# Appending Data
First, import the necessary packages and load `winequality-red.csv` and `winequality-white.csv`.

<p style="font-family: Arial; font-size:2em;color:gold; text-align:center;">---->Abdalla Nassar<---- </p>

In [1]:
# import numpy and pandas
import numpy as np
import pandas as pd

# load red and white wine datasets
df_red=pd.read_csv("winequality-red.csv", sep=";")
df_white=pd.read_csv("winequality-white.csv", sep=";")

## Create Color Columns
Create two arrays as long as the number of rows in the red and white dataframes that repeat the value “red” or “white.” NumPy offers really easy way to do this. Here’s the documentation for [NumPy’s repeat](https://docs.scipy.org/doc/numpy/reference/generated/numpy.repeat.html) function. Take a look and try it yourself.

In [2]:
# create color array for red dataframe
color_red=np.repeat("Red",len(df_red))
print("red color: {}".format(color_red))

# create color array for white dataframe
color_white=np.repeat("White",len(df_white))
print("White color: {}".format(color_white))


red color: ['Red' 'Red' 'Red' ... 'Red' 'Red' 'Red']
White color: ['White' 'White' 'White' ... 'White' 'White' 'White']


Add arrays to the red and white dataframes. Do this by setting a new column called 'color' to the appropriate array.

In [3]:
df_red["Color"] = color_red
df_white["Color"] = color_white



Do the same for the white dataframe and use `head()` to confirm the change.

In [4]:
df_white.head(10)

Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality,Color
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6,White
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6,White
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6,White
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,White
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6,White
5,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6,White
6,6.2,0.32,0.16,7.0,0.045,30.0,136.0,0.9949,3.18,0.47,9.6,6,White
7,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6,White
8,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6,White
9,8.1,0.22,0.43,1.5,0.044,28.0,129.0,0.9938,3.22,0.45,11.0,6,White


## Combine DataFrames with Append
Check the documentation for [Pandas' append](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.append.html) function and see if you can use this to figure out how to combine the dataframes.

In [5]:
print(df_white.shape)
print(df_red.shape)

(4898, 13)
(1599, 13)


In [6]:
# append the dataframes  (There are 3 ways to do this, can you use them all ?)


df1 =pd.concat([df_white,df_red],axis=0)  #Very good way
df1

#-----------------------OR-------------------------

#df2= df_white.append(df_red)  #Very good way
#df2

#-----------------------OR-------------------------

#df3=df_red.merge(df_white,how='outer')  #Very good way
#df3.head(10)

#-----------------------OR-------------------------

#df4 = df_red.join(df_white , how='left', lsuffix='_left', rsuffix='_right') #bad way to do this
#df4

#-----------------------OR-------------------------

#df_white.loc[len(df_white)] = df_red.iloc[0] #good way
#df_white

#-----------------------Done-------------------------
# view dataframe to check for success
#هو كان فيه تاني بس انا تعبت😂


Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality,Color,total_sulfur-dioxide
0,7.0,0.270,0.36,20.7,0.045,45.0,170.0,1.00100,3.00,0.45,8.8,6,White,
1,6.3,0.300,0.34,1.6,0.049,14.0,132.0,0.99400,3.30,0.49,9.5,6,White,
2,8.1,0.280,0.40,6.9,0.050,30.0,97.0,0.99510,3.26,0.44,10.1,6,White,
3,7.2,0.230,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6,White,
4,7.2,0.230,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6,White,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1594,6.2,0.600,0.08,2.0,0.090,32.0,,0.99490,3.45,0.58,10.5,5,Red,44.0
1595,5.9,0.550,0.10,2.2,0.062,39.0,,0.99512,3.52,0.76,11.2,6,Red,51.0
1596,6.3,0.510,0.13,2.3,0.076,29.0,,0.99574,3.42,0.75,11.0,6,Red,40.0
1597,5.9,0.645,0.12,2.0,0.075,32.0,,0.99547,3.57,0.71,10.2,5,Red,44.0


## Scroll to right, you will find a column filled with NaN values. Go watch the next video and get back here to solve the problem (This is necessary for the next tasks !!)

In [7]:
# fix column names (do not use the usual solution we used earlier in the previous tasks)
df_red.rename(columns={'total_sulfur-dioxide': 'total_sulfur_dioxide'}, inplace=True)

# merge the two datasets again after fixing the issue (keep the color column)
df1 =pd.concat([df_white,df_red],axis=0)  


In [8]:
# Confirm your changes
df1


Unnamed: 0,fixed_acidity,volatile_acidity,citric_acid,residual_sugar,chlorides,free_sulfur_dioxide,total_sulfur_dioxide,density,pH,sulphates,alcohol,quality,Color
0,7.0,0.270,0.36,20.7,0.045,45.0,170.0,1.00100,3.00,0.45,8.8,6,White
1,6.3,0.300,0.34,1.6,0.049,14.0,132.0,0.99400,3.30,0.49,9.5,6,White
2,8.1,0.280,0.40,6.9,0.050,30.0,97.0,0.99510,3.26,0.44,10.1,6,White
3,7.2,0.230,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6,White
4,7.2,0.230,0.32,8.5,0.058,47.0,186.0,0.99560,3.19,0.40,9.9,6,White
...,...,...,...,...,...,...,...,...,...,...,...,...,...
1594,6.2,0.600,0.08,2.0,0.090,32.0,44.0,0.99490,3.45,0.58,10.5,5,Red
1595,5.9,0.550,0.10,2.2,0.062,39.0,51.0,0.99512,3.52,0.76,11.2,6,Red
1596,6.3,0.510,0.13,2.3,0.076,29.0,40.0,0.99574,3.42,0.75,11.0,6,Red
1597,5.9,0.645,0.12,2.0,0.075,32.0,44.0,0.99547,3.57,0.71,10.2,5,Red


## Save Combined Dataset
Save your newly combined dataframe as `winequality_edited.csv`. Remember, set `index=False` to avoid saving with an unnamed column!

In [9]:
# save the dataframe
df1.to_csv('winequality_edited.csv', index=False)

In [10]:
# How many samples are there in the newely saved dataframe?___________
print("number of samples winequality: {} ".format(df1.shape[0]))

print("*"*60)
print("number of samples winequality: {} ".format(len(df1)))




number of samples winequality: 6497 
************************************************************
number of samples winequality: 6497 


In [11]:
# How many columns are there?___________ 

print("number of columns winequality: {} ".format(len(df1.columns)))

print("*"*60)
print("number of columns winequality: {} ".format(df1.shape[1]))


number of columns winequality: 13 
************************************************************
number of columns winequality: 13 
