#### Hi! Today you will play with NaN values and another Data Manipulation methods. Let us know if you need help and also if you need more challenges on that :)

#### The dataset that you will use is again recipes.csv. You can either use your own or the one in the subfolder "data". Do not forget to import the necessary libraries!

In [1]:
import pandas as pd
import numpy as np

In [2]:
recipes = pd.read_csv('data/recipes.csv', index_col=0)
recipes.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 150 entries, 0 to 149
Data columns (total 78 columns):
 #   Column                              Non-Null Count  Dtype  
---  ------                              --------------  -----  
 0   vegetarian                          150 non-null    bool   
 1   glutenFree                          150 non-null    bool   
 2   dairyFree                           150 non-null    bool   
 3   veryHealthy                         150 non-null    bool   
 4   healthScore                         150 non-null    float64
 5   aggregateLikes                      150 non-null    int64  
 6   id                                  150 non-null    int64  
 7   title                               150 non-null    object 
 8   pricePerServing                     150 non-null    float64
 9   readyInMinutes                      150 non-null    int64  
 10  servings                            150 non-null    int64  
 11  sourceUrl                           150 non-n

#### Print the names of Columns that have null values

In [18]:
### Columns with null value 
### Check out isna() method here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isna.html
null_values_mask = recipes.isna().any()
columns_with_null = recipes[recipes.columns[null_values_mask]].columns
print(columns_with_null)
print(f'Number of columns with null values: {len(columns_with_null)}')

Index(['FiberAmount', 'FiberpercentOfDailyNeed', 'VitaminB6Amount',
       'VitaminB6percentOfDailyNeed', 'ManganeseAmount',
       'ManganesepercentOfDailyNeed', 'SeleniumAmount',
       'SeleniumpercentOfDailyNeed', 'PotassiumAmount',
       'PotassiumpercentOfDailyNeed', 'VitaminB2Amount',
       'VitaminB2percentOfDailyNeed', 'VitaminCAmount',
       'VitaminCpercentOfDailyNeed', 'PhosphorusAmount',
       'PhosphoruspercentOfDailyNeed', 'FolateAmount',
       'FolatepercentOfDailyNeed', 'VitaminB5Amount',
       'VitaminB5percentOfDailyNeed', 'MagnesiumAmount',
       'MagnesiumpercentOfDailyNeed', 'IronAmount', 'IronpercentOfDailyNeed',
       'VitaminKAmount', 'VitaminKpercentOfDailyNeed', 'CopperAmount',
       'CopperpercentOfDailyNeed', 'VitaminAAmount',
       'VitaminApercentOfDailyNeed', 'VitaminB12Amount',
       'VitaminB12percentOfDailyNeed', 'VitaminDAmount',
       'VitaminDpercentOfDailyNeed', 'ZincAmount', 'ZincpercentOfDailyNeed',
       'VitaminEAmount', 'VitaminE

#### Print the recipes with the highest number of column aggregateLikes

In [19]:
most_likes_mask = recipes['aggregateLikes'] == recipes['aggregateLikes'].max()
recipes[most_likes_mask].title

50    Slow Cooker Spicy Hot Wings
Name: title, dtype: object

#### Handle the missing values of “calcium percent of Daily need”. Sort the resulting DataFrame by id column.

In [20]:
### A quick check shows that where we have missing values on CalciumAmount we also have missing values on CalciumpercentOfDailyNeed which means we can't interpret the correct values
### And that the reason CalciumpercentOfDailyNeed is missing is because CalciumAmount is missing. Which means the best way to handle it is to fill it with 0 instead.
print(recipes['CalciumpercentOfDailyNeed'].isna().equals(recipes['CalciumAmount'].isna()))

recipes[['CalciumpercentOfDailyNeed', 'CalciumAmount']] = recipes[['CalciumpercentOfDailyNeed','CalciumAmount']].fillna(value=0)

print(recipes['CalciumpercentOfDailyNeed'].isna().any()) ### No more missing values

True
False


#### Delete the columns that have null values == 141

In [22]:
recipes.dropna(thresh=141, axis=1) ### No columns were dropped

Unnamed: 0,vegetarian,glutenFree,dairyFree,veryHealthy,healthScore,aggregateLikes,id,title,pricePerServing,readyInMinutes,...,MagnesiumAmount,MagnesiumpercentOfDailyNeed,IronAmount,IronpercentOfDailyNeed,CopperAmount,CopperpercentOfDailyNeed,VitaminB1Amount,VitaminB1percentOfDailyNeed,CalciumAmount,CalciumpercentOfDailyNeed
0,True,True,True,False,1.0,3,633998,Banana Blueberry Pancakes,55.79,45,...,19.91,4.98,0.75,4.14,0.08,3.80,0.03,2.26,18.11,1.81
1,True,True,False,False,0.0,1,634426,Basil and Orange Confit Compound Butter,15.07,45,...,,,,,,,,,0.00,0.00
2,False,False,False,False,3.0,1,635085,Black Bottom Banana Bars,78.85,45,...,43.70,10.93,3.14,17.46,0.26,13.10,0.32,21.09,100.80,10.08
3,True,False,True,False,2.0,1,663229,The Best Raw Chocolate Chip Cookies,54.64,45,...,8.51,2.13,1.21,6.71,0.03,1.65,0.10,6.47,26.79,2.68
4,False,True,False,False,31.0,3,651437,Mediterranean Spinach Artichoke Dip,115.26,45,...,42.27,10.57,1.48,8.23,0.13,6.39,0.07,4.64,162.84,16.28
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
145,False,True,True,False,18.0,13,638315,"Chicken Sausage, White Bean and Cabbage Soup",196.23,45,...,53.40,13.35,3.34,18.53,0.21,10.62,0.15,10.23,102.67,10.27
146,True,False,False,False,1.0,2,656248,Pinot Noir Brownies,65.76,45,...,33.91,8.48,1.13,6.28,0.28,14.15,0.03,1.68,19.82,1.98
147,True,True,True,True,93.0,2,637297,Cauliflower Chickpea Stew,139.28,45,...,162.82,40.71,4.85,26.92,0.68,34.15,0.45,29.70,141.16,14.12
148,True,True,False,False,5.0,1,659412,Sautéed Balsamic Green Beans With Cherry Tomatoes,101.81,45,...,35.26,8.82,1.65,9.18,0.13,6.71,0.12,7.85,55.64,5.56


#### Print the title of the recipes that are vegetarian == TRUE and glutenFree == True

In [23]:
mask = recipes['vegetarian'] == True
mask_2 = recipes['glutenFree'] == True
recipes[mask & mask_2].title

0                              Banana Blueberry Pancakes
1                Basil and Orange Confit Compound Butter
14                            Spicy Carrot Amaranth Soup
15                                            Nutty Rice
20                           Fire Roasted Tomato Chutney
22                                 Butternut Squash Soup
24                         Green Beans with Garlic Chips
27                                Easy Eggplant Parmesan
29                                            Kappa Maki
30                              Vegan Chana Masala Curry
31     Grilled Peach Melba with Vanilla Bean Frozen Y...
33                             Detox Orange Carrot Juice
37                      Roasted Asparagus with Egg Salad
39                   Three Ingredient Frozen Pina Colada
42                               Peach Coconut Ice Cream
57     Sautéed Balsamic Green Beans With Cherry Tomatoes
61                              Ginger Melon Side Salad 
69                      Chicken

#### How many vegan recipes are there (Vegan = Vegetarian and dairy free)?

In [25]:
mask = recipes['vegetarian'] == True
mask_2 = recipes['dairyFree'] == True
vegan_recipes = recipes[mask & mask_2]
print(len(vegan_recipes))

29


#### Compare the average amount of Vitamin B12 for the vegan and non-vegan recipes. How reliable are the results?

In [27]:
print(vegan_recipes['VitaminB12Amount'].mean())
print(f'Number of vegan recipes {len(vegan_recipes)}')
print(f"Null values in VitaminB12Amount in vegan recipes {vegan_recipes['VitaminB12Amount'].isna().sum()}")

0.24333333333333332
Number of vegan recipes 29
Null values in VitaminB12Amount in vegan recipes 26


In [28]:
mask = recipes['vegetarian'] == False
mask_2 = recipes['glutenFree'] == False
none_vegan_recipes = recipes[mask | mask_2]
print(none_vegan_recipes['VitaminB12Amount'].mean())
print(f'Number of none vegan recipes {len(none_vegan_recipes)}')
print(f"Null values in VitaminB12Amount in none vegan recipes {none_vegan_recipes['VitaminB12Amount'].isna().sum()}")

1.0472619047619045
Number of none vegan recipes 108
Null values in VitaminB12Amount in none vegan recipes 24


In [29]:
### Results are not too reliable dure to the ratio of null values in each column. The nonvegan has 89.66% null percentage while the percentage for the vegan recipes is 22.22%.

#### Compare the average health Score for both types of recipes (vegan vs non-vegan).

In [30]:
print(vegan_recipes['healthScore'].mean())
print(f'Number of vegan recipes {len(vegan_recipes)}')
print(f"Null values in healthScore in vegan recipes {vegan_recipes['healthScore'].isna().sum()}")

27.724137931034484
Number of vegan recipes 29
Null values in healthScore in vegan recipes 0


In [31]:
print(none_vegan_recipes['healthScore'].mean())
print(f'Number of none vegan recipes {len(none_vegan_recipes)}')
print(f"Null values in healthScore of none vegan recipes {none_vegan_recipes['healthScore'].isna().sum()}")

19.73148148148148
Number of none vegan recipes 108
Null values in healthScore of none vegan recipes 0


In [32]:
### We can see that the vegan recipes has a higher average health score than none vegan recipes.

#### Create a new column: "VeganWeek" where you decrease the price of vegan products by 10%, but only if they already cost more than 100.

In [33]:
mask = recipes['vegetarian'] == True
mask_2 = recipes['dairyFree'] == True
mask_3 = recipes['pricePerServing'] > 100
recipes['VeganWeek'] = recipes[mask & mask_2 & mask_3]['pricePerServing'] * 0.9 ## Update price
recipes['VeganWeek'] = recipes['VeganWeek'].fillna(value=recipes['pricePerServing']) ## Fill null values with original price (undiscounted)
recipes['VeganWeek']

0       55.790
1       15.070
2       78.850
3       54.640
4      115.260
        ...   
145    196.230
146     65.760
147    125.352
148    101.810
149    105.516
Name: VeganWeek, Length: 150, dtype: float64

#### Among the non-vegan recipes, find the one requiring most time to prepare. How is it called?

In [34]:
most_time_consuming_recipe = none_vegan_recipes.loc[none_vegan_recipes['readyInMinutes'].idxmax()]

print(f'{most_time_consuming_recipe.title} recipe takes the most time at {most_time_consuming_recipe.readyInMinutes} minutes')


Oeufs En Meurette recipe takes the most time at 328 minutes
