# Data Manipulation in Python

### Loading Packages

In this set of challenge problems, we will practice popular pandas dataframe manipulation.

First, let's load the McDonald's India Menu file. This dataset was downloaded from [Kaggle](www.kaggle.com). 

Kaggle is a free website that has thousands of free datasets that you can download to practice your skills. Additionally, they host competitions throughout the year to solve real-world data problems provided from outside businesses. The competitions are a great way to find a problem that interests you & practice your python/data science skills. 

In [2]:
# Import pandas
import pandas as pd

# Define the file path & file name
file_name = file_path = '../../2022-fall-python-tutorial/data/India_Menu.csv'

# Load the menu to menu_df
menu_df = pd.read_csv(file_name)

# Print the first 3 rows
menu_df.head(3)

Unnamed: 0,Menu Category,Menu Items,Per Serve Size,Energy (kCal),Protein (g),Total fat (g),Sat Fat (g),Trans fat (g),Cholesterols (mg),Total carbohydrate (g),Total Sugars (g),Added Sugars (g),Sodium (mg)
0,Regular Menu,McVeggie™ Burger,168 g,402.05,10.24,13.83,5.34,0.16,2.49,56.54,7.9,4.49,706.13
1,Regular Menu,McAloo Tikki Burger®,146 g,339.52,8.5,11.31,4.27,0.2,1.47,50.27,7.05,4.07,545.34
2,Regular Menu,McSpicy™ Paneer Burger,199 g,652.76,20.29,39.45,17.12,0.18,21.85,52.33,8.35,5.27,1074.58


#### CP #1

Select only the following columns & re-define menu_df:
* Menu Category
* Menu Items
* Energy (kCal)
* Protein (g)
* Total fat (g)
* Total carbohydrate (g)
* Total Sugars (g)
* Added Sugars (g)

In [8]:
# Keep only the required columns
menu_df = 

# Print first 5 rows
menu_df.head()

Unnamed: 0,Menu Category,Menu Items,Energy (kCal),Protein (g),Total fat (g),Total carbohydrate (g),Total Sugars (g),Added Sugars (g)
0,Regular Menu,McVeggie™ Burger,402.05,10.24,13.83,56.54,7.9,4.49
1,Regular Menu,McAloo Tikki Burger®,339.52,8.5,11.31,50.27,7.05,4.07
2,Regular Menu,McSpicy™ Paneer Burger,652.76,20.29,39.45,52.33,8.35,5.27
3,Regular Menu,Spicy Paneer Wrap,674.68,20.96,39.1,59.27,3.5,1.08
4,Regular Menu,American Veg Burger,512.17,15.3,23.45,56.96,7.85,4.76


#### CP #2

Drop 'Total Sugars (g)' inplace:

*Hint - make sure you define if you're dropping a row or column*

In [9]:
# Drop the 'Total Sugars (g)' column inplace
menu_df

# Print first 5 rows
menu_df.head()

Unnamed: 0,Menu Category,Menu Items,Energy (kCal),Protein (g),Total fat (g),Total carbohydrate (g),Added Sugars (g)
0,Regular Menu,McVeggie™ Burger,402.05,10.24,13.83,56.54,4.49
1,Regular Menu,McAloo Tikki Burger®,339.52,8.5,11.31,50.27,4.07
2,Regular Menu,McSpicy™ Paneer Burger,652.76,20.29,39.45,52.33,5.27
3,Regular Menu,Spicy Paneer Wrap,674.68,20.96,39.1,59.27,1.08
4,Regular Menu,American Veg Burger,512.17,15.3,23.45,56.96,4.76


#### CP #3

Rename the columns inplace as outlined below:
* Menu Category = menu_category
* Menu Items = menu_item
* Energy (kCal) = calories
* Protein (g) = protein
* Total fat (g) = fat
* Total carbohydrate (g) = carbs
* Added Sugars = added_sugar

*Hint - make sure you define if you're renaming a row or column*

In [17]:
# Rename columns
menu_df.rename(

    {
        "Menu Category": "menu_category",
        "Menu Items": "menu_item",
        "Energy (kCal)": "calories",
        "Protein (g)": "protein",
        "Total fat (g)": "fat",
        "Total carbohydrate (g)": "carbs",
        "Added Sugars (g)": "added_sugar"
    }
)

#### CP #4

Sort menu_df by added_sugar from highest to lowest & reset the index:

In [20]:
# Sort menu_df by added_sugar
menu_df = 

# Print first 3 rows
menu_df.head(3)

Unnamed: 0,index,menu_category,menu_item,calories,protein,fat,carbs,added_sugar
0,120,Beverages Menu,Large Fanta Oragne,256.88,0.0,0.0,64.22,64.22
1,126,Beverages Menu,Large Sprite,237.12,0.0,0.0,59.28,59.28
2,117,Beverages Menu,Large Coca-Cola,217.36,0.0,0.0,54.34,54.34


#### CP #5

Get a list of distinct menu_category values & define it as menu_cat_list:

In [21]:
# Create list of unique menu categories
menu_cat_list = menu_df["menu_category"].drop_duplicates()

# Print our category list
menu_cat_list

['Beverages Menu',
 'McCafe Menu',
 'Desserts Menu',
 'Breakfast Menu',
 'Gourmet Menu',
 'Regular Menu',
 'Condiments Menu']

#### CP #6

Find the index of the Regular Menu & the Breakfast Menu:

In [26]:
# Find index of 'Breakfast Menu'
break_menu_ind = 

# Find index of 'Regular Menu'
reg_menu_ind = 

# Define list to filter menu_df
filter_list = [menu_cat_list[break_menu_ind], menu_cat_list[reg_menu_ind]]

# Print filter_list
print(filter_list)

#### CP #7

Filter menu_df for only items in the breakfast & regular menu:

In [33]:
# Filter menu_df using filter_list
menu_df = menu_df[menu_df["menu_category"]].reset_index(drop = True)

# Print first 3 rows
menu_df.head(3)

Unnamed: 0,menu_category,menu_item,calories,protein,fat,carbs,added_sugar
0,Breakfast Menu,Hot Cake with maple syrup,432.98,8.6,14.02,68.01,13.5
1,Regular Menu,Veg Maharaja Mac,832.67,24.17,37.94,93.84,6.92
2,Regular Menu,Chicken Maharaja Mac,689.12,34.0,36.69,55.39,6.14


#### CP #8

Add a column calculating how many calories come from protein:

*Note - 4 calories = 1 gram of protein*