# Reading Data and Simple Manipulations

Data scientists use the `pandas` library a lot in their work.  We tend to think of it as
our main workhorse, and there's a lot that we can do with it.

In python, we use the `import` statement to give us access to libraries.  The following line
imports the `pandas` library and allows us to refer to it as `pd` (data scientists are lazy and type poorly):

In [1]:
import pandas as pd

## Reading data

Next, let's read a CSV file into a DataFrame (which is basically a table).  
This particular file contains nutritional information from McDonalds.  It was released by McDonalds in 2015 and is 
available via https://www.kaggle.com/mcdonalds/nutrition-facts

In [2]:
mcdonalds = pd.read_csv("menu.csv")

Running the next block will "evaluate" the DataFrame.  What that usually means is that we're going to 
print it to the notebook:

In [3]:
mcdonalds

Unnamed: 0,Category,Item,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
0,Breakfast,Egg McMuffin,4.8 oz (136 g),300,120,13.0,20,5.0,25,0.0,...,31,10,4,17,3,17,10,0,25,15
1,Breakfast,Egg White Delight,4.8 oz (135 g),250,70,8.0,12,3.0,15,0.0,...,30,10,4,17,3,18,6,0,25,8
2,Breakfast,Sausage McMuffin,3.9 oz (111 g),370,200,23.0,35,8.0,42,0.0,...,29,10,4,17,2,14,8,0,25,10
3,Breakfast,Sausage McMuffin with Egg,5.7 oz (161 g),450,250,28.0,43,10.0,52,0.0,...,30,10,4,17,2,21,15,0,30,15
4,Breakfast,Sausage McMuffin with Egg Whites,5.7 oz (161 g),400,210,23.0,35,8.0,42,0.0,...,30,10,4,17,2,21,6,0,25,10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
255,Smoothies & Shakes,McFlurry with Oreo Cookies (Small),10.1 oz (285 g),510,150,17.0,26,9.0,44,0.5,...,80,27,1,4,64,12,15,0,40,8
256,Smoothies & Shakes,McFlurry with Oreo Cookies (Medium),13.4 oz (381 g),690,200,23.0,35,12.0,58,1.0,...,106,35,1,5,85,15,20,0,50,10
257,Smoothies & Shakes,McFlurry with Oreo Cookies (Snack),6.7 oz (190 g),340,100,11.0,17,6.0,29,0.0,...,53,18,1,2,43,8,10,0,25,6
258,Smoothies & Shakes,McFlurry with Reese's Peanut Butter Cups (Medium),14.2 oz (403 g),810,290,32.0,50,15.0,76,1.0,...,114,38,2,9,103,21,20,0,60,6


## Simple manipulations

We can look at the different parts of a DataFrame:


In [4]:
mcdonalds.columns

Index(['Category', 'Item', 'Serving Size', 'Calories', 'Calories from Fat',
       'Total Fat', 'Total Fat (% Daily Value)', 'Saturated Fat',
       'Saturated Fat (% Daily Value)', 'Trans Fat', 'Cholesterol',
       'Cholesterol (% Daily Value)', 'Sodium', 'Sodium (% Daily Value)',
       'Carbohydrates', 'Carbohydrates (% Daily Value)', 'Dietary Fiber',
       'Dietary Fiber (% Daily Value)', 'Sugars', 'Protein',
       'Vitamin A (% Daily Value)', 'Vitamin C (% Daily Value)',
       'Calcium (% Daily Value)', 'Iron (% Daily Value)'],
      dtype='object')

In [5]:
mcdonalds.shape

(260, 24)

In [6]:
mcdonalds.head()

Unnamed: 0,Category,Item,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
0,Breakfast,Egg McMuffin,4.8 oz (136 g),300,120,13.0,20,5.0,25,0.0,...,31,10,4,17,3,17,10,0,25,15
1,Breakfast,Egg White Delight,4.8 oz (135 g),250,70,8.0,12,3.0,15,0.0,...,30,10,4,17,3,18,6,0,25,8
2,Breakfast,Sausage McMuffin,3.9 oz (111 g),370,200,23.0,35,8.0,42,0.0,...,29,10,4,17,2,14,8,0,25,10
3,Breakfast,Sausage McMuffin with Egg,5.7 oz (161 g),450,250,28.0,43,10.0,52,0.0,...,30,10,4,17,2,21,15,0,30,15
4,Breakfast,Sausage McMuffin with Egg Whites,5.7 oz (161 g),400,210,23.0,35,8.0,42,0.0,...,30,10,4,17,2,21,6,0,25,10


In [7]:
mcdonalds.tail()

Unnamed: 0,Category,Item,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
255,Smoothies & Shakes,McFlurry with Oreo Cookies (Small),10.1 oz (285 g),510,150,17.0,26,9.0,44,0.5,...,80,27,1,4,64,12,15,0,40,8
256,Smoothies & Shakes,McFlurry with Oreo Cookies (Medium),13.4 oz (381 g),690,200,23.0,35,12.0,58,1.0,...,106,35,1,5,85,15,20,0,50,10
257,Smoothies & Shakes,McFlurry with Oreo Cookies (Snack),6.7 oz (190 g),340,100,11.0,17,6.0,29,0.0,...,53,18,1,2,43,8,10,0,25,6
258,Smoothies & Shakes,McFlurry with Reese's Peanut Butter Cups (Medium),14.2 oz (403 g),810,290,32.0,50,15.0,76,1.0,...,114,38,2,9,103,21,20,0,60,6
259,Smoothies & Shakes,McFlurry with Reese's Peanut Butter Cups (Snack),7.1 oz (202 g),410,150,16.0,25,8.0,38,0.0,...,57,19,1,5,51,10,10,0,30,4


In [8]:
mcdonalds['Calories']

0      300
1      250
2      370
3      450
4      400
      ... 
255    510
256    690
257    340
258    810
259    410
Name: Calories, Length: 260, dtype: int64

## Simple statistics

In [9]:
mcdonalds['Calories'].mean()

368.2692307692308

In [10]:
mcdonalds['Calories'].sum()

95750

In [11]:
mcdonalds['Calories'].count()

260

In [12]:
mcdonalds.describe()

Unnamed: 0,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,Cholesterol,Cholesterol (% Daily Value),Sodium,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
count,260.0,260.0,260.0,260.0,260.0,260.0,260.0,260.0,260.0,260.0,...,260.0,260.0,260.0,260.0,260.0,260.0,260.0,260.0,260.0,260.0
mean,368.269231,127.096154,14.165385,21.815385,6.007692,29.965385,0.203846,54.942308,18.392308,495.75,...,47.346154,15.780769,1.630769,6.530769,29.423077,13.338462,13.426923,8.534615,20.973077,7.734615
std,240.269886,127.875914,14.205998,21.885199,5.321873,26.639209,0.429133,87.269257,29.091653,577.026323,...,28.252232,9.419544,1.567717,6.307057,28.679797,11.426146,24.366381,26.345542,17.019953,8.723263
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,210.0,20.0,2.375,3.75,1.0,4.75,0.0,5.0,2.0,107.5,...,30.0,10.0,0.0,0.0,5.75,4.0,2.0,0.0,6.0,0.0
50%,340.0,100.0,11.0,17.0,5.0,24.0,0.0,35.0,11.0,190.0,...,44.0,15.0,1.0,5.0,17.5,12.0,8.0,0.0,20.0,4.0
75%,500.0,200.0,22.25,35.0,10.0,48.0,0.0,65.0,21.25,865.0,...,60.0,20.0,3.0,10.0,48.0,19.0,15.0,4.0,30.0,15.0
max,1880.0,1060.0,118.0,182.0,20.0,102.0,2.5,575.0,192.0,3600.0,...,141.0,47.0,7.0,28.0,128.0,87.0,170.0,240.0,70.0,40.0


## Split-apply-combine

In [13]:
mcdonalds['Category'].value_counts()

Coffee & Tea          95
Breakfast             42
Smoothies & Shakes    28
Chicken & Fish        27
Beverages             27
Beef & Pork           15
Snacks & Sides        13
Desserts               7
Salads                 6
Name: Category, dtype: int64

In [14]:
mcdonalds[mcdonalds["Category"] == "Salads"]

Unnamed: 0,Category,Item,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
84,Salads,Premium Bacon Ranch Salad (without Chicken),7.9 oz (223 g),140,70,7.0,11,3.5,18,0.0,...,10,3,3,12,4,9,170,30,15,6
85,Salads,Premium Bacon Ranch Salad with Crispy Chicken,9 oz (255 g),380,190,21.0,33,6.0,29,0.0,...,22,7,2,10,5,25,100,25,15,8
86,Salads,Premium Bacon Ranch Salad with Grilled Chicken,8.5 oz (241 g),220,80,8.0,13,4.0,20,0.0,...,8,3,2,10,4,29,110,30,15,8
87,Salads,Premium Southwest Salad (without Chicken),8.1 oz (230 g),140,40,4.5,7,2.0,9,0.0,...,20,7,6,23,6,6,160,25,15,10
88,Salads,Premium Southwest Salad with Crispy Chicken,12.3 oz (348 g),450,190,22.0,33,4.5,22,0.0,...,42,14,7,28,12,23,170,30,15,15
89,Salads,Premium Southwest Salad with Grilled Chicken,11.8 oz (335 g),290,80,8.0,13,2.5,13,0.0,...,28,9,7,28,10,27,170,30,15,15


In [15]:
mcdonalds.query("Category == 'Salads'")

Unnamed: 0,Category,Item,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
84,Salads,Premium Bacon Ranch Salad (without Chicken),7.9 oz (223 g),140,70,7.0,11,3.5,18,0.0,...,10,3,3,12,4,9,170,30,15,6
85,Salads,Premium Bacon Ranch Salad with Crispy Chicken,9 oz (255 g),380,190,21.0,33,6.0,29,0.0,...,22,7,2,10,5,25,100,25,15,8
86,Salads,Premium Bacon Ranch Salad with Grilled Chicken,8.5 oz (241 g),220,80,8.0,13,4.0,20,0.0,...,8,3,2,10,4,29,110,30,15,8
87,Salads,Premium Southwest Salad (without Chicken),8.1 oz (230 g),140,40,4.5,7,2.0,9,0.0,...,20,7,6,23,6,6,160,25,15,10
88,Salads,Premium Southwest Salad with Crispy Chicken,12.3 oz (348 g),450,190,22.0,33,4.5,22,0.0,...,42,14,7,28,12,23,170,30,15,15
89,Salads,Premium Southwest Salad with Grilled Chicken,11.8 oz (335 g),290,80,8.0,13,2.5,13,0.0,...,28,9,7,28,10,27,170,30,15,15


In [16]:
mcdonalds.groupby('Category')['Calories'].mean()

Category
Beef & Pork           494.000000
Beverages             113.703704
Breakfast             526.666667
Chicken & Fish        552.962963
Coffee & Tea          283.894737
Desserts              222.142857
Salads                270.000000
Smoothies & Shakes    531.428571
Snacks & Sides        245.769231
Name: Calories, dtype: float64

### Example: Which menu item has the most calories?

In [18]:
# Solution
mcdonalds.sort_values('Calories',ascending=False)

Unnamed: 0,Category,Item,Serving Size,Calories,Calories from Fat,Total Fat,Total Fat (% Daily Value),Saturated Fat,Saturated Fat (% Daily Value),Trans Fat,...,Carbohydrates,Carbohydrates (% Daily Value),Dietary Fiber,Dietary Fiber (% Daily Value),Sugars,Protein,Vitamin A (% Daily Value),Vitamin C (% Daily Value),Calcium (% Daily Value),Iron (% Daily Value)
82,Chicken & Fish,Chicken McNuggets (40 piece),22.8 oz (646 g),1880,1060,118.0,182,20.0,101,1.0,...,118,39,6,24,1,87,0,15,8,25
32,Breakfast,Big Breakfast with Hotcakes (Large Biscuit),15.3 oz (434 g),1150,540,60.0,93,20.0,100,0.0,...,116,39,7,28,17,36,15,2,30,40
31,Breakfast,Big Breakfast with Hotcakes (Regular Biscuit),14.8 oz (420 g),1090,510,56.0,87,19.0,96,0.0,...,111,37,6,23,17,36,15,2,25,40
34,Breakfast,Big Breakfast with Hotcakes and Egg Whites (La...,15.4 oz (437 g),1050,450,50.0,77,16.0,81,0.0,...,115,38,7,28,18,35,4,2,25,30
33,Breakfast,Big Breakfast with Hotcakes and Egg Whites (Re...,14.9 oz (423 g),990,410,46.0,70,16.0,78,0.0,...,110,37,6,23,17,35,0,2,25,30
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
117,Beverages,Diet Coke (Child),12 fl oz cup,0,0,0.0,0,0.0,0,0.0,...,0,0,0,0,0,0,0,0,0,0
116,Beverages,Diet Coke (Large),30 fl oz cup,0,0,0.0,0,0.0,0,0.0,...,0,0,0,0,0,0,0,0,0,0
115,Beverages,Diet Coke (Medium),21 fl oz cup,0,0,0.0,0,0.0,0,0.0,...,0,0,0,0,0,0,0,0,0,0
114,Beverages,Diet Coke (Small),16 fl oz cup,0,0,0.0,0,0.0,0,0.0,...,0,0,0,0,0,0,0,0,0,0


### Challenge: Come up with a question that could be answered with this data set