<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Project 2: Analyzing Chipotle Data

_Author: Joseph Nelson (DC)_

---

For Project 2, you will complete a series of exercises exploring [order data from Chipotle](https://github.com/TheUpshot/chipotle), compliments of _The New York Times'_ "The Upshot."

For these exercises, you will conduct basic exploratory data analysis (Pandas not required) to understand the essentials of Chipotle's order data: how many orders are being made, the average price per order, how many different ingredients are used, etc. These allow you to practice business analysis skills while also becoming comfortable with Python.

---

## Basic Level

### Part 1: Read in the file with `csv.reader()` and store it in an object called `file_nested_list`.

Hint: This is a TSV (tab-separated value) file, and `csv.reader()` needs to be told [how to handle it](https://docs.python.org/2/library/csv.html).

In [40]:
import pandas as pd
import numpy as np

DATA_FILE_TSV = 'chipotle.tsv'
df = pd.read_csv(DATA_FILE_TSV, sep ="\t")

In [41]:
df.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


---

## Intermediate Level

### Part 3: Calculate the average price of an order.

Hint: Examine the data to see if the `quantity` column is relevant to this calculation.

Hint: Think carefully about the simplest way to do this!

In [42]:
df.dtypes

order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

In [43]:
df['item_price'] = df['item_price'].str.replace('$','')

In [45]:
df['item_price'] = df['item_price'].astype(float)

In [102]:
df['total_price_per_item_per_order'] = (df['item_price']*df['quantity'])
df.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,total_price_per_item_per_order
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39
1,1,1,Izze,[Clementine],3.39,3.39
2,1,1,Nantucket Nectar,[Apple],3.39,3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98,33.96


In [135]:
orders = df.groupby('order_id')['total_price_per_item_per_order'].sum()
orders

order_id
1       11.56
2       33.96
3       12.67
4       21.00
5       13.70
        ...  
1830    23.00
1831    12.90
1832    13.20
1833    23.50
1834    28.75
Name: total_price_per_item_per_order, Length: 1834, dtype: float64

In [59]:
orders.mean()

21.394231188658654

# Part 4: Create a list (or set) named `unique_sodas` containing all of unique sodas and soft drinks that Chipotle sells.

Note: Just look for `'Canned Soda'` and `'Canned Soft Drink'`, and ignore other drinks like `'Izze'`.

In [121]:
unique_sodas = df[(df.item_name == 'Canned Soda')|(df.item_name == 'Canned Soft Drink')].choice_description.unique()
unique_sodas

array(['[Sprite]', '[Dr. Pepper]', '[Mountain Dew]', '[Diet Dr. Pepper]',
       '[Coca Cola]', '[Diet Coke]', '[Coke]', '[Lemonade]', '[Nestea]'],
      dtype=object)

---

## Advanced Level


### Part 5: Calculate the average number of toppings per burrito.

Note: Let's ignore the `quantity` column to simplify this task.

Hint: Think carefully about the easiest way to count the number of toppings!


In [110]:
df['topping_count'] = (df['choice_description'].str.count(','))
df.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,total_price_per_item_per_order,topping_count
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39,
1,1,1,Izze,[Clementine],3.39,3.39,0.0
2,1,1,Nantucket Nectar,[Apple],3.39,3.39,0.0
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,2.39,
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98,33.96,4.0


In [111]:
burritos = df[df['item_name'].str.contains("Burrito")]
burritos

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,total_price_per_item_per_order,topping_count
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",11.75,11.75,7.0
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",9.25,9.25,6.0
16,8,1,Chicken Burrito,"[Tomatillo-Green Chili Salsa (Medium), [Pinto ...",8.49,8.49,3.0
17,9,1,Chicken Burrito,"[Fresh Tomato Salsa (Mild), [Black Beans, Rice...",8.49,8.49,5.0
21,11,1,Barbacoa Burrito,"[[Fresh Tomato Salsa (Mild), Tomatillo-Green C...",8.99,8.99,6.0
...,...,...,...,...,...,...,...
4608,1829,1,Veggie Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",11.25,11.25,6.0
4610,1830,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Sour Cream, Cheese...",11.75,11.75,5.0
4611,1830,1,Veggie Burrito,"[Tomatillo Green Chili Salsa, [Rice, Fajita Ve...",11.25,11.25,4.0
4617,1833,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Sour ...",11.75,11.75,6.0


In [114]:
burritos.groupby('item_name')['topping_count'].mean()

item_name
Barbacoa Burrito    4.142857
Burrito             4.833333
Carnitas Burrito    4.372881
Chicken Burrito     4.329114
Steak Burrito       4.407609
Veggie Burrito      4.957895
Name: topping_count, dtype: float64

### Part 6: Create a dictionary. Let the keys represent chip orders and the values represent the total number of orders.

Expected output: `{'Chips and Roasted Chili-Corn Salsa': 18, ... }`

Note: Please take the `quantity` column into account!

Optional: Learn how to use `.defaultdict()` to simplify your code.

In [117]:
chips = df[df['item_name'].str.contains("Chips")]
chips

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,total_price_per_item_per_order,topping_count
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39,
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,2.39,
6,3,1,Side of Chips,,1.69,1.69,
10,5,1,Chips and Guacamole,,4.45,4.45,
14,7,1,Chips and Guacamole,,4.45,4.45,
...,...,...,...,...,...,...,...
4596,1826,1,Chips and Guacamole,,4.45,4.45,
4600,1827,1,Chips and Guacamole,,4.45,4.45,
4605,1828,1,Chips and Guacamole,,4.45,4.45,
4613,1831,1,Chips,,2.15,2.15,


In [130]:
chip_orders = chips.groupby('item_name')['quantity'].sum()
chip_orders

item_name
Chips                                    230
Chips and Fresh Tomato Salsa             130
Chips and Guacamole                      506
Chips and Mild Fresh Tomato Salsa          1
Chips and Roasted Chili Corn Salsa        23
Chips and Roasted Chili-Corn Salsa        18
Chips and Tomatillo Green Chili Salsa     45
Chips and Tomatillo Red Chili Salsa       50
Chips and Tomatillo-Green Chili Salsa     33
Chips and Tomatillo-Red Chili Salsa       25
Side of Chips                            110
Name: quantity, dtype: int64

In [134]:
chip_orders.to_dict()

{'Chips': 230,
 'Chips and Fresh Tomato Salsa': 130,
 'Chips and Guacamole': 506,
 'Chips and Mild Fresh Tomato Salsa': 1,
 'Chips and Roasted Chili Corn Salsa': 23,
 'Chips and Roasted Chili-Corn Salsa': 18,
 'Chips and Tomatillo Green Chili Salsa': 45,
 'Chips and Tomatillo Red Chili Salsa': 50,
 'Chips and Tomatillo-Green Chili Salsa': 33,
 'Chips and Tomatillo-Red Chili Salsa': 25,
 'Side of Chips': 110}