<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Project 2: Analyzing Chipotle Data

_Author: Joseph Nelson (DC)_

---

For Project 2, you will complete a series of exercises exploring [order data from Chipotle](https://github.com/TheUpshot/chipotle), compliments of _The New York Times'_ "The Upshot."

For these exercises, you will conduct basic exploratory data analysis (Pandas not required) to understand the essentials of Chipotle's order data: how many orders are being made, the average price per order, how many different ingredients are used, etc. These allow you to practice business analysis skills while also becoming comfortable with Python.

---

## Basic Level

### Part 1: Read in the file with `csv.reader()` and store it in an object called `file_nested_list`.

Hint: This is a TSV (tab-separated value) file, and `csv.reader()` needs to be told [how to handle it](https://docs.python.org/2/library/csv.html).

In [2]:
import csv
from collections import namedtuple   # Convenient to store the data rows

DATA_FILE = './data/chipotle.tsv'

In [18]:
file_nested_list = []
with open(DATA_FILE,'r') as tsvfile:
    file_input = csv.reader(tsvfile, delimiter='\t')
    for row in file_input:
        file_nested_list.append(row)


In [19]:
file_nested_list[0]

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']

In [20]:
file_nested_list[1]

['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 ']

In [21]:
len(file_nested_list)

4623

In [23]:
file_nested_list

[['order_id', 'quantity', 'item_name', 'choice_description', 'item_price'],
 ['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 '],
 ['1', '1', 'Izze', '[Clementine]', '$3.39 '],
 ['1', '1', 'Nantucket Nectar', '[Apple]', '$3.39 '],
 ['1', '1', 'Chips and Tomatillo-Green Chili Salsa', 'NULL', '$2.39 '],
 ['2',
  '2',
  'Chicken Bowl',
  '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]',
  '$16.98 '],
 ['3',
  '1',
  'Chicken Bowl',
  '[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sour Cream, Guacamole, Lettuce]]',
  '$10.98 '],
 ['3', '1', 'Side of Chips', 'NULL', '$1.69 '],
 ['4',
  '1',
  'Steak Burrito',
  '[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]',
  '$11.75 '],
 ['4',
  '1',
  'Steak Soft Tacos',
  '[Tomatillo Green Chili Salsa, [Pinto Beans, Cheese, Sour Cream, Lettuce]]',
  '$9.25 '],
 ['5',
  '1',
  'Steak Burrito',
  '[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Ch

### Part 2: Separate `file_nested_list` into the `header` and the `data`.


In [25]:
header = file_nested_list[0]

In [27]:
len(header)

5

In [31]:
header

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']

In [33]:
header[0]

'order_id'

In [28]:
data = file_nested_list[1:]

In [29]:
len(data)

4622

In [30]:
data[0]

['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 ']

---

## Intermediate Level

### Part 3: Calculate the average price of an order.

Hint: Examine the data to see if the `quantity` column is relevant to this calculation.

Hint: Think carefully about the simplest way to do this!

Step 1 - Convert the data to a list of dictionaries.

In [46]:
data_dicts = []
for r in data:
    data_dicts.append(
        {
            header[0]:r[0],
            header[1]:r[1],
            header[2]:r[2],
            header[3]:r[3],
            header[4]:r[4],
        }
    )

In [48]:
data_dicts[:4]

[{'order_id': '1',
  'quantity': '1',
  'item_name': 'Chips and Fresh Tomato Salsa',
  'choice_description': 'NULL',
  'item_price': '$2.39 '},
 {'order_id': '1',
  'quantity': '1',
  'item_name': 'Izze',
  'choice_description': '[Clementine]',
  'item_price': '$3.39 '},
 {'order_id': '1',
  'quantity': '1',
  'item_name': 'Nantucket Nectar',
  'choice_description': '[Apple]',
  'item_price': '$3.39 '},
 {'order_id': '1',
  'quantity': '1',
  'item_name': 'Chips and Tomatillo-Green Chili Salsa',
  'choice_description': 'NULL',
  'item_price': '$2.39 '}]

Step 2 - Clean up data_dicts so working with numbers.

In [49]:
# First clean up order_id / order_quantity
for item in data_dicts:
    for h in header:
        if item[h].isdigit():
            item[h] = int(item[h])
        

[{'order_id': 2,
  'quantity': 2,
  'item_name': 'Chicken Bowl',
  'choice_description': '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]',
  'item_price': '$16.98 '}]

In [None]:
# then clean up item_price:

In [53]:
for item in data_dicts:
    item['item_price'] = item['item_price'].lstrip('$')
    item['item_price'] = item['item_price'].rstrip(' ')
    item['item_price'] = float(item['item_price'])

In [54]:
data_dicts[:3]

[{'order_id': 1,
  'quantity': 1,
  'item_name': 'Chips and Fresh Tomato Salsa',
  'choice_description': 'NULL',
  'item_price': 2.39},
 {'order_id': 1,
  'quantity': 1,
  'item_name': 'Izze',
  'choice_description': '[Clementine]',
  'item_price': 3.39},
 {'order_id': 1,
  'quantity': 1,
  'item_name': 'Nantucket Nectar',
  'choice_description': '[Apple]',
  'item_price': 3.39}]

In [55]:
[print(i) for i in data_dicts if i['order_id']==1]

{'order_id': 1, 'quantity': 1, 'item_name': 'Chips and Fresh Tomato Salsa', 'choice_description': 'NULL', 'item_price': 2.39}
{'order_id': 1, 'quantity': 1, 'item_name': 'Izze', 'choice_description': '[Clementine]', 'item_price': 3.39}
{'order_id': 1, 'quantity': 1, 'item_name': 'Nantucket Nectar', 'choice_description': '[Apple]', 'item_price': 3.39}
{'order_id': 1, 'quantity': 1, 'item_name': 'Chips and Tomatillo-Green Chili Salsa', 'choice_description': 'NULL', 'item_price': 2.39}


[None, None, None, None]

In [162]:
[i for i in data_dicts if i['quantity']>=2][:3]

[{'order_id': 2,
  'quantity': 2,
  'item_name': 'Chicken Bowl',
  'choice_description': '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]',
  'item_price': 16.98},
 {'order_id': 9,
  'quantity': 2,
  'item_name': 'Canned Soda',
  'choice_description': '[Sprite]',
  'item_price': 2.18},
 {'order_id': 23,
  'quantity': 2,
  'item_name': 'Canned Soda',
  'choice_description': '[Mountain Dew]',
  'item_price': 2.18}]

In [168]:
[i for i in data_dicts if (i['choice_description']=="[Sprite]" and i['quantity'] > 2)][:10]

# it looks like the item_prace adjusts for the quantity automatically. e.g. $1.09 * 2 = $2.18

[{'order_id': 350,
  'quantity': 3,
  'item_name': 'Canned Soft Drink',
  'choice_description': '[Sprite]',
  'item_price': 3.75},
 {'order_id': 901,
  'quantity': 4,
  'item_name': 'Canned Soda',
  'choice_description': '[Sprite]',
  'item_price': 4.36},
 {'order_id': 1786,
  'quantity': 4,
  'item_name': 'Canned Soft Drink',
  'choice_description': '[Sprite]',
  'item_price': 5.0}]

In [169]:
[i for i in data_dicts if i['order_id'] == 1786]

# $1.25 is believeable for a can of soft drink, and it's consistent across soft drinks. Maybe a different state?

[{'order_id': 1786,
  'quantity': 1,
  'item_name': 'Chicken Bowl',
  'choice_description': '[Fresh Tomato Salsa, Rice]',
  'item_price': 8.75},
 {'order_id': 1786,
  'quantity': 1,
  'item_name': 'Carnitas Burrito',
  'choice_description': '[Fresh Tomato Salsa, [Fajita Vegetables, Rice, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]',
  'item_price': 11.75},
 {'order_id': 1786,
  'quantity': 1,
  'item_name': 'Chicken Bowl',
  'choice_description': '[Fresh Tomato Salsa, [Rice, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]',
  'item_price': 11.25},
 {'order_id': 1786,
  'quantity': 1,
  'item_name': 'Chicken Bowl',
  'choice_description': '[Fresh Tomato Salsa, [Fajita Vegetables, Rice, Black Beans, Cheese, Sour Cream, Guacamole, Lettuce]]',
  'item_price': 11.25},
 {'order_id': 1786,
  'quantity': 1,
  'item_name': 'Barbacoa Bowl',
  'choice_description': '[Fresh Tomato Salsa, [Fajita Vegetables, Rice, Black Beans, Guacamole, Lettuce]]',
  'item_price': 11.75},
 {'order_

In [57]:
sum([1,2,3])

6

Calculate the average order price (i.e. total cost of orders / number of distinct orders)

In [154]:
total_orders_cost = sum([i['item_price']  for i in data_dicts])

In [155]:
total_orders = max(i['order_id'] for i in data_dicts)

In [156]:
print(f"Average cost per order: {format(total_orders_cost/total_orders,'.2f')}")

Average cost per order: 18.81


### Part 4: Create a list (or set) named `unique_sodas` containing all of unique sodas and soft drinks that Chipotle sells.

Note: Just look for `'Canned Soda'` and `'Canned Soft Drink'`, and ignore other drinks like `'Izze'`.

In [84]:
all_sodas = [i['choice_description'] for i in data_dicts if i['item_name'][:6] == 'Canned']

In [94]:
all_sodas_clean = []

for i in all_sodas:
    ci = i[1:-1]
    all_sodas_clean.append(ci)

In [97]:
unique_sodas = list(set(all_sodas_clean))

In [98]:
unique_sodas

['Lemonade',
 'Sprite',
 'Mountain Dew',
 'Diet Coke',
 'Coke',
 'Coca Cola',
 'Diet Dr. Pepper',
 'Dr. Pepper',
 'Nestea']

---

## Advanced Level


### Part 5: Calculate the average number of toppings per burrito.

Note: Let's ignore the `quantity` column to simplify this task.

Hint: Think carefully about the easiest way to count the number of toppings!


In [99]:
len(['1','2','3'])

3

In [105]:
[(i['choice_description']) for i in data_dicts if "Burrito" in i['item_name']]

['[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]',
 '[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Cheese, Sour Cream, Lettuce]]',
 '[Tomatillo-Green Chili Salsa (Medium), [Pinto Beans, Cheese, Sour Cream]]',
 '[Fresh Tomato Salsa (Mild), [Black Beans, Rice, Cheese, Sour Cream, Lettuce]]',
 '[[Fresh Tomato Salsa (Mild), Tomatillo-Green Chili Salsa (Medium), Tomatillo-Red Chili Salsa (Hot)], [Rice, Cheese, Sour Cream, Lettuce]]',
 '[[Tomatillo-Green Chili Salsa (Medium), Tomatillo-Red Chili Salsa (Hot)], [Pinto Beans, Rice, Cheese, Sour Cream, Guacamole, Lettuce]]',
 '[[Tomatillo-Green Chili Salsa (Medium), Roasted Chili Corn Salsa (Medium)], [Black Beans, Rice, Sour Cream, Lettuce]]',
 '[Tomatillo-Green Chili Salsa (Medium), [Pinto Beans, Rice, Cheese, Sour Cream]]',
 '[[Roasted Chili Corn Salsa (Medium), Fresh Tomato Salsa (Mild)], [Rice, Black Beans, Sour Cream]]',
 '[Fresh Tomato Salsa, [Rice, Pinto Beans, C

In [110]:
print('[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]'
     .replace('[','').replace(']','')
     )

Tomatillo Red Chili Salsa, Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce


In [126]:
toppings_per_burrito = [i['choice_description'].replace('[','').replace(']','') for i in data_dicts if "Burrito" in i['item_name']]

In [127]:
list(toppings_per_burrito[0].replace(' ','').split(','))

['TomatilloRedChiliSalsa',
 'FajitaVegetables',
 'BlackBeans',
 'PintoBeans',
 'Cheese',
 'SourCream',
 'Guacamole',
 'Lettuce']

In [128]:
toppings_per_burrito_clean = [list(i.replace(' ','').split(',')) for i in toppings_per_burrito]

In [129]:
toppings_per_burrito_clean

[['TomatilloRedChiliSalsa',
  'FajitaVegetables',
  'BlackBeans',
  'PintoBeans',
  'Cheese',
  'SourCream',
  'Guacamole',
  'Lettuce'],
 ['FreshTomatoSalsa',
  'Rice',
  'BlackBeans',
  'PintoBeans',
  'Cheese',
  'SourCream',
  'Lettuce'],
 ['Tomatillo-GreenChiliSalsa(Medium)', 'PintoBeans', 'Cheese', 'SourCream'],
 ['FreshTomatoSalsa(Mild)',
  'BlackBeans',
  'Rice',
  'Cheese',
  'SourCream',
  'Lettuce'],
 ['FreshTomatoSalsa(Mild)',
  'Tomatillo-GreenChiliSalsa(Medium)',
  'Tomatillo-RedChiliSalsa(Hot)',
  'Rice',
  'Cheese',
  'SourCream',
  'Lettuce'],
 ['Tomatillo-GreenChiliSalsa(Medium)',
  'Tomatillo-RedChiliSalsa(Hot)',
  'PintoBeans',
  'Rice',
  'Cheese',
  'SourCream',
  'Guacamole',
  'Lettuce'],
 ['Tomatillo-GreenChiliSalsa(Medium)',
  'RoastedChiliCornSalsa(Medium)',
  'BlackBeans',
  'Rice',
  'SourCream',
  'Lettuce'],
 ['Tomatillo-GreenChiliSalsa(Medium)',
  'PintoBeans',
  'Rice',
  'Cheese',
  'SourCream'],
 ['RoastedChiliCornSalsa(Medium)',
  'FreshTomatoSalsa(M

In [130]:
avg_toppings_per_burrito = sum([len(i) for i in toppings_per_burrito_clean])/len(toppings_per_burrito_clean)

In [131]:
print(f"Average toppings per burrito: {format(avg_toppings_per_burrito,'f')}")

Average toppings per burrito: 5.395051



**Consolidated set of code to determine average:**



In [133]:
# clean up the dictionary output slightly
toppings_per_burrito = [i['choice_description'].replace('[','').replace(']','') for i in data_dicts if "Burrito" in i['item_name']]

# clean it up some more to get the toppings into a single nested list
toppings_per_burrito_clean = [list(i.replace(' ','').split(',')) for i in toppings_per_burrito]

# calculate the average toppings per burrito
avg_toppings_per_burrito = sum([len(i) for i in toppings_per_burrito_clean])/len(toppings_per_burrito_clean)

# print the output
print(f"Average toppings per burrito: {format(avg_toppings_per_burrito,'f')}")

Average toppings per burrito: 5.395051


### Part 6: Create a dictionary. Let the keys represent chip orders and the values represent the total number of orders.

Expected output: `{'Chips and Roasted Chili-Corn Salsa': 18, ... }`

Note: Please take the `quantity` column into account!

Optional: Learn how to use `.defaultdict()` to simplify your code.

In [138]:
# Get a list of all cases where chips were ordered:

chips_orders = [i for i in data_dicts if 'Chips' in i['item_name']]


In [139]:
len(chips_orders)

1084

In [140]:
chips_orders[:4]

[{'order_id': 1,
  'quantity': 1,
  'item_name': 'Chips and Fresh Tomato Salsa',
  'choice_description': 'NULL',
  'item_price': 2.39},
 {'order_id': 1,
  'quantity': 1,
  'item_name': 'Chips and Tomatillo-Green Chili Salsa',
  'choice_description': 'NULL',
  'item_price': 2.39},
 {'order_id': 3,
  'quantity': 1,
  'item_name': 'Side of Chips',
  'choice_description': 'NULL',
  'item_price': 1.69},
 {'order_id': 5,
  'quantity': 1,
  'item_name': 'Chips and Guacamole',
  'choice_description': 'NULL',
  'item_price': 4.45}]

In [175]:
# Compile unique list of chips_orders item names

# Use replacement on "-" given that there seem to be two versions of the same item
chips_item_names = list(set([i['item_name'].replace('-',' ') for i in chips_orders]))

# Don't replace on "-" given that there seem to be two versions of the same item
#chips_item_names = list(set([i['item_name'] for i in chips_orders]))

In [176]:
chips_item_names

['Chips and Tomatillo Red Chili Salsa',
 'Chips and Fresh Tomato Salsa',
 'Chips and Tomatillo Green Chili Salsa',
 'Chips and Tomatillo-Green Chili Salsa',
 'Chips',
 'Chips and Roasted Chili-Corn Salsa',
 'Side of Chips',
 'Chips and Guacamole',
 'Chips and Roasted Chili Corn Salsa',
 'Chips and Tomatillo-Red Chili Salsa',
 'Chips and Mild Fresh Tomato Salsa']

In [177]:
[print(i) for i in chips_orders if (i['item_name'] == 'Chips' and i['choice_description'] != 'NULL')]

[]

In [178]:
[print(i) for i in chips_orders if (i['item_name'] == 'Side of Chips' and i['choice_description'] != 'NULL')]

[]

In [179]:
chip_dict = {}

for chip_name in chips_item_names:
    num_orders = sum([i['quantity'] for i in chips_orders if i['item_name']==chip_name])
    chip_dict.update({chip_name : num_orders})

In [174]:
chip_dict # This is the output with the "-" replace used

{'Chips and Tomatillo Red Chili Salsa': 50,
 'Chips and Fresh Tomato Salsa': 130,
 'Chips and Tomatillo Green Chili Salsa': 45,
 'Chips': 230,
 'Side of Chips': 110,
 'Chips and Guacamole': 506,
 'Chips and Roasted Chili Corn Salsa': 23,
 'Chips and Mild Fresh Tomato Salsa': 1}

In [180]:
chip_dict # This is the output without the "-" replace used

{'Chips and Tomatillo Red Chili Salsa': 50,
 'Chips and Fresh Tomato Salsa': 130,
 'Chips and Tomatillo Green Chili Salsa': 45,
 'Chips and Tomatillo-Green Chili Salsa': 33,
 'Chips': 230,
 'Chips and Roasted Chili-Corn Salsa': 18,
 'Side of Chips': 110,
 'Chips and Guacamole': 506,
 'Chips and Roasted Chili Corn Salsa': 23,
 'Chips and Tomatillo-Red Chili Salsa': 25,
 'Chips and Mild Fresh Tomato Salsa': 1}

---

## Bonus: Craft a problem statement about this data that interests you, and then answer it!


**Potential options:**

* Look at the distribution of order values
* Look at the distribution of item values, across different items
* Try and infer the price of different additions / extras (see if there is any?)
