<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Project 2: Analyzing Chipotle Data

_Author: Joseph Nelson (DC)_

---

For Project 2, you will complete a series of exercises exploring [order data from Chipotle](https://github.com/TheUpshot/chipotle), compliments of _The New York Times'_ "The Upshot."

For these exercises, you will conduct basic exploratory data analysis (Pandas not required) to understand the essentials of Chipotle's order data: how many orders are being made, the average price per order, how many different ingredients are used, etc. These allow you to practice business analysis skills while also becoming comfortable with Python.

---

## Basic Level

### Part 1: Read in the file with `csv.reader()` and store it in an object called `file_nested_list`.

Hint: This is a TSV (tab-separated value) file, and `csv.reader()` needs to be told [how to handle it](https://docs.python.org/2/library/csv.html).

In [2]:
import csv
from collections import namedtuple   # Convenient to store the data rows

DATA_FILE = './data/chipotle.tsv'

In [15]:
help(csv.reader)

Help on built-in function reader in module _csv:

reader(...)
    csv_reader = reader(iterable [, dialect='excel']
                            [optional keyword args])
        for row in csv_reader:
            process(row)
    
    The "iterable" argument can be any object that returns a line
    of input for each iteration, such as a file object or a list.  The
    optional "dialect" parameter is discussed below.  The function
    also accepts optional keyword arguments which override settings
    provided by the dialect.
    
    The returned object is an iterator.  Each iteration returns a row
    of the CSV file (which can span multiple input lines).



In [21]:
with open(DATA_FILE) as file:
    reader = csv.reader(file, delimiter='\t')
    file_nested_list = []
    for row in reader:
        file_nested_list.append(row)
        
        
file_nested_list[:5]

[['order_id', 'quantity', 'item_name', 'choice_description', 'item_price'],
 ['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 '],
 ['1', '1', 'Izze', '[Clementine]', '$3.39 '],
 ['1', '1', 'Nantucket Nectar', '[Apple]', '$3.39 '],
 ['1', '1', 'Chips and Tomatillo-Green Chili Salsa', 'NULL', '$2.39 ']]

### Part 2: Separate `file_nested_list` into the `header` and the `data`.


In [22]:
header, data = file_nested_list[0], file_nested_list[1:]
print('Header: ', header)
print('Data: ', data[:5])

Header:  ['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']
Data:  [['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 '], ['1', '1', 'Izze', '[Clementine]', '$3.39 '], ['1', '1', 'Nantucket Nectar', '[Apple]', '$3.39 '], ['1', '1', 'Chips and Tomatillo-Green Chili Salsa', 'NULL', '$2.39 '], ['2', '2', 'Chicken Bowl', '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]', '$16.98 ']]


---

## Intermediate Level

### Part 3: Calculate the average price of an order.

Hint: Examine the data to see if the `quantity` column is relevant to this calculation.

Hint: Think carefully about the simplest way to do this!

In [34]:
# Turn the nested list into an imitation dataframe (i.e. a dictionary of lists)
df = {col:[] for col in header}
df

{'order_id': [],
 'quantity': [],
 'item_name': [],
 'choice_description': [],
 'item_price': []}

In [35]:
for row in data:
    assert len(row) == len(header)
    for i in range(len(row)):
        df[header[i]].append(row[i])

In [36]:
df['item_price'][:10]

['$2.39 ',
 '$3.39 ',
 '$3.39 ',
 '$2.39 ',
 '$16.98 ',
 '$10.98 ',
 '$1.69 ',
 '$11.75 ',
 '$9.25 ',
 '$9.25 ']

In [37]:
# The item price is a string with the $ symbol in the front. Turn it to a float.
df['item_price'] = list(map(lambda price: float(price.replace('$','').strip()), df['item_price']))

In [41]:
df['quantity'][:10]

['1', '1', '1', '1', '2', '1', '1', '1', '1', '1']

In [42]:
# Quantity is stored as a string and needs to be converted to an integer
df['quantity'] = list(map(int, df['quantity']))
df['quantity'][:10]

[1, 1, 1, 1, 2, 1, 1, 1, 1, 1]

In [49]:
# Use price * quantity to get the order total and average it out.
totals = [x*y for x, y in list(zip(df['item_price'],df['quantity']))]

8.489186499350943

In [87]:
# Aggregate the orders
orders = {order: 0 for order in set(df['order_id'])}

for i in range(len(df['order_id'])):
    order_id = df['order_id'][i]
    total = totals[i]
    orders[order_id] += total

In [88]:
grand_totals = orders.values()
sum(grand_totals) / len(grand_totals)

21.39423118865875

The average price of an order at Chipotle is $21.39

### Part 4: Create a list (or set) named `unique_sodas` containing all of unique sodas and soft drinks that Chipotle sells.

Note: Just look for `'Canned Soda'` and `'Canned Soft Drink'`, and ignore other drinks like `'Izze'`.

In [51]:
df['item_name'][:20]

['Chips and Fresh Tomato Salsa',
 'Izze',
 'Nantucket Nectar',
 'Chips and Tomatillo-Green Chili Salsa',
 'Chicken Bowl',
 'Chicken Bowl',
 'Side of Chips',
 'Steak Burrito',
 'Steak Soft Tacos',
 'Steak Burrito',
 'Chips and Guacamole',
 'Chicken Crispy Tacos',
 'Chicken Soft Tacos',
 'Chicken Bowl',
 'Chips and Guacamole',
 'Chips and Tomatillo-Green Chili Salsa',
 'Chicken Burrito',
 'Chicken Burrito',
 'Canned Soda',
 'Chicken Bowl']

In [60]:
# List the description of every item that starts with 'Canned' and convert it to a set
unique_sodas = set([description for item, description in zip(df['item_name'], df['choice_description']) if item.startswith('Canned')])

In [61]:
unique_sodas

{'[Coca Cola]',
 '[Coke]',
 '[Diet Coke]',
 '[Diet Dr. Pepper]',
 '[Dr. Pepper]',
 '[Lemonade]',
 '[Mountain Dew]',
 '[Nestea]',
 '[Sprite]'}

---

## Advanced Level


### Part 5: Calculate the average number of toppings per burrito.

Note: Let's ignore the `quantity` column to simplify this task.

Hint: Think carefully about the easiest way to count the number of toppings!


In [65]:
toppings = [description for item, description in zip(df['item_name'], df['choice_description']) if item.endswith('Burrito')]

In [69]:
toppings[:10]

['[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]',
 '[Fresh Tomato Salsa, [Rice, Black Beans, Pinto Beans, Cheese, Sour Cream, Lettuce]]',
 '[Tomatillo-Green Chili Salsa (Medium), [Pinto Beans, Cheese, Sour Cream]]',
 '[Fresh Tomato Salsa (Mild), [Black Beans, Rice, Cheese, Sour Cream, Lettuce]]',
 '[[Fresh Tomato Salsa (Mild), Tomatillo-Green Chili Salsa (Medium), Tomatillo-Red Chili Salsa (Hot)], [Rice, Cheese, Sour Cream, Lettuce]]',
 '[[Tomatillo-Green Chili Salsa (Medium), Tomatillo-Red Chili Salsa (Hot)], [Pinto Beans, Rice, Cheese, Sour Cream, Guacamole, Lettuce]]',
 '[[Tomatillo-Green Chili Salsa (Medium), Roasted Chili Corn Salsa (Medium)], [Black Beans, Rice, Sour Cream, Lettuce]]',
 '[Tomatillo-Green Chili Salsa (Medium), [Pinto Beans, Rice, Cheese, Sour Cream]]',
 '[[Roasted Chili Corn Salsa (Medium), Fresh Tomato Salsa (Mild)], [Rice, Black Beans, Sour Cream]]',
 '[Fresh Tomato Salsa, [Rice, Pinto Beans, C

In [72]:
# Toppings look like lists but they are actually stored as strings. For today's life hack, we just count the commas
# in each row and add 1 to get the number of toppings

topping_counts = list(map(lambda row: row.count(',') + 1, toppings))

# And now get the average
sum(topping_counts) / len(topping_counts)

5.395051194539249

An average of 5.4 toppings are put on each burrito

### Part 6: Create a dictionary. Let the keys represent chip orders and the values represent the total number of orders.

Expected output: `{'Chips and Roasted Chili-Corn Salsa': 18, ... }`

Note: Please take the `quantity` column into account!

Optional: Learn how to use `.defaultdict()` to simplify your code.

In [78]:
chip_orders = {item:0 for item in set(df['item_name']) if item.startswith('Chips') or item.endswith('Chips')}
chip_orders

{'Chips and Guacamole': 0,
 'Chips and Fresh Tomato Salsa': 0,
 'Chips and Tomatillo-Green Chili Salsa': 0,
 'Chips and Tomatillo Red Chili Salsa': 0,
 'Chips and Mild Fresh Tomato Salsa': 0,
 'Chips and Roasted Chili-Corn Salsa': 0,
 'Chips and Tomatillo-Red Chili Salsa': 0,
 'Chips and Tomatillo Green Chili Salsa': 0,
 'Side of Chips': 0,
 'Chips and Roasted Chili Corn Salsa': 0,
 'Chips': 0}

In [79]:
for i in range(len(df['item_name'])):
    item = df['item_name'][i]
    if item.startswith('Chips') or item.endswith('Chips'):
        chip_orders[item] += df['quantity'][i]
chip_orders      

{'Chips and Guacamole': 506,
 'Chips and Fresh Tomato Salsa': 130,
 'Chips and Tomatillo-Green Chili Salsa': 33,
 'Chips and Tomatillo Red Chili Salsa': 50,
 'Chips and Mild Fresh Tomato Salsa': 1,
 'Chips and Roasted Chili-Corn Salsa': 18,
 'Chips and Tomatillo-Red Chili Salsa': 25,
 'Chips and Tomatillo Green Chili Salsa': 45,
 'Side of Chips': 110,
 'Chips and Roasted Chili Corn Salsa': 23,
 'Chips': 230}

---

## Bonus: Craft a problem statement about this data that interests you, and then answer it!
