'''
Python Homework with Chipotle data
https://github.com/TheUpshot/chipotle
'''

'''
BASIC LEVEL
PART 1: Read in the file with csv.reader() and store it in an object called 'file_nested_list'.
Hint: This is a TSV file, and csv.reader() needs to be told how to handle it.
      https://docs.python.org/2/library/csv.html
'''

In [19]:
import csv

# specify that the delimiter is a tab character
with open('chipotle.tsv') as f:
    file_nested_list = [row for row in csv.reader(f, delimiter='\t')]

'''
BASIC LEVEL
PART 2: Separate 'file_nested_list' into the 'header' and the 'data'.
'''

In [17]:
header = file_nested_list[0]
data = file_nested_list[1:]

In [20]:
header

['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']

In [24]:
data[0:2]

[['1', '1', 'Chips and Fresh Tomato Salsa', 'NULL', '$2.39 '],
 ['1', '1', 'Izze', '[Clementine]', '$3.39 ']]

'''
INTERMEDIATE LEVEL
PART 3: Calculate the average price of an order.
Hint: Examine the data to see if the 'quantity' column is relevant to this calculation.
Hint: Think carefully about the simplest way to do this!
'''

In [53]:
# count the number of unique order_id's
# note: you could assume this is 1834 since that's the maximum order_id, but it's best to check
num_orders = len(set([row[0] for row in data]))     # 1834
num_orders

1834

In [45]:
# create a list of prices
# note: ignore the 'quantity' column because the 'item_price' takes quantity into account
prices = [float(row[4][1:-1]) for row in data]      # strip the dollar sign and trailing space
prices[0:10]

[2.39, 3.39, 3.39, 2.39, 16.98, 10.98, 1.69, 11.75, 9.25, 9.25]

In [54]:
# calculate the average price of an order and round to 2 digits
round(sum(prices) / num_orders, 2)      # $18.81

18.81

'''
INTERMEDIATE LEVEL
PART 4: Create a list (or set) of all unique sodas and soft drinks that they sell.
Note: Just look for 'Canned Soda' and 'Canned Soft Drink', and ignore other drinks like 'Izze'.
'''

In [17]:
# if 'item_name' includes 'Canned', append 'choice_description' to 'sodas' list
sodas = []
for row in data:
    if 'Canned' in row[2]:
        sodas.append(row[3][1:-1])      # strip the brackets

In [60]:
# create a set of unique sodas
unique_sodas = set(sodas)
unique_sodas

{'Coca Cola',
 'Coke',
 'Diet Coke',
 'Diet Dr. Pepper',
 'Dr. Pepper',
 'Lemonade',
 'Mountain Dew',
 'Nestea',
 'Sprite'}

In [66]:
# equivalent list comprehension (using an 'if' condition)
sodas = [row[3][1:-1] for row in data if 'Canned' in row[2]]

In [67]:
# create a set of unique sodas
unique_sodas = set(sodas)
unique_sodas

{'Coca Cola',
 'Coke',
 'Diet Coke',
 'Diet Dr. Pepper',
 'Dr. Pepper',
 'Lemonade',
 'Mountain Dew',
 'Nestea',
 'Sprite'}

In [17]:
'''
ADVANCED LEVEL
PART 5: Calculate the average number of toppings per burrito.
Note: Let's ignore the 'quantity' column to simplify this task.
Hint: Think carefully about the easiest way to count the number of toppings!
'''

In [68]:
# keep a running total of burritos and toppings
burrito_count = 0
topping_count = 0

In [69]:
# calculate number of toppings by counting the commas and adding 1
# note: x += 1 is equivalent to x = x + 1
for row in data:
    if 'Burrito' in row[2]:
        burrito_count += 1
        topping_count += (row[3].count(',') + 1)

In [70]:
burrito_count

1172

In [71]:
topping_count

6323

In [72]:
# calculate the average topping count and round to 2 digits
round(topping_count / float(burrito_count), 2)      # 5.40

5.4

'''
ADVANCED LEVEL
PART 6: Create a dictionary in which the keys represent chip orders and
  the values represent the total number of orders.
Expected output: {'Chips and Roasted Chili-Corn Salsa': 18, ... }
Note: Please take the 'quantity' column into account!
Optional: Learn how to use 'defaultdict' to simplify your code.
'''

In [73]:
# start with an empty dictionary
chips = {}

# if chip order is not in dictionary, then add a new key/value pair
# if chip order is already in dictionary, then update the value for that key
for row in data:
    if 'Chips' in row[2]:
        if row[2] not in chips:
            chips[row[2]] = int(row[1])     # this is a new key, so create key/value pair
        else:
            chips[row[2]] += int(row[1])    # this is an existing key, so add to the value

In [74]:
chips

{'Chips and Fresh Tomato Salsa': 130,
 'Chips and Tomatillo-Green Chili Salsa': 33,
 'Side of Chips': 110,
 'Chips and Guacamole': 506,
 'Chips and Tomatillo Green Chili Salsa': 45,
 'Chips': 230,
 'Chips and Tomatillo Red Chili Salsa': 50,
 'Chips and Roasted Chili-Corn Salsa': 18,
 'Chips and Roasted Chili Corn Salsa': 23,
 'Chips and Tomatillo-Red Chili Salsa': 25,
 'Chips and Mild Fresh Tomato Salsa': 1}

In [75]:
# defaultdict saves you the trouble of checking whether a key already exists
from collections import defaultdict
dchips = defaultdict(int)
for row in data:
    if 'Chips' in row[2]:
        dchips[row[2]] += int(row[1])

'''
BONUS: Think of a question about this data that interests you, and then answer it!
'''