# Pandas Exercises
---

## Part 1 - Getting to know your data

With this series of exercises, we're going to use pandas to create a dataframe and get to know it with a few pandas techniques. This is by no means an exhaustive example, but it will provide some useful ways to get acquainted with a dataset. We're going to use a dataset with Chipotle orders because it's relatively small and it provides a good variety of things to look at (and because I love Chipotle :) )

### Step 1. Import the necessary libraries

In [3]:
import pandas as pd

### Step 2. Import the dataset from chipotle.tsv and assign it to a variable called chip

TSV stands for 'tab separated values' it works just like a CSV. You should use '\t' as your delimeter when reading it into a pandas dataframe.

In [4]:
chip = pd.read_csv('chipotle.tsv', sep='\t')

### Step 3. View the first 10 entries of the dataframe

In [5]:
chip.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


### Step 4. Print the number of observations in the dataframe

Note: you can do this in a few different ways.

In [6]:
# Solution 1
chip.shape[0]

4622

In [7]:
# Solution 2
chip.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.6+ KB


### Step 5. Print the number of columns in the dataframe

In [8]:
chip.shape[1]

5

### Step 6. Print the names of all of the columns in the dataframe

In [9]:
chip.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

### Step 7. Figure out how the dataset is indexed

In [11]:
chip.index

RangeIndex(start=0, stop=4622, step=1)

### Step 8. Find the most commonly ordered items in the dataset

In [18]:
c = chip.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)
c.head()

Unnamed: 0_level_0,order_id,quantity
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicken Bowl,713926,761
Chicken Burrito,497303,591
Chips and Guacamole,449959,506
Steak Burrito,328437,386
Canned Soft Drink,304753,351


### Step 9. Find out how many items were ordered in *total*

In [22]:
chip.quantity.sum()

4972

### Step 10. Find out how many *different types* of items are sold

In [23]:
chip.item_name.value_counts().count()

50

### Step 11. Turn the values in the price column into floats

We have to deal with those pesky dollar signs in the strings somehow... (Hint: you can replace all $'s with '' and *then* convert to a float)

In [39]:
chip.item_price = chip.item_price.replace('[\$,]', '', regex=True).astype(float)
chip.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98


### Step 12. Find the revenue for the period in the dataset

In [52]:
(chip['quantity']* chip['item_price']).sum()

39237.020000000055

### Step 13. Find out how many orders were made in the period

In [47]:
chip.order_id.value_counts().count()

1834

### Step 14. Find the average price of each order

In [50]:
# Solution 1

chip.groupby(by=['order_id']).sum().mean()['revenue']

21.394231188658654

In [51]:
# Solution 2

chip['revenue'] = chip['quantity'] * chip['item_price']
order_grouped = chip.groupby(by=['order_id']).sum()
order_grouped.mean()['revenue']

21.394231188658654