# Pandas Exercises
---

## Part 1 - Getting to know your data

With this series of exercises, we're going to use pandas to create a dataframe and get to know it with a few pandas techniques. This is by no means an exhaustive example, but it will provide some useful ways to get acquainted with a dataset. We're going to use a dataset with Chipotle orders because it's relatively small and it provides a good variety of things to look at (and because I love Chipotle :) )

### Step 1. Import the necessary libraries

In [2]:
import pandas as pd

### Step 2. Import the dataset from chipotle.tsv and assign it to a variable called chip

TSV stands for 'tab separated values' it works just like a CSV. You should use '\t' as your delimeter when reading it into a pandas dataframe.

In [3]:
chip = pd.read_csv('chipotle.tsv', sep='\t')

### Step 3. View the first 10 entries of the dataframe

In [4]:
chip.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",10.98
6,3,1,Side of Chips,,1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",9.25


### Step 4. Print the number of observations in the dataframe

Note: you can do this in a few different ways.

In [5]:
# Solution 1
chip.shape[0]

4622

In [7]:
# Solution 2
len(chip)

4622

### Step 5. Print the number of columns in the dataframe

In [8]:
chip.shape[1]

5

### Step 6. Print the names of all of the columns in the dataframe

In [25]:
chip.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

### Step 7. What are the column datatypes?

In [9]:
chip.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null float64
dtypes: float64(1), int64(2), object(2)
memory usage: 180.6+ KB


### Step 8. Find the most common items in the dataset

In [27]:
chip['item_name'].value_counts()

Chicken Bowl                             726
Chicken Burrito                          553
Chips and Guacamole                      479
Steak Burrito                            368
Canned Soft Drink                        301
Chips                                    211
Steak Bowl                               211
Bottled Water                            162
Chicken Soft Tacos                       115
Chips and Fresh Tomato Salsa             110
Chicken Salad Bowl                       110
Canned Soda                              104
Side of Chips                            101
Veggie Burrito                            95
Barbacoa Burrito                          91
Veggie Bowl                               85
Carnitas Bowl                             68
Barbacoa Bowl                             66
Carnitas Burrito                          59
Steak Soft Tacos                          55
6 Pack Soft Drink                         54
Chips and Tomatillo Red Chili Salsa       48
Chicken Cr

### Step 9. Find out how many items were ordered in *total*

In [28]:
chip['quantity'].sum()

4972

### Step 10. Find out how many *different types* of items are sold

In [36]:
chip['item_name'].value_counts().count()

50

In [35]:
len(chip['item_name'].unique())

50

### Step 11. Find the revenue for each item in the dataset

In [37]:
chip['quantity'] * chip['item_price']

0        2.39
1        3.39
2        3.39
3        2.39
4       33.96
5       10.98
6        1.69
7       11.75
8        9.25
9        9.25
10       4.45
11       8.75
12       8.75
13      11.25
14       4.45
15       2.39
16       8.49
17       8.49
18       4.36
19       8.75
20       4.45
21       8.99
22       3.39
23      10.98
24       3.39
25       2.39
26       8.49
27       8.99
28       1.09
29       8.49
        ...  
4592    11.75
4593    11.75
4594    11.75
4595     8.75
4596     4.45
4597     1.25
4598     1.50
4599     8.75
4600     4.45
4601     1.25
4602     9.25
4603     9.25
4604     8.75
4605     4.45
4606     1.25
4607    11.75
4608    11.25
4609     1.25
4610    11.75
4611    11.25
4612     9.25
4613     2.15
4614     1.50
4615     8.75
4616     4.45
4617    11.75
4618    11.75
4619    11.25
4620     8.75
4621     8.75
Length: 4622, dtype: float64

### Step 12. Make a new column called revenue and add it to the dataframe

HINT: Try Googling "how to create new column in pandas"

In [10]:
chip['revenue'] = chip['quantity'] * chip['item_price']
chip.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,revenue
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39
1,1,1,Izze,[Clementine],3.39,3.39
2,1,1,Nantucket Nectar,[Apple],3.39,3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98,33.96


### Step 13. Calculate the average price for items

In [40]:
chip['item_price'].mean()

7.464335785374297

### Step 14. Calculate the total revenue for the whole dataset

In [41]:
chip['revenue'].sum()

39237.02