# Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

### Step 3. Assign it to a variable called chipo.

In [2]:
chipo = pd.read_table('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv')  # Read a tsv into a DataFrame

### Step 4. See the first 10 entries

In [3]:
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


### Step 5. What is the number of observations in the dataset?

In [4]:
chipo.shape # 4,622 observations

(4622, 5)

### Step 6. What is the number of columns in the dataset?

In [5]:
chipo.shape # 5 columns

(4622, 5)

### Step 7. Print the name of all the columns.

In [6]:
chipo.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

### Step 8. How is the dataset indexed?

In [48]:
chipo.index

RangeIndex(start=0, stop=4622, step=1)

### Step 9. Which was the most ordered item?

In [17]:
# chicken bowl was the most ordered item
chipo.groupby('item_name')['quantity'].count().sort_values()

item_name
Carnitas Salad                             1
Veggie Crispy Tacos                        1
Chips and Mild Fresh Tomato Salsa          1
Crispy Tacos                               2
Bowl                                       2
Salad                                      2
Steak Salad                                4
Veggie Salad                               6
Carnitas Salad Bowl                        6
Burrito                                    6
Carnitas Crispy Tacos                      7
Veggie Soft Tacos                          7
Chicken Salad                              9
Barbacoa Salad Bowl                       10
Barbacoa Crispy Tacos                     11
Veggie Salad Bowl                         18
Chips and Roasted Chili-Corn Salsa        18
Izze                                      20
Chips and Tomatillo-Red Chili Salsa       20
Chips and Roasted Chili Corn Salsa        22
Barbacoa Soft Tacos                       25
Nantucket Nectar                          27


In [55]:
chipo.groupby('item_name')['quantity'].count().sort_values().tail(1)

item_name
Chicken Bowl    726
Name: quantity, dtype: int64

### Step 10. How many items were ordered?

In [20]:
chipo.sum() # 4,972 items were ordered

order_id    4285772
quantity       4972
dtype: int64

### Step 11. What was the most ordered item in the choice_description column?

In [50]:
# diet coke was the most ordered item in the choice_description column
chipo.groupby('choice_description')['quantity'].count().sort_values()

choice_description
[Adobo-Marinated and Grilled Chicken, Pinto Beans, [Sour Cream, Salsa, Cheese, Cilantro-Lime Rice, Guacamole]]                  1
[Tomatillo Green Chili Salsa, [Fajita Vegetables, Rice, Black Beans, Pinto Beans, Guacamole]]                                   1
[Tomatillo Green Chili Salsa, [Fajita Vegetables, Rice, Black Beans]]                                                           1
[Tomatillo Green Chili Salsa, [Fajita Vegetables, Rice, Cheese, Lettuce]]                                                       1
[Tomatillo Green Chili Salsa, [Fajita Vegetables, Rice, Cheese, Sour Cream, Guacamole]]                                         1
[Tomatillo Green Chili Salsa, [Fajita Vegetables, Rice, Cheese, Sour Cream]]                                                    1
[Tomatillo Green Chili Salsa, [Fajita Vegetables, Rice, Lettuce]]                                                               1
[Tomatillo Green Chili Salsa, [Fajita Vegetables, Rice, Pinto Beans, Ch

In [52]:
chipo.groupby('choice_description')['quantity'].count().sort_values().tail(1)

choice_description
[Diet Coke]    134
Name: quantity, dtype: int64

### Step 12. How many items were ordered in total?

In [22]:
chipo.sum() # 4,972 items were ordered

order_id    4285772
quantity       4972
dtype: int64

### Step 13. Turn the item price into a float

In [24]:
chipo.dtypes

order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

In [28]:
chipo['item_price'] = chipo['item_price'].str.replace('$', '')
chipo['item_price'] = chipo['item_price'].astype(float)

In [29]:
chipo.dtypes

order_id                int64
quantity                int64
item_name              object
choice_description     object
item_price            float64
dtype: object

### Step 14. How much was the revenue for the period in the dataset?

In [56]:
chipo.sum() #34,500.16 in revenue

order_id      4285772.00
quantity         4972.00
item_price      34500.16
dtype: float64

### Step 15. How many orders were made in the period?

In [57]:
chipo.order_id.value_counts().count() #1834 orders were made

1834

### Step 16. What is the average amount per order?

In [58]:
chipo_orders = chipo.groupby(by=['order_id']).sum()
chipo_orders.mean()['item_price']

18.811428571428689

### Step 17. How many different items are sold?

In [41]:
chipo['item_name'].value_counts()

Chicken Bowl                             726
Chicken Burrito                          553
Chips and Guacamole                      479
Steak Burrito                            368
Canned Soft Drink                        301
Steak Bowl                               211
Chips                                    211
Bottled Water                            162
Chicken Soft Tacos                       115
Chips and Fresh Tomato Salsa             110
Chicken Salad Bowl                       110
Canned Soda                              104
Side of Chips                            101
Veggie Burrito                            95
Barbacoa Burrito                          91
Veggie Bowl                               85
Carnitas Bowl                             68
Barbacoa Bowl                             66
Carnitas Burrito                          59
Steak Soft Tacos                          55
6 Pack Soft Drink                         54
Chips and Tomatillo Red Chili Salsa       48
Chicken Cr

In [43]:
chipo['item_name'].value_counts().count() # 50 different items are sold

50