# Ex2 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

In [3]:
df = pd.read_csv("https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv",sep = "\t")

In [4]:
df.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


### Step 3. Assign it to a variable called chipo.

In [5]:
chipo = df

### Step 4. See the first 10 entries

In [6]:
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


### Step 5. What is the number of observations in the dataset?

In [7]:
# Solution 1
chipo.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB


In [8]:
# Solution 2
chipo.shape


(4622, 5)

### Step 6. What is the number of columns in the dataset?

In [10]:
chipo.shape[1]

5

### Step 7. Print the name of all the columns.

In [15]:
for x in chipo.columns:
    print(x)

order_id
quantity
item_name
choice_description
item_price


### Step 8. How is the dataset indexed?

In [16]:
chipo.index

RangeIndex(start=0, stop=4622, step=1)

### Step 9. Which was the most-ordered item? 

In [17]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


In [24]:
chipo.groupby(by = "item_name").sum().sort_values("quantity",ascending =False)["quantity"][0:5]

item_name
Chicken Bowl           761
Chicken Burrito        591
Chips and Guacamole    506
Steak Burrito          386
Canned Soft Drink      351
Name: quantity, dtype: int64

### Step 10. For the most-ordered item, how many items were ordered?

In [25]:
chipo.groupby(by = "item_name").sum().sort_values("quantity",ascending =False)["quantity"][0]

761

### Step 11. What was the most ordered item in the choice_description column?

In [28]:
chipo[0:1]

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39


In [30]:
chipo.groupby(by = "choice_description").sum().sort_values("quantity",ascending =False)["quantity"][0:1]

choice_description
[Diet Coke]    159
Name: quantity, dtype: int64

### Step 12. How many items were orderd in total?

In [31]:
chipo.quantity.sum()

4972

### Step 13. Turn the item price into a float

#### Step 13.a. Check the item price type

In [32]:
chipo.item_price.dtype

dtype('O')

#### Step 13.b. Create a lambda function and change the type of item price

In [34]:
def remove_dollar(x):
    return x[1:]
chipo["new_item_price"] = chipo["item_price"].apply(lambda x: remove_dollar(x))

In [35]:
chipo.head(1)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,new_item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39,2.39


In [37]:
chipo["new_item_price"] = pd.to_numeric(chipo["new_item_price"])

In [42]:
chipo["item_price"] = chipo["new_item_price"]
chipo["item_price"].dtype
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,new_item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39
1,1,1,Izze,[Clementine],3.39,3.39
2,1,1,Nantucket Nectar,[Apple],3.39,3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98,16.98


In [44]:
chipo.drop(["new_item_price"],inplace = True, axis = 1)

In [45]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98


#### Step 13.c. Check the item price type

In [46]:
chipo["item_price"].dtype

dtype('float64')

### Step 14. How much was the revenue for the period in the dataset?

In [47]:
chipo["Sales"] = chipo["item_price"] * chipo["quantity"]
chipo.head(1)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,Sales
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39


In [48]:
chipo["Sales"].sum()

39237.02

### Step 15. How many orders were made in the period?

In [73]:
len(chipo["order_id"].unique())

1834

### Step 16. What is the average revenue amount per order?

In [60]:
# Solution 1
chipo.groupby(by = "order_id").sum()["Sales"].mean()

21.394231188658654

In [61]:
# Solution 2
chipo.groupby(by = "order_id").sum()


Unnamed: 0_level_0,quantity,item_price,Sales
order_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,4,11.56,11.56
2,2,16.98,33.96
3,2,12.67,12.67
4,2,21.00,21.00
5,2,13.70,13.70
...,...,...,...
1830,2,23.00,23.00
1831,3,12.90,12.90
1832,2,13.20,13.20
1833,2,23.50,23.50


### Step 17. How many different items are sold?

In [62]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,Sales
0,1,1,Chips and Fresh Tomato Salsa,,2.39,2.39
1,1,1,Izze,[Clementine],3.39,3.39
2,1,1,Nantucket Nectar,[Apple],3.39,3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98,33.96


In [70]:
len(chipo["item_name"].unique())

50