# Tutorial 1 Getting And Knowing Your Data

**Tut 1.1**

@Data: 3rd Sep 2019

Link: [Pandas Exercises](https://github.com/SharkChilli-Cyrus/pandas_exercises)

[pd.df.sort_values](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html?highlight=sort_values#pandas.DataFrame.sort_values)

[pd.df.groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html)

## Import packages

In [27]:
import numpy as np
import pandas as pd

## Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv)

In [2]:
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'

In [3]:
chipo = pd.read_csv(url, sep='\t')

## See the first 10 entities

In [5]:
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


## What is the number of observations in the dataset?

In [9]:
chipo.shape

(4622, 5)

In [10]:
chipo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
order_id              4622 non-null int64
quantity              4622 non-null int64
item_name             4622 non-null object
choice_description    3376 non-null object
item_price            4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.6+ KB


## Show the name of all the columns

In [16]:
chipo.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

## How is the dataset indexed?

In [17]:
chipo.index

RangeIndex(start=0, stop=4622, step=1)

## <font color=red>Which was the most-ordered item?</font>

In [43]:
c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)
c.head()

Unnamed: 0_level_0,order_id,quantity
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicken Bowl,713926,761
Chicken Burrito,497303,591
Chips and Guacamole,449959,506
Steak Burrito,328437,386
Canned Soft Drink,304753,351


### For the most-ordered item, how many items were ordered?

In [45]:
c = chipo.groupby('item_name')
c = c.sum()
c = c.sort_values(['quantity'], ascending=False)
c.head(1)

Unnamed: 0_level_0,order_id,quantity
item_name,Unnamed: 1_level_1,Unnamed: 2_level_1
Chicken Bowl,713926,761


### What was the most ordered item in the choice_description column?

In [48]:
c = chipo.groupby('choice_description').sum()
c = c.sort_values(['quantity'], ascending=False)
c.head()

Unnamed: 0_level_0,order_id,quantity
choice_description,Unnamed: 1_level_1,Unnamed: 2_level_1
[Diet Coke],123455,159
[Coke],122752,143
[Sprite],80426,89
"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream, Lettuce]]",43088,49
"[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream]]",36041,42


### How many items were ordered in total?

In [52]:
total_items_orders = chipo['quantity'].sum()
print(total_items_orders)

4972


## Turn the item price into a float

### Check the item price type

In [53]:
chipo['item_price'].dtype

dtype('O')

### Create a <font color=red>lambda function</font> and change the type of item price

In [54]:
dollarizer = lambda x: float(x[1:-1])
chipo['item_price'] = chipo['item_price'].apply(dollarizer)

### Check the item price type

In [56]:
chipo['item_price'].dtype

dtype('float64')

## How much was the revenue for the period in the dataset?

In [62]:
revenue = np.multiply(chipo['quantity'], chipo['item_price']).sum()

print('Revenue was: $' + str(np.round(revenue, 2)))

Revenue was: $39237.02


## How many orders were made in the period?

In [74]:
orders = chipo['order_id'].value_counts().count()
print(orders)

1834


## What is the average revenue amount per order?

In [79]:
chipo['revenue'] = np.multiply(chipo['quantity'], chipo['item_price'])
order_grouped = chipo.groupby(by=['order_id']).sum()
order_grouped.mean()['revenue']


21.394231188658654

## How many different items are sold?

In [81]:
chipo['item_name'].value_counts().count()

50