# Part 1 - Getting and Knowing your Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [5]:
import pandas as pd
import numpy as np

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

### Step 3. Assign it to a variable called chipo.

In [67]:
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'
chipo = pd.read_csv(url, delimiter='\t')

In [68]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


### Step 4. See the first 10 entries

In [69]:
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


### Step 5. What is the number of observations in the dataset?

In [70]:
# Solution 1
chipo.shape[0]

4622

In [71]:
# Solution 2

chipo['item_name'].count()

4622

### Step 6. What is the number of columns in the dataset?

In [72]:
chipo.shape[1]

5

### Step 7. Print the name of all the columns.

In [73]:
chipo.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

### Step 8. How is the dataset indexed?

In [74]:
chipo.index

RangeIndex(start=0, stop=4622, step=1)

### Step 9. Which was the most-ordered item? 

In [75]:
chipo['item_name'].value_counts().max()

726

In [76]:
# Buscar el producto mas pedido 
top_item = chipo['item_name'].value_counts().idxmax()

subset = chipo[chipo['item_name'] == top_item]

print(f"The most ordered item is: {top_item}")

The most ordered item is: Chicken Bowl


### Step 10. For the most-ordered item, how many items were ordered?

In [78]:
quantity_subset = chipo[chipo['item_name'] == 'Chicken Bowl']

quantity_subset

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
13,7,1,Chicken Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",$11.25
19,10,1,Chicken Bowl,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$8.75
26,13,1,Chicken Bowl,"[Roasted Chili Corn Salsa (Medium), [Pinto Bea...",$8.49
...,...,...,...,...,...
4590,1825,1,Chicken Bowl,"[Roasted Chili Corn Salsa, [Rice, Black Beans,...",$11.25
4591,1825,1,Chicken Bowl,"[Tomatillo Red Chili Salsa, [Rice, Black Beans...",$8.75
4595,1826,1,Chicken Bowl,"[Tomatillo Green Chili Salsa, [Rice, Black Bea...",$8.75
4599,1827,1,Chicken Bowl,"[Roasted Chili Corn Salsa, [Cheese, Lettuce]]",$8.75


In [79]:
quantity_subset.quantity.sum()

761

### Step 11. What was the most ordered item in the choice_description column?

In [80]:
chipo['choice_description'].value_counts()

choice_description
[Diet Coke]                                                                                                                                      134
[Coke]                                                                                                                                           123
[Sprite]                                                                                                                                          77
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream, Lettuce]]                                                                            42
[Fresh Tomato Salsa, [Rice, Black Beans, Cheese, Sour Cream, Guacamole, Lettuce]]                                                                 40
                                                                                                                                                ... 
[Fresh Tomato Salsa (Mild), [Pinto Beans, Black Beans, Rice, Cheese, Sour Cream, Lettuc

### Step 12. How many items were orderd in total?

In [93]:
uniques_prod = chipo['item_name'].nunique()
print (f"There were {uniques_prod} items ordered in total")

There were 50 items ordered in total


### Step 13. Turn the item price into a float

#### Step 13.a. Check the item price type

In [104]:
chipo.dtypes

order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

#### Step 13.b. Create a lambda function and change the type of item price

In [114]:
chipo['item_price'] = chipo['item_price'].apply(lambda x: float(x.replace('$', '')))


#### Step 13.c. Check the item price type

In [115]:
chipo.dtypes

order_id                int64
quantity                int64
item_name              object
choice_description     object
item_price            float64
dtype: object

### Step 14. How much was the revenue for the period in the dataset?

In [119]:
revenue = (chipo['quantity']* chipo['item_price']).sum()
print (f"The revenue has been ${revenue}")

The revenue has been $39237.02


### Step 15. How many orders were made in the period?

In [120]:
orders = chipo['item_name'].count()
orders

4622

### Step 16. What is the average revenue amount per order?

In [121]:
# Solution 1

average_per_order = revenue / orders
average_per_order

8.48918649935093

In [124]:
# Solution 2
sum(chipo['quantity']* chipo['item_price'])/chipo.shape[0]

8.489186499350943

### Step 17. How many different items are sold?

In [127]:
chipo['item_name'].nunique()

50

# Part 2 - Filtering and Sorting Data

This time we are going to pull data directly from the internet.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.

### Step 1. Import the necessary libraries

In [337]:
import pandas as pd
import numpy as np

### Step 2. Import the dataset from this [address](https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv). 

### Step 3. Assign it to a variable called chipo.

In [338]:
url = ('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv')
chipo = pd.read_csv(url, delimiter='\t')

In [339]:
chipo.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


In [343]:
chipo.shape

(4622, 5)

### Step 4. How many products cost more than $10.00?

In [344]:
chipo['item_price'] = chipo['item_price'].apply(lambda x: float(x.replace('$', ''))) #We change the data type to be able to operate

In [346]:
chipo['unit_price'] = chipo['item_price']/chipo['quantity']

In [351]:
expensive_prod = chipo[chipo['unit_price']>= 10]
expensive_prod.head(3).sort_values(['unit_price'], ascending=False )

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,unit_price
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",11.75,11.75
13,7,1,Chicken Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",11.25,11.25
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",10.98,10.98


In [352]:
unique_items = expensive_prod[['item_price', 'item_name']].groupby ('item_name').mean()
unique_items.shape
print (f"There are {unique_items.shape[0]} products with a price higuer than $10")

There are 25 products with a price higuer than $10


### Step 5. What is the price of each item? 
###### print a data frame with only two columns item_name and item_price

In [353]:
expensive_prod[['item_name', 'unit_price',]]

Unnamed: 0,item_name,unit_price
5,Chicken Bowl,10.98
7,Steak Burrito,11.75
13,Chicken Bowl,11.25
23,Chicken Burrito,10.98
39,Barbacoa Bowl,11.75
...,...,...
4610,Steak Burrito,11.75
4611,Veggie Burrito,11.25
4617,Steak Burrito,11.75
4618,Steak Burrito,11.75


### Step 6. Sort by the name of the item

In [354]:
expensive_prod.sort_values(by="item_name", ascending=True) # Herewe can short by name (like the previous list)
#unique_items.sort_values(by="unit_price", ascending=False) # Here Im sorting by (avg) highest price

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,unit_price
1175,483,1,Barbacoa Bowl,"[Fresh Tomato Salsa (Mild), [Black Beans, Rice...",11.48,11.48
2013,812,1,Barbacoa Bowl,"[Tomatillo Red Chili Salsa, [Black Beans, Chee...",11.75,11.75
2073,836,1,Barbacoa Bowl,"[Tomatillo Green Chili Salsa, [Fajita Vegetabl...",11.75,11.75
4485,1786,1,Barbacoa Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",11.75,11.75
471,202,1,Barbacoa Bowl,"[[Tomatillo-Green Chili Salsa (Medium), Roaste...",11.48,11.48
...,...,...,...,...,...,...
1884,760,1,Veggie Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",11.25,11.25
4261,1700,1,Veggie Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Rice,...",11.25,11.25
295,128,1,Veggie Salad Bowl,"[Fresh Tomato Salsa, [Fajita Vegetables, Lettu...",11.25,11.25
738,304,1,Veggie Soft Tacos,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",11.25,11.25


### Step 7. What was the quantity of the most expensive item ordered?

In [391]:
most_exp_item = chipo[chipo['unit_price'] == chipo['unit_price'].max()] 
most_exp_item.head(5) # There are multiple items that have the maximum unit price 

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,unit_price
3546,1426,1,Barbacoa Salad Bowl,"[Fresh Tomato Salsa, Guacamole]",11.89,11.89
1311,534,1,Steak Salad Bowl,"[Roasted Chili Corn Salsa, [Fajita Vegetables,...",11.89,11.89
1132,468,1,Carnitas Salad Bowl,"[Fresh Tomato Salsa, [Rice, Black Beans, Chees...",11.89,11.89
1159,478,1,Steak Salad Bowl,"[Fresh Tomato Salsa, [Rice, Fajita Vegetables,...",11.89,11.89
3350,1343,1,Steak Salad Bowl,"[Fresh Tomato Salsa, [Cheese, Guacamole, Lettu...",11.89,11.89


In [392]:
most_exp_item['item_name'].value_counts() # We want to count the times each product has been ordered

item_name
Steak Salad Bowl       19
Barbacoa Salad Bowl     5
Carnitas Salad Bowl     4
Name: count, dtype: int64

In [397]:
# Each of these orders can be ordered the same item more than one. We can get a subset to analysed this.

sum_quantities = most_exp_item.groupby('item_name')['quantity'].sum() # sum quantities for same items
sum_quantities # We get the top 3 products with highest unit price an their order quantities.

item_name
Barbacoa Salad Bowl     5
Carnitas Salad Bowl     4
Steak Salad Bowl       21
Name: quantity, dtype: int64

### Step 8. How many times was a Veggie Salad Bowl ordered?

In [398]:
veggie_salad = chipo['item_name'].loc[chipo['item_name'] == 'Veggie Salad Bowl'].count()
print (f"The Veggie Salad Bowl has been order {veggie_salad} times")

The Veggie Salad Bowl has been order 18 times


### Step 9. How many times did someone order more than one Canned Soda?

In [429]:
canned_soda = chipo.loc[chipo['item_name'] == 'Canned Soda', ['item_name', 'quantity']]
canned_soda.sort_values(by= ['quantity'], ascending=False).head(21)

Unnamed: 0,item_name,quantity
2235,Canned Soda,4
3592,Canned Soda,2
700,Canned Soda,2
3866,Canned Soda,2
350,Canned Soda,2
18,Canned Soda,2
51,Canned Soda,2
162,Canned Soda,2
3364,Canned Soda,2
171,Canned Soda,2


In [430]:
more_canned_soda = canned_soda['quantity']>1
print(f"There have been {more_canned_soda.sum()} clients ordering more than 1 soda")

There have been 20 clients ordering more than 1 soda
