# Pizza dataset

Welcome to a quick exercise for you to practice your pandas skills! We will be using the Pizza sales dataset used for previous Academy projects. Just follow along and complete the tasks outlined in bold below. The tasks will get harder and harder as you go along.

** Import pandas as pd.**

In [1]:
import pandas as pd

**Read pizza_sales.csv as a dataframe called pizzas. Uncomment lines below once you're done.**

In [3]:
pizzas = pd.read_csv("data/pizza_sales.csv")

# Uncomment these lines once you manage to load the pizzas dataframe.
pizzas["order_time"] = pd.to_datetime(pizzas["order_time"], format="%H:%M")
pizzas["order_date"] = pd.to_datetime(pizzas["order_date"])

**Check the head of the DataFrame.**

In [5]:
pizzas['order_time'].dt.hour

0        11
1        11
2        11
3        11
4        11
         ..
48615    21
48616    21
48617    21
48618    22
48619    23
Name: order_time, Length: 48620, dtype: int64

** Use the .info() method to find out how many entries there are.**

In [4]:
pizzas.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48620 entries, 0 to 48619
Data columns (total 16 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   order_details_id   48620 non-null  int64         
 1   order_id           48620 non-null  int64         
 2   pizza_id           48620 non-null  object        
 3   quantity           48620 non-null  int64         
 4   order_date         48620 non-null  datetime64[ns]
 5   order_time         48620 non-null  datetime64[ns]
 6   unit_price         48620 non-null  float64       
 7   total_price        48620 non-null  float64       
 8   pizza_size         48620 non-null  object        
 9   pizza_category     48620 non-null  object        
 10  pizza_ingredients  48620 non-null  object        
 11  pizza_name         48620 non-null  object        
 12  first_name         48619 non-null  object        
 13  second_name        48619 non-null  object        
 14  addres

**What is the average `total_price`?**

In [5]:
pizzas["total_price"].mean()

16.821473673385437

**What is the `best-selling` pizza?**

In [6]:
most_sold_pizza = pizzas["pizza_name"].value_counts().idxmax()
most_sold_pizza

'The Classic Deluxe Pizza'

**What is the best-selling pizza size? What is the ratio of that size to overall sales?**

In [7]:
# Calculate total sales of pizzas by size
pizza_size_sales = pizzas.groupby("pizza_size")["quantity"].sum()

# Best-selling pizza size
best_selling_size_sales = pizza_size_sales.max()

# Overall sales
total_sales = pizza_size_sales.sum()

# Calculate the ratio of best-selling pizza size to overall sales
best_selling_ratio = best_selling_size_sales / total_sales
best_selling_size_sales, best_selling_ratio

(18956, 0.3823778593617622)

**What is the best-selling pizza during lunchtime (12-13)?**

In [6]:
lunchtime_pizzas = pizzas[
    (pizzas["order_time"].dt.hour >= 12) & (pizzas["order_time"].dt.hour < 13)
]
most_sold_lunctime_pizza = lunchtime_pizzas["pizza_name"].value_counts().idxmax()
most_sold_lunctime_pizza

'The Pepperoni Pizza'

In [12]:
pizzas['order_hour'] = pizzas['order_time'].dt.hour
lunchtime_orders =  pizzas[(pizzas["order_hour"] >= 12) & (pizzas["order_hour"] <= 13)]
lunchtime_orders

Unnamed: 0,order_details_id,order_id,pizza_id,quantity,order_date,order_time,unit_price,total_price,pizza_size,pizza_category,pizza_ingredients,pizza_name,first_name,second_name,address,date_of_birth,order_hour
6,7,3,ital_supr_m,1,2015-01-01,1900-01-01 12:12:00,16.50,16.50,M,Supreme,"Calabrese Salami, Capocollo, Tomatoes, Red Oni...",The Italian Supreme Pizza,Victor,Hardy,"1276 Anderson Mountain\r\nWrightborough, FL 13840",1963-03-22,12
7,8,3,prsc_argla_l,1,2015-01-01,1900-01-01 12:12:00,20.75,20.75,L,Supreme,"Prosciutto di San Daniele, Arugula, Mozzarella...",The Prosciutto and Arugula Pizza,Victor,Hardy,"1276 Anderson Mountain\r\nWrightborough, FL 13840",1963-03-22,12
8,9,4,ital_supr_m,1,2015-01-01,1900-01-01 12:16:00,16.50,16.50,M,Supreme,"Calabrese Salami, Capocollo, Tomatoes, Red Oni...",The Italian Supreme Pizza,Michelle,Walls,"4278 Davis Corner\r\nNorth Brianna, FM 60171",1976-03-28,12
9,10,5,ital_supr_m,1,2015-01-01,1900-01-01 12:21:00,16.50,16.50,M,Supreme,"Calabrese Salami, Capocollo, Tomatoes, Red Oni...",The Italian Supreme Pizza,Lisa,Mills,54013 Schmidt Groves Suite 440\r\nSanchezmouth...,1997-08-14,12
10,11,6,bbq_ckn_s,1,2015-01-01,1900-01-01 12:29:00,12.75,12.75,S,Chicken,"Barbecued Chicken, Red Peppers, Green Peppers,...",The Barbecue Chicken Pizza,Traci,Evans,"33172 Veronica Mission\r\nVasquezberg, ID 63500",1941-04-17,12
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48477,48478,21290,ckn_pesto_l,1,2015-12-31,1900-01-01 13:16:00,20.75,20.75,L,Chicken,"Chicken, Tomatoes, Red Peppers, Spinach, Garli...",The Chicken Pesto Pizza,Maureen,Carpenter,"980 Hernandez Landing\r\nEast Christopher, GU ...",1942-01-12,13
48478,48479,21290,ital_veggie_l,1,2015-12-31,1900-01-01 13:16:00,21.00,21.00,L,Veggie,"Eggplant, Artichokes, Tomatoes, Zucchini, Red ...",The Italian Vegetables Pizza,Maureen,Carpenter,"980 Hernandez Landing\r\nEast Christopher, GU ...",1942-01-12,13
48479,48480,21291,green_garden_s,1,2015-12-31,1900-01-01 13:29:00,12.00,12.00,S,Veggie,"Spinach, Mushrooms, Tomatoes, Green Olives, Fe...",The Green Garden Pizza,Karen,Bradley,"33330 David Station\r\nSouth Amber, KS 96633",2005-08-09,13
48480,48481,21292,classic_dlx_s,1,2015-12-31,1900-01-01 13:34:00,12.00,12.00,S,Classic,"Pepperoni, Mushrooms, Red Onions, Red Peppers,...",The Classic Deluxe Pizza,Tammy,Barnes,30853 Cody Forge Apt. 820\r\nWest Denisecheste...,1990-02-10,13


**Calculate total earnings from each pizza types. What are the three top earners?**

In [9]:
# Calculating total earnings from each pizza type/category
total_earnings_per_pizza_type = pizzas.groupby("pizza_category")["total_price"].sum()

# Finding the top 3 pizza types with the highest total earnings
top_3_earning_pizza_types = total_earnings_per_pizza_type.sort_values(
    ascending=False
).head(3)

top_3_earning_pizza_types

pizza_category
Classic    220053.1
Supreme    208197.0
Chicken    195919.5
Name: total_price, dtype: float64

**What's the name of the person who bought most pizzas?**

In [18]:
# Calculating the total quantity of pizzas bought by each person
pizzas['full_name'] = pizzas['first_name'] + " " + pizzas['second_name']
total_pizzas_per_person = pizzas.groupby(["full_name"])["quantity"].sum()

the person who bought the most pizzas
person_who_bought_most_pizzas = total_pizzas_per_person.idxmax()
most_pizzas_bought = total_pizzas_per_person.max()

person_who_bought_most_pizzas, most_pizzas_bought

('Emily Davis', 55)

In [20]:
total_pizzas_per_person.sort_values()

full_name
David Moore              1
Heather White            1
Kathleen Herrera         1
Heather Shea             1
Jimmy Hill               1
                        ..
David Smith             47
James Miller            47
Christopher Martinez    51
John Johnson            53
Emily Davis             55
Name: quantity, Length: 4745, dtype: int64

**How much money did that person spent on pizzas?** 

In [11]:
# Calculating the total amount of money spent on pizzas by the person who bought the most pizzas
money_spent_by_top_buyer = pizzas[
    (pizzas["first_name"] == "Emily") & (pizzas["second_name"] == "Davis")
]["total_price"].sum()
money_spent_by_top_buyer

940.25

**What was the average sale per every month?**

In [12]:
average_sale_per_month = pizzas.groupby(pizzas["order_date"].dt.to_period("M"))[
    "total_price"
].mean()
average_sale_per_month

order_date
2015-01    16.793383
2015-02    16.741932
2015-03    16.817272
2015-04    16.901106
2015-05    16.844244
2015-06    16.951602
2015-07    16.870007
2015-08    16.677638
2015-09    16.805460
2015-10    16.862681
2015-11    16.820872
2015-12    16.766300
Freq: M, Name: total_price, dtype: float64

**On average, how many pizzas a person buys monthly?**

In [13]:
# Group by month and person, then calculate the mean number of pizzas bought
pizzas["year_month"] = pizzas["order_date"].dt.to_period("M")
monthly_pizza_per_person = (
    pizzas.groupby(["year_month", "first_name", "second_name"])["quantity"]
    .sum()
    .reset_index()
)

# Calculate the average number of pizzas bought per person per month
average_pizzas_per_person_per_month = monthly_pizza_per_person["quantity"].mean()
average_pizzas_per_person_per_month

2.7978891522745233

**Bonus: What are the 5 most common ingredients?**

Tip: research the .explode() dotmethod of pandas!

In [14]:
# Explode the pizza ingredients into separate rows per every order_id
pizzas_exploded = pizzas.copy()

pizzas_exploded["pizza_ingredients"] = pizzas_exploded["pizza_ingredients"].str.split(
    ", "
)
pizzas_exploded = pizzas_exploded.explode("pizza_ingredients")[
    ["order_details_id", "order_id", "pizza_ingredients"]
]
# Find the 5 most common pizza ingredients
top_ingredients = pizzas_exploded["pizza_ingredients"].value_counts().head(5)
top_ingredients

Garlic               27422
Tomatoes             26601
Red Onions           19547
Red Peppers          16284
Mozzarella Cheese    10333
Name: pizza_ingredients, dtype: int64