# Chocolate Sales Analysis
This notebook performs various Pandas operations on the Chocolate Sales dataset to extract meaningful insights.

## Step 1: Import Necessary Libraries

In [1]:
import pandas as pd
import plotly.express as px

## Step 2: Load the Dataset

In [3]:
file_path = "C:/Users/DYNABOOK/Desktop/Pandas_Assignemnt/Chocolate Sales.xlsx"
xls = pd.ExcelFile(file_path)
df = pd.read_excel(xls, sheet_name="data")
df.head()

Unnamed: 0,Sales Person,Country,Product,Date,Amount,Boxes Shipped
0,Jehu Rudeforth,UK,Mint Chip Choco,2022-01-04,5320,180
1,Van Tuxwell,India,85% Dark Bars,2022-08-01,7896,94
2,Gigi Bohling,India,Peanut Butter Cubes,2022-07-07,4501,91
3,Jan Morforth,Australia,Peanut Butter Cubes,2022-04-27,12726,342
4,Jehu Rudeforth,UK,Peanut Butter Cubes,2022-02-24,13685,184


## Step 3: Get an Overview of the Dataset

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1094 entries, 0 to 1093
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Sales Person   1094 non-null   object        
 1   Country        1094 non-null   object        
 2   Product        1094 non-null   object        
 3   Date           1094 non-null   datetime64[ns]
 4   Amount         1094 non-null   int64         
 5   Boxes Shipped  1094 non-null   int64         
dtypes: datetime64[ns](1), int64(2), object(3)
memory usage: 51.4+ KB


Conclusion: The dataset summary provides insights into the data types and missing values.

## Step 4: Check for Missing Values

In [5]:
df.isnull().sum()

Sales Person     0
Country          0
Product          0
Date             0
Amount           0
Boxes Shipped    0
dtype: int64

Conclusion: If there are missing values, they should be handled appropriately.

## Step 5: Count Unique Products Sold

In [6]:
df["Product"].nunique()

22

Conclusion: This helps understand the variety of chocolate products sold.

## Step 6: Identify the Top 5 Best-Selling Products

In [7]:
df.groupby("Product")["Amount"].sum().sort_values(ascending=False).head(5)

Product
Smooth Sliky Salty     349692
50% Dark Bites         341712
White Choc             329147
Peanut Butter Cubes    324842
Eclairs                312445
Name: Amount, dtype: int64

Conclusion: These are the top revenue-generating chocolate products.

## Step 7: Total Sales per Country

In [8]:
df.groupby("Country")["Amount"].sum()

Country
Australia      1137367
Canada          962899
India          1045800
New Zealand     950418
UK             1051792
USA            1035349
Name: Amount, dtype: int64

Conclusion: This helps understand which countries contribute the most to sales.

## Step 8: Analyze Sales Trend Over Time

In [None]:
df['Month'] = df['Date'].dt.to_period('M').astype(str)
fig = px.line(df.groupby('Month')['Amount'].sum().reset_index(), x='Month', y='Amount', title='Sales Trend Over Time')
fig.show()

Conclusion: Observing monthly sales trends helps identify peak seasons.

## Step 9: Calculate Average Sales per Shipment

In [10]:
df["Avg Sale per Box"] = df["Amount"] / df["Boxes Shipped"]
df.head()

Unnamed: 0,Sales Person,Country,Product,Date,Amount,Boxes Shipped,Month,Avg Sale per Box
0,Jehu Rudeforth,UK,Mint Chip Choco,2022-01-04,5320,180,2022-01,29.555556
1,Van Tuxwell,India,85% Dark Bars,2022-08-01,7896,94,2022-08,84.0
2,Gigi Bohling,India,Peanut Butter Cubes,2022-07-07,4501,91,2022-07,49.461538
3,Jan Morforth,Australia,Peanut Butter Cubes,2022-04-27,12726,342,2022-04,37.210526
4,Jehu Rudeforth,UK,Peanut Butter Cubes,2022-02-24,13685,184,2022-02,74.375


Conclusion: This metric helps understand the revenue generated per box shipped.

## Step 10: Identify High-Value Sales Transactions

In [11]:
df[df["Amount"] > 10000]

Unnamed: 0,Sales Person,Country,Product,Date,Amount,Boxes Shipped,Month,Avg Sale per Box
3,Jan Morforth,Australia,Peanut Butter Cubes,2022-04-27,12726,342,2022-04,37.210526
4,Jehu Rudeforth,UK,Peanut Butter Cubes,2022-02-24,13685,184,2022-02,74.375000
6,Oby Sorrel,UK,99% Dark & Pure,2022-01-25,13685,176,2022-01,77.755682
31,Rafaelita Blaksland,UK,99% Dark & Pure,2022-06-29,12446,150,2022-06,82.973333
41,Karlen McCaffrey,USA,Raspberry Choco,2022-04-15,14749,354,2022-04,41.663842
...,...,...,...,...,...,...,...,...
1051,Van Tuxwell,India,Eclairs,2022-07-21,10500,106,2022-07,99.056604
1067,Camilla Castle,New Zealand,85% Dark Bars,2022-08-08,15099,55,2022-08,274.527273
1075,Roddy Speechley,India,Spicy Special Slims,2022-03-22,10647,173,2022-03,61.543353
1081,Dennison Crosswaite,USA,Smooth Sliky Salty,2022-05-12,11781,91,2022-05,129.461538


Conclusion: Filtering high-value sales helps focus on premium transactions.

## Step 11: Find the Most Active Salesperson

In [18]:
df["Sales Person"].value_counts()

Sales Person
Kelci Walkden          54
Brien Boise            53
Van Tuxwell            51
Beverie Moffet         50
Dennison Crosswaite    49
Oby Sorrel             49
Ches Bonnell           48
Karlen McCaffrey       47
Gigi Bohling           47
Curtice Advani         46
Marney O'Breen         45
Kaine Padly            45
Madelene Upcott        45
Jehu Rudeforth         43
Roddy Speechley        43
Barr Faughny           43
Gunar Cockshoot        43
Mallorie Waber         41
Jan Morforth           39
Andria Kimpton         39
Husein Augar           38
Dotty Strutley         36
Wilone O'Kielt         34
Rafaelita Blaksland    34
Camilla Castle         32
Name: count, dtype: int64

Conclusion: This identifies the salesperson with the highest number of transactions.

## Step 12: Total Chocolate Boxes Shipped per Product

In [13]:
fig = px.bar(df.groupby("Product")["Boxes Shipped"].sum().reset_index(), x="Product", y="Boxes Shipped", title="Total Chocolate Boxes Shipped per Product")
fig.show()

Conclusion: This identifies the highest-distributed chocolate products.

## Step 13: Calculate Sales Contribution by Product

In [14]:
fig = px.pie(df.groupby("Product")["Amount"].sum().reset_index(), values="Amount", names="Product", title="Sales Contribution by Product")
fig.show()

Conclusion: Helps understand the revenue share of each product.

## Step 14: Rank Products by Sales Amount

In [15]:
df["Product Rank"] = df["Amount"].rank(ascending=False)
df.sort_values("Product Rank").head(10)

Unnamed: 0,Sales Person,Country,Product,Date,Amount,Boxes Shipped,Month,Avg Sale per Box,Product Rank
543,Ches Bonnell,India,Peanut Butter Cubes,2022-01-27,22050,208,2022-01,106.009615,1.0
135,Van Tuxwell,India,Organic Choco Syrup,2022-05-16,19929,174,2022-05,114.534483,2.0
751,Rafaelita Blaksland,New Zealand,Eclairs,2022-02-07,19481,51,2022-02,381.980392,3.0
66,Van Tuxwell,Australia,Organic Choco Syrup,2022-08-10,19453,14,2022-08,1389.5,4.0
589,Curtice Advani,India,Smooth Sliky Salty,2022-04-19,19327,135,2022-04,143.162963,5.0
212,Marney O'Breen,UK,Smooth Sliky Salty,2022-05-13,18991,88,2022-05,215.806818,6.0
1008,Kaine Padly,UK,After Nines,2022-01-21,18697,176,2022-01,106.232955,7.0
434,Jan Morforth,New Zealand,Mint Chip Choco,2022-06-30,18340,285,2022-06,64.350877,8.0
806,Brien Boise,India,85% Dark Bars,2022-08-09,18032,205,2022-08,87.960976,9.0
609,Jan Morforth,Australia,Mint Chip Choco,2022-02-22,17626,103,2022-02,171.126214,10.0


Conclusion: Ranking products allows us to compare their performance.