# Assignment 3


**Dataset Name**: Company Sales Data

**Dataset Source**: Internal company records

Description: This dataset contains records of sales transactions made by customers on our company's online platform. Each row represents a unique transaction, and the dataset includes various attributes related to each transaction.

**Columns**:

**id**: Unique identifier for each transaction.

**product**: Indicates whether the product was recommended (TRUE/FALSE).

**shop**: The online shop where the transaction took place.

**uid**: Unique user identifier for the customer.

**api_key**: API key associated with the transaction (if applicable).

**email**: Customer's email address (if available).

**order_id**: Unique identifier for the order (if applicable).

**created_at**: Date and time when the transaction occurred.

**device**: Type of device used for the transaction (e.g., desktop, mobile).

**price**: Purchase price for the product.

Objective: The objective of this analysis is to gain insights into customer behavior, purchase patterns, and overall sales performance on our platform. We will explore various aspects of the data to inform business decisions and identify opportunities for improvement.

In [12]:
# Load Necessary Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
df=pd.read_csv('C:\\Users\\MOEED\\Desktop\\Knowledge Streams\Machine Learning\Assignment_III_database.csv')
df

Unnamed: 0,id,product,recommended,shop,uid,api_key,email,order_id,created_at,device,price
0,295,8270579663139,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,,,2023-08-18 13:45:54,desktop,2490.0
1,228,8270579335459,True,shopcast-stage-1-0.myshopify.com,Ue66GQ3Hp5Ha,015629919a40414db823561bddb1e8e3,,,2023-08-16 12:59:52,desktop,960000.0
2,279,8270579335459,True,shopcast-stage-1-0.myshopify.com,kM7u3GN9Qqme,015629919a40414db823561bddb1e8e3,,,2023-08-18 07:29:15,desktop,960000.0
3,231,8270579269923,True,shopcast-stage-1-0.myshopify.com,Ue66GQ3Hp5Ha,015629919a40414db823561bddb1e8e3,,,2023-08-16 12:52:55,desktop,1024.0
4,235,8270579335459,True,shopcast-stage-1-0.myshopify.com,Ue66GQ3Hp5Ha,015629919a40414db823561bddb1e8e3,,,2023-08-16 12:54:08,desktop,960000.0
...,...,...,...,...,...,...,...,...,...,...,...
4995,16131932,6877473243254,False,shoeboxpk.myshopify.com,sxsPnVSyArUr,12373fcff2284a409d56bcbe7c4a1ea1,,,2023-09-20 06:33:46,mobile,
4996,16143424,6842053886070,False,shoeboxpk.myshopify.com,np7qVDz6fGLJ,12373fcff2284a409d56bcbe7c4a1ea1,,,2023-09-20 06:38:39,mobile,
4997,16132540,6691491545206,False,shoeboxpk.myshopify.com,AHla2f2pDC4-,12373fcff2284a409d56bcbe7c4a1ea1,,,2023-09-20 06:34:01,desktop,
4998,16144720,6842050576502,False,shoeboxpk.myshopify.com,cNt-6b0ntESD,12373fcff2284a409d56bcbe7c4a1ea1,,,2023-09-20 06:39:09,mobile,


# Customer Analysis:
 How many unique customers made purchases?

In [24]:
df['uid'].nunique()

2699

# Purchase Patterns:

Question: What is the average purchase price across all transactions?

In [26]:
df['price'].mean()

141218.17142857143

# Top-Selling Products:

Question: What are the top 5 products by the number of units sold?

In [41]:
df1=pd.DataFrame(df.groupby('product').size().sort_values(ascending=False).head())
df1

Unnamed: 0_level_0,0
product,Unnamed: 1_level_1
6802312102006,56
6630766936182,49
6802312364150,40
6690173517942,40
6682259423350,33


# Purchase by Device Type:

Question: What is the total revenue generated from desktop vs. mobile purchases?

In [45]:
df.groupby('device')['price'].sum().to_frame()

Unnamed: 0_level_0,price
device,Unnamed: 1_level_1
desktop,3981676.0
mobile,960960.0


# Date Analysis:

Question: What is the month with the highest total revenue?

In [None]:
df['month'] = pd.to_datetime(df['created_at']).dt.to_period('M')

revenue_by_month = df.groupby('month')['price'].sum().reset_index()


highest_revenue_month = revenue_by_month.iloc[revenue_by_month['price'].idxmax()]

print("Month with the highest total revenue:")
print(highest_revenue_month)

# Customer Behavior:

Question: How many customers made repeat purchases (more than one transaction)?

In [68]:
ab=df['uid'].value_counts()
print((ab>1).sum())

1004


# Recommendation Effectiveness:

Question: What is the average purchase price for recommended products vs. non-recommended products?

In [74]:
df['prodcut_recommendation']=df['recommended'].map({ True:'Recommended',False:'Non-Recommended'})
df.groupby('prodcut_recommendation')['price'].mean()

prodcut_recommendation
Non-Recommended        10.000000
Recommended        145371.352941
Name: price, dtype: float64

# Price Category Analysis:

Question: What is the distribution of purchases between 'Low' and 'High' price categories?

In [78]:
df['Category']=df['price'].apply(lambda x: 'high' if x>1000 else 'low')

In [84]:
df['Category'].value_counts().to_frame()

Unnamed: 0_level_0,count
Category,Unnamed: 1_level_1
low,4969
high,31
