# Analysis of Motorcycles Sales Data

## Data Transformation

In [250]:
import pandas as pd

In [251]:
sales = pd.read_csv("data/sales_data.csv")
sales.head()

Unnamed: 0,date,warehouse,client_type,product_line,quantity,unit_price,total,payment
0,1/6/2021,Central,Retail,Miscellaneous,8,16.85,134.83,Credit card
1,1/6/2021,North,Retail,Breaking system,9,19.29,173.61,Cash
2,1/6/2021,North,Retail,Suspension & traction,8,32.93,263.45,Credit card
3,1/6/2021,North,Wholesale,Frame & body,16,37.84,605.44,Transfer
4,1/6/2021,Central,Retail,Engine,2,60.48,120.96,Credit card


In [252]:
# Convert the 'date' column to a datetime object
sales['date'] = pd.to_datetime(sales['date'])

# Set the 'date' column as the index of the DataFrame
#sales = sales.set_index('date')


Parsing dates in DD/MM/YYYY format when dayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to ensure consistent parsing.



In [253]:
from pandas.api.types import CategoricalDtype

cats = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

cat_type = CategoricalDtype(categories=cats, ordered=True)

# Create new columns for the weekday and month
sales['Weekday'] = sales['date'].dt.day_name().astype(cat_type)
sales['Month'] = sales['date'].dt.month_name()

## Exploratory Data Analysis

### Payment Method

In [254]:
import plotly.express as px

# Plot the DataFrame
payment_method = px.bar(sales.groupby('payment').count().reset_index().
                        assign(percentage=lambda x: (x['date'] / x['date'].sum()) * 100).
                        loc[:,["payment","percentage"]].sort_values(by='percentage', ascending=True), 
                        color="payment", x='payment', y='percentage', title='Number of Transactions by Payment Method', 
                        labels={'payment': 'Payment','percentage': 'Number of Transactions(%)'})

# Sort the bars from lowest to highest
payment_method.show()

### Daily Revenue

In [255]:
# Plot the DataFrame
daily_revenue = px.line(sales.groupby('date').sum().reset_index().loc[:,["date","total"]], 
                        x='date', y='total', title='Daily Revenue',
                        labels={'date': 'Date','total': 'Daily Revenue'})

# Show the plot
daily_revenue.show()


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



### Days of the week with higher revenue

In [256]:
import plotly.express as px

# Plot the DataFrame
Top_10_purchased_item = px.bar(sales.groupby('Weekday').count().reset_index().
                        assign(percentage=lambda x: (x['date'] / x['date'].sum()) * 100).
                        loc[:,["Weekday","percentage"]].sort_values(by="Weekday").reset_index(drop=True),
                        color="Weekday", x='Weekday', y='percentage', title='Number of Transactions In each Day of the Week', 
                        labels={'Weekday': 'Day of the Week','percentage': 'Number of Transactions(%)'})

# Sort the bars from lowest to highest
Top_10_purchased_item.show()

### Product Line with most Number of Transactions

In [262]:
import plotly.express as px

# Plot the DataFrame
Top_10_purchased_item = px.bar(sales.groupby('product_line').count().reset_index().
                        assign(percentage=lambda x: (x['date'] / x['date'].sum()) * 100).
                        loc[:,["product_line","percentage"]].sort_values(by="percentage").reset_index(drop=True),
                        color="product_line", x='product_line', y='percentage', title='Number of Transactions by Product Line', 
                        labels={'product_line': 'Products Category','percentage': 'Number of Transactions(%)'})

# Sort the bars from lowest to highest
Top_10_purchased_item.show()

## Product Category with most sales

Unnamed: 0_level_0,quantity,unit_price,total
product_line,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Breaking system,2130,4080.32,38350.15
Electrical system,1698,4937.93,43612.71
Engine,627,3665.6,37945.38
Frame & body,1619,7110.15,69024.73
Miscellaneous,1176,2782.91,27165.82
Suspension & traction,2145,7745.13,73014.21


In [265]:
import plotly.express as px

# Plot the DataFrame
Top_10_purchased_item = px.bar(sales.groupby('product_line').sum().reset_index().
                        loc[:,["product_line","total"]].sort_values(by="total"),
                        color="product_line", x='product_line', y='total', title='Revenue by Product Line', 
                        labels={'product_line': 'Products Category','total': 'Revenue($)'})

# Sort the bars from lowest to highest
Top_10_purchased_item.show()


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.

