### Bike store sales DataSet

<img src="https://user-images.githubusercontent.com/7065401/58563302-42466a80-8201-11e9-9948-b3e9f88a5662.jpg"
    style="width:400px; float: right; margin: 0 30px 30px 30px;"></img>
**Analyzing sales made on bike stores**


**DataSet**  :   [Bike Store Sales](https://docs.google.com/spreadsheets/d/1NOe_UrPx6ULF2C5MvHmZ9ODuw8t9M77Q1Y64gP-7JHA/edit?usp=sharing)


## Import Libraries

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns



### Load Data

In [None]:
Sales=pd.read_csv('/kaggle/input/bike-sales-in-europe/Sales.csv')

In [None]:
Sales.head()

### Rows & Columns

In [None]:
Sales['Country'].unique()

In [None]:
Shape = Sales.shape
Rows=Shape[0]
Col= Shape[1]
print(f"Rows of DataSet is :  {Rows}")
print(f"Columns of DataSet is :  {Col}")

# Data Preprocessing
-------------------------------------------
### **Identify Type of Columns**





In [None]:
Sales.info()

### Convert date

> **Convert Object into dateTime Category**

In [None]:
Sales['Date'] = pd.to_datetime(Sales['Date'])

### Change the Date Format

In [None]:
Sales["Calculated_Date"]= Sales[['Year', 'Month', 'Day']].apply(lambda x: '{}-{}-{}'.format(x[0], x[1], x[2]), axis=1)
Sales["Calculated_Date"]=pd.to_datetime(Sales['Calculated_Date'])
Sales["Calculated_Date"].head()


### **Identify Null And Missing Values**

In [None]:
Sales.isnull().sum()

In [None]:
Sales.describe()

> ## Numerical analysis and visualization
_We'll analyze the Numerical features column:_

In [None]:
Sales['Unit_Cost'].describe()


In [None]:
Sales['Unit_Cost'].mean()

In [None]:
Sales['Unit_Cost'].median()

In [None]:
Sales['Unit_Cost'].plot(kind="box",figsize=(14,6),vert=False, fontsize=12)

In [None]:
Sales['Unit_Cost'].plot(kind="density" , figsize=(12,6), fontsize=12)
plt.title("Unit Cost",fontsize=15)
plt.xlabel("Unit Cost",fontsize=12)
plt.ylabel("Sales",fontsize=12)


### Median and Mean at Unit Cost

In [None]:
ax=Sales['Unit_Cost'].plot(kind="density" , figsize=(12,6))
# ax.axvlines(Sales['Unit_Cost'].mean())
mean= ax.axvline(Sales['Unit_Cost'].mean(), color='red' )
median = ax.axvline(Sales['Unit_Cost'].median(), color='g')
plt.legend({'Median':median,'Mean':mean})

In [None]:
ax = Sales['Unit_Cost'].plot(kind='hist', figsize=(14,6))
ax.set_ylabel('Number of Sales', fontsize=15)
ax.set_xlabel('Dollars', fontsize=15)

### Mean of Customers Age

In [None]:
Sales["Customer_Age"].value_counts().mean()

In [None]:
Sales["Customer_Age"].plot(kind='box',vert=False,figsize=(15,6), fontsize=12)

In [None]:
Sales["Customer_Age"].plot(kind='kde',figsize=(12,6))
plt.title("Sales",fontsize=15)
plt.xlabel("Customer Age",fontsize=12)
plt.ylabel("Sales",fontsize=12)
plt.legend()

### Sales According to years

In [None]:
Sales['Year'].value_counts()

In [None]:
Sales["Year"].value_counts().plot(kind="bar",figsize=(16,6))

### Sales According to Month

In [None]:
Sales['Month'].value_counts()

In [None]:
Sales["Month"].value_counts().plot(kind="bar",figsize=(12,6))
plt.legend()

> ### Sales evolve through the years

In [None]:
Sales['Calculated_Date'].value_counts().plot(kind="line",figsize=(14,6))
plt.legend()

### Increase 50 Udollar'S revenue to every sale

In [None]:
Sales['Revenue']+50

> ## Categorical Analysis and visualization
_We'll analyze the categorical features column_

> ### Sales According to Age Group

In [None]:
Sales['Age_Group'].value_counts()

In [None]:
Sales["Age_Group"].value_counts().plot(kind='bar',figsize=(14,6))
plt.legend()
plt.legend()
plt.ylabel("Sales")
plt.xlabel("Age")

In [None]:
Sales["Age_Group"].value_counts().plot(kind='pie',figsize=(14,8),autopct='%1.1f%%',fontsize=13)
plt.title("Age Group",fontsize=14)

### Mean of Sales order

In [None]:
Sales["Order_Quantity"].mean()

In [None]:
Sales["Order_Quantity"].plot(kind='box',vert=False,figsize=(12,6),fontsize=12)
plt.title("Order Quality",fontsize=15)


> ### Sales According to Country

In [None]:
Sales['Country'].value_counts()

In [None]:
Sales['Country'].value_counts().plot(kind='bar',figsize=(12,4))
plt.title("Sales In Each Country",fontsize=17)
plt.ylabel("Sales",fontsize=15)
plt.xlabel("Country",fontsize=15)
plt.legend()

> ### List of every product sold

In [None]:
MostSales=Sales.loc[: ,"Product"].unique()
# sales['Product'].unique()

In [None]:
Sales.loc[: ,"Product"].value_counts().head(10).plot(kind='bar',figsize=(14,5))
plt.title("List of Products Sales",fontsize=17)
plt.ylabel("Number of Products",fontsize=15)
plt.xlabel("Products",fontsize=15)
plt.legend()

> ### Relation between Unit cost & Unit Price

In [None]:
Sales.plot(kind='scatter',x="Unit_Cost", y="Unit_Price",figsize=(12,4),fontsize=10)
plt.title("Relation between Unit cost & Unit Price",fontsize=17)
plt.ylabel("Unit Price",fontsize=15)
plt.xlabel("Unit Cost",fontsize=15)


> ### Relation between Order Quantitity & Profit

In [None]:
Sales.plot(kind="box",x='Order_Quantity',y='Profit',figsize=(12,4),fontsize=10)
plt.title("Relation Between Order Quantity & Profit",fontsize=15)


> ### Relation between Country & Unit Profit

In [None]:
Sales.plot(kind="box",x='Country',y='Profit',figsize=(12,4),fontsize=13,vert=False )
plt.title("Relation between Country & Unit Profit",fontsize=17)


> ### Customer Age According to Country

In [None]:
Sales[["Customer_Age","Country"]].boxplot(by='Country',figsize=(10,6))

 > ### How many orders were made in Canada or France?

In [None]:
Sales.loc[((Sales['Country']=='Canada' ) |  (Sales['Country']=='France' ))].shape[0]

 > ### How many Bike Racks orders were made from Canada?

In [None]:
Sales.loc[(Sales['Country']=='Canada' ) & (Sales['Sub_Category']=="Bike Racks")].shape[0]

> ### Sales in Each State of France

In [None]:
Sales.loc[Sales["Country"]=="France","State"].value_counts()

In [None]:
Sales.loc[Sales["Country"]=="France","State"].value_counts().plot(kind='bar',figsize=(16,5))
plt.title("Sales in Each state of France",fontsize=17)
plt.ylabel("Number of Sales",fontsize=15)
plt.xlabel("States",fontsize=15)
plt.legend()

plt.legend()

### How many sales were made per category?

In [None]:
Sales['Sub_Category'].value_counts()

In [None]:
Sales['Product_Category'].value_counts().plot(kind='pie',figsize=(8,8),autopct='%1.1f%%',fontsize=12)
Cat=Sales['Sub_Category'].unique()
plt.title("Products Category",fontsize=15)
plt.legend(wedges, Cat,fontsize=12,loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))



In [None]:
Sales["Sub_Category"].value_counts()

In [None]:
Sales.loc[Sales['Product_Category']=='Accessories' ,"Sub_Category"].value_counts()

In [None]:
Sales.loc[Sales['Product_Category']=='Accessories' ,"Sub_Category"].value_counts().plot(kind="bar",figsize=(16,4))
plt.legend()

In [None]:
Sales.loc[Sales['Product_Category']=="Bikes","Sub_Category"].value_counts()

In [None]:
Pc=Sales.loc[Sales['Product_Category']=="Bikes","Sub_Category"].unique()
Sales.loc[Sales['Product_Category']=="Bikes","Sub_Category"].value_counts().plot(kind="pie",figsize=(8,8),fontsize=12)
plt.legend(fontsize=13,loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
plt.title("Products Category in Bikes",fontsize=15)



### Which gender has the most amount of sales?

In [None]:
Sales['Customer_Gender'].value_counts()

In [None]:
Sales['Customer_Gender'].value_counts().plot(kind='bar',fontsize=12,figsize=(8,6))
plt.ylabel("Sales",fontsize=13)


### How many sales with more than 500 in Revenue were made by men?

In [None]:
Sales.loc[(Sales['Customer_Gender']=="M") & (Sales["Revenue"]>=500)].shape[0]

### Get the top-5 sales with the highest revenue

In [None]:
Sales.sort_values(['Revenue'],ascending=False).head(5)

### Get the sale with the highest revenue

In [None]:
# Sales.sort_values(['Revenue'],ascending=False).head(1)

Sales['Revenue'].max()

# Cond=Sales['Revenue']==Sales["Revenue"].max()
# Sales.loc[Cond]

### What is the mean Order_Quantity of orders with more than 10K in revenue?

In [None]:

Sales.loc[Sales["Revenue"]>10_000 , "Order_Quantity"].mean()

# cond = Sales['Revenue'] > 10_000
# Sales.loc[cond, 'Order_Quantity'].mean()


### What is the mean Order_Quantity of orders with less than 10K in revenue?

In [None]:
Sales.loc[Sales["Revenue"]<10_000,"Order_Quantity"].mean()

### How many orders were made in May of 2016?

In [None]:
Sales.loc[(Sales["Year"]==2016) & (Sales["Month"]=='May')].shape[0]

### How many orders were made in May,June,July of 2016?

In [None]:
Sales.loc[(Sales['Year'] == 2016) & (Sales['Month'].isin(['May', 'June', 'July']))].shape[0]

In [None]:
Saels2016=Sales.loc[Sales["Year"]==2016 , ['Profit',"Month"]]
Saels2016.boxplot(by="Month", figsize=(14,6))

### 10% tax on Sales in USA

In [None]:
Sales.loc[Sales["Country"]=="United State",'Unit_Price']*=1.072

In [None]:
Sales["Unit_Price"].head(2)

### Customer Revenue  according to age

In [None]:
plt.figure(figsize=(16,8))
Sales.plot(kind="scatter",x="Customer_Age",y="Revenue",figsize=(10,8),fontsize=12)
plt.xlabel("Customer Age",fontsize=13)
plt.ylabel("Revenue",fontsize=13)
plt.show()

### Customer Revenue with profit

In [None]:
Sales.plot(kind='scatter', x='Revenue', y='Profit', figsize=(10,8),fontsize=12)
plt.xlabel("Revenue",fontsize=13)
plt.ylabel("Profit",fontsize=13)

### Add and calculate a new Revenue_per_Age column

In [None]:
Sales["Revenu_Per_Age"] = Sales["Revenue"]/Sales['Customer_Age']

In [None]:
Sales['Revenu_Per_Age'].plot(kind='density', figsize=(14,6))
plt.title("Revenue Per Age",fontsize=15)


plt.legend()

In [None]:
Sales['Revenu_Per_Age'].plot(kind='hist', figsize=(14,6))
plt.title("Revenue Per Age",fontsize=15)
plt.legend()

> ### Add and calculate a new Calculated_Cost column

**formula**:
#### **Calculated_Cost=Order_Quantity ∗ Unit_Cost**

          

In [None]:
Sales['Calculated_Cost'] = Sales['Order_Quantity'] * Sales['Unit_Cost']

> ### Add and calculate a new Calculated_Revenue column

#### Formula :  Calculated_Revenue=Cost+Profit

In [None]:
Sales['Calculates_Revenue']= Sales["Cost"] + Sales["Profit"]

In [None]:
Sales["Revenue"].plot(kind="hist" , bins=100 ,figsize=(14,6))
plt.legend()

> ### Modify all Unit_Price values adding 3% tax to them

In [None]:
Tax = 1.03
Unit_Price_Tax=Sales['Unit_Price']*Tax

In [None]:
Unit_Price_Tax.plot(kind="hist",figsize=(12,4))
plt.xlabel("Unit_Price",fontsize=13)
plt.ylabel("Sales",fontsize=13)
plt.legend()

> ### Get all the sales made in the state of _Kentucky_

In [None]:

Sales.loc[Sales["State"]=='Kentucky'].head()

> ### Get the mean revenue of the Adults (35-64) sales group

In [None]:
Sales.loc[Sales['Age_Group'] == 'Adults (35-64)', 'Revenue'].mean()

In [None]:
Sales.loc[(Sales['Age_Group'] == 'Youth (<25)') | (Sales['Age_Group'] == 'Adults (35-64)')].shape[0]

> ### Get the mean revenue of the sales group Adults (35-64) in United States

In [None]:
Sales.loc[(Sales['Age_Group'] == 'Adults (35-64)') & (Sales['Country'] == 'United States'), 'Revenue'].mean()

> ### Increase the revenue by 10% to every sale made in France

In [None]:
Revenue_France=Sales.loc[Sales['Country']=="France",'Revenue']
Revenue_France*=1.1

In [None]:
Revenue_France

> ## Relationship between the columns

In [None]:
Corr=Sales.corr()
Corr

In [None]:
figure = plt.figure(figsize=(8,8))
plt.matshow(Corr, cmap='RdBu', fignum=figure.number)
plt.xticks(range(len(Corr.columns)),Corr.columns,rotation='vertical')
plt.yticks(range(len(Corr.columns)), Corr.columns);