# EDA on AeroFit Treadmill Buyer
## About this Dataset
The company collected the data on individuals who purchased a treadmill from the AeroFit stores during the prior three months. The dataset features their Age, Gender, Education, Marital-Status, Income and other attributes.

### Product Portfolio:

    1. The KP281 is an entry-level treadmill that sells for $1,500.
    2. The KP481 is for mid-level runners that sell for $1,750.
    3. The KP781 treadmill is having advanced features that sell for $2,500.

In [None]:
from warnings import filterwarnings
filterwarnings('ignore')

In [None]:
import os
os.chdir('/kaggle/input/aerofit-treadmill')

In [None]:
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
td=pd.read_csv('aerofit_treadmill.csv')
td.head()

In [None]:
td.shape

In [None]:
td.describe()

In [None]:
td.describe(include='object')

### Overall Customer Profile: 
    1. Customers age between 18 and 50
    2. Education between 12 years - 21 years
    3. Usage between 2-7 times a week
    4. Fitness rating of customers lie between 1-5
    5. Customers income range from 29k - 104k
    6. Miles covered range from 21-360 each week
    7. Majorty Male customers
    8. KP281 is highly sold product
    9. Majorty customers are Partnered

In [None]:
td.columns

## Missing values/Duplicate values

In [None]:
td.head()

In [None]:
td.isna().sum()

In [None]:
td.isnull().sum()

In [None]:
td.duplicated().sum()

## There are no missing values/null values or duplicated values in the dataset

# Descriptive Analysis

### categorical,continuous features

In [None]:
td.dtypes

In [None]:
cat=list(td.columns[td.dtypes=='object'])
cat

In [None]:
con=list(td.columns[td.dtypes!='object'])
con

In [None]:
td[cat].describe().T

In [None]:
td[con].describe().T

## Separate data based on company's product 
    1. p1 belongs to KP281
    2. p2 belongs to KP481
    3. p3 belongs to KP881

In [None]:
p1=td.query("Product=='KP281'")
p1.head(2)

In [None]:
p2=td.query("Product=='KP481'")
p2.head(2)

In [None]:
p3=td.query("Product=='KP781'")
p3.head(2)

# Data Visualisation
# Product KP281

## Univariate Analysis
    Univariate analysis explores each variable separately
    it describes each vairable on its own
    it looks at the range of values and central tendency of values
    it describes the pattern of response to the variable

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=2)
p1['Gender'].value_counts().plot(kind='bar',ax=ax[0])
ax[0].set_title('Countplot of Gender')
p1['MaritalStatus'].value_counts().plot(kind='bar',ax=ax[1])
ax[1].set_title('Countplot of MaritalStatus')

In [None]:
for i in con:
    plt.figure(figsize=(6,4))
    sns.histplot(data=p1,x=i,kde=True)
    plt.show()

## Graphical Analysis for KP281:
    1. Partnered segment are purchasing this product more
    2. Most of the sales are from 22-30yrs age group 
    3. Education of customers is around 14yrs and 16yrs
    4. Maximum Usage is around 3times a week
    5. Maximum fitness rating is 3-3.5
    6. Maximum Customer Income range from 34k-53k
    7. Maximum miles covered 60-85

## Bivariate Analysis
    1. it is to observe the relationships between 2 variables
    
    scatter plot and correlation heatmap : continuous variables vs continuous variables
    boxplot : categorical variables vs continuous variables
    crosstab : categorical variables vs categorical variables

In [None]:
# one hot encoding to convert catergorical values to continuous values
p1gd=pd.get_dummies(p1)
plt.figure(figsize=(16,8))
sns.heatmap(p1gd.corr(),annot=True)

In [None]:
sns.scatterplot(p1,x='Fitness',y='Miles')
plt.show()

In [None]:
sns.scatterplot(p1,x='Age',y='Income')
plt.show()

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=2,figsize=(8,6))

ctab1=pd.crosstab(p1['Product'],p1['MaritalStatus'])
sns.heatmap(ctab1,annot=True,ax=ax[0])
ax[0].set_title('KP281 product vs MaritalStatus')

ctab2=pd.crosstab(p1['Product'],p1['Age'])
sns.heatmap(ctab2,annot=True,ax=ax[1])
ax[1].set_title('KP281 product vs Age')

In [None]:
fig,ax = plt.subplots(nrows=3, ncols=2, figsize=(12,15))


sns.boxplot(p1,x='Product',y='Age',ax=ax[0,0],color='y')
ax[0,0].set_title("KP281 Customers Age")

sns.boxplot(p1,x='Product',y='Usage',ax=ax[0,1],color='orange')
ax[0,1].set_title("KP281 Customers Usage per week")

sns.boxplot(p1,x='Product',y='Income',ax=ax[1,0],color='r')
ax[1,0].set_title("KP281 Customers Income")

sns.boxplot(p1,x='Product',y='Fitness',ax=ax[1,1])
ax[1,1].set_title("KP281 Customers Fitness per week")

sns.boxplot(p1,x='Product',y='Education',ax=ax[2,0],color='orange')
ax[2,0].set_title("KP281 Customers Education")

sns.boxplot(p1,x='Product',y='Miles',ax=ax[2,1],color='y')
ax[2,1].set_title("KP281 Customers Miles per week")
plt.show()

# Product KP481

## Univariate anlaysis

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=2)
p2['Gender'].value_counts().plot(kind='bar',ax=ax[0])
ax[0].set_title('Countplot of Gender')
p2['MaritalStatus'].value_counts().plot(kind='bar',ax=ax[1])
ax[1].set_title('Countplot of MaritalStatus')

In [None]:
# Slight difference in Male and Female. Partnered customers are more than single customers

In [None]:
for i in con:
    plt.figure(figsize=(6,4))
    sns.histplot(data=p2,x=i,kde=True)
    plt.show()

### Analysis: 
    Customers with age group 19-27 and 32-35yrs
    Education of 14yrs and 16yrs
    Usage of 3
    Fitness 3/5
    Income range of 44-55k
    Miles target 85-105

## Bivariate Analysis
    scatter plot and heatmap: con vs con
    cross tab: cat vs cat
    boxplot: cat vs con

In [None]:
# one hot encoding to convert catergorical values to continuous values
p2gd=pd.get_dummies(p2)
plt.figure(figsize=(16,8))
sns.heatmap(p2gd.corr(),annot=True)

In [None]:
sns.scatterplot(p2,x='Age',y='Income')

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=2,figsize=(8,6))

ctab1=pd.crosstab(p2['Product'],p2['MaritalStatus'])
sns.heatmap(ctab1,annot=True,ax=ax[0])
ax[0].set_title('KP481 product vs MaritalStatus')

ctab2=pd.crosstab(p2['Product'],p2['Age'])
sns.heatmap(ctab2,annot=True,ax=ax[1])
ax[1].set_title('KP481 product vs Age')

In [None]:
# Maximum customers belong to 23 and 25yrs 
# Partnered are more than single

In [None]:
fig,ax = plt.subplots(nrows=3, ncols=2, figsize=(12,15))


sns.boxplot(p2,x='Product',y='Age',ax=ax[0,0],color='y')
ax[0,0].set_title("KP481 Customers Age")

sns.boxplot(p2,x='Product',y='Usage',ax=ax[0,1],color='orange')
ax[0,1].set_title("KP481 Customers Usage per week")

sns.boxplot(p2,x='Product',y='Income',ax=ax[1,0],color='r')
ax[1,0].set_title("KP481 Customers Income")

sns.boxplot(p2,x='Product',y='Fitness',ax=ax[1,1])
ax[1,1].set_title("KP481 Customers Fitness per week")

sns.boxplot(p2,x='Product',y='Education',ax=ax[2,0],color='orange')
ax[2,0].set_title("KP481 Customers Education")

sns.boxplot(p2,x='Product',y='Miles',ax=ax[2,1],color='y')
ax[2,1].set_title("KP481 Customers Miles per week")
plt.show()

# Product KP781

## Univariate Analysis

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=2)
p3['Gender'].value_counts().plot(kind='bar',ax=ax[0])
ax[0].set_title('Countplot of Gender')
p3['MaritalStatus'].value_counts().plot(kind='bar',ax=ax[1])
ax[1].set_title('Countplot of MaritalStatus')

In [None]:
for i in con:
    sns.histplot(data=p2,x=i,kde=True)
    plt.show()

## Analysis:
    partnered customers are more than single
    maximum customers are male
    age group 22-31yrs
    education 16-17 and 18-19yrs
    usage is 4 or 5times a week
    Fitness rating 4.75-5
    income range 50k or 95k
    miles 1160-200miles

## Bivariate Analysis
    scatter plot and heat map : con vs con
    box plot : cat vs con
    crossstab : cat vs cat

In [None]:
# one hot encoding to convert catergorical values to continuous values
p3gd=pd.get_dummies(p3)
plt.figure(figsize=(16,8))
sns.heatmap(p3gd.corr(),annot=True)

In [None]:
sns.scatterplot(p3,x='Income',y='Age')

In [None]:
# Income and Age are correlated

In [None]:
fig,ax=plt.subplots(nrows=1,ncols=2,figsize=(8,6))

ctab1=pd.crosstab(p3['Product'],p3['MaritalStatus'])
sns.heatmap(ctab1,annot=True,ax=ax[0])
ax[0].set_title('KP781 product vs MaritalStatus')

ctab2=pd.crosstab(p3['Product'],p3['Age'])
sns.heatmap(ctab2,annot=True,ax=ax[1])
ax[1].set_title('KP781 product vs Age')

In [None]:
fig,ax = plt.subplots(nrows=3, ncols=2, figsize=(12,15))


sns.boxplot(p3,x='Product',y='Age',ax=ax[0,0],color='y')
ax[0,0].set_title("KP781 Customers Age")

sns.boxplot(p3,x='Product',y='Usage',ax=ax[0,1],color='orange')
ax[0,1].set_title("KP781 Customers Usage per week")

sns.boxplot(p3,x='Product',y='Income',ax=ax[1,0],color='r')
ax[1,0].set_title("KP781 Customers Income")

sns.boxplot(p3,x='Product',y='Fitness',ax=ax[1,1])
ax[1,1].set_title("KP781 Customers Fitness per week")

sns.boxplot(p3,x='Product',y='Education',ax=ax[2,0],color='orange')
ax[2,0].set_title("KP781 Customers Education")

sns.boxplot(p3,x='Product',y='Miles',ax=ax[2,1],color='y')
ax[2,1].set_title("KP781 Customers Miles per week")
plt.show()

# ===========================================================

In [None]:
## Product vs Age
fig,ax = plt.subplots(nrows=3, ncols=2, figsize=(12,15))
sns.boxplot(data=td, x='Age', y='Product', ax=ax[0,0]); ax[0,0].set_title('Customers Age for different Products')
sns.boxplot(data=td, x='Income', y='Product', ax=ax[0,1]); ax[0,1].set_title('Customers Income for different Product')
sns.boxplot(data=td, x='Fitness', y='Product', ax=ax[1,0]); ax[1,0].set_title('Customers Fitness for different Product')
sns.boxplot(data=td, x='Miles', y='Product', ax=ax[1,1]); ax[1,1].set_title('Customers Miles for different Product')
sns.boxplot(data=td, x='Usage', y='Product', ax=ax[2,0]); ax[2,0].set_title('Customers Usage for different Product')
sns.boxplot(data=td, x='Education', y='Product', ax=ax[2,1]); ax[2,1].set_title('Customers Education for different Product')

# Individual Product Analysis:
### KP281: This is the highest sold product.Below customers are our target customers:
        1. Low income people aged(38k-55k) between 23-34yrs with education around 14,16yrs.
        2. People with fitness levels 3/5 and who target 65-95 avg miles per week.
        3. Married couple tend to purchase this product more than single.
    
### KP481: This is the 2nd highest sold product. Below customers are our target customers:
        1. Moderate income people aged(45k-55k) between 24-34yrs with education around 14,16yrs.
        2. People with fitness levels 3/5 and who target 60-110 avg miles per week.
        3. Married couple tend to purchase this product more than single.

### KP781: This is the lowest sold product. Below customers are our target customers:
        1. High income people(range 58k-90k) aged between 25-28yrs with education around 16,18yrs.
        2. People with fitness levels 4/5 and who target 120-200 avg miles per week.
        3. Married couple tend to purchase this product more than single.

# Overall Product Analysis: Among all the 3 products, 
## KP281 is the highest sold product and KP481 is second highest sold product.
## KP781 is premium product. We can look for below features for target customers
    1. Many people are preferring KP281 over KP481 due to economical reasons.
    2. The target fitness goals and age group is similar for KP281,KP481 products.
    3. KP781 is mostly bought by high salaried people with good fitness levels.
    
### Irrespective of product type,below features can be observed in target customers:
    1. Rate of purchase by Married people is more than single people.
    2. People target to workout atleast 3 times a week 
    3. The more the age, the higher the income, so we need to target higher age group.
    4. People with higher education tend  to earn more and spend on premium products