# Business Case: Aerofit - Descriptive Statistics & Probability

### About Aerofit

Aerofit is a leading brand in the field of fitness equipment. Aerofit provides a product range including machines such as treadmills, exercise bikes, gym equipment, and fitness accessories to cater to the needs of all categories of people.


### Business Problem

The market research team at AeroFit wants to identify the characteristics of the target audience for each type of treadmill offered by the company, to provide a better recommendation of the treadmills to the new customers. The team decides to investigate whether there are differences across the product with respect to customer characteristics.

Perform descriptive analytics to create a customer profile for each AeroFit treadmill product by developing appropriate tables and charts.
For each AeroFit treadmill product, construct two-way contingency tables and compute all conditional and marginal probabilities along with their insights/impact on the business.

### Dataset

The company collected the data on individuals who purchased a treadmill from the AeroFit stores during the prior three months. The dataset has the following features:

Dataset link: <a href="https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/125/original/aerofit_treadmill.csv?1639992749" >Aerofit_treadmill.csv </a>


|Feature|Possible Values|
|-------|---------------|
|Product Purchased |KP281, KP481, or KP781|
|Age|	In years|
|Gender|Male/Female|
|Education|	In years|
|MaritalStatus|	Single or partnered|
|Usage|The avg. no. of times customer plans to use the treadmill each week.|
|Income|Annual income (in $)|
|Fitness|Self-rated fitness on a 1-to-5 scale (1-poor shape & 5-excellent shape.)|
|Miles|The avg. no. of miles the customer expects to walk/run each week|

### Product Portfolio:
  - The KP281 is an entry-level treadmill that sells for dollar 1,500
  - The KP481 is for mid-level runners that sell for dollar 1,750.
  - The KP781 treadmill is having advanced features that sell for dollar 2,500.
 
  

### Importing the required libraries or packages for EDA 

In [18]:
#Importing packages
import numpy as np
import pandas as pd

# Importing matplotlib and seaborn for graphs
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

### Utility Functions - Used during Analysis

#### Missing Value - Calculator

In [19]:
def missingValue(df):
    #Identifying Missing data. Already verified above. To be sure again checking.
    total_null = df.isnull().sum().sort_values(ascending = False)
    percent = ((df.isnull().sum()/df.isnull().count())*100).sort_values(ascending = False)
    print("Total records = ", df.shape[0])

    md = pd.concat([total_null,percent.round(2)],axis=1,keys=['Total Missing','In Percent'])
    return md

In [3]:
aerofit_data = pd.read_csv("./aerofit_treadmill.csv")

In [4]:
aerofit_data.head()

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
0,KP281,18,Male,14,Single,3,4,29562,112
1,KP281,19,Male,15,Single,2,3,31836,75
2,KP281,19,Female,14,Partnered,4,3,30699,66
3,KP281,19,Male,12,Single,3,3,32973,85
4,KP281,20,Male,13,Partnered,4,2,35247,47


In [5]:
aerofit_data.shape

(180, 9)

In [6]:
aerofit_data.columns

Index(['Product', 'Age', 'Gender', 'Education', 'MaritalStatus', 'Usage',
       'Fitness', 'Income', 'Miles'],
      dtype='object')

### Validating Duplicate Records

In [7]:
aerofit_data = aerofit_data.drop_duplicates()
aerofit_data.shape

(180, 9)

### Inference
  - No dupicates records found.

### Missing Data Analysis

In [22]:
missingValue(aerofit_data).head(5)

Total records =  180


Unnamed: 0,Total Missing,In Percent
Product,0,0.0
Age,0,0.0
Gender,0,0.0
Education,0,0.0
MaritalStatus,0,0.0


### Inference
  - No missing value found.

### Unique values (counts) for each Feature

In [23]:
aerofit_data.nunique()

Product           3
Age              32
Gender            2
Education         8
MaritalStatus     2
Usage             6
Fitness           5
Income           62
Miles            37
dtype: int64

### Unique values (names) are checked for each Features

In [25]:
aerofit_data['Product'].unique()

array(['KP281', 'KP481', 'KP781'], dtype=object)

In [26]:
aerofit_data['Age'].unique()

array([18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 43, 44, 46, 47, 50, 45, 48, 42],
      dtype=int64)

In [27]:
aerofit_data['Gender'].unique()

array(['Male', 'Female'], dtype=object)

In [28]:
aerofit_data['Education'].unique()

array([14, 15, 12, 13, 16, 18, 20, 21], dtype=int64)

In [29]:
aerofit_data['MaritalStatus'].unique()

array(['Single', 'Partnered'], dtype=object)

In [30]:
aerofit_data['Usage'].unique()

array([3, 2, 4, 5, 6, 7], dtype=int64)

In [31]:
aerofit_data['Fitness'].unique()

array([4, 3, 2, 1, 5], dtype=int64)

In [32]:
aerofit_data['Income'].unique()

array([ 29562,  31836,  30699,  32973,  35247,  37521,  36384,  38658,
        40932,  34110,  39795,  42069,  44343,  45480,  46617,  48891,
        53439,  43206,  52302,  51165,  50028,  54576,  68220,  55713,
        60261,  67083,  56850,  59124,  61398,  57987,  64809,  47754,
        65220,  62535,  48658,  54781,  48556,  58516,  53536,  61006,
        57271,  52291,  49801,  62251,  64741,  70966,  75946,  74701,
        69721,  83416,  88396,  90886,  92131,  77191,  52290,  85906,
       103336,  99601,  89641,  95866, 104581,  95508], dtype=int64)

In [33]:
aerofit_data['Miles'].unique()

array([112,  75,  66,  85,  47, 141, 103,  94, 113,  38, 188,  56, 132,
       169,  64,  53, 106,  95, 212,  42, 127,  74, 170,  21, 120, 200,
       140, 100,  80, 160, 180, 240, 150, 300, 280, 260, 360], dtype=int64)

### Inference
  - No abnormalities were found in the data.

### DataType Validation

In [34]:
aerofit_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 180 entries, 0 to 179
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Product        180 non-null    object
 1   Age            180 non-null    int64 
 2   Gender         180 non-null    object
 3   Education      180 non-null    int64 
 4   MaritalStatus  180 non-null    object
 5   Usage          180 non-null    int64 
 6   Fitness        180 non-null    int64 
 7   Income         180 non-null    int64 
 8   Miles          180 non-null    int64 
dtypes: int64(6), object(3)
memory usage: 14.1+ KB


### Inference
  - No problems with the type of data used.

In [35]:
aerofit_data.describe()

Unnamed: 0,Age,Education,Usage,Fitness,Income,Miles
count,180.0,180.0,180.0,180.0,180.0,180.0
mean,28.788889,15.572222,3.455556,3.311111,53719.577778,103.194444
std,6.943498,1.617055,1.084797,0.958869,16506.684226,51.863605
min,18.0,12.0,2.0,1.0,29562.0,21.0
25%,24.0,14.0,3.0,3.0,44058.75,66.0
50%,26.0,16.0,3.0,3.0,50596.5,94.0
75%,33.0,16.0,4.0,4.0,58668.0,114.75
max,50.0,21.0,7.0,5.0,104581.0,360.0


## Data Preparation