# Background Information

#### Main Target
 - Investigate whether there are differences across the product lines with respect to customer characteristics.

#### Stakeholder
 -  AdRight market research team

#### Objective 
 - Descriptive analytics to create a customer profile for each CardioGood Fitness treadmill product line
 - Apply the analyise in generating and guiding marketing strategy, 

<br>

#### Dataset
The data is collected on individuals who purchased a treadmill at a CardioGoodFitness retail store during the prior three months and stored in the **CardioGoodFitness.csv** file. Data related to the following customer variables is collected: 

- product purchased (TM195, TM498, or TM798)
- gender; 
- age (in years)
- education (in years)
- relationship status (single or partnered)
- annual household income (in dollar)
- average number of times the customer plans to use the treadmill each week
- average number of miles the customer expects to walk/run each week
- self-rated fitness on an 1-to-5 scale (where 1 is poor shape and 5 is excellent shape)

#### Study Directions
- Compare sales data of the products
- Compare the characteristics of the customer (include personal information and expectation towards treadmill
- Exmine relation bewtween customer's characteristics and the products
 <br>

# Environment set-up and understand the data

In [None]:
# input libraries

import numpy as np 
import pandas as pd 
import re

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
CGF = pd.read_csv("/kaggle/input/cardiogoodfitness/CardioGoodFitness.csv")

In [None]:
CGF.shape

##### Notes:
CGF Dataset includes 180 observations.

In [None]:
CGF.head()

In [None]:
CGF.rename(columns={'MaritalStatus':'Marital_Status',
                    'Usage':'Treadmill_Usage',
                    'Fitness':'Self-rated_Fitness',
                    'Income':'Annual_Income'},inplace=True)

In [None]:
CGF.info()

##### Notes:
- Data in string type: **"Product", "Gender", "Marital_Status"** 
- Data in integer type: **"Age", "Education", "Treadmill_Usage", "Self-rated_Fitness", "Annual_Income", "Miles"**


In [None]:
CGF.isna().any()

In [None]:
CGF.duplicated().any()

##### Notes:
There is no missing or duplicated data in the dataset.

# Data Prepocessing

In [None]:
CGF.groupby("Product")["Annual_Income"].agg(["mean","min","max","count"])

In [None]:
CGF.describe(include="all")

##### Notes:
- Age of customers from 18 to 50, with mean = 28.79
- Customer plan to use Treadmill 2 to 7 times per week.
- Customer expect to run/walk on the Treadmill 21 to 360 km per week.
- Average self-rated fitness is 3.31, normal.
- Annual Income of customers from 16.5K to 104.5K, with mean = 53.7K.


#### Gender distribution in products

In [None]:
CGF["Gender"].value_counts()

In [None]:
Genderdist = CGF.groupby(["Product","Gender"])["Gender"].count()
TotalList = CGF.groupby("Product")["Gender"].count()
Gender_dist = round(Genderdist/TotalList*100)
print(Gender_dist)

##### Notes:
- The best-selling treadmill is TM195.
- Male customers buying treadmill more than female customers, while **half of TM195 customers is female**.
- Married customer buying Treadmill more than Single customers.


# Anaylsis with Visualization 

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('ticks')
sns.set(rc={'figure.figsize':(15,10)})

In [None]:
sns.pairplot(data=CGF)

In [None]:
sns.countplot(x="Gender",hue="Marital_Status",data=CGF).set(title = "Customers Married Status by Gender")

In [None]:
sns.countplot(x="Product",hue="Marital_Status",data=CGF).set(title = "Customers Married Status by Products")

In [None]:
sns.scatterplot(x="Age",y="Annual_Income",hue="Gender",data=CGF, size="Miles" ).set(title = "Age by Annual_Income in Gender")

In [None]:
sns.scatterplot(x="Treadmill_Usage",y="Annual_Income",hue="Product",data=CGF, size = "Miles" ).set(title = "Annual Income by Age in Product")

In [None]:
sns.scatterplot(x="Age",y="Miles",hue="Gender",data=CGF, size="Treadmill_Usage").set(title = "Miles by Age in Gender")

In [None]:
sns.swarmplot(x="Self-rated_Fitness",y="Miles",hue="Gender",data=CGF).set(title = "Miles by rated_Fitness in Gender")

In [None]:
CGF_corr = CGF.corr()
sns.heatmap(CGF_corr,annot=True)

# Important Observations

There is a strong correlation between:
- average number of times the customer plans to use the treadmill each week and the average number of miles the customer expects to walk/run each week, and 
- self-rated Fitness and an average number of miles the customer expects to walk/run each week.


## Customer Profiles of Models
- There are **more male customers buying treadmills than female customers in general**.
- In Marital status, it shows that there are **more married customers than single customers in all three products**.
- **Male customers with a high annual income tend to use the treadmill more.**
- Customers of TM798 tend to run more than customers of the other 2 products.
- Generally, male customers run more than female customers in most of the age.
- **Customers with higher income tend to choose TM798** and plan to use the treadmill more often each week.
- positive relationship between the average number of times the customer plans to use the treadmill each week and self-rated fitness score, **customers who plan to use treadmill more, rate higher in the self-rated fitness score**, while most of the data grouped in score 3

<br>


#### TM195
- TM195 is **the best-selling product**, the mean of the customers who bought TM195 is 46.4K.
- The **only** product that has an equal distribution in customer's gender.


#### TM498
- TM498 is the second best-selling product, the mean of the customers who bought TM498 is 48.9K.
- **Customers characteristics of TM498 is similar to TM195**, which can be observed from Annual Income by Age in Product


#### TM798
- The sales numbers of TM798 record the **least**, but with the **highest mean, 75K, in the annual income of customers**, seems like the high-income group prefers this model more than others.
- Most unequal gender distribution rate as **only 18% of customers of TM798 are female**.

# Recommendations

 
From the customers’ characteristics of each product, **we can observe that TM798 has the Middle class with active use of the treadmill, and customers of TM197 and TM498 share another similar group of characteristics**. I will recommend **an enhancement in the function/ clear market position for TM498, in order to target the group between TM197 and TM798**. And get all 3 products targeting different groups of customers.