# 1. Introduction

# Aerofit: Descriptive Statistics & Probability

**By Akanksha Trivedi - Scaler Academy**

# 2. Project Purpose

## Introduction

Aerofit is a leading brand in fitness equipment. This analysis aims to identify the characteristics of the target audience for each treadmill product offered by the company.

# 3. Dataset Overview

## Purpose

- Perform descriptive analytics to create customer profiles
- Construct contingency tables for conditional/marginal probabilities
- Recommend treadmills based on customer profiles

# 4. Univariate Analysis & Outliers

## Dataset Characteristics

In [None]:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load dataset
df = pd.read_csv('aerofit_data.csv')
df.head()
df.info()
df.describe()
```

# 5. Bivariate Analysis

### 🔍 Observations

- 180 samples with 9 variables
- KP281 is top selling product
- Age group 24-33 is the largest segment
- Male and partnered users dominate

# 6. Probability Analysis

## Univariate Analysis & Outliers

In [None]:
```python
# Distribution plots
numeric_cols = ['Age', 'Education', 'Usage', 'Fitness', 'Income', 'Miles']
df[numeric_cols].hist(bins=20, figsize=(12, 10))
plt.tight_layout()
plt.show()

# Boxplots for outliers
for col in numeric_cols:
    sns.boxplot(x=df[col])
    plt.title(f'Boxplot for {col}')
    plt.show()
```

# 7. Customer Profiling & Strategy

### 🔍 Observations

- Common age group: ~25
- Usage: 3-4 times/week
- Fitness self-rating: 3
- KP281 is most sold (44.44%), followed by KP481 (33.33%), KP781 (22.22%)
- Income and miles have high outliers

## Bivariate Analysis

In [None]:
```python
# Example: Age vs Product
sns.boxplot(x='Product', y='Age', data=df)
plt.title('Age Distribution by Product')
plt.show()
```

### 🔍 Observations

- KP781 attracts older, more fit users
- KP281 used by consistent users with mid-range income
- KP781 has higher education and fitness levels
- KP481 overlaps KP281 with higher income

## Marginal & Conditional Probability

In [None]:
```python
# Example contingency table
pd.crosstab(df['Product'], df['Gender'], normalize='index')
```

## Customer Profiling & Recommendations

### KP281
- Age: 25–35
- Gender & Marital Status: No strong influence
- Income: 40k–50k
**Recommendations:** Keep price competitive due to cost sensitivity.

### KP481
- Similar to KP281 but higher income
**Recommendations:** Add more features to differentiate, push through marketing.

### KP781
- Older, fit, and higher income customers
**Recommendations:** Appeal to high-end market; target less-fit wealthy group too.