### The Data Analysis Framework

1. Look at the information in the data
2. Decide on the analysis objective(s)
3. What are the different questions that come to mind when looking at the data?
4. Select the questions that are in line with the objective(s)
5. Look for answers to these questions in the data
6. Summarise the findings

### Let's see how to use this in a data analysis setting with a simple example
We would be using the gender-weight-height-bmi dataset downloaded from Kaggle and apply the above framework on it to generate some insights

In [1]:
## loading required packages
import numpy as np
import pandas as pd
import seaborn as sns

In [2]:
## Loading the gender-weight-height dataset
df = pd.read_csv('500_Person_Gender_Height_Weight_Index.csv')
print('data shape:',df.shape)
df.head()

data shape: (500, 4)


Unnamed: 0,Gender,Height,Weight,Index
0,Male,174,96,4
1,Male,189,87,2
2,Female,185,110,4
3,Female,195,104,3
4,Male,149,61,3


#### Step 1: Look at the information in the data

Our data contains the gender, weight, height and body mass index (BMI) information of 500 distinct people. The fields - Height & Weight are numeric, while Gender & Index are categorical in nature. There are no missing values in the data.

Below is the decsription of fields in the data:

Gender : Male / Female

Height : Number (cm)

Weight : Number (Kg)

Index :
0 - Extremely Weak, 
1 - Weak, 
2 - Normal, 
3 - Overweight, 
4 - Obesity, 
5 - Extreme Obesity

#### Step 2: Decide on the analysis objective(s)

Let's try to look at the impact of gender on weight and height

#### Step 3 & 4: 

We can combine step 3 & 4 in the framework and jot down some questions that we would try to answer further in the exercise

Q1: Average height of male vs female? <br>
Q2: Min height of male vs female? <br>
Q3: Max height of male vs female?

Q4: Average weight of male vs female? <br>
Q5: Min weight of male vs female? <br>
Q6: Max weight of male vs female? <br>

#### Step 5: Try answering the questions using the data

In [3]:
## Average Height
avg_height = df.groupby('Gender')['Height'].mean()

## Minimum Height 
min_height = df.groupby('Gender')['Height'].min()

## Maximum Height 
max_height = df.groupby('Gender')['Height'].max()

## Average Weight
avg_weight = df.groupby('Gender')['Weight'].mean()

## Minimum Weight 
min_weight = df.groupby('Gender')['Weight'].min()

## Maximum Weight 
max_weight = df.groupby('Gender')['Weight'].max()


print('### Height Metrics ###')
print('Avg_Height:', avg_height.round(1))
print('Min_Height:', min_height)
print('Max_Height:', max_height)

print('\n')
print('\n')

print('### Weight Metrics ###')
print('Avg_Weight:', avg_weight.round(1))
print('Min_Weight:', min_weight)
print('Max_Weight:', max_weight)

### Height Metrics ###
Avg_Height: Gender
Female    170.2
Male      169.6
Name: Height, dtype: float64
Min_Height: Gender
Female    140
Male      140
Name: Height, dtype: int64
Max_Height: Gender
Female    199
Male      199
Name: Height, dtype: int64




### Weight Metrics ###
Avg_Weight: Gender
Female    105.7
Male      106.3
Name: Weight, dtype: float64
Min_Weight: Gender
Female    50
Male      50
Name: Weight, dtype: int64
Max_Weight: Gender
Female    160
Male      160
Name: Weight, dtype: int64


#### Step 6: Summarising the findings

Our analysis on our dataset shows no impact of gender on height or weight.