# Health Stats Part 4: Waist 2 Hip Ratios - Pandas Only

<!--- Write an explanation of the Waist To Hips Ratio statistic used by health professionals. Please include an explanation of what it is used for, exactly how it is calculated, and how to interpret the results. Note: Formmatting matters. Make this as professional as you can using Markdown.  --->

<!--- feel free to use any web resources, including [Wikipedia](https://en.wikipedia.org/wiki/Waist%E2%80%93hip_ratio) or any other resources that you can find online. Just MAKE SURE you provide a link to every resource you decide to use. --->

<!--- Including the formula, or that fancy diagram/table you see on wikipedia is DEFINITELY a good idea! How? The LaTeX equations section in [This link](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) might help. --->

<!--- For extra points, try to create a table similar to the one on the wikipedia page on your own. --->

Waist-to-Hip Ratio

The Waist-to-Hip ratio is calculated by dividing the circumfrence of the waist by the circumference of the subject's hip. This ratio is considered a key indicator of a subject's health and the risk a subject is at of developing potentially serious health conditions. The World Health Organization (WHO) provides specifications for how this measurement is supposed to be taken. For the waist measurement, the measurement should be taken from below the subject's palbable rib and the top of the illiac crest. When taking the waist measurement it is important that the waist measurement is taken from here rather than the navel (which is more common but incorrect), as doing this will typically understate the measurement. The hip measurement should be taken from the widest part of the subject's buttocks.

Since W-to-H ratio is an important indicator of obesity it is imperative to understand how interpret the calculation's result. Various health organizations provide different paramaters to interpret the results.

Source: https://en.wikipedia.org/wiki/Waist%E2%80%93hip_ratio

$ ratio_{w2h} = \frac{w}{h} $

DGSP

- Women:
       - Normal-weight: <=.8
       - Over-weight  : .81-.84
       - Obese        : >=.85
- Men:
       - Normal-weight: <=.9
       - Over-weight  : .91-.99
       - Obese        : >=1.0

WHO

- Women:
        -Obese: >.85
- Men: 
        -Obese: >.9

NIDDK

- Women: 
        -Obese: >.8
- Men:
        -Obese: > .9
        
 <img src="https://www.meandmywaist.com/wp-content/uploads/2017/06/WHR.png">








## Source Data 

<!--- Replace the text below with a Markdown bullet list that defines the columns of the CSV file. Be sure to indicate the data type for each column. --->

<!--- Example can be: ID, unique identifier of each person, integer. Remember you need to put this into a bullet list! How? [This link](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) might help. --->

<!--- These two markdown cells are required in almost any analytical report. --->

__EDIT THIS MARKDOWN CELL__
- ID
    - Interger
    - This variable identifies the subject
- Waist
    - Integer
    - This variable is the subject's waist measurement
- Hip
    - Integer
    - This variable is the subject's hip measurement
- Gender
    - String
    - This variable indentifies the subject's gender
- W2H Ratio
    - Float
    - This variable shows the result of each subject's "Waist to Hip" Ratio
    - This variable is calculated by dividing "Waist" by "Hip"
- Shape
    - String
    - This variable depends on the subject's ratio and their gender
  

## Data Import

In [3]:
# Goal: Extract the data from the file

# use proper pandas function to read data from a CSV file to a DataFrame
import pandas as pd

w2h_data= pd.read_csv("w2h_data.csv")
w2h_data

Unnamed: 0,ID,Waist,Hip,Gender
0,1,30,32,M
1,2,32,37,M
2,3,30,36,M
3,4,33,39,M
4,5,29,33,M
5,6,32,38,M
6,7,33,42,M
7,8,30,40,M
8,9,30,37,M
9,10,32,39,M


## Calculations

In [4]:
# Goal: For each row of data calculate and store the w2h_ratio and shape.
import numpy as np
# Step 1: Make sure the columns are in correct data types - refer to previous parts for the correct data types
w2h_data.dtypes
# Step 2: calculate the 'W2H_Ratio' column using the 'Waist' and 'Hip' columns
#We need to divide the waist column by the hip data
w2h_data['W2H_Ratio']=w2h_data.Waist/w2h_data.Hip
# Step 3: Create the `Shape` column based on the values of the `W2H_Ratio` column - refer to previous parts for the log
#set up rules to match each shape with the corresponding ratio for males vs. females
rules = [
    (w2h_data['Gender']== 'F') & (w2h_data['W2H_Ratio']<=.8),
    (w2h_data['Gender']== 'F') & (w2h_data['W2H_Ratio']>.8),
    (w2h_data['Gender']== 'M') & (w2h_data['W2H_Ratio']<=.9),
    (w2h_data['Gender']=='M') & (w2h_data['W2H_Ratio']>.9)
]
#create the shape variable to match the rules
shapes =['Pear','Apple','Pear','Apple']
#match the shape variables to the rules created above
w2h_data['Shape']=np.select(rules,shapes)
w2h_data

Unnamed: 0,ID,Waist,Hip,Gender,W2H_Ratio,Shape
0,1,30,32,M,0.9375,Apple
1,2,32,37,M,0.864865,Pear
2,3,30,36,M,0.833333,Pear
3,4,33,39,M,0.846154,Pear
4,5,29,33,M,0.878788,Pear
5,6,32,38,M,0.842105,Pear
6,7,33,42,M,0.785714,Pear
7,8,30,40,M,0.75,Pear
8,9,30,37,M,0.810811,Pear
9,10,32,39,M,0.820513,Pear


## Output

In [6]:
# Goal: pretty print the rows as an HTML table

# Display the complete DF

# Save the DF to a file './complete_w2h.csv'
display(w2h_data) #dsiplay 
w2h_data.to_html #print the data frame as an html table below
w2h_data.to_csv(r'./complete_w2h.csv')

Unnamed: 0,ID,Waist,Hip,Gender,W2H_Ratio,Shape
0,1,30,32,M,0.9375,Apple
1,2,32,37,M,0.864865,Pear
2,3,30,36,M,0.833333,Pear
3,4,33,39,M,0.846154,Pear
4,5,29,33,M,0.878788,Pear
5,6,32,38,M,0.842105,Pear
6,7,33,42,M,0.785714,Pear
7,8,30,40,M,0.75,Pear
8,9,30,37,M,0.810811,Pear
9,10,32,39,M,0.820513,Pear


## Lessons Learned

Have you noticed in the previous two parts, how much code we have written? Have you also note how little code we have written in this part, with the help of Pandas? 

This is the reason why we want to use Pandas to handle the data we use for analytics.