# Health Stats Part 1: Waist 2 Hip Ratios

# W2H Ratio
- a ratio of the circumference of a person's waist to their hips (see photo below for further explanation)
    - waist circumference is measured just above the belly button
    - hip circumference is measured at the widest part of the hips
- this measurment is calculated by dividing waist by hips:
    $ ratio_{w2h} = \frac{w}{h} $
- this ratio is used to indicate the relative health of a person and their risk of developing a serious health condition in the future
    - larger waist circumference (apple- shaped) can lead to greater health risk, rather than a larger hip circumference (pear- shaped)
    - for instance, the risk for diabetes increases with a W2H ratio above 0.85 for females and above 1.0 for men due to fat distribution
    - this ratio is also thought to be correlated with fertility

- research shows that obesity can be defined by a W2H ratio: 
    - above 0.90 for males
    - above 0.85 for females 
  
source: https://en.wikipedia.org/wiki/Waist–hip_ratio

<img src = 'https://upload.wikimedia.org/wikipedia/commons/d/dd/Waist-hip_ratio.svg' />


The following table represents W2H ratios considered by three well-known organizations. 
    - DGSP
        -represents the first two columns of women and men data
    - WHO
        -represents the second two columns of women and men data
    - NIDDK
        -represents the third two columns of women and men data
    

| **DGSP** | **WHO** | **NIDDK** |
| ------- :| -----: | -----: |

| Women | Men | Women | Men | Women | Men |      |
| -----:| ---: | -----:| ---:| -----:| ---: | ---: |
| ?  | ?  | ?  | ?  | ?  | ?  |**under-weight**|
| < 0.80| < 0.90 | ?  | ?  | ?  | ?  |**normal weight**|
| 0.80-0.84 | 0.90-0.99 | ?  | ?  | ?  | ?  |**over-weight**|
| >0.85 | >1.00 | >0.85  | >0.90  | >0.80  | >1.00  |**obesity**| 
   
   
   




<!--- Write an explanation of the Waist To Hips Ratio statistic used by health professionals. Please include an explanation of what it is used for, exactly how it is calculated, and how to interpret the results. Note: Formmatting matters. Make this as professional as you can using Markdown.  --->

<!--- feel free to use any web resources, including [Wikipedia](https://en.wikipedia.org/wiki/Waist%E2%80%93hip_ratio) or any other resources that you can find online. Just MAKE SURE you provide a link to every resource you decide to use. --->

<!--- Including the formula, or that fancy diagram/table you see on wikipedia is DEFINITELY a good idea! How? The LaTeX equations section in [This link](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) might help. --->

<!--- For extra points, try to create a table similar to the one on the wikipedia page on your own. --->

__EDIT THIS MARKDOWN CELL__

## Source Data 



## Definitions of Columns in CSV File
- **ID**: uniquie identifier of each person, integer
- **Waist**: measurment for circumference of area just above the belly button, integer
- **Hip**: measurment for circumference of area at the widest part of the hips, integer
- **Gender**: identity identifier of each person, string



## Data Import

For whatever type of analysis, we need to read in the data. 

This is the basic way how Python read-in data. 

For more information regarding this part, read Chapter 7 in your PY4E textbook.

In [1]:
# Goal: Extract the data from the file

# opens the w2h_data.csv for reading
f = open("w2h_data.csv", "r")

# loads the file into a list of strings, one string per line
raw_lines = list(f)

# closes the file
f.close()

In [2]:
raw_rows = [r.rstrip('\n').split(',') for r in raw_lines]

In [3]:
print(raw_rows)

[['ID', 'Waist', 'Hip', 'Gender'], ['1', '30', '32', 'M'], ['2', '32', '37', 'M'], ['3', '30', '36', 'M'], ['4', '33', '39', 'M'], ['5', '29', '33', 'M'], ['6', '32', '38', 'M'], ['7', '33', '42', 'M'], ['8', '30', '40', 'M'], ['9', '30', '37', 'M'], ['10', '32', '39', 'M'], ['11', '24', '35', 'F'], ['12', '25', '37', 'F'], ['13', '24', '37', 'F'], ['14', '22', '34', 'F'], ['15', '26', '38', 'F'], ['16', '26', '37', 'F'], ['17', '25', '38', 'F'], ['18', '26', '37', 'F'], ['19', '28', '40', 'F'], ['20', '23', '35', 'F']]


Data are not useful when they are in the wrong data type, or have wrong values, missing values... 

Clean up your data is an important step in any analysis.

In [4]:
# Goal: Scrub and convert the data, loading it into a new list called rows

# Strips out newline '\n' characters and converts to a list
raw_rows = [r.rstrip('\n').split(',') for r in raw_lines] # <--- Whoa. Why does this work?
                                                        #rstrip removes characters on the right ie:'/n'
                                                        #split takes these strings and inputs them into a list

# Creates a new list `rows`, starting with just the column names
rows = list() 
rows.append(raw_rows[0]);

# Convert each `raw_row`, starting with the second
for raw_row in raw_rows[1:]:
    
    # Note: the values in the `raw_row` list are all strings.
    # Create a new list called `row` that converts each item in `raw_row` to the right data type  
    row = [int(raw_row[0]),int(raw_row[1]),int(raw_row[2]),raw_row[3]] # FIX THIS-fixed
    # you'll need to use conversion functions above
    # Append the new `row` to the `rows` list
    rows.append(row)

print(rows)
    
# from here on out use the `rows` list instead of `raw_rows` or `raw_lines`
# You may want to print out `rows` to test whether your code above worked

[['ID', 'Waist', 'Hip', 'Gender'], [1, 30, 32, 'M'], [2, 32, 37, 'M'], [3, 30, 36, 'M'], [4, 33, 39, 'M'], [5, 29, 33, 'M'], [6, 32, 38, 'M'], [7, 33, 42, 'M'], [8, 30, 40, 'M'], [9, 30, 37, 'M'], [10, 32, 39, 'M'], [11, 24, 35, 'F'], [12, 25, 37, 'F'], [13, 24, 37, 'F'], [14, 22, 34, 'F'], [15, 26, 38, 'F'], [16, 26, 37, 'F'], [17, 25, 38, 'F'], [18, 26, 37, 'F'], [19, 28, 40, 'F'], [20, 23, 35, 'F']]


In [5]:
#row = int(float(raw_row[0]),(float(raw_row[1]),(float(raw_row[2]),(float(raw_row[3]))

In [6]:
#original code
#row = int(raw_row[0],int(raw_row[1]),int(raw_row[2]),int(raw_row[3]))

## Calculations

Sometimes, the data given to you do not contain the values you need directly, you will need to calculate them somehow. 

In this part, you calculate two new features namely `W2H Ratio` and `Shape`.

In [7]:
# Goal: For each row of data calculate and store the w2h_ratio and shape.

# Adds columns for the two new features
rows[0].extend(["W2H Ratio","Shape"])

# For each row in the rows list, calculate the waist to hips ratio and shape
for row in rows[1:]:
    # Calculate the w2h_ratio 
    w2h_ratio = row[1] / row[2] # FIX THIS-fixed; you will need to take care about data types
    if w2h_ratio < 0.90: # Based on the ratio and the gender, set the variable shape to either 'Apple' or 'Pear'
        shape = 'Pear'
    else:
        shape = 'Apple' # FIX THIS-fixed; you will need to use a conditional
    
    # Add the new data to the end of the row
    row += [w2h_ratio, shape] # note: += is shorthand for the extend method used above
    

## Output

In your analysis report, it is always helpful to display your data somehow.

This is a very rudimentary way to displaying your data, including the original features and the new features you just calculated.

In [8]:
# Goal: pretty print the rows as an HTML table

# Note: this works, but we can do this much better with pandas
html_table = '<table><tr><th>'
html_table += "</th><th>".join(rows[0])
html_table += '</th></tr>'
for row in rows[1:]:
    html_table += "<tr><td>"
    html_table += "</td><td>".join(str(col) for col in row)
    html_table += "</td></tr>"
html_table += "</table>"

from IPython.display import HTML, display
display(HTML(html_table))

ID,Waist,Hip,Gender,W2H Ratio,Shape
1,30,32,M,0.9375,Apple
2,32,37,M,0.8648648648648649,Pear
3,30,36,M,0.8333333333333334,Pear
4,33,39,M,0.8461538461538461,Pear
5,29,33,M,0.8787878787878788,Pear
6,32,38,M,0.8421052631578947,Pear
7,33,42,M,0.7857142857142857,Pear
8,30,40,M,0.75,Pear
9,30,37,M,0.8108108108108109,Pear
10,32,39,M,0.8205128205128205,Pear
