# Health Stats Part 1: Waist 2 Hip Ratios

<!--- Write an explanation of the Waist To Hips Ratio statistic used by health professionals. Please include an explanation of what it is used for, exactly how it is calculated, and how to interpret the results. Note: Formmatting matters. Make this as professional as you can using Markdown.  --->

<!--- feel free to use any web resources, including [Wikipedia](https://en.wikipedia.org/wiki/Waist%E2%80%93hip_ratio) or any other resources that you can find online. Just MAKE SURE you provide a link to every resource you decide to use. --->

<!--- Including the formula, or that fancy diagram/table you see on wikipedia is DEFINITELY a good idea! How? The LaTeX equations section in [This link](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) might help. --->

<!--- For extra points, try to create a table similar to the one on the wikipedia page on your own. --->

__EDIT THIS MARKDOWN CELL__

# Waist-hip ratio

The __waist-hip ratio__ or waist-to-hip ratio (WHR) is the dimensionless ratio of the circumference of the waist to that of the hips. This is calculated as waist measurement divided by hip measurement (W ÷ H). For example, a person with a 30″ (76 cm) waist and 38″ (97 cm) hips has a waist-hip ratio of about 0.78.

<img src = 'https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Waist-hip_ratio.svg/300px-Waist-hip_ratio.svg.png' />

<strong> To calculate the WHR we use this formula: 

$ ratio_{w2h} = \frac{w}{h} $


The __WHR__ has been used as an indicator or measure of health, and the risk of developing serious health conditions. WHR correlates with fertility (with different optimal values in males and females).

WHR is used as a measurement of obesity, which in turn is a possible indicator of other more serious health conditions. The WHO states that abdominal obesity is defined as a waist-hip ratio above 0.90 for males and above 0.85 for females, or a body mass index (BMI) above 30.0. The National Institute of Diabetes, Digestive and Kidney Diseases (NIDDK) states that women with waist hip ratios of more than 0.8, and men with more than 1.0, are at increased health risk because of their fat distribution.

__Table 1__ 
    
The following table displays WHR standards for woman and man for what is considered under-weight, normal-weight, over-weight and obese. These ratios were set by DGSB, WHO, and NIDDK, respectively (e.g. the first women and man column was set by DGSB). 

                                                                                     
|<i></i> | Women | Man | Women | Man | Women | Man |
|-------|-------|-----|-------|-----|-------|-----| 
| Underweight | ? |? |? |? |? |? | 
| Normal-weight | < 0.80 | < 0.90 | ? | ? | ? | ? | 
| Over-weight | 0.80-0.84 | 0.90-0.99| ? | ? | ? | ? |
| Obesity | > 0.85 | > 1.00 | > 0.85 | > 0.90 | > 0.80 | > 1.00 |

[1]: Source: https://en.wikipedia.org/wiki/Waist%E2%80%93hip_ratio#cite_note-6 

                                                                                      

In [5]:
# NOTE: 
# if you use |---| one time it works 
# $ ratio_{w2h} = \frac{w}{h} $ --> $ (beggining + end) for formula
    # ratio (normal letters), _ (subscript the following), {} (enclose what you want to included in previous command)
    # /frac --> make a fraction and use {for upper} and {lower} value
    # end with $ for the end of formula

## Source Data 

<!--- Replace the text below with a Markdown bullet list that defines the columns of the CSV file. Be sure to indicate the data type for each column. --->

<!--- Example can be: ID, unique identifier of each person, integer. Remember you need to put this into a bullet list! How? [This link](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html) might help. --->

<!--- These two markdown cells are required in almost any analytical report. --->

__EDIT THIS MARKDOWN CELL__




## Data Import

For whatever type of analysis, we need to read in the data. 

This is the basic way how Python read-in data. 

For more information regarding this part, read Chapter 7 in your PY4E textbook.

In [1]:
# Goal: Extract the data from the file
    # Assign a variable to open the file command
    # use open() and specify "file name", and what you want to do with it --> r = reading 
    # There is more conditions you can use e.g. set delimiters, usecols [use columns specify], dtype = make all entries one type
    # skiprows
    # Better to use np.loadtext(), np.genfromtxt(), np.recsfromcsv(), or with open(here you don't need to close file)
    
# opens the w2h_data.csv for reading
f = open("w2h_data.csv", "r")
# d = f.read() f.close() print(d)

# loads the file into a list of strings, one string per line
    # list(single argument) creates a list, terable is optional (to change over object e.g. float)
    
raw_lines = list(f) # assign new variable and convert the f file into a list

# closes the file
f.close()

Data are not useful when they are in the wrong data type, or have wrong values, missing values... 

Clean up your data is an important step in any analysis.

In [2]:
# Goal: Scrub and convert the data, loading it into a new list called rows

# Strips out newline '\n' characters and converts to a list
    # .rstrip("removes end of the line characters - to the right of the list") in this case /n --> method
    # .split("every time you see a delimiter that seperates the SET of values --> split them up to SUBLISTS") - method
    # sublists are now sperated with [],[],[] brackets
    # for r in raw_lines --> a loop --> loop the previous commands for every element in raw_lines list
    # store this in a new variable raw_rows
    # []--> embed all to to make sure this only applies to this line --> so the loop captures previous commands
    # why []? 
raw_rows = [r.rstrip('\n').split(',') for r in raw_lines] # <--- Whoa. Why does this work? 

# for loop is great for updating the entire list the way you want it to look like

# Creates a new list `rows`, starting with just the column names
    # .append() --> Adds its argument as a single element to the end of a list 
        # append only the first element (element in this case is column names because of the split)
        # But why isnt that already appended? Maybe append that column names to every other sublist or reference ir
            # or connect it)
        # Why the ; ??
rows = list() 
rows.append(raw_rows[0]);

# Convert each `raw_row`, starting with the second
for raw_row in raw_rows[1:]:
    
    # Note: the values in the `raw_row` list are all strings.
    # Create a new list called `row` that converts each item in `raw_row` to the right data type 
        # making sure that numbers are numbers and not strings so we can make calculations
    row = [int((raw_row[0])),int(raw_row[1]), int(raw_row[2]),raw_row[3]] # FIX THIS; 
    # you'll need to use conversion functions above
    # Append the new `row` to the `rows` list
    rows.append(row) # convert/add everything said for the entire list starting with second sublist

    
print(rows)
# from here on out use the `rows` list instead of `raw_rows` or `raw_lines`
# You may want to print out `rows` to test whether your code above worked

[['ID', 'Waist', 'Hip', 'Gender'], [1, 30, 32, 'M'], [2, 32, 37, 'M'], [3, 30, 36, 'M'], [4, 33, 39, 'M'], [5, 29, 33, 'M'], [6, 32, 38, 'M'], [7, 33, 42, 'M'], [8, 30, 40, 'M'], [9, 30, 37, 'M'], [10, 32, 39, 'M'], [11, 24, 35, 'F'], [12, 25, 37, 'F'], [13, 24, 37, 'F'], [14, 22, 34, 'F'], [15, 26, 38, 'F'], [16, 26, 37, 'F'], [17, 25, 38, 'F'], [18, 26, 37, 'F'], [19, 28, 40, 'F'], [20, 23, 35, 'F']]


## Calculations

Sometimes, the data given to you do not contain the values you need directly, you will need to calculate them somehow. 

In this part, you calculate two new features namely `W2H Ratio` and `Shape`.

In [3]:
# Goal: For each row of data calculate and store the w2h_ratio and shape.
    # don't know what shape means in this context?? Apple means you are fat and pear not??

# Adds columns for the two new features
    # so only in the columns list add the following
    # .extent() expand the columns in this case by following
    # [] why?
rows[0].extend(["W2H Ratio","Shape"])

# For each row in the rows list, calculate the waist to hips ratio and shape
for row in rows[1:]: # update/calculate elements starting with the 2nd sublist 

    # Calculate the w2h_ratio
        # store results in variable
        # divide the second and third element in the sublist --> do it for all (loop)
    w2h_ratio = (row[1])/(row[2]) # FIX THIS; you will need to take care about data types
    
    # Based on the ratio and the gender, set the variable shape to either 'Apple' or 'Pear'
        # set a variable's innitial value = "Pear"
        # if the ratio is bigger than 0.90 and 4th row in the list is "M" --> w2h ratio from my table above
        # say the shape is "Apple" otherwise leave "Pear"
        # but if the 4th row is "F" and ratio is higher than 0.85 --> say apple otherwise leave pear 
    shape = "Pear" # FIX THIS; you will need to use a conditional
    if row[3] == "M" and w2h_ratio > 0.9:
        shape = "Apple"
    if row[3] == "F" and w2h_ratio > 0.85:
        shape = "Apple"
            
        
    row += [w2h_ratio, shape] # note: += is shorthand for the extend method used above
    
    # row here is a variable from for the loop
    # a += b is shorthand for a = a + b
    # meaning: each row element (start with 2nd one)= row element(2nd...) + calculations for w2h ratio and shape
    # [column names e.g. ratio, shape] [1, 30,32,M,ratio calc, Apple/pear]...
    # So that means put the calculation and condition under the column names W2H Ratio and Shape.
    
    
    #HOW DOES THE PROGRAM KNOW TO PUT THE CALCULATIONS UNDER THESE 2 COLUMNS?? WE DIDNT SPECIFY??
        # nevermind we did it in the HTML code bellow
print(rows)

[['ID', 'Waist', 'Hip', 'Gender', 'W2H Ratio', 'Shape'], [1, 30, 32, 'M', 0.9375, 'Apple'], [2, 32, 37, 'M', 0.8648648648648649, 'Pear'], [3, 30, 36, 'M', 0.8333333333333334, 'Pear'], [4, 33, 39, 'M', 0.8461538461538461, 'Pear'], [5, 29, 33, 'M', 0.8787878787878788, 'Pear'], [6, 32, 38, 'M', 0.8421052631578947, 'Pear'], [7, 33, 42, 'M', 0.7857142857142857, 'Pear'], [8, 30, 40, 'M', 0.75, 'Pear'], [9, 30, 37, 'M', 0.8108108108108109, 'Pear'], [10, 32, 39, 'M', 0.8205128205128205, 'Pear'], [11, 24, 35, 'F', 0.6857142857142857, 'Pear'], [12, 25, 37, 'F', 0.6756756756756757, 'Pear'], [13, 24, 37, 'F', 0.6486486486486487, 'Pear'], [14, 22, 34, 'F', 0.6470588235294118, 'Pear'], [15, 26, 38, 'F', 0.6842105263157895, 'Pear'], [16, 26, 37, 'F', 0.7027027027027027, 'Pear'], [17, 25, 38, 'F', 0.6578947368421053, 'Pear'], [18, 26, 37, 'F', 0.7027027027027027, 'Pear'], [19, 28, 40, 'F', 0.7, 'Pear'], [20, 23, 35, 'F', 0.6571428571428571, 'Pear']]


HOW TO PREVENT PROGRAM OF ADDING shapes and ratio columns and values if you run it 2 times in a row

## Output 

In your analysis report, it is always helpful to display your data somehow.

This is a very rudimentary way to displaying your data, including the original features and the new features you just calculated.

# Discussion answers are after the output

In [4]:
# Goal: pretty print the rows as an HTML table

# Note: this works, but we can do this much better with pandas

html_table = '<table><tr><th>'
    # html_table is a function to create a table 
    # "<table> --> make a talbe, <tr> --> defines row in a table, <th> --> defines header cell in table"
    # make a table, where the following row is a header
    
html_table += "</th><th>".join(rows[0])
    # the table above is table above + column name in our list rows (.join() --> join it together - give it names)
    # "</th><th>" ? --> end the header make a header? --> make every element a seperate header
    
html_table += '</th></tr>'
    # table from 2nd line = table from 2nd line + end headers and make rows under it
    
for row in rows[1:]:
    html_table += "<tr><td>" # so for that 2nd row under headers fill it with values from rows starting with 2nd element
    html_table += "</td><td>".join(str(col) for col in row) # I can explain but DON'T UNDERSTAND IT!!
    html_table += "</td></tr>"
html_table += "</table>"

from IPython.display import HTML, display
display(HTML(html_table))


ID,Waist,Hip,Gender,W2H Ratio,Shape
1,30,32,M,0.9375,Apple
2,32,37,M,0.8648648648648649,Pear
3,30,36,M,0.8333333333333334,Pear
4,33,39,M,0.8461538461538461,Pear
5,29,33,M,0.8787878787878788,Pear
6,32,38,M,0.8421052631578947,Pear
7,33,42,M,0.7857142857142857,Pear
8,30,40,M,0.75,Pear
9,30,37,M,0.8108108108108109,Pear
10,32,39,M,0.8205128205128205,Pear


# Discussion Questions

* How long did it take you to figure out how to do a bullet list in Markdown? What other formatting tricks did you try?

    Not to long as I heard you say that we have to double click markdown to show the first assigment instructions. 
    I found a great formatting website: [1]: https://en.support.wordpress.com/markdown-quick-reference/ 
    I tried: making text italic, bold, make a header, table, insert image, reference source, formulas,...

* Was there any code that you thought was particularly elegant? How about cryptic or buggy?
    
    This was genius: raw_rows = [r.rstrip('\n').split(',') for r in raw_lines]. 
    My only question is the reason to use [] brackets. My guess is to contain the for loop within those commands.
    
    Moreover, while I could explain this code I do not understand how exactly it works. 
    html_table += "</td><td>".join(str(col) for col in row) 
    html_table += "</td></tr>"
    html_table += "</table>"
    
    Also if you run the table output 2 times it keeps adding column shape and ratio with its values.
    I am not sure why but I am suspecting the last formula.

* What does the code `raw_lines = list(f)` in the first code cell do exactly? Where can we learn more about loading files? 
 Why do we bother closing the file at the end of the cell?
        
        It loads the file into a list of strings, one string per line. --> List(single argument) which 
        creates a list. Iterable condition is optional (to change over object e.g. float)
        We can learn in DataCamp. There is a course about importing data that I am taking. 
        We close it becasue it's good practice and if you are opening it more than once. 
        It would be better to use np.loadtext(), np.genfromtxt(), np.recsfromcsv(), or with open (here you don't 
        need to close file)
        
        
  * In the second code cell, why do we try to clean up the data all at once? Why not just deal with it as raw strings?
      
      Because that is how we want the data to look like. Otherwise, it wouldn't be presentable and ready for calculations
      or addition to column names. 
      
  * What is going on in the line below, also from the second code cell?  
  ```raw_rows = [r.rstrip('\n').split(',') for r in raw_lines]```
    
    .rstrip method: ("removes end of the line characters - to the right of the list"). In this case is /n.
    .split method: ("every time you see a delimiter that seperates the SET of values --> split them up to SUBLISTS"). 
    Now the sublists are sperated with [column names],[ID,weight,height,M],[...]. 
    for r in raw_lines is a loop. It loops the previous commands for every element (sublist) in raw_lines list.
    Then we store this in a new variable called raw_rows. 
    We use [] brackets to embed all. Making sure this only applies to this line and the loop only captures previous             commands.
    
  * What does this do?  
  ```for raw_row in raw_rows[1:]:```
      
      For every element in raw_rows list: do the following bellow BUT start with the 2nd element. In this case exclude
      column names.
      
  * In the third code cell, a list is extended by another list. What does that mean and how is that different from appending list items to the list? How could we do the same thing using `append()`?
      
      The row here is a variable from for the loop. The short-hand means, a += b is for a = a + b.
      Meaning: each row element (start with 2nd one)= row element(2nd...) + calculations for w2h ratio and shape.
      That gives us: [column names e.g. ratio, shape] [1, 30,32,M,ratio calc, Apple/pear]...
      
      It's different from append becasue it assigns the for loop variable and doesn't need specifications on where to
      append. 
      With append we would do: row[0].append("w2h_ratio", "shape")
      
  * When might the calculation
  ```w2h_ratio = row[1]/row[2]``` give inaccurate results?
      
      If we didn't create sublists and exclude the first row[0]. 