# Project Final Submission Template

### Step 1a: Planning 
#### Identify the information in the file your program will read

Describe (all) the information that is available. Be sure to note any surprising or unusual features. (For example, some information sources have missing data, which may be blank or flagged using values like -99, NaN, or something else.)

<font color="blue">The information provided in the file is:
- <font color="blue">patient number, age (in days)
- <font color="blue">height (cm), 
- <font color="blue">weight (kg), 
- <font color="blue">gender (2 for male, 1 for female), 
- <font color="blue">systolic blood pressure (mmHg), 
- <font color="blue">diastolic blood pressure (mmHg), 
- <font color="blue">cholesterol (1 for normal cholesterol levels, 2 for above normal, 3 for well above normal), 
- <font color="blue">glucose (1 for normal cholesterol levels,  2 for above normal, 3 for well above norma), 
- <font color="blue">whether they smoke (0 for no, 1 for yes),
- <font color="blue">whether they drink (0 for no, 1 for yes), 
- <font color="blue">whether they work out (0 for no, 1 for yes), 
- <font color="blue">and whether they have CVD (0 for no, 1 for yes).
 
 
 
 
- <font color="blue">The age ranges for the dataset seem reasonably consistent with possible human lifetimes. We will limit them based on 18 years old, and 122 years old (from the oldest person to live)[1].
- <font color="blue">While all of the heights listed are positive, some are very tall (but not completely unheard of), and some are extremely short for an adult. We plan to manage this issue by coding to remove heights that are below 55 cm and above 267 cm. These are the most extreme heights recorded thus it would be unreasonable to exceed them [2].
- <font color="blue">Some weights also seem extremely small. We plan to manage this issue by coding to remove weights below 5.9 kg and above 595 kg. These are the most extreme cases recorded worldwide thus anything else would be unreasonable[3], [4].
- <font color="blue">Also, we will compare these data values with their BMI. This way, we ensure there is not an extreme height/weight associated with a disproportionate height/weight. The value must be within 7.5 to 188 or else the data will be removed as it would be unreasonable. [5]
- <font color="blue">Some systolic blood pressures are negative or in the thousands. We plan to remove those below 30 or above 370 for similar reasons to those above. [6], [7]
- <font color="blue">Some diastolic blood pressures are negative or in the thousands. We plan to remove those below 10 or above 360 for similar reasons to those above. [6], [7]
- <font color="blue">All gender, cholesterol, glucose, alcohol, physical activity, and CVD data are correctly within their 2 possible integers. This will still be checked.



<font color="blue">[1] Eckart, Kim. 2021. “How long can a person live? The 21st century may see a record-breaker.” University of Washington News Blog (blog). University of Washington. Jul 1. https://www.washington.edu/news/2021/07/01/how-long-can-a-person-live-the-21st-century-may-see-a-record-breaker/


<font color="blue">[2] Grossman, Samantha. 2014. “Here’s a Picture of the World’s Tallest Man and the World’s Shortest Man Shaking Hands.” Newsfeed. Time. Nov 13.https://time.com/3583663/worlds-tallest-man-shortest-man-shaking-hands/

<font color="blue">[3] Guinness World Records. “Lightest Person.” Guinness World Records. Accessed Nov 20 2022. https://www.guinnessworldrecords.com/world-records/67485-lightest-person

<font color="blue">[4] Young, Sarah. 2018. “The world’s most obese man is attempting to lose weight.” Lifestyle. The Independent. Jan 6. https://www.independent.co.uk/life-style/juan-pedro-franco-world-s-heaviest-man-mexico-obesity-weight-loss-a8145396.html

<font color="blue">[5] BMI = EDI. 2011.“Part V Fat: No More Fear, No More Contempt.” EDI. Dec 8. https://edinstitute.org/blog/2011/12/8/part-v-fat-no-more-fear-no-more-contempt

<font color="blue">[6] All American Hospice. 2021. “How Dangerous is Low Blood Pressure?” (blog). All American Hospice. Nov 25. https://myallamericanhospice.com/dangerous-low-blood-pressure/

<font color="blue">[7] Narloch J, M Brandstater. 1995. “Influence of Breathing Technique on Arterial Blood Pressure During Heavy Weight Lifting.” Archives of physical medicine and rehabilitation. (May): https://pubmed.ncbi.nlm.nih.gov/7741618/
</font>

### Step 1b: Planning 
#### Brainstorm ideas for what your program will produce
#### Select the idea you will build on for subsequent steps

You must brainstorm at least three ideas for graphs or charts that your program could produce and choose the one that you'd like to work on. You can choose between a line chart, histogram, bar chart, scatterplot, or pie chart.

If you would like to change your project idea from what was described in the proposal, you will need to get permission from your project TA. This is intended to help ensure that your new project idea will meet the requirements of the project. Please see the project proposal for things to be aware of when communicating with your project TA.

<font color="blue">
    
- Histogram: we have a histogram where we have the CVD data in one color and non-CVD data in another. We then have multiple categories/symptoms/risk-adding behaviours on the x-axis and the proportion of those with and without CVD that present this attribute in the y-axis. As such, you can compare the level of “risk factors” between those with and without CVD.  
- Scatterplots: We could have the input be a single one of the risk-factor attributes, and show a scatterplot of the patients’ age and the proportion of them in this age that present it between those with and without CVD.
- Pie charts: We can make pie charts that compare the proportion of those with and without CVD that are the ones with an attribute. Instead of a proportion of risk factor to group of people with/without CVD, we are finding the proportion of those with/without CVD to the number of  people with the risk factor. We can then make multiple pie charts of the risk factors.

    
    
We have decided to go with the histogram as it can more clearly show the proportions of the risk factors between those with/without CVD, while also showing the proportion within this with/without category as well. It would also provide more information on the whole to observe it by age category rather than as a change in age for a single risk factor.
</font>

### Step 1c: Planning 
#### Write or draw examples of what your program will produce

You must include an image that shows what your chart or plot will look like. You can insert an image using the Insert Image command near the bottom of the Edit menu.

<font color="blue">Insert your image in this cell. Feel free to remove this prompt from this cell.</font>![CPSC%20milestone%20histogram%20draw.tiff](attachment:CPSC%20milestone%20histogram%20draw.tiff)

### Step 2a: Building
#### Document which information you will represent in your data definitions

Before you design data definitions in the code cell below, you must explicitly document here which information in the file you chose to represent and why that information is crucial to the chart or graph that you'll produce when you complete step 2c.

<font color="blue">

We want to make a histogram with the diastolic pressure, systolic pressure, elevated cholestrol, smoking habits, excerciose habits, and BMI. we will need:
    
- <font color="blue">Age (in days) -> Required to sort and put patient in respective age range</font>
    
- <font color="blue">Height (cm) to calculate BMI, which is a category in the histogram
    
- <font color="blue">Weight (kg) to calculate BMI
    
- <font color="blue">Systolic blood pressure (mmHg) -> Used as one of the categories on the histogram
    
- <font color="blue">Diastolic blood pressure (mmHg) -> Used as one of the categories on the histogram
    
- <font color="blue">cholesterol levels (1 for normal cholesterol levels, 2 for above normal, 3 for well above normal) -> Used as one of the categories on the histogram
    
- <font color="blue">Smoking habits (0 for no, 1 for yes) -> Used as one of the categories on the histogram
    
- <font color="blue">Exerxise habits (0 for no, 1 for yes) -> Used as one of the categories on the histogram
    
note: BMI will be calculated based on height and weight, and will be one of the categories on the garph.

</font>

#### Design data definitions

In [None]:
from cs103 import *
import csv
from typing import NamedTuple, List
from enum import Enum
import matplotlib.pyplot as plt

##################
# Data Definitions

##################
# Data Definitions

ElevationLevel = Enum("ElevationLevel", ["normal","above_normal","well_above_normal"])
#interp. the possible elevation levels of a measured component in the patient. Normal, above normal, and
#well above normal (in thi scase, for cholesterol).

#examples are redundant for enums

@typecheck
def fn_for_elevation_level(el: ElevationLevel)-> ...:
    #template based on atomic distinct and one-of rule (3 times)
    if el == ElevationLevel.normal:
        return ...
    elif el == ElevationLevel.above_normal:
        return...
    elif el == ElevationLevel.well_above_normal:
        return ...

    
    
Patient = NamedTuple("Patient", [("age", int), #in days [1,...]
                                ("height", int), # in cm [0,...]
                                ("weight", float), #in kg [0,...]
                                ("sys_psi", int), #in mmHg [0,...]
                                ("dias_psi", int), #in mmHg [0,...]
                                ("cholesterol", ElevationLevel), 
                                ("smoke", bool), #False is no, True is yes
                                ("exercise", bool), #False is no, True is yes
                                ("CVD", bool)]) #False is no, True is yes
#interp. information about a patient from age (age), height(height), weight (weight), 
#systolic blood pressure (sys_psi), diastolic blood pressure (dias_spi), cholesterol 
#elevation level (cholesterol), whether they smoke or not (smoke), whether they exercise or not (exercise),
#whether they have CVD or not (CVD).

#healthy patient with normal readings. No smoke/CVD, yes exercise, 50yo
P0 = Patient(18614, 183, 73.7, 120, 80, ElevationLevel.normal, False, True, False)

#too young to be included
P1 = Patient(25, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False)

#too short, too low diastolic bp, elevated cholesterol, 18yo
            #--> unrelated measures therefore combined for simplicity
P2 = Patient(6570, 25, 87.3, 120, 2, ElevationLevel.above_normal, False, True,False)

#too tall, too high diastolic bp, very elevated cholesterol, Yes smoke/CVD, no exercise, 35yo
        #-> unrelated measures therefore combined for simplicity
P3 = Patient(13139, 300, 83.7, 120, 500, ElevationLevel.well_above_normal,True,False,True)

#exact limit tall, exact limit upper dias_psi, 36yo -->  unrelated measures therefore combined for simplicity
P11 = Patient(13140, 272, 83.7, 120, 360, ElevationLevel.normal,True, False, True)

#exact limit short, exact limit lower dias_psi, 51yo  --> unrelated measures therefore combined for simplicity
P4 = Patient(18615, 55, 83.7, 120, 30, ElevationLevel.normal,False,True,False)

#too low weight, too low sys_psi,60yo --> unrelated measures therefore combined for simplicity
P5=Patient(18250, 183, 1, 2,80,ElevationLevel.normal,False,True,False)

#too high weight, too high sys_psi, 70yo--> unrelated measures therefore combined for simplicity
P6 = Patient(18250, 183, 900, 400, 80, ElevationLevel.normal, True, False, True)

#exact limit upper weight, 71yo--> unrelated measures therefore combined for simplicity
P7 = Patient(29199, 183, 653, 370, 80, ElevationLevel.above_normal, True, False, True)

#exact limit lower weight, exact limit lower sys_psi, 80yo--> unrelated measures therefore combined for simplicity
P8 = Patient(29200, 183, 2.13, 45, 80, ElevationLevel.well_above_normal, True, False, True)


@typecheck
def fn_for_patient(p: Patient)-> ...:
    #template based on compound (9 fields)
    return ... (p.age,
               p.height,
               p.weight,
               p.sys_psi,
               p.dias_psi,
               p.cholesterol,
               p.smoke,
               p.exercise,
               p.CVD)

                
# List[Patient]
# interp. a list of Patients

LOP0 = []
LOP1 = [P0,P1,P2,P3,P4,P5,P6,P7,P8]

@typecheck
def fn_for_lop(lop: List[Patient]) -> ...:
    #template based on arbitraty-sized and reference rule
    #describe accumulator
    acc = ...   #type:
    for patient in lop:
        acc = ... (acc, fn_for_patient(patient))
        
    return acc

: 

### Step 2b and 2c: Building
#### Design a function to read the information and store it as data in your program
#### Design functions to analyze the data


Complete these steps in the code cell below. You will likely want to rename the analyze function so that the function name describes what your analysis function does.

Unless approved by your project TA, you **cannot** use libraries such as `numpy` or `pandas`. The project is meant as a way for you to demonstrate your knowledge of the learning goals in this course. While it is convinent to use external libraries, it will do all the work and will not help us gauge your mastery of the concepts.

You also cannot use built in list functions (e.g., `sum` or `average`) when writing code to do your substantial computation. Normally we encourage you to make use of what is already available but in this case, the final project involves demonstrating skills from class (e.g., how to work with a list). Using pre-built functions for this does not enable you to demonstrate what you know.

If you wish to change your project idea, you must **first** obtain permission from your TA. When contacting your TA, please provide a valid reason for why you want to change your project. Each time you change your topic idea, your TA will have to evaluate it to see if it will meet all of the project requirements. This is non-trivial task during one of the busiest times of the semester. As such, the deadline for project idea changes will be 3 business days before the deadline. Note that the deliverable deadline will not be extended and there is no compensation for the time you spent on the previous idea.

In [None]:
#############
#main function

@typecheck
def main(filename: str, agent:int) -> None:
    """
    Reads the file from given filename, analyzes the data, returns a bar graph with proportion of each
    category (BMI, systolic, diastolic, cholesterol, smoke, exercise)for those with CVD 
    and those without side-by-side.
    """
    # Template from HtDAP, based on function composition 
    return analyze_bar_create(read(filename), agent) 

#tests for main are in a cell below all the analyze/read etc functions

: 

In [None]:
###########
# Functions for read

@typecheck
def read(filename: str) -> List[Patient]:
    """    
    reads information from the specified file and returns a list of patients.
    Filters through the data in the CSV file to ensure only the reliable rows become a 
    Patient in the List[Patient].
    
    the restrictions are:
    18 years<=age<=122 years
    55<=height<=267
    5.9<=weight<=595
    30<=sys_psi<=370
    10<=dias_psi<=360
    1<=chol<= 3
    0<= smoke, excercise, CVD<= 1
    7.5 <= BMI <= 188
    """
    #return []  #stub
    # Template from HtDAP
    # lop contains the list of Patient accumulated so far
    lop = [] # type: List[Patient]

    with open(filename) as csvfile:
        
        reader = csv.reader(csvfile)
        next(reader) # skip header line

        for row in reader:
            age = parse_int(row[1])
            height = parse_int(row[3])
            weight = parse_float(row[4])
            sys = parse_int(row[5])
            dias = parse_int(row[6])
            chol = parse_int(row[7])
            smoke = parse_int(row[9])
            exc = parse_int(row[11])
            CVD = parse_int(row[12])
            
            if mega_filter(age, height, weight, sys, dias, chol, smoke, exc, CVD):
                p = Patient(age, height, weight, sys, dias, convert_el(chol), bool_converter(smoke),\
                            bool_converter(exc), bool_converter(CVD))
                lop.append(p)
    
    return lop


@typecheck
def mega_filter(age:float,height:float,weight:float,sys_psi:int,dias_psi:int,chol:int,\
                smoke:int,excercise:int,CVD:int)-> bool:
    """
    returns True if all parameters are within their correct range, False otherwise.
    Ranges are:
    18<=age<=122
    55<=height<=267
    5.9<=weight<=595
    30<=sys_psi<=370
    10<=dias_psi<=360
    1<=chol<= 3
    0<= smoke, excercise, CVD<= 1
    7.5 <= BMI <= 188
    """
    #return False #stub
    #template based on simple atomic non-distinct.
    return is_correct_range(18,122,day_to_year_age(age)) and is_correct_range(55,267,height) and \
    is_correct_range(5.9,595,weight) and is_correct_range(30, 370, sys_psi) and \
    is_correct_range(10,360,dias_psi) and is_correct_range(1,3,chol) and is_correct_range(0,1,smoke) \
    and is_correct_range(0,1,excercise) and is_correct_range(0,1,CVD) and is_correct_range(7.5,188,BMI(height,weight))


@typecheck
def day_to_year_age(days: int) -> float:
    """
    Takes a number and divides it by 365.
    """
    #return 0 #stub
    #template based on atomic non-distinct
    return days/365


@typecheck
def is_correct_range(minimum: float, maximum: float, data: float) -> bool:
    """
    returns True if the data is within the range between/equal to the min and max. Returns False otherwise.
    """
    #return True #stub
    #template based on atomic non-distinct
    return minimum <= data and data <= maximum
                

@typecheck
def bool_converter(p: int) -> bool:
    """
    Returns True if p is 1, and False if p is 0. Assumes those are the only 2 possible inputs in the function.
    """
    #return True #stub
    #template based on atomic non-distinct.
    if p == 1:
        return True
    else:
        return False
    
    
@typecheck
def convert_el(p: int) -> ElevationLevel:
    """
    Returns the corresponding ElevationLevel of the number: 
    1 -> normal, 2 -> above_normal, 3-> well_above_normal.
    Assume those are the only 3 possible inputs.
    """
    #return ElevationLevel.normal #stub
    #template based on atomic non-distinct
    if p == 1:
        return ElevationLevel.normal
        
    elif p == 2:
        return ElevationLevel.above_normal
        
    elif p == 3:
        return ElevationLevel.well_above_normal
    

@typecheck

def BMI(h: float, w: float) -> float:
    #template based on atomic non-distinct (2 parameters)
    
    """
    calculates the BMI of the patient from height (h) and weight (w). Ensures height is above 0.
    """
    #return 0 #stub
    if h>0:
        return round(((w)/((h/100)**2)),4)
    else:
        return 0
    
    



# Begin testing
start_testing()

expect(read("project_empty_test - project_empty_test-Copy1.csv"), [])
expect(read("Untitled spreadsheet - project_some_rows-Copy1.csv"), [Patient(21413, 157, 69.0, 130, 80,\
                                                                      ElevationLevel.normal,\
                                                                      False,True, False),\
                                                              Patient(23046, 158,90.0, 145, 85,\
                                                                      ElevationLevel.above_normal,False,\
                                                                      True, True),\
                                                              Patient(23376, 156, 45.0, 110, 60,\
                                                                      ElevationLevel.normal,\
                                                                      True, True, False)])
       
expect(read("project_some_rows-rewmoval - project_rows_with_removals-Copy1.csv"),\
       [Patient(18614, 183, 83.7, 120, 80, ElevationLevel.normal,False, True, False),\
        Patient(6570, 183, 83.7, 120, 80, ElevationLevel.normal,False, True, False),\
        Patient(44530, 183, 83.7, 120, 80, ElevationLevel.normal,False, True, False),\
        Patient(18614, 267, 83.7, 120, 80, ElevationLevel.normal,False, True, False),\
        Patient(18614, 183, 595.0, 120, 80, ElevationLevel.normal,False, True, False), \
        Patient(18614, 183, 83.0, 30, 80, ElevationLevel.normal,False, True, False), \
        Patient(18614, 183, 83.0, 370, 80, ElevationLevel.normal,False, True, False),\
        Patient(18614, 183, 85.0, 120, 10, ElevationLevel.normal, False, True, False),\
        Patient(18614, 183, 85.0, 120, 360, ElevationLevel.normal,False, True, False)])


# show testing summary
summary()

start_testing()

# healthy patient with normal readings. No smoke/CVD, yes excercise, 50yo -> 18614 (age)
expect(mega_filter(18614, 183, 83.7, 120, 80, 1, 0, 1, 0), True)

# age at lower limit
expect(mega_filter(6570, 183, 83.7, 120, 80, 1, 0, 1, 0), True)

# too young to be included -> 25 days (age) 
        #note: all data in our data set is within age ranges, so this is a random example
expect(mega_filter(25, 183, 83.7, 120, 80, 1, 0, 1, 0), False)

# age at upper limit
expect(mega_filter(44530, 183, 83.7, 120, 80, 1, 0, 1, 0), True)

# too old to be included -> 44895 days or 123 years (age) 
        #note: all data in our data set is within age ranges, so this is a random example
expect(mega_filter(44895, 183, 83.7, 120, 80, 2, 0, 1, 0), False)

# height at lower limit
expect(mega_filter(18614, 55, 25, 120, 80, 1, 0, 1, 0), True)

# too short to be included
expect(mega_filter(18614, 40, 83.7, 120, 80, 1, 0, 1, 0), False)

# height at upper limit
expect(mega_filter(18614, 267, 83.7, 120, 80, 1, 0, 1, 0), True)

# too tall to be included
expect(mega_filter(18614, 300, 83.7, 120, 80, 1, 0, 1, 0), False)

# weight at lower limit
expect(mega_filter(18614, 56, 5.9, 120, 80, 1, 0, 1, 0), True)

# too light to be included -> 5kgs (weight) 
        #note: all data in our data set is within age ranges, so this is a random example
expect(mega_filter(17614, 183, 83.7, 5, 80, 1, 0, 1, 0), False)

# weight at upper limit
expect(mega_filter(18614, 183, 595, 120, 80, 1, 0, 1, 0), True)

# too heavy to be included -> 596kgs (weight) 
        #note: all data in our data set is within age ranges, so this is a random example
expect(mega_filter(17614, 183, 596, 120, 80, 3, 0, 1, 0), False)

# Systolic pressure at lower limit
expect(mega_filter(18614, 183, 83, 30, 80, 1, 0, 1, 0), True)

# Systolic pressure too low
expect(mega_filter(17614, 183, 83.7, 25, 80, 3, 0, 1, 0), False)

# Systolic pressure at upper limit
expect(mega_filter(18614, 183, 83, 370, 80, 1, 0, 1, 0), True)

# Systolic pressure too high
expect(mega_filter(17614, 183, 83.7, 500, 80, 1, 0, 1, 0), False)

# Diastolic pressure at lower limit
expect(mega_filter(18614, 183, 85, 120, 10, 1, 0, 1, 0), True)

# Diastolic pressure too low
expect(mega_filter(17614, 183, 83.7, 120, 9, 3, 0, 1, 0), False)

# Diastolic pressure at upper limit
expect(mega_filter(18614, 183, 85, 120, 360, 1, 0, 1, 0), True)

# Diastolic pressure too high
expect(mega_filter(17614, 183, 83.7, 120, 500, 1, 0, 1, 0), False)

# cholesterol out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 4, 0, 1, 0), False)

# cholesterol out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 0, 0, 1, 0), False)

# smoking out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 2, 2, 1, 0), False)

# smoking out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 1, -1, 1, 0), False)

# excercise out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 3, 1, 2, 0), False)

# excercise out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 1, 0, -1, 0), False)

# cvd out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 3, 0, 2, 3), False)

# cvd out of range
expect(mega_filter(17614, 183, 83.7, 120, 80, 2, 1, 1, -1), False)

#BMI too high
expect(mega_filter(18614, 56, 80, 120, 80, 1, 0, 1, 0), False)

#BMI too low
expect(mega_filter(18614, 265, 20, 120, 80, 1, 0, 1, 0), False)


summary()

start_testing()
expect(day_to_year_age(15), 15/365)
expect(day_to_year_age(0), 0)
expect(day_to_year_age(365), 1)
summary()

start_testing()
expect(is_correct_range(0,1,0), True)
expect(is_correct_range(0,1,1), True)
expect(is_correct_range(1,3.2,2.3), True)
expect(is_correct_range(1.1,3,4.5), False)
expect(is_correct_range(1.2,2.4,0.5), False)


summary()

start_testing()

expect(bool_converter(0), False)
expect(bool_converter(1), True)

summary()

start_testing()

expect(convert_el(1), ElevationLevel.normal)
expect(convert_el(2), ElevationLevel.above_normal)
expect(convert_el(3), ElevationLevel.well_above_normal)

summary()

start_testing()

#h and w from P0
expect(BMI(183, 73.7), 22.0072)

#from P4
expect(BMI(55, 83.7), 276.6942)

# from P7
expect(BMI(183,653), 194.9894)

#from P8
expect(BMI(183,2.13), 0.6360)

summary()

: 

In [None]:
############
#Functions for analyze (analyze_bar_create)

@typecheck
def analyze_bar_create(lop: List[Patient], age:int) -> None: 
    """ 
    takes a list of Patients, filters it for the specified age, and produces a bar chart showing the percentage of those with and without CVD that
    have a high BMI, high systolic pressure, high diastolic pressure, high cholesterol, 
    that smoke, and that excercise
    """
    #return None #stub
    #template based on composition
    
    #1. variable that is list of filtered ages.
    list_filtered_age = filters_for_age(lop, age)
    
    #2. varirable that is list of filtered ages also fitered for CVD -- Is a helper in function
    #3. variable that is final categories list for CVD
    list_cvd_values = either_final_categories_list(list_filtered_age, True)
    
    #4. variabel that is list of filtered ages also filtered for no CVD -- Is a helper in function
    #5. varuavle that is final categories list for no CVD.
    list_no_cvd_values = either_final_categories_list(list_filtered_age, False)
    
    #return produce_final_bar(list_cvd_values,list_no_cvd_values)
    return produce_final_bar(list_cvd_values,list_no_cvd_values)


#math part of analyze functions.

#turn year age into day age
#using a function on Patient age and input age:
#assign an age range a value, and see what value the input of this helper function returns for both
#if the are the same, it's true and you can add it to the list.

@typecheck
def filters_for_age(lop: List[Patient], age:int)-> List[Patient]:
    """
    takes a list of patient and filters is to a list of patients within the specified age range
    """
    #return [] #stub
    #template based on List[Patient] with additional parameter age.
    #list of patients in correct age range from one specified seen in lop so far
    correct_age_list = [] #type: List[Patient]
    for p in lop:
        if is_correct_age_range(p, age):
            correct_age_list.append(p)
    return correct_age_list

@typecheck
def is_correct_age_range(p: Patient, age: int)-> bool:
    """
    takes the Patient and the specified age and determines if 
    the Patient's age is in the same age range as the specified age (True).
    False otherwise.
    Assumes the patient is within the proper age ranges as determined in the read function, and same for age.
    """
    #return True #stub
    #template based on Patient with additonal parameter age.
    return age_range(day_to_year_age(p.age)) == age_range(age)

@typecheck
def age_range(age:float)-> float:
    """
    Takes the age in years and assigns it a float value based on the age range it is in.
    The float returned has the lower limit as the full numbers, and upper limit as its decimals.
    Assume the age is within the correct limits set by the read function.
    """
    #return 8.8 #stub
    #tempalte based on atomic non-distinct
    if age <36:
        return 18.35
    elif age <51:
        return 36.50
    elif age< 66:
        return 51.65
    elif age<81:
        return 66.80
    else:
        return 81
    
#you start with a list of patients.
#filter list of Patients into list of Patients within wanted age range
@typecheck
def either_final_categories_list(lop: List[Patient], cvd: bool)-> List[float]:
    """
    takes a list of Patient, filters is for patients with or without CVD, and then returns
    a list with the percentages of each category within this filtered list.
    """
    #return [] #stub
    #template based on composition
    list_filtered_cvd = filter_for_either_cvd(lop, cvd)
    return percentages_list(list_filtered_cvd)


#filter list of Patients in age range into 2 lists: those with and those without CVD
#make a loop. If p.CVD == True, add to an accumulator.

@typecheck
def filter_for_either_cvd(lop: List[Patient], cvd: bool)-> List[Patient]:
    """
    takes a list of Patient and filters it, keeping the 
    Patients that either do or don't have CVD based on the specification when function is called.
    """
    #return [] #stub
    #template based on List[Patient] with additional parameter cvd
    #list of Patients that match the CVD status as specified seen so far in lop
    right_cvd_list = []   #type: List[Patient]
    for p in lop:
        if is_same_cvd(p, cvd):
            right_cvd_list.append(p)
    return right_cvd_list


@typecheck
def is_same_cvd(p: Patient, cvd: bool) -> bool:
    """
    takes a patient and compares whether its CVD status is the same as the one specified, thus returns True. 
    False otherwise.
    """
    #return True #stub
    #template based on Patient with additional parameter cvd
    return p.CVD == cvd


#take all of these functions and turn them into a list of values
#fn takes in List[Patient] (either with or without CVD)
#returns [percent_bmi, percent_sys_psi, percent_dias_psi, percent_chol, percent_smoke, percent_exercise]
        ## -> these are parameters that = functions that we outlines above, they all take in the whole list.
        # template is NOT based on List[Patient], but composition.
        
@typecheck
def percentages_list(lop: List[Patient])-> List[float]:
    """
    takes in a list of patients and returns a list of floats that has the percentage values of each category. 
    The list is either list of patients with CVD or without, both within the specified age range, 
    however this function will not differentiate and calculate based on all Patients provided.
    The final list is in this order: [bmi, systolic psi, diastolic psi, cholesterol, smoking, exercise]
    """
    #return [] #stub
    #template based on composition
    bmi_percent = find_percent_high_bmi(lop)
    sys_psi = find_percent_high_sys(lop)
    dias_psi = find_percent_high_dias(lop)
    chol = find_percent_high_chol(lop)
    smoke = find_percent_smoke(lop)
    exercise = find_percent_exercise(lop)
    return [bmi_percent, sys_psi, dias_psi, chol, smoke, exercise]

#bmi: you call it once with list that have CVD, and again for list that doesn't
#find_percent_high_bmi
#is a for loop for fn that is List[Patient]
#for a patient, you have 2 accumulators: high bmi, total patients.
#for every patient, add 1 to total.
#high bmi is 25+ 

#(National Heart, Lung, and Blood Institute. "Calculate you Body Mass Index".
#National Heart, Lung, and Blood Institute.Accessed Dec 7 2022. 
#https://www.nhlbi.nih.gov/health/educational/lose_wt/BMI/bmicalc.htm).

#use the bmi calculate helper, and then have helper fn that returns true if bmi >= 25.
#if true, add 1 to high bmi accumulator.
# returns high bmi/total * 100

@typecheck
def find_percent_high_bmi(lop: List[Patient])-> float:
    """
    takes a list of Patients and returns the percentage of them that have a high bmi.
    A high bmi is a value >= 25.
    if empty list, return 0
    """
    #return 15 #stub
    #tempalte based on List[Patient]
    #the number of Patients with a high bmi seen so far in lop
    high_bmi = 0 #type: int
    #the total number of Patients seen so far in lop
    total = 0 #type: int
    for p in lop:
        if is_elevated(BMI(p.height, p.weight),25):
            high_bmi = high_bmi +1
        total = total+1
    if total == 0:
        return 0
    else:
        return high_bmi/total *100

@typecheck
def is_elevated(number_patient: float, number:float ) -> bool:
    """
    takes 2 number and determines whether number >= number_patient
    """
    #return True #stub
    #template based on Patient
    return number_patient >= number

#sys_psi: you call it once with list that have CVD, and again for list that doesn't
#find_percent_high_sys
#is a for loop for fn that is List[Patient]
#for a patient, you have 2 accumulators: high sys psi, total
#for every patient, add 1 to total.
#high sys psi = 130+ 

#(Center for Disease Control. "About High Blood Pressure" . Center for Disease Control. 
#Accessed Dec 7 2022. https://www.cdc.gov/bloodpressure/about.htm)

#have a helper that returns true id sys_psi>=140 (same fn as see if bmi is high)
#if true, add 1 to high sys_psi accumulator
#return high sys+psi/total *100

@typecheck
def find_percent_high_sys(lop: List[Patient])-> float:
    """
    takes a list of Patients and returns the percentage of them that have high systolic blood pressure.
    this is if the value is 130+
    if empty list, return 0
    """
    #return -1 #stub
    #template based on List[Patient]
    #number of patients with high sys pressure seen so far in lop
    high_sys_number = 0 #type: int
    #total number of Patients seen so far in lop
    total = 0 #type: int
    for p in lop:
        if is_elevated(p.sys_psi, 130):
            high_sys_number = high_sys_number + 1
        total = total + 1
    if total == 0:
        return 0
    else:
        return high_sys_number/total*100
    
#dias_psi: you call it once with list that have CVD, and again for list that doesn't
#find_percent_high_dias
#is a for loop for fn that is List[Patient]
#for a patient, you have 2 accumulators: high dias_psi, total
#fir every patient, add 1 to total
#high dias_psi = 80+

#(Center for Disease Control. "About High Blood Pressure" . Center for Disease Control. 
#Accessed Dec 7 2022. https://www.cdc.gov/bloodpressure/about.htm)

#have a helper that returns true if dias_psi>=80 (same fn as see if bmi is high)
#if true, add 1 to high dias_psi accumulate
#return dias_psi/total * 100

@typecheck
def find_percent_high_dias(lop: List[Patient])-> float:
    """
    takes a list of patients and returns the percentage of them that have high diastolic blood pressure.
    that value is 80+
    If empty list, return 0
    """
    #return -1 #stub
    #template based on List[Patient]
    #number of Patients with high diastolic pressure seen in lop so far
    high_dias_number = 0 #type: int
    #total number of Patietns seen in lop so far
    total = 0 #type: int
    for p in lop:
        if is_elevated(p.dias_psi, 80):
            high_dias_number = high_dias_number +1
        total = total + 1
    if total ==0:
        return 0
    else:
        return high_dias_number/total *100
    
#chol: you call it once with list that have CVD, and again for list that doesn't
#find_percent_high_chol
#is a for loop for fn that is List[Patient]
#for a patient, you have 2 accumulators: high chol, total
#for every patient, add 1 to total
#if chol != ElevationLevel.normal (happens in a helper fn):
#add 1 to high chol acc
#return high chol/total * 100
@typecheck
def find_percent_high_chol(lop: List[Patient])-> float:
    """
    takes a list of patients and returns the percentage of them that have elevated cholesterol.
    this is if it is ElevationLevel.above_normal or ElevationLevel.well_above_normal
    If empty list, return 0
    """
    #return -1 #stub
    #template based on List[Patient]
    #number of Patients with high cholesterol seen so far in lop
    high_chol_number = 0 #type: int
    #total number of Patients seen so far in lop
    total = 0 #type: int
    for p in lop:
        if is_high_el(p.cholesterol):
            high_chol_number = high_chol_number +1
        total = total +1
    if total == 0:
        return 0
    else:
        return high_chol_number/total*100
    
@typecheck
def is_high_el(el: ElevationLevel)-> bool:
    """
    takes a ElevationLevel and determines whether it is not ElevationLevel.normal, returning True.
    If it is indeed ElevationLevel.normal, returns False.
    """
    #return True #stub
    #template based on ElevationLevel
    if el == ElevationLevel.normal:
        return False
    elif el == ElevationLevel.above_normal:
        return True
    elif el == ElevationLevel.well_above_normal:
        return True

#smoke: you call it once with list that have CVD, and again for list that doesn't
#find_percent_smoke
#is a for loop for fn that is List[Patient]
#for a patient, you have 2 accumulators: smoke, total
#for every patient, add 1 to total
#if smoke (in Patient) == True (no helper needed)
#add 1 to smoke acc
#return smoke/total *100

@typecheck
def find_percent_smoke(lop: List[Patient])-> float:
    """
    takes a list of Patients and finds the percentage of them that smoke. 
    If the list is empty, return 0
    """
    #return -1 #stub
    #template based on List[Patient]
    #number of patients in lop that smoke seen so far
    number_smoke = 0 #type: int
    #total number of Patients seen so far in lop
    total = 0 #type: int
    for p in lop:
        if p.smoke:
            number_smoke = number_smoke + 1
        total = total + 1
    if total == 0:
        return 0
    else:
        return number_smoke/total *100
    

#exercise: you call it once with list that have CVD, and again for list that doesn't
#find_percent_exercise
#is a for loop for fn that is List[Patient]
#for a patient, you have 2 accumulators: exercise, total
#for every patient, add 1 to total
#if excercise (in Patient) == True (no helper needed)
#add 1 to exercise acc
#return smoke/total *100

@typecheck
def find_percent_exercise(lop: List[Patient])-> float:
    """
    takes a list of Patients and finds the percent of them that smoke.
    If the list is empty, return 0
    """
    #return -1 #stub
    #template based on List[Patient]
    #number of patients that exercise seen in lop so far
    number_exercise = 0 #type: int
    #total number of Patients seen so far in lop
    total = 0 #type: int
    for p in lop:
        if p.exercise:
            number_exercise = number_exercise + 1
        total = total + 1
    if total == 0:
        return 0
    else:
        return number_exercise/total*100



    
#bar graph function for analyze    
    
    
@typecheck
def produce_final_bar(means_CVD: List[float], 
                       means_No_CVD: List[float]) -> None:
    """
    Plot a bar graph with proportion of each category (BMI, systolic, diastolic, cholesterol, smoke, exercise)
    for those with CVD and those without side-by-side. X-axis = categories, y-axis = proportion as % from 0-100
    
    Assumes both list parameters are the same (non-zero) length.
    """
    #return None  #stub
    # Template based on visualization
    
    # the width of each bar
    bar_width = 2
    
    # the middle coordinate for each of the bars for the first bar chart (CVD)
    middle_of_bars_CVD = [2, 7, 12, 17, 22, 27]
    
    # the middle coordinate for each of the bars for the second bar chart (No CVD). Note that each 
    # entry in the list is exactly bar_width greater than the corresponding entry in the 
    # middle_of_bars_CVD list
    middle_of_bars_No_CVD = [4, 9, 14, 19, 24, 29]
    
    # the opacity for the bars. It must be between 0 and 1, and higher numbers are more
    # opaque (darker)
    opacity = 0.4
    
    # create the first bar chart
    rect1 = plt.bar(middle_of_bars_CVD, 
                     means_CVD,                # list containing the height for each bar, here the means
                     bar_width,
                     alpha=opacity,                  # set the opacity
                     align='edge',
                     color='b',                      # set the colour (here, blue)
                     label='CVD')

    rect2 = plt.bar(middle_of_bars_No_CVD, 
                     means_No_CVD, 
                     bar_width,
                     alpha=opacity,
                     align='edge',
                     color='r',                       # notice that we use a different colour (red)
                     label='No CVD')

    # set the labels for the x-axis, y-axis, and plot title
    plt.xlabel('Category')
    plt.ylabel('Percentage')
    plt.title('Percentage of each category to CVD diagnosis')

    # set the axes for our chart
    plt.axis([0,33,0,100])

    # set the labels for each 'tick' on the x-axis
    plt.xticks(middle_of_bars_No_CVD, ['BMI', 'Systolic', 'Diastolic', 'Cholesterol', 'Smoke', 'Exercise'])
    
    # we want to show the legend because this plot contains two bar charts. The arguments will place
    # the legend to the right of the plot
    plt.legend(loc='upper right')
    
    # show the plot
    plt.show()
    
    return None

start_testing()

#returns a bar garph with x-axis "Categories" and sub-titles "BMI", "Systolic", "Diastolic", "Cholesterol"
#"Smoke", "Exercise". has y-axis "percentage". 2 bar colours: red for CVD, blue for no CVD.

expect(analyze_bar_create(LOP1,36), None)
#expect a graph: (* = CVD, € = no cvd)
#B = BMI, S = Systolic, D = Diastolic, C = Cholesterol, S = Smoke, E = Exercise


#100
#| *.   *     * €       *.   €
#| *    *.    * €.      *.   €
#|_*____*_____*_€_______*____€_
#  B    S      D    C   S   E
#         Category

expect(analyze_bar_create([], 36), None)
#expect a graph: (empty)
#B = BMI, S = Systolic, D = Diastolic, C = Cholesterol, S = Smoke, E = Exercise
#
#  100|                
#     |              
#     |  
#     |  
#     +--------------------------
#       B   S   D   C   S   E
#           Category

summary()

start_testing()
#empty test
expect(filters_for_age([], 45), [])
#test with what you need
expect(filters_for_age(LOP1, 50), [P0,P5,P6])
#test without what you need
expect(filters_for_age([P1,P2,P3,P4,P7,P8], 45), [])
summary()


start_testing()
#works = both are in same range, doens't work = they are in different age ranges.
#lower limit 18 works
expect(is_correct_age_range(P2, 18), True)
#lower limit 18 doesn't work
expect(is_correct_age_range(P2, 65), False)
#upper limit 35 works
expect(is_correct_age_range(P3, 35), True)
#upper limit 35 doesn't work
expect(is_correct_age_range(P3, 82), False)
#lower limit 36 works
expect(is_correct_age_range(P11, 38), True)
#lower limit 36 doesn't work
expect(is_correct_age_range(P11, 99), False)
#upper limit 50 works
expect(is_correct_age_range(P5, 44), True)
#upper limit 50 doesn't work
expect(is_correct_age_range(P5, 22), False)
#lower limit 51 works
expect(is_correct_age_range(Patient(18615, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 52), True)
#lower limit 51 doesn't work
expect(is_correct_age_range(Patient(18615, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 90), False)
#upper limit 65 works
expect(is_correct_age_range(Patient(24089, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 65), True)
#uppe limit 65 doesn't work
expect(is_correct_age_range(Patient(24089, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 25), False)
#lower limit 66 works
expect(is_correct_age_range(Patient(24090, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 67), True)
#lower limit 66 doesn't work
expect(is_correct_age_range(Patient(24090, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 86), False)
#upper limit 80 works
expect(is_correct_age_range(Patient(29564, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 79), True)
#upper limit 80 doesn't work
expect(is_correct_age_range(Patient(29564, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 100), False)
#lower limit 81 works
expect(is_correct_age_range(Patient(29565, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 82), True)
#lower limit 81 doesn't work
expect(is_correct_age_range(Patient(29565, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 25), False)
#above 81 works
expect(is_correct_age_range(Patient(34680, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 90), True)
#above 81 doens't work
expect(is_correct_age_range(Patient(34680, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 54), False)
#happily within a range, works
expect(is_correct_age_range(Patient(20000, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 55), True)
#happpily within a range, doesn't work
expect(is_correct_age_range(Patient(20000, 183, 83.7, 120, 80, ElevationLevel.normal,False,True,False), 80), False)

summary()

start_testing()
#lower limit 18
expect(age_range(18), 18.35)
#upper limit 35
expect(age_range(35.99), 18.35)
#lower limit 36
expect(age_range(36), 36.50)
#upper limit 50
expect(age_range(50.99), 36.50)
#lower limit 51
expect(age_range(51), 51.65)
#upper limit 65
expect(age_range(65.99), 51.65)
#lower limit 66
expect(age_range(66), 66.80)
#upper limit 80
expect(age_range(80.99), 66.80)
#lower limit 81
expect(age_range(81), 81)
#above 81
expect(age_range(85), 81)
#happily within a range.
expect(age_range(25), 18.35)
summary()

start_testing()

expect(either_final_categories_list([], True), [0,0,0,0,0,0])
expect(either_final_categories_list(LOP1, True), [50.0, 50.0, 100.0, 75.0, 100.0, 0.0])


summary()

start_testing()
expect(filter_for_either_cvd([], True), [])
expect(filter_for_either_cvd(LOP1, True), [P3,P6,P7,P8])
expect(filter_for_either_cvd(LOP1, False), [P0,P1,P2,P4,P5])
summary()

start_testing()

expect(is_same_cvd(P0, True), False)
expect(is_same_cvd(P0, False), True)

summary()


start_testing()
expect(percentages_list([]), [0,0,0,0,0,0])
expect(percentages_list(LOP1), [4/9*100,2/9*100,7/9*100,4/9*100,4/9*100,5/9*100])

summary()

start_testing()

expect(find_percent_high_bmi([]), 0)
expect(find_percent_high_bmi(LOP1), 4/9*100)

summary()

start_testing()

expect(is_elevated(20, 25.5), False)
expect(is_elevated(27.3, 25.5), True)

summary()

start_testing()

expect(find_percent_high_sys([]), 0)
expect(find_percent_high_sys(LOP1), 2/9*100)

summary()


start_testing()

expect(find_percent_high_dias([]), 0)
expect(find_percent_high_dias(LOP1), 7/9*100)

summary()

start_testing()
expect(find_percent_high_chol([]), 0)
expect(find_percent_high_chol(LOP1), 4/9*100)

summary()

start_testing()
expect(is_high_el(ElevationLevel.normal), False)
expect(is_high_el(ElevationLevel.above_normal), True)
expect(is_high_el(ElevationLevel.well_above_normal), True)

summary()

start_testing()
expect(find_percent_smoke([]), 0)
expect(find_percent_smoke(LOP1), 4/9*100)

summary()

start_testing()

expect(find_percent_exercise([]), 0)
expect(find_percent_exercise(LOP1), 5/9*100)


summary()

start_testing()

#returns a bar garph with x-axis "Categories" and sub-titles "BMI", "Systolic", "Diastolic", "Cholesterol"
#"Smoke", "Exercise". has y-axis "percentage". 2 bar colours: red for CVD, blue for no CVD.


expect(produce_final_bar([20,20,20,20,20,20], [40,40,40,40,40,40]), None)
#expect a graph: (€ = CVD, * = no cvd)
#B = BMI, S = Systolic, D = Diastolic, C = Cholesterol, S = Smoke, E = Exercise

#100
#|
#|
#|
#|40
#|  *.     *.    *.    *.    *.    *.  
#|  *.     *.    *.    *.    *.    *.   
#|€ *.   € *.  € *.  € *.  € *.  € *.  
#|€ *.   € *.  € *.  € *.  € *.  € *.  
#_______________________________________
#    B.    S.    D.   C.    S     E

expect(produce_final_bar([0,0,0,0,0,0], [0,0,0,0,0,0]), None)
#expect a graph: (empty)
#expect a graph: (€ = CVD, * = no cvd)
#B = BMI, S = Systolic, D = Diastolic, C = Cholesterol, S = Smoke, E = Exercise
#
#  100|                
#     |              
#     |  
#     |  
#     +--------------------------
#       B   S   D   C   S   E
#           Category


expect(produce_final_bar([20, 20, 30,0,0,0], [10, 10,0,0,0, 10]), None)
#expect a graph: (€ = CVD, * = no cvd)
#B = BMI, S = Systolic, D = Diastolic, C = Cholesterol, S = Smoke, E = Exercise

#  100|                
#     | 
#     |   
#     |          €             
#     | €   €    €              
#     | €*  €*   €            * 
#     +--------------------------
#       B   S   D   C   S   E
#           Category


summary()

: 

In [None]:
start_testing()

#returns a bar garph with x-axis "Categories" and sub-titles "BMI", "Systolic", "Diastolic", "Cholesterol"
#"Smoke", "Exercise". has y-axis "percentage". 2 bar colours: red for CVD, blue for no CVD.


expect(main("Untitled spreadsheet - project_some_rows-Copy1.csv", 63), None)
#expect a graph: (* = CVD, € = no cvd)
#B = BMI, S = Systolic, D = Diastolic, C = Cholesterol, S = Smoke, E = Exercise


#100
#| *     *      *       *        * €
#| *.    *      *       *        * €
#| * €   * €    * €.    *    €   * €
#|_*_€___*_€____*_€_____*____€___*_€__
#   B     S      D      C     S   E
#           Category

expect(main("project_empty_test - project_empty_test-Copy1.csv", 63), None)

#expect a graph: (empty)
#B = BMI, S = Systolic, D = Diastolic, C = Cholesterol, S = Smoke, E = Exercise

#100
#|
#|
#|
#|
#|________________________
#   B   S   D   C   S   E
#      Category

summary()

: 

### Final Graph/Chart

Now that everything is working, you **must** call `main` on the intended information source in order to display the final graph/chart:

In [None]:
#the main data source is the one found in the call to main below. It has almost 70k rows. We will also submit
#a data file called "CPSC103 cvd table 500ish lines.csv", which has around the first 500 lines of the file.
#It will load quicker if you find this easier.

#In the main function, you input the csv file, and an age (in years, as an integer) whose age range you want
#to look at.
#the age ranges are:
# 18-35
# 36-50
# 51-65
# 66-80
# 81+

#the categories produced, while their names were too long to put in full in the graph, are:
#BMI above healthy levels (25+), (BMI)
#high systolic blood pressure (130+), (systolic)
#high diastolic blood pressure (80+), (diastolic)
#high cholesterol (cholesterol)
#people that smoke (smoke)
#people that exercise (exercise)

#If there is no patient within your specified age range, the graph will have no bars (will be empty/blank)

main("cardio_train_full_file-Copy1.csv", 90)

: 