# HDS5210-2022 Midterm

In the midterm, you're going to use all of the programming and data management skills you've developed so far to build a risk calculator that pretends to be integrated with a clinical registry.  You'll compute the PRIEST COVID-19 Clinical Severity Score for a series of patients and, based on their risk of an adverse outcome, query a REST web service to find a hospital to transfer them to. The end result of your work will be a list of instructions on where each patient should be discharged: "home" if they are below 30% risk and the recommended hospital if they are at or above 30%.

Each step in the midterm will build up to form your final solution. Along the way, I've provided plenty of test cases to make sure that you're getting those steps correct.

The midterm is due at 11:59 PM CST on Monday, March 14th.

---

## Step 1: Calculate PRIEST Clinical Severity Score

This scoring algorithm can be found [here on the MDCalc website](https://www.mdcalc.com/priest-covid-19-clinical-severity-score#evidence).  

1. You will need to write a function called **priest()** with the following input parameters.  
 * Sex
 * Age in years
 * Respiratory rate in breaths per minute
 * Oxygen saturation as a percent between 0 and 1
 * Heart rate in beats per minute
 * Systolic BP in mmHg
 * Temperature in degrees C
 * Alertness as a string description
 * Inspired Oxygen as as string description
 * Performance Status as a string description
2. The function will need to follow the algorithm provided on the MDCalc website to compute a risk percentage that should be returned as a numeric value between 0 and 1.
3. Be sure to use docstring documentation and at least three built-in docstring test cases.
4. Assume that the input values that are strings could be any combination of upper or lower case. For example: 'male', 'Male', 'MALE', 'MalE' should all be interpretted by your code as male.
5. If any of the inputs are invalid (for example a sex value that is not recognizable as male or female) your code should raise a ValueException that includes a message with the invalid input and which parameter that input was provided as.

NOTES:
1. In the final step there is a rule to convert form raw score to percentile.  In that conversion, 17-25 maps to 59-88% and ≥26 maps to >99%.  For our purposes, we want these to be specific % number outputs.  Use the following rule:
 * If score is between 17 and 25, percentile should be 0.59
 * If score is greater than or equal to 26, percentile should be 0.99


In [1]:
def priest(sex, age, resp_rate, oxyg_sat, heart_rate, systolBP, temp, alertness, insprd_oxyg, perf_stat):
    """ (string, int, int, int/float, int, int/float, float, string, string, string) -> float
    This function takes as inputs 10 variables and and outputs a score between 0 & 1.
    The function follows the algorithm provided on the MDCalc website (link below)
    to compute a risk percentage that should be returned as a numeric value between 0 and 1.
    
    (MDCalc website... https://www.mdcalc.com/priest-covid-19-clinical-severity-score#evidence)
    
    Assumption: the input values that are strings could be any combination of upper or lower case. 
    For example: 'male', 'Male', 'MALE', 'MalE' should all be interpretted by your code as male.    
    If any of the inputs are invalid (for example a sex value that is not recognizable as male or female) the code 
    should raise a ValueException that includes a message with the invalid input and which parameter that input was provided as.
    Note: 
    1) oxyg_sat and systolBP parameters accept both int & float values.
    2) temp parameter accepts both celsius and fahrenheit values.
    
    >>> priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=219, temp=94, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care")
    0.59
    
    >>> priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=220, temp=94, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care")
    0.99
    
    >>> priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=96, heart_rate=150, systolBP=220, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care")
    0.49
    
    """
  
    #initialize points
    points = 0
    
    #gender (female=0; male=1)    
    if sex.lower() == "female":
        points += 0
    elif sex.lower() == "male":
        points += 1
    else:
        #this would throw an error for unrecognizable values       
        raise ValueError("""Invalid input for Gender! Valid inputs for sex/gender are: Male, Female (values are not case sensitive)""")

    #age in years (does not include the age<16 category)
    if 16 <= int(age) <= 49:
        points += 0
    elif 50 <= int(age) <= 65:
        points += 2
    elif 66 <= int(age) <= 80:
        points += 3       
    elif int(age) > 80:
        points += 4       
    else:
        #this would throw an error for age values less than 16yrs (original study included only peaple aged 16+ years) 
        raise ValueError("""Invalid input for Age! Please input integer values for age (in years) greater than or equal to 16 years""")
    
    #Repiratory rate (breaths/min)
    if int(resp_rate) < 9:
        points += 3
    elif 9 <= int(resp_rate) <=11:
        points += 1
    elif 12 <= int(resp_rate) <=20:
        points += 0       
    elif 21 <= int(resp_rate) <=24:
        points += 2             
    elif int(resp_rate) > 24:
        points += 3    
    else:
        #this would throw an error for unrecognizable values         
        raise ValueError("""Invalid input for Repiratory rate! Please input integer values for repiratory rate (breaths/min) 
                        in the following ranges: <9, 9-11, 12-20, 21-24, >24""")

    #Oxygen Saturation (as a percent between 0 and 1)   
    #make this field accept decimals as well...   
    #if float/decimal values are inputed...
    oxySAT = float(oxyg_sat)
    if 0 < float(oxyg_sat) <= 1:
        if round((oxySAT*100) ,0) < 92:
            points += 3
        elif 92 <= round((oxySAT*100) ,0) <= 93:
            points += 2    
        elif 94 <= round((oxySAT*100) ,0) <= 95:
            points += 1
        elif round((oxySAT*100) ,0) > 95:
            points += 0            
        else:
            raise ValueError("Invalid input for Oxygen Saturation! Please input oxygen saturation(%) levels as an integer or a decimal value between 0 and 1.")
            #this would throw an error for unrecognizable values 
    
    #if integer values are inputed...        
    elif 0 < int(oxyg_sat) <= 100:
        if oxyg_sat < 92:
            points += 3
        elif 92 <= oxyg_sat <= 93:
            points += 2    
        elif 94 <= oxyg_sat <= 95:
            points += 1
        elif oxyg_sat > 95:
            points += 0 
        else:
            raise ValueError("Invalid input for Oxygen Saturation! Please input oxygen saturation(%) levels as an integer or a decimal value.")
            #this would throw an error for unrecognizable values             
    else:
        raise ValueError("Invalid input for Oxygen Saturation! Please input oxygen saturation(%) levels as an integer or a decimal value.")
        #this would throw an error for unrecognizable values 

    #Heart rate (beats/min)
    if int(heart_rate) < 41:
        points += 3
    elif 41 <= int(heart_rate) <= 50:
        points += 1
    elif 51 <= int(heart_rate) <= 90:
        points += 0       
    elif 91 <= int(heart_rate) <= 110:
        points += 1       
    elif 111 <= int(heart_rate) <= 130:
        points += 2         
    elif int(heart_rate) > 130:
        points += 3
    else:
        #this would throw an error for unrecognizable values        
        raise ValueError("""Invalid input for Heart rate! Please input integer values for heart rate (beats/min). Range of values can be: 
                            <41, 41-50, 51-90, 91-110, 111-130, >130""")
  
    #Systolic BP (mmHg)
    #make this field accept floats & ints... 
    sysBP_float=float(systolBP)  
    sysBP_int=int(systolBP)   
    
    #if decimal values are inputed...        
    if sysBP_float:           
        if round(sysBP_float,2) < 91.00:
            points += 3
        elif 91.00 <= round(sysBP_float,2) <= 100.99:
            points += 2
        elif 101.00 <= round(sysBP_float,2) <= 110.99:
            points += 1      
        elif 111.00 <= round(sysBP_float,2) <= 219.00:
            points += 0         
        elif round(sysBP_float,2) > 219.00:
            points += 3     
        
    #if integer values are inputed...    
        elif sysBP_int < 91:
            points += 3
        elif 91 <= int(sysBP_int) < 101:
            points += 2
        elif 101 <= int(sysBP_int) <= 110.99:
            points += 1      
        elif 111.00 <= int(sysBP_int) <= 219.00:
            points += 0         
        elif int(sysBP_int) > 219.00:
            points += 3       
        else:
            raise ValueError("Invalid input for Systolic BP! Please input Systolic BP(mmHG) levels as an integer or a decimal value between 0 and 1.")
            #this would throw an error for unrecognizable values            
    else:
        raise ValueError("Invalid input for Systolic BP! Please input Systolic BP(mmHG) levels as an integer or a decimal value between 0 and 1.")
        #this would throw an error for unrecognizable values    
        
    #Temparature (acceptable ranges: 20°C-43°C or 68°F-109°F)
    temps = float(temp)
    if (20.00 <= round(temps,2) < 35.10) or (68.00 <= round(temps,2) < 95.18):
        points += 3
    elif (35.10 <= round(temps,2) <= 36.09) or (95.18 <= round(temps,2) <= 96.80):
        points += 1
    elif (36.10 <= round(temps,2) <= 38.09) or (96.80 < round(temps,2) <= 100.40):
        points += 0      
    elif (38.10 <= round(temps,2) <= 39.00) or (100.40 < round(temps,2) <= 102.20):
        points += 1         
    elif (39.00 < round(temps,2) <= 43.00) or (102.20 < round(temps,2) <= 109.00):
        points += 2
    else:
        #this would throw an error for unrecognizable values
        raise ValueError("""Invalid input for Body Temparature! Please input temperature values in the range 20°C to 43°C (for the Celcius/Centigrade Scale) OR 68°F-109°F (for the Fahrenheit Scale)""")
        #how do we prevent a temp in celcius getting imputed as fahrenheit? May have to code both celcius & fahrenheit with non-overlapping but meaningful ranges...
        #fahrenheit range: 68°F-109°F; celsius range:20°C-43°C 
        
    #Alertness (alert=0; confused or not alert=3)    
    if alertness.lower() == "alert":
        points += 0
    elif alertness.lower() == "confused or not alert" or (alertness.lower() == "confused") or (alertness.lower() == "not alert"):
        points += 3
    else:
        #this would throw an error for unrecognizable values        
        raise ValueError("""Invalid input for Alertness! Valid inputs for alertness are: 'alert' OR 'confused, not alert' (values are not case sensitive).""")
        
    #Inspired oxygen (air=0; supplemental oxygen=2)    
    if insprd_oxyg.lower() == "air":
        points += 0
    #elif insprd_oxyg.lower() == "supplemental oxygen":
    elif insprd_oxyg.lower() == "supplemental oxygen":       
        points += 2
    else:
        #this would throw an error for unrecognizable values        
        raise ValueError("""Invalid input for Inspired oxygen! Valid inputs for Inspired oxygen are: air, supplemental oxygen (values are not case sensitive).""")
                
    #Performance Status (i.e. the ability to perform normal activities)
    category1 = "Unrestricted normal activity"
    category2 = "Limited strenuous activity, can do light activity"
    category3 = "Limited activity, can self-care"
    category4 = "Limited self-care"
    category5 = "Bed/chair bound, no self-care"
    
    if perf_stat.lower() == category1.lower():
        points += 0
    elif perf_stat.lower() == category2.lower():
        points += 1
    elif perf_stat.lower() == category3.lower():
        points += 2        
    elif perf_stat.lower() == category4.lower():
        points += 3            
    elif perf_stat.lower() == category5.lower():
        points += 4           
    else:
        #this would throw an error for unrecognizable values   
        raise ValueError("""Invalid input for Performance Status! Please input Valid values for Performance Status... n\
                         ["unrestricted normal activity", "Limited strenuous activity, can do light activity"n\
                         ,"Limited activity, can self-care", "Limited self-care", "Bed/chair bound, no self-care"]""")
    #return points
        
    #initialize priest risk score
    risk_score = 0
    
    if 0 <= points <= 1:
        risk_score = 0.01
    elif 2 <= points <= 3:
        risk_score = 0.02        
    elif points == 4:
        risk_score = 0.03
    elif points == 5:
        risk_score = 0.09       
    elif points == 6:
        risk_score = 0.15                         
    elif points == 7:
        risk_score = 0.18
    elif points == 8:
        risk_score = 0.22
    elif points == 9:
        risk_score = 0.26       
    elif points == 10:
        risk_score = 0.29     
    elif points == 11:
        risk_score = 0.34     
    elif points == 12:
        risk_score = 0.38             
    elif points == 13:
        risk_score = 0.46             
    elif points == 14:
        risk_score = 0.47             
    elif points == 15:
        risk_score = 0.49     
    elif points == 16:
        risk_score = 0.55             
    elif 17 <= points <= 25:
        risk_score = 0.59    
    elif points >= 26:
        risk_score = 0.99        
    else: 
        raise ValueError("""Priest Risk Score cannot be Calculated! Please check your input values! 
                            See the following link for more Info: https://www.mdcalc.com/priest-covid-19-clinical-severity-score#evidence""")
        
    return risk_score
    
 #["unrestricted normal activity", "Limited strenuous activity, can do light activity",n\
 #"Limited activity, can self-care", "Limited self-care", "Bed/chair bound, no self-care"]    

In [2]:
#Quick Checks
print("-------Tests/Examples------")
print(priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=219, temp=94, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 23 points; >59% 30-day prob of adverse outcome (using MDCalc)

print(priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=220, temp=94, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 26 points; >99% 30-day prob of adverse outcome (using MDCalc)

print(priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=96, heart_rate=150, systolBP=110.9, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 13 points; >46% 30-day prob of adverse outcome (using MDCalc)

print(priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=96, heart_rate=150, systolBP=220, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 15 points; >49% 30-day prob of adverse outcome (using MDCalc)


-------Tests/Examples------
0.59
0.99
0.46
0.49


In [3]:
#More Checks
print("-------oxygen saturation checks (float vs. int)------")
#both int & float values should return the same priest score
#oxyg_sat=96
print(priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=93, heart_rate=150, systolBP=110.9, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 15 points; >49% 30-day prob of adverse outcome (using MDCalc)

#oxyg_sat=0.96
print(priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=0.93, heart_rate=150, systolBP=110.9, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 15 points; >49% 30-day prob of adverse outcome (using MDCalc)

print("-------temp checks (°Celsius vs. °Fahrenheit)------")
#both scales should return the same priest score
#temp=95.18°F
print(priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=220, temp=95.18, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 24 points; >59% 30-day prob of adverse outcome (using MDCalc)

#temp=35.1°C
print(priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=220, temp=35.1, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 24 points; >59% 30-day prob of adverse outcome (using MDCalc)

print("-------Systolic BP checks (float vs. int)------")
#both int & float values should return the same priest score
print(priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=96, heart_rate=150, systolBP=220, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 15 points; >49% 30-day prob of adverse outcome (using MDCalc)

print(priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=96, heart_rate=150, systolBP=220.00, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care"))
#priest score = 15 points; >49% 30-day prob of adverse outcome (using MDCalc)

-------oxygen saturation checks (float vs. int)------
0.49
0.49
-------temp checks (°Celsius vs. °Fahrenheit)------
0.59
0.59
-------Systolic BP checks (float vs. int)------
0.49
0.49


In [4]:
import doctest
doctest.run_docstring_examples(priest, globals(), verbose=True)

Finding tests in NoName
Trying:
    priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=219, temp=94, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care")
Expecting:
    0.59
ok
Trying:
    priest(sex="Male", age=80, resp_rate=24, oxyg_sat=80, heart_rate=130, systolBP=220, temp=94, alertness="not alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care")
Expecting:
    0.99
ok
Trying:
    priest(sex="feMale", age=18, resp_rate=20, oxyg_sat=96, heart_rate=150, systolBP=220, temp=24, alertness="alert", insprd_oxyg="Supplemental oxygen", perf_stat="Bed/chair bound, no self-care")
Expecting:
    0.49
ok


## Part 2: Find a hospital

The next thing we have to do is figure out where to send this particular patient.  The guidelines on where to send a patient are based on their age (pediatric, adult, geriatric), sex, and risk percentage.  Luckily, you don't have to implement these rules. I already have. All you have to do is use a REST web service that I've created for you.

You'll want to use Python to make a call to my REST web service similar to the example URL below. The first part of the URL will be the same for everyone and every request that you make. What you will need to modify for each of your requests is the information after the question mark.

```
https://oumdj6oci2.execute-api.us-east-1.amazonaws.com/prd/?age=40&sex=male&risk_pct=0.1
```

The example above asks my web service where a 40-year old male with a risk percetage of 10% should go.  What the web service will return back is a JSON string containing the information you need.  That JSON will look like this:

```json
{
  "age": "40",
  "sex": "male",
  "risk": "0.1",
  "hospital": "Southwest Hospital and Medical Center"
}
```

1. Your job is to write a function called **find_hospital()** that takes age, sex, and risk as parameters.
2. Your function should call this REST web service
3. Then your function will need to interpret the JSON it gets and return just the name of the hospital
4. If anything fails, return None without raising any exceptions
5. Include a good docstring with at least five test cases.


In [5]:
import requests
import json

def find_hospital(age, sex, risk):
    """(int, str, float)-> str
    Use the professor's REST web service to lookup and return the
    name of the hospital where this person should be sent.
    
    >>> find_hospital(40, 'male', 0.1)
    'Southwest Hospital and Medical Center'
    
    >>> find_hospital(40, 'female', 0.1)
    'Select Specialty Hospital - Northeast Atlanta'
    
    >>> find_hospital(60, 'male', 0.5)
    'Emory Dunwoody Medical Center'
    
    >>> find_hospital(20, 'female', 0.9)
    'Emory Dunwoody Medical Center'
    
    >>> find_hospital(90, 'male', 0.9)
    'Wesley Woods Geriatric Hospital'
    
    """
    
    url = f'https://oumdj6oci2.execute-api.us-east-1.amazonaws.com/prd/?age={age}&sex={sex}&risk_pct={risk}'
    
    try:
        r = requests.get(url)
        info = r.json()
        hospital = info.get('hospital')
    except:
        hospital = None
        raise ValueError(f'Unable to find hospital for inputs: {age} {sex} {risk}')
    
    return hospital

In [6]:
#Quick Checks
print(find_hospital(40, 'male', 0.1))
print(find_hospital(40, 'female', 0.1))
print(find_hospital(60, 'male', 0.5))
print(find_hospital(20, 'female', 0.9))
print(find_hospital(90, 'male', 0.9))

Southwest Hospital and Medical Center
Select Specialty Hospital - Northeast Atlanta
Emory Dunwoody Medical Center
Emory Dunwoody Medical Center
Wesley Woods Geriatric Hospital


In [7]:
import doctest
doctest.run_docstring_examples(find_hospital, globals(), verbose=True)

Finding tests in NoName
Trying:
    find_hospital(40, 'male', 0.1)
Expecting:
    'Southwest Hospital and Medical Center'
ok
Trying:
    find_hospital(40, 'female', 0.1)
Expecting:
    'Select Specialty Hospital - Northeast Atlanta'
ok
Trying:
    find_hospital(60, 'male', 0.5)
Expecting:
    'Emory Dunwoody Medical Center'
ok
Trying:
    find_hospital(20, 'female', 0.9)
Expecting:
    'Emory Dunwoody Medical Center'
ok
Trying:
    find_hospital(90, 'male', 0.9)
Expecting:
    'Wesley Woods Geriatric Hospital'
ok


## Part 3: Get the address for that hospital from a webpage

Great! Now we have code to tell us which hospital to send someone to... but we don't know where that hospital is. The next function we need to create is one that looks up the address of that hospital.  All of these hospitals are in Atlanta, Georgia.  We're going to use the list on this webpage to lookup the address for that hospital, based on its name.

https://www.officialusa.com/stateguides/health/hospitals/georgia.html

1. You need to create a function called **get_address()** that takes hospital name as a parameter and searches the data on the webpage above to find the addres for that hospital.
2. I've said that all the hospitals are in Atlanta, but this webpage has hospitals from all of Georgia.  So, make sure you verify that the row of data you're using is in Atlanta, just in case there are hospitals with the same name in different cities.
3. Your code will have to parse the HTML on the webpage and turn that into some kind of data structure that you can search through to find the right hospital.
4. If you do find more than one matching hospital in Atlanta with the same name, you should raise a KeyError.
5. If the hospital name isn't found, the function should raise a KeyError.
6. Be sure to use good docstring documentation and includes at least 3 test cases.

In [8]:
from bs4 import BeautifulSoup
import requests
import json

def get_address(hospital_name):
    """ (str) -> str
    This function takes a hospital name as a parameter and searches the data on the webpage below to find the address for that hospital.
    https://www.officialusa.com/stateguides/health/hospitals/georgia.html
    
    If there is more than one matching hospital in Atlanta with the same name, the function would raise a KeyError.
    If the hospital name isn't found, the function would raise a KeyError.

    >>> get_address('SOUTHWEST HOSPITAL AND MEDICAL CENTER')
    '501 FAIRBURN ROAD SW'

    >>> get_address('EMORY DUNWOODY MEDICAL CENTER')
    '4500 NORTH SHALLOWFORD ROAD'

    >>> get_address('WELLSTAR ATLANTA MEDICAL CENTER')
    '303 PARKWAY DRIVE NE'

    >>> get_address('EMORY UNIVERSITY HOSPITAL MIDTOWN')
    '550 PEACHTREE ST NE'

    >>> get_address('HUGHES SPALDING CHILDRENS HOSPITAL')
    '1711 TULLIE CIRCLE NE'
     
    """
    headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 11_2_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36" }

    page = requests.get("https://www.officialusa.com/stateguides/health/hospitals/georgia.html", headers=headers)
    #sprint(page.status_code) #successful!
   
    soup = BeautifulSoup(page.content, 'html.parser')
    
    #quick check of the table container using 'id'
    myTable = soup.find(id="myTable")
    #print(type(myTable))
    #print(len(myTable))

    #scrape the table rows (it appears the rows are stored on the 'tr' tag)
    tblRow = soup.find_all("tr")
    #tbl_row = soup.find("tr")
    #tblCols = tblRow.find_all("td")
    
    #initialize lists & dictionary    
    tblData = [] #entire list for Georgia
    tbl_ATL = [] #subset of data for Atlanta
    dataStruct = {} #dictionary data structure for hospital info
    
    #scrape the column headers
    tbl_colNames = [row.get_text() for row in soup.find("tr").select("th")]
    #print(tbl_colNames)

    #select table columns/data
    for tblRow in soup.find_all("tr"):
        tblData.append([row.get_text() for row in tblRow.select("td")])

    #remove row with city=None and select Atlanta hospitals only (i.e. pop out the blank field)   
    tblData.pop(0)  
    #tblData

    #select Atlanta hospitals only
    for item in tblData:
        if item[0] == "ATLANTA":
            tbl_ATL.append(item)
    #tbl_ATL 

    #loop through hospital list & create a dictionary (recall wk4 hmwrk - #20.3)
    for item in tbl_ATL:
        hospName = item[1] #hospital name is in position #1
        hospAddr = item[3] #hospital name is in position #3     
        if hospName not in dataStruct: 
            dataStruct[hospName] = []
            dataStruct[hospName].append(hospAddr)  

        #if there are multiple similarly-named hospitals, raise KeyError
        if len(dataStruct[hospName]) > 1:
            raise KeyError("""Multiple Hospitals with that Name""")

    #create the data structure for the hospital info       
    for hosp in dataStruct.items():
        name=hosp[0]
        address=hosp[1] 
        if name.lower() == hospital_name.lower():
            address=dataStruct[name][0] 
            name_addr = [name, address]
            hospAddr = name_addr[1]
            """#comment: creating the hospAddr object is somewhat convoluted, but 
            this is the only workaround I had for outputing the hospital address 
            as a string rather than a list (still have knowledge gaps...) :)"""
  
        #If the hospital name isn't found (null values), the function should raise a KeyError.    
        if hospAddr == None:
            raise KeyError("""The Hospital's Address is Missing""")       
    
    return hospAddr


In [9]:
#Quick Checks
print(get_address('SOUTHWEST HOSPITAL AND MEDICAL CENTER'))
print(get_address('EMORY DUNWOODY MEDICAL CENTER'))
print(get_address('WELLSTAR ATLANTA MEDICAL CENTER'))
print(get_address('EMORY UNIVERSITY HOSPITAL MIDTOWN'))
print(get_address('HUGHES SPALDING CHILDRENS HOSPITAL'))

get_address('SOUTHWEST HOSPITAL AND MEDICAL CENTER')

#'SOUTHWEST HOSPITAL AND MEDICAL CENTER': ['501 FAIRBURN ROAD SW'],
#'EMORY DUNWOODY MEDICAL CENTER': ['4500 NORTH SHALLOWFORD ROAD']
# 'WELLSTAR ATLANTA MEDICAL CENTER': ['303 PARKWAY DRIVE NE'],
# 'EMORY UNIVERSITY HOSPITAL MIDTOWN': ['550 PEACHTREE ST NE'],
# 'HUGHES SPALDING CHILDRENS HOSPITAL': ['1711 TULLIE CIRCLE NE'],

501 FAIRBURN ROAD SW
4500 NORTH SHALLOWFORD ROAD
303 PARKWAY DRIVE NE
550 PEACHTREE ST NE
1711 TULLIE CIRCLE NE


'501 FAIRBURN ROAD SW'

In [10]:
import doctest
doctest.run_docstring_examples(get_address, globals(), verbose = True)

Finding tests in NoName
Trying:
    get_address('SOUTHWEST HOSPITAL AND MEDICAL CENTER')
Expecting:
    '501 FAIRBURN ROAD SW'
ok
Trying:
    get_address('EMORY DUNWOODY MEDICAL CENTER')
Expecting:
    '4500 NORTH SHALLOWFORD ROAD'
ok
Trying:
    get_address('WELLSTAR ATLANTA MEDICAL CENTER')
Expecting:
    '303 PARKWAY DRIVE NE'
ok
Trying:
    get_address('EMORY UNIVERSITY HOSPITAL MIDTOWN')
Expecting:
    '550 PEACHTREE ST NE'
ok
Trying:
    get_address('HUGHES SPALDING CHILDRENS HOSPITAL')
Expecting:
    '1711 TULLIE CIRCLE NE'
ok


## Part 4: Run the risk calculator on a population

The `/data` folder has a file called `people.psv`.  It is a pipe-delimited (`|`) file with columns that match the inputs for the PRIEST calculation above.

In addition, the file has a patient identifier in the first column.

1. Write a function called **process_people()** that takes a file name as a parameter. Your Python program should use your code above to process all of these rows, determine the hospital and address, and return a list whose items are a dictionary like `{ patient_number: [sex, age, breath, o2sat, heart, systolic, temp, alertness, inspired, status, hospital, address]}`
2. Be sure to use good docstrings, but you don't need any tests in your doc strings.  I've provided those for you below.


**NOTE** that when running your code for all the 100 records in the `people.psv` file, it may take a few minutes to complete.  You're making multiple calls to the internet for each record, so that can take a little while.


In [11]:
import csv

def process_people(file_name):
    """ (filepath) -> list 
    This function takes a file name as a parameter, determines the hospital and address, and returns a list whose items are a dictionary like 
    {patient_number: [sex, age, breath, o2sat, heart, systolic, temp, alertness, inspired, status, hospital, address]}
    Output is a list of dictionaries, with the patient id as key.
    """
    
    #read in file as csv (wk6 notes)
    ##standard syntax is... with open(filepath+filename, 'r') as csvfile: 
    with open(file_name,'r') as dfl:
        #create a csv reader object
        csvreader = csv.reader(dfl, delimiter='|')

        # initializing objects
        fields = []
        rows = []
        pat_dict = {}
        
        # extracting field names through first row
        #fields = csvreader.next()
        fields = next(csvreader)

        #skip headers when reading in file
        #next(csv_reader, None)        

        # extracting each data row one by one
        for item in csvreader:
            #rows.append(item)  

            #assign names to columns
            pat_id = item[0]
            sex = item[1]
            age = int(item[2])
            resp_rate = int(item[3])
            oxyg_sat = float(item[4])
            heart_rate = int(item[5])
            systolBP = int(item[6])
            temp = float(item[7])
            alertness = item[8]
            insprd_oxyg = item[9]
            perf_stat = item[10]

            #assign the needed functions to python objects
            risk = priest(sex, age, resp_rate, oxyg_sat, heart_rate, systolBP, temp, alertness, insprd_oxyg, perf_stat)
            hospital = find_hospital(age, sex, risk)
            address = get_address(hospital)     

            pat_dict[pat_id] = [sex, age, resp_rate, oxyg_sat, heart_rate, systolBP, temp, alertness, insprd_oxyg, perf_stat, float(risk), hospital, address]
        
        return pat_dict 
    
#notes to self...
#https://www.geeksforgeeks.org/working-csv-files-python/
#https://www.codegrepper.com/code-examples/python/skip+header+in+csv+python

In [12]:
#Testing (Note: uncomment this for testing purposes...)
#process_people('/data/people.psv')     

##### Note: This output is quite long. To test the "process_people" function, please uncomment the code above.

## Part 5: Checking your final results

The final step is to check your results.  You should be able to compare your results to the output in `people_results.json` in the `/data` folder.  Write some code to check your results.  This does not need to be a function.

In [13]:
#Title: Checking the Final Results...

import json

#step 1: read in the correct results file (file_name = '/data/people_results.json')
with open("/data/people_results.json") as file_1:
    file_1_results = json.load(file_1)

#step 2: call the process_people function on the people.psv file
file_2_procOutput = process_people('/data/people.psv') 

    
#step 3: check if both files match
#3.1 find and print the mismatches:
if file_1_results == file_2_procOutput:
    print("Successful! Both files are matched!")
else:
    print("Sorry, files don't match! Please check your inputs.")

#3.2 if the files aren't fully matched, determine how many matched rows
count = 0 #initialize counter
length = len(file_1_results) #get the number of rows in the results file 

#iterate through each row & count the instances where both files are matched
for rows in file_1_results.keys():
    if(file_1_results[rows] == file_2_procOutput[rows]):
        count += 1
print(f'There are {count} matched rows out of a total of {length} rows!')

#3.3 find and print the mismatches:
for rows in file_1_results.keys():
    if(file_1_results[rows] != file_2_procOutput[rows]):
        print(rows, file_1_results[10], file_2_procOutput[10] )
        
#notes to self...
#https://www.geeksforgeeks.org/compare-two-files-line-by-line-in-python/

Successful! Both files are matched!
There are 99 matched rows out of a total of 99 rows!


---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

In order to submit your work, you'll need to use the `git` command line program to **add** your homework file (this file) to your local repository, **commit** your changes to your local repository, and then **push** those changes up to github.com.  From there, I'll be able to **pull** the changes down and do my grading.  I'll provide some feedback, **commit** and **push** my comments back to you.  Next week, I'll show you how to **pull** down my comments.

To run through everything one last time and submit your work:
1. Use the `Kernel` -> `Restart Kernel and Run All Cells` menu option to run everything from top to bottom and stop here.
2. Follow the instruction on the prompt below to either ssave and submit your work, or continue working.

If anything fails along the way with this submission part of the process, let me know.  I'll help you troubleshoort.

---

In [None]:
a=input('''
Are you ready to submit your work?
1. Click the Save icon (or do Ctrl-S / Cmd-S)
2. Type "yes" or "no" below
3. Press Enter

''')

if a=='yes':
    !git add "midterm-2022.ipynb"
    !git commit -a -m "Submitting the midterm"
    !git push
else:
    print('''
    
OK. We can wait.
''')