# Week 12 - Earn-Back Points Assignment #2

These exercises are entirely optional, but they provide good practice. And you can use them to earn extra points toward your semester grade.  Completing all the questions in this assignment correctly will earn you back 8 points.

There will be 2 more assignments like this between now and the end of the semester, giving you the opportunity to earn back a total of 32 points.

**If anything about the above rules is unclear, please message me on Canvas or via email**

---

## Introduction

The Centers for Medicare and Medicaid Services (CMS) provides lots of information online including a general directory of hospitals in the US. For this set of exercises, we'll be working with a file referred to as [Hospital General Information](https://data.cms.gov/provider-data/dataset/xubh-q36u). **Download this file as a CSV and upload it to your week 13 directory on Jupyter.**

Each of these exercises will involve finding an answer to a specific question and then submitting that.  Your answers must be computed using Python code within this notebook to earn full credit.

You do not need to write functions to compute the answers and do not need to provide any special documentation. You can simply calculate the answers inline in the notebook and then submit your answers using the `answers` dictionary, similar to how most of our part 1 assignments work.


In [1]:
import pandas as pd
answers = {}
df = pd.read_csv('Hospital_General_Information.csv')

---

## Tips

Before you get started, I want to show you a pattern that you might find useful. In the example below, I'm going to summarize a simple data frame, determine which name occurs most often, how often that is, and what the percent of total that represents.  This can be a useful pattern in general and you should be able to apply it below.

In [2]:
df = pd.DataFrame([
    ['Boal','Paul',45],
    ['Boal','Anny',47],
    ['Boal','James',75],
    ['Lester','Sarahlynn',48],
    ['Lester','Carolynn',70]
], columns=['Last Name','First Name','Age'])

df

Unnamed: 0,Last Name,First Name,Age
0,Boal,Paul,45
1,Boal,Anny,47
2,Boal,James,75
3,Lester,Sarahlynn,48
4,Lester,Carolynn,70


In [3]:
# Which family (based on Last Name) has the most people?

# 1. Group by Last Name
# 2. Count how many people are in each family
# 3. Sort by value


by_last_name = df.groupby('Last Name')
family_count = by_last_name['Last Name'].count()
family_sorted = family_count.sort_values(ascending=False)

family_sorted

Last Name
Boal      3
Lester    2
Name: Last Name, dtype: int64

In [4]:
# 4. Extract the "index" (aka Last Name)
# 5. Choose the first value

top_family = list(family_sorted.index)[0]
top_family

'Boal'

In [5]:
# How many members does that family have?

# 6. Choose that family from the counts we already computed.

family_count[top_family]

3

In [6]:
# What percent of total is that?

# 7. Compute a total
# 8. Compute the percent

total = family_count.sum()
pct = family_count[top_family] / total

pct

0.6

---

### E12.01

How many columns are there in this data frame?

In [7]:
answers['E12.01'] = len(df.columns)

### E12.02

How many hospitals are there in this file? (Each row is one hospital.)

In [8]:
answers['E12.02'] = len(df)

### E12.03

How many hospitals from Missouri (state abbreviation MO) are in this file?

In [9]:
mo_hospitals = df[df['State'] == 'MO']
answers['E12.03'] = len(mo_hospitals)

KeyError: 'State'

### E12.04

How many different ZIP Codes from Misouri are represented in this file?

In [None]:
answers['E12.04'] = len(mo_hospitals['ZIP Code'].unique())

### E12.05

Which of those ZIP Codes has the most hospitals?  If it's a tie, submit your answer as a list of ZIP Codes.  Make sure your answer is submitted as a string or list of strings. Do not submit the ZIP Code values as numbers.


In [None]:
mo_count = mo_hospitals['ZIP Code'].value.count().sort.values(ascending=False)

most = max(mo_count)[0]
top = list(mo_count[mo_count==most].index.astype(str))

answers['E12.05'] = top

In [None]:
mo_count = mo_hospitals['ZIP Code'].value.counts()

In [None]:
list(mo_count[mo_count==most].index.astype(str))

mo_count

In [None]:
list(mo_count)[0]

### E12.06

Which state has the most hospitals?

In [None]:
counnt_by_state = df.groupby('State')['State'].count()
answers['E12.06'] = list(count_by_state.sort_values(ascending=False))

### E12.07

How many different Hospital Types are there in this file?

In [None]:
answers['E12.07'] = ""

### E12.08

Which Hospital Type has the greatest number of files?

In [None]:
answers['E12.08'] = ""

### E12.09

What percent of the total hospital count (represented as ##.##%, rounded to two decimal places) is represented in that Hospital Type?

In [None]:
answers['E12.09'] = ""

### E12.10

What percent of the total hospital count (represented as ##.##%, rounded to two decimal places) provide Emergency Services?

In [None]:
answers['E12.10'] = ""

---

Checking Your Work
---

After completing your work above and running each cell, you can check your answers by running the code below. 

The easiest way to do this is to use the `Kernel` -> `Restart Kernel and Run All Cells` menu option. This option restarts Python and runs every cell from top to bottom until it encounters an exception of some kind.  It will stop after running the cell below and outputing a summary of how many answers you have correct or incorrect.


In [None]:
import getpass
import boto3
import json

test = {
    "user": getpass.getuser(),
    "week": "week12EB",
    "answers": answers
}

client = boto3.client('lambda')

response = client.invoke(
    FunctionName="hds5210",
    InvocationType="RequestResponse",
    Payload=json.dumps(test))

result = json.loads(response['Payload'].read().decode('utf-8'))
# print(result)

try:
    print('{0:>7}{1:>30}{2:>10}'.format('Q#','Yours','Correct?'))
    for row in result.get('results'):
        print('{0:>7}{1:>30}{2:>10}'.format(str(row[0]),str(row[1]),str(row[2])))
except:
    print(result)

## Submit your work to github in your week 13 folder by 12/5 11:59 PM