<a href="https://colab.research.google.com/github/christophermalone/HLA311/blob/main/Module2_Part3B_Advanced.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 2 | Part 3B: Summary - Advanced Level 

This purpose of this iPython Notebook is to communicate the process by which a data scientist would obtain basic summaries of a data table.

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Example #1 - Uninsured Rates 

For this example, we will reconsider the Small Area Health Insurance Estimates (SAHIE) data from the United States Census Bureau.  This data provides counts for the number of people with and without health insurance.  This subset include county level information for people under the age of 19 (agecat = 4) and those from the poorest income category (iprcat = 3). 

Source: https://www.census.gov/data/datasets/time-series/demo/sahie/estimates-acs.html 


Google Folder: <a href="hhttps://drive.google.com/drive/folders/1t7TYoPlQggErulrxNw-t9v8B9VLvQ7jk?usp=sharing" target="_blank">Link to Data</a>

The following fields are included in this data table:
*   State_Name: state name
*   State_Abbreviation: abbreviation for state
*   County_Name: name of county
*   CountyName_State: Combination of County_Name field and State_Abbreviation
*   MedicareExpansion_Adopted_StateLevel: Had state adopted medicare expansion on year data was collected
*   agecat: category of age (see data dictionary)
*   racecat: category of race (see data dictionary)
*   sexcat: category of sex (see data dictionary)
*   iprcat: category of income (see data dictionary)
*   NumberinGroup: Number of people in this demographic group
*   Uninsured: Number of people uninsured in this demographic group
*   Insured: Number of people insured in this demographic group
*   PercentUninsured: Percent of people uninsured in this demographic group


<table width='100%' ><tr><td bgcolor='green'></td></tr></table>


<strong>Goal</strong>: Compute the percent of uninsured people under the age of 19 who are in the lowest income categorey.


### Load the data in Python

In [49]:
#Load the pandas package
import pandas as pd

In [50]:
#Use read_table to read in the tab delimited file into Python
SAHIE_Children_Poorest = pd.read_csv('http://www.statsclass.org/online/hla311/datasets/SAHIE_CountyData_Children_Poorest.csv') 

In [51]:
#How many records and field
SAHIE_Children_Poorest.shape

(3142, 13)

In [52]:
#Look at first few rows of the data
SAHIE_Children_Poorest.head(n=5)

Unnamed: 0,State_Name,State_Abbreviation,County_Name,CountyName_State,MedicareExpansion_Adopted_StateLevel,agecat,racecat,sexcat,iprcat,NumberInGroup,Uninsured,Insured,PercentUninsured
0,Alabama,AL,Autauga County,"Autauga County, AL",No,4,0,0,3,3494.0,125.0,3369.0,3.6
1,Alabama,AL,Baldwin County,"Baldwin County, AL",No,4,0,0,3,11602.0,661.0,10941.0,5.7
2,Alabama,AL,Barbour County,"Barbour County, AL",No,4,0,0,3,2727.0,95.0,2632.0,3.5
3,Alabama,AL,Bibb County,"Bibb County, AL",No,4,0,0,3,1669.0,63.0,1606.0,3.8
4,Alabama,AL,Blount County,"Blount County, AL",No,4,0,0,3,4206.0,257.0,3949.0,6.1


Next, install the dfply package that can be used to invoke various data verbs in Python

In [53]:
pip install dfply



In [54]:
#Load the dfply package
from dfply import *

### Summarize and obtain the Overall % 

The .sum() method is used here to obtain the desired summary.  Other summary measures exist such as .mean(), .median, .min(), .max(), .std(), etc.

In [55]:
# Using Pyton to obtain Overall % for uninsured children
OutcomeTable = (
           SAHIE_Children_Poorest
            >> summarize(Overall_Rate = X.Uninsured.sum() / X.NumberInGroup.sum())
          )

#Pretty print the desired table
print(OutcomeTable.to_string(index=False))

 Overall_Rate
     0.068159




---



---



---

