# County Health Data Subset and Instructions

#### The purpose of this file is to outline how to use the County Health Data and explain how the following steps can be reproduced by the user.

The `CountyHealthData_2014-2015.csv` will be used to create a smaller subset of data containing specific rows and columns. This subset can be utilized further to create visualization or view desired data. This notebook shows how to create a subset about the state of Connecticut including information about the counties, population, median household income, health care costs, and uninsured individuals. 

## Importing Packages and Data
Import the Pandas and Numpy packages in order to use and manipulate the data from the .csv file.

Pandas and Numpy can be imported using the following command:

In [1]:
import pandas as pd
import numpy as np

Once the information above has been typed, select the shift and return button on the keyboard. Utilizing these two buttons runs the code.

Pandas is abbreviated to `pd` so "pandas" does not have to be written everytime it needs to be utilized. The same situation occurs with Numpy, which is abbreviated to `np`.

After Pandas and Numpy have been imported, the .csv file containing the data must be read. This is necessary in order to utilize the data. The `CountyHealthData_2014-2015.csv` will be read using Pandas (pd) and be defined as `df`. Defining this removes the requirement of inserting the full name of the file everytime. 

`CountyHealthData_2014-2015.csv` can be defined using the following command:

In [2]:
df=pd.read_csv("CountyHealthData_2014-2015.csv")

## Selecting Specific Data
Brackets can be utilized to select specific rows within the data. In order to do this, the number the cooresponds with the desired rows can be placed in brackets and then ran. Placing `df` before the brackets indicates that the desired information is from the County Health Data and to be read by Pandas. 

### Selecting Rows
In the example below, the desired rows were rows 0-2. In order to get these rows, `0:3` must be placed in the brackets because the row one less than the last number will be returned. When this was placed in brackets, all 64 columns were returned. 

In [89]:
df [0:3]

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
0,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/2014,,0.122,...,,0.374,0.25,3791.0,0.185,216.0,69192,0.127,,0.287
1,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/2015,,0.122,...,,0.314,0.176,4837.0,0.185,254.0,74088,0.133,,
2,AK,West,Pacific,Anchorage Borough,2020,2020,Region 22,1/1/2014,6827.0,0.125,...,15.37,0.218,0.096,6588.0,0.119,135.0,71094,0.319,6.29,0.16


### Selecting Columns
Brackets can also be utilized to examine specific columns. The name of the column can be placed in quotations within a set of brackets and then ran. Placing `df` prior to the brackets once again indicates that the information should be from the County Health Data and read by Pandas. 

In the example below, the desired column was the one containing the 2011 population estimate. When the title was placed in brackets and ran, the data from every row of the 2011 population estimate column was returned. 

In [88]:
df["2011 population estimate"]

0         5547
1         5511
2       298610
3       300950
4        17746
         ...  
6104     21066
6105      8464
6106      8463
6107      7082
6108      7158
Name: 2011 population estimate, Length: 6109, dtype: int64

### Organizing Columns
Multiple specific columns can examined by being combining them into a table. The names of the desired columns can be placed in quotations within a double set of brackets. Each column name must be in its own set of quotations and separated by commas. `df` should be placed prior to the brackets.

In the example below, the desired columns are `State, County, and 2011 population estimate`. When the titles were ran, the data from every row of all three columns were returned.

In [109]:
df[["State", "County", "2011 population estimate"]]

Unnamed: 0,State,County,2011 population estimate
0,AK,Aleutians West Census Area,5547
1,AK,Aleutians West Census Area,5511
2,AK,Anchorage Borough,298610
3,AK,Anchorage Borough,300950
4,AK,Bethel Census Area,17746
...,...,...,...
6104,WY,Uinta County,21066
6105,WY,Washakie County,8464
6106,WY,Washakie County,8463
6107,WY,Weston County,7082


### Combining Rows and Columns
When selecting multiple, specific columns to view, all the rows are displayed. If only certain rows from the specified columns are desired, the code for specific rows and specific columns can be combined. The numbers associated with the desired rows must be placed in one set of bracket. The number of the last row must be increased by one to ensure that all the desired rows are shown. For multiple desired columns, the same steps should be followed from above.

In the example below, the desired rows are 19 through 24. The desired columns are `State, County, and 2011 population`. After running the data, the desired information is depicted in a table. 

In [95]:
df[19:25][["State", "County", "2011 population estimate"]]

Unnamed: 0,State,County,2011 population estimate
19,AK,Kodiak Island Borough,14135
20,AK,Lake and Peninsula Borough,1654
21,AK,Lake and Peninsula Borough,1648
22,AK,Matanuska-Susitna Borough,93925
23,AK,Matanuska-Susitna Borough,95192
24,AK,Nome Census Area,9915


## Connecticut Data Subset
The above steps are utilized to formulate the desire table. This table only contains data from the state Connecticut and columns that display the `state, county, 2011 population estimate, median household income, health care costs, and uninsured individuals`. 

### Rows of Connecticut Data
The first step to create this table is to determine which rows contain information about Connecticut. To do this, `[State]` is set equal to the abbreviation of the desired state, in this case `Connecticut (CT)`, within another set of brackets. `df` is placed twice in the code. Once before the outermost set of brackets, and another time before the innermost set of brackets. This code is depicted below.

In [112]:
df[df["State"] == "CT"]

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
593,CT,Northeast,New England,Fairfield County,9001,9001,Region 12,1/1/2014,4541.0,0.1,...,8.16,0.151,0.034,9101.0,0.092,68.0,79536,0.274,3.39,0.193
594,CT,Northeast,New England,Fairfield County,9001,9001,Region 12,1/1/2015,4459.0,0.1,...,8.6,0.151,0.048,9132.0,0.092,74.0,81816,0.289,4.1,
595,CT,Northeast,New England,Hartford County,9003,9003,Region 12,1/1/2014,5808.0,0.109,...,10.59,0.125,0.032,8831.0,0.096,98.0,63374,0.326,4.46,0.198
596,CT,Northeast,New England,Hartford County,9003,9003,Region 12,1/1/2015,5746.0,0.109,...,11.33,0.122,0.038,8864.0,0.096,110.0,63491,0.34,4.9,
597,CT,Northeast,New England,Litchfield County,9005,9005,Region 12,1/1/2014,5191.0,0.092,...,11.15,0.104,0.029,8606.0,0.09,43.0,67746,0.151,1.43,0.192
598,CT,Northeast,New England,Litchfield County,9005,9005,Region 12,1/1/2015,5325.0,0.092,...,11.76,0.11,0.04,8535.0,0.09,43.0,70519,0.177,1.4,
599,CT,Northeast,New England,Middlesex County,9007,9007,Region 12,1/1/2014,4899.0,0.094,...,11.24,0.095,0.029,8690.0,0.078,75.0,74588,0.149,1.31,0.173
600,CT,Northeast,New England,Middlesex County,9007,9007,Region 12,1/1/2015,4985.0,0.094,...,13.15,0.099,0.034,8617.0,0.078,85.0,75335,0.108,1.3,
601,CT,Northeast,New England,New Haven County,9009,9009,Region 12,1/1/2014,6030.0,0.124,...,13.21,0.124,0.029,9438.0,0.096,96.0,59217,0.365,3.95,0.208
602,CT,Northeast,New England,New Haven County,9009,9009,Region 12,1/1/2015,5761.0,0.124,...,13.83,0.137,0.042,9515.0,0.096,103.0,58672,0.372,4.6,


From the code above, it can be determined that the rows that contain information about Connecticut are `rows 593 through 608`.

## Connecticut Table
There are two different ways to formulate a table with the desired information. The first way is the same way the example tables were created above.

### Creation Option One
The first way is to utilize the strategy using two different sets of brackets. The first containing the desired rows. In this scenario, `rows 593-608 are desired, but the desired rows contained in the brackets must be increased by one` in order to see all the rows of Connecticut. The first brackets contain `593:609`. The second set of brackets is a double set that `contains the names of the columns in quotations, separated by commas`. Both sets of brackets are preceded by `df` to ensure it is read from the correct data set by Pandas.

In [107]:
df[593:609][["State", "County", "2011 population estimate", "Median household income", "Health care costs", "Uninsured"]]

Unnamed: 0,State,County,2011 population estimate,Median household income,Health care costs,Uninsured
593,CT,Fairfield County,933835,79536,9101.0,0.117
594,CT,Fairfield County,939904,81816,9132.0,0.121
595,CT,Hartford County,897259,63374,8831.0,0.1
596,CT,Hartford County,898272,63491,8864.0,0.099
597,CT,Litchfield County,187530,67746,8606.0,0.085
598,CT,Litchfield County,186924,70519,8535.0,0.092
599,CT,Middlesex County,165602,74588,8690.0,0.078
600,CT,Middlesex County,165562,75335,8617.0,0.083
601,CT,New Haven County,862813,59217,9438.0,0.099
602,CT,New Haven County,862287,58672,9515.0,0.112


### Creation Option Two
The other way to create the desired table is by using the `.loc` function. This function allows the exact numbers associated with the rows to be placed in a set of brackets along with the names of the columns. The positive of this technique is that the last number for the desired row does not have to be increased.

This strategy uses `.loc` combined with `df` in front of the outermost brackets. Following this, within the brackets, are the number cooresponding to the rows followed by a comma. This table is still made up of the Connecticut data, so `rows 593-608`. After the comma is another `set of brackets containing the name of each of the columns in quotations separated by commas`. Doing this will give the same table as above. 

In [104]:
df.loc[593:608,["State", "County", "2011 population estimate", "Median household income", "Health care costs", "Uninsured"]]

Unnamed: 0,State,County,2011 population estimate,Median household income,Health care costs,Uninsured
593,CT,Fairfield County,933835,79536,9101.0,0.117
594,CT,Fairfield County,939904,81816,9132.0,0.121
595,CT,Hartford County,897259,63374,8831.0,0.1
596,CT,Hartford County,898272,63491,8864.0,0.099
597,CT,Litchfield County,187530,67746,8606.0,0.085
598,CT,Litchfield County,186924,70519,8535.0,0.092
599,CT,Middlesex County,165602,74588,8690.0,0.078
600,CT,Middlesex County,165562,75335,8617.0,0.083
601,CT,New Haven County,862813,59217,9438.0,0.099
602,CT,New Haven County,862287,58672,9515.0,0.112


### Defining the Subset
A new term was defined by setting the `.loc` option from above equal to `CT_subset`. The other option can also be used and the same results would be achieved. This will allow for the table to be reproduced or manipulated later without having to type the entire string of code over again. 

Once the code was set equal to `CT_subset` and ran, `CT_subset` was ran by itself and the table was produced.

In [115]:
CT_subset=df.loc[593:608,["State", "County", "2011 population estimate", "Median household income", "Health care costs", "Uninsured"]]

In [116]:
CT_subset

Unnamed: 0,State,County,2011 population estimate,Median household income,Health care costs,Uninsured
593,CT,Fairfield County,933835,79536,9101.0,0.117
594,CT,Fairfield County,939904,81816,9132.0,0.121
595,CT,Hartford County,897259,63374,8831.0,0.1
596,CT,Hartford County,898272,63491,8864.0,0.099
597,CT,Litchfield County,187530,67746,8606.0,0.085
598,CT,Litchfield County,186924,70519,8535.0,0.092
599,CT,Middlesex County,165602,74588,8690.0,0.078
600,CT,Middlesex County,165562,75335,8617.0,0.083
601,CT,New Haven County,862813,59217,9438.0,0.099
602,CT,New Haven County,862287,58672,9515.0,0.112


### Exporting Connecticut Subset
The `CT_subset` table was then `exported as a .csv file using .to_csv in the code after CT_subset. After CT_subset.to_csv is a set of parentheses containing the desired name of the .csv file when exported and "index=False". Including "index=false" removes the numbers that Pandas included automatically when the .csv file was read. 

Once the code below was ran, a new file was created including just the information from the Connecticut chart that was created above.

In [118]:
CT_subset.to_csv("CT_subset.csv", index=False)