## **Overview**


*   During this process, you will use python to create and merge three subsets of county health data.
*   The finished product should be a data subset that displays physical inactivity within differnet states.
* This data should specifically be be for the west region.


In [2]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


## **Getting Started**


1.   Create a folder in the file exporter on your device for your project.

* This folder will help with organizing the data in a certain location, so make sure you name the folder something easy to remember.

2.   Download the CountyHealth data from [this link](https://drive.google.com/file/d/134lz04JTLVIbwfsBmuJOEWcRZFOPvUUo/view) and drag the file into the folder you created.

3. Import the Pandas and Numpys packages as represented below:

* Make sure to include **as pd** following the **import** statement in order to make it easier to call functions later with Pandas. The same thing applies to the Numpy package.


In [3]:
import numpy as np
import pandas as pd

4. Read the file with Pandas and display the data to make sure it is working properly.
* The file can be read using the **.read_csv(**) function with the file name in parenthesis
* You can assign the data any name you would like by changing the name on the left side of the **=** .

In [4]:
df=pd.read_csv('gdrive/My Drive/Unit 3 Project Data/CountyHealthData_2014-2015.csv')

In [5]:
dataFrame = pd.read_csv('gdrive/My Drive/Unit 3 Project Data/CountyHealthData_2014-2015.csv')

In [6]:
dataFrame

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
0,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/14,,0.122,...,,0.374,0.250,3791.0,0.185,216.0,69192,0.127,,0.287
1,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/15,,0.122,...,,0.314,0.176,4837.0,0.185,254.0,74088,0.133,,
2,AK,West,Pacific,Anchorage Borough,2020,2020,Region 22,1/1/14,6827.0,0.125,...,15.37,0.218,0.096,6588.0,0.119,135.0,71094,0.319,6.29,0.160
3,AK,West,Pacific,Anchorage Borough,2020,2020,Region 22,1/1/15,6856.0,0.125,...,17.08,0.227,0.123,6582.0,0.119,148.0,76362,0.334,5.60,
4,AK,West,Pacific,Bethel Census Area,2050,2050,Insuff Data,1/1/14,13345.0,0.211,...,,0.394,0.124,5860.0,0.200,169.0,41722,0.668,12.77,0.477
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6104,WY,West,Mountain,Uinta County,56041,56041,Insuff Data,1/1/15,7436.0,0.135,...,18.66,0.192,0.090,7600.0,0.123,47.0,60953,0.273,,
6105,WY,West,Mountain,Washakie County,56043,56043,Insuff Data,1/1/14,6580.0,0.106,...,,0.225,0.086,8202.0,0.099,47.0,49533,0.328,,0.133
6106,WY,West,Mountain,Washakie County,56043,56043,Insuff Data,1/1/15,7572.0,0.106,...,,0.226,0.101,7940.0,0.099,47.0,50740,0.309,,
6107,WY,West,Mountain,Weston County,56045,56045,Insuff Data,1/1/14,5633.0,0.162,...,,0.201,0.084,6906.0,0.130,28.0,53665,0.232,,0.171


***Good job!*** You should be able to see your data set.

If you are having issues, double check the file name that you put in the **.read_csv()** function to make sure it matches correctly the file name in the folder you created.


## **Creating the First Subset**

Now that you have all the data, we can begin by isolating the values in the "Year" colum that onyl refer to 2014:
1. Create a filtering command that isoolates everey instance of **["Region"] == "West"**
* The inner statement should contain **dataFrame["Region"] == "West"** which should return each month.
* The outer statement should be a general reference of the dataFrame: **dataFrame[]**.
2. Assign an easy-to-remember name to this subset.

In [7]:
one_subset = dataFrame[dataFrame["Region"] == "West"]

* Make sure to use **.copy()** to avoid a **SettingwithCopyWarning** Later.
3. Display the subset to make sure it is working as intended.

In [8]:
one_subset

Unnamed: 0,State,Region,Division,County,FIPS,GEOID,SMS Region,Year,Premature death,Poor or fair health,...,Drug poisoning deaths,Uninsured adults,Uninsured children,Health care costs,Could not see doctor due to cost,Other primary care providers,Median household income,Children eligible for free lunch,Homicide rate,Inadequate social support
0,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/14,,0.122,...,,0.374,0.250,3791.0,0.185,216.0,69192,0.127,,0.287
1,AK,West,Pacific,Aleutians West Census Area,2016,2016,Insuff Data,1/1/15,,0.122,...,,0.314,0.176,4837.0,0.185,254.0,74088,0.133,,
2,AK,West,Pacific,Anchorage Borough,2020,2020,Region 22,1/1/14,6827.0,0.125,...,15.37,0.218,0.096,6588.0,0.119,135.0,71094,0.319,6.29,0.160
3,AK,West,Pacific,Anchorage Borough,2020,2020,Region 22,1/1/15,6856.0,0.125,...,17.08,0.227,0.123,6582.0,0.119,148.0,76362,0.334,5.60,
4,AK,West,Pacific,Bethel Census Area,2050,2050,Insuff Data,1/1/14,13345.0,0.211,...,,0.394,0.124,5860.0,0.200,169.0,41722,0.668,12.77,0.477
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6104,WY,West,Mountain,Uinta County,56041,56041,Insuff Data,1/1/15,7436.0,0.135,...,18.66,0.192,0.090,7600.0,0.123,47.0,60953,0.273,,
6105,WY,West,Mountain,Washakie County,56043,56043,Insuff Data,1/1/14,6580.0,0.106,...,,0.225,0.086,8202.0,0.099,47.0,49533,0.328,,0.133
6106,WY,West,Mountain,Washakie County,56043,56043,Insuff Data,1/1/15,7572.0,0.106,...,,0.226,0.101,7940.0,0.099,47.0,50740,0.309,,
6107,WY,West,Mountain,Weston County,56045,56045,Insuff Data,1/1/14,5633.0,0.162,...,,0.201,0.084,6906.0,0.130,28.0,53665,0.232,,0.171


We can break this subset down further to onyl show data for physical inactivity.
1. Apply the row values provided by the subset you jsut displayed above to the original **dataFrame** by using the **.loc** function.
* The **.loc** function is used to display a range of rows for a specific set of columns of our choosing.
2. Integrate the **"Region"** and **"State"** columns to show a clear set of time and the **"Physical Inactivity"** column to show data from only the amount of physical activity.

In [9]:
one_subset= dataFrame.loc[:,["Region","State","Physical inactivity"]]

* Make sure to use **.copy()** to aviod a **SettingWithCopyWarning** later.
3. Display this subset to make sure it is working as intended.

In [10]:
one_subset

Unnamed: 0,Region,State,Physical inactivity
0,West,AK,0.234
1,West,AK,0.220
2,West,AK,0.205
3,West,AK,0.180
4,West,AK,0.283
...,...,...,...
6104,West,WY,0.244
6105,West,WY,0.246
6106,West,WY,0.240
6107,West,WY,0.285


## Exporting Your New Subset
* To do this, you will use the method .to_csv -adding the filename and extension within the parentheses at the end.
* By default, this .csv will include the column of indices that pandas created when we read the original file into our notebook using .read_csv
* To eliminate these, we can add **index=False** to our statement, which tells us it not to bring in those index numbers.

In [11]:
one_subset.to_csv("one_subset.csv", index=False)