## County Health Data Subset
### Overview

This is a five step process going from the initial environment imports to the exportation of a finished product.

We will:
1. import pandas and numpy
2. create dataframe object
3. create our subset
4. make a copy of our subset
5. Export the subset to .csv


## Step 1
Import the pandas and numpy environments and name them using the following code: `import pandas as pd`, and `import numpy as np`. Providing the abbreviated titles makes inputting instructions faster.

In [49]:
import pandas as pd
import numpy as np

## Step 2
Now we want pandas to create our dataframe object by reading in our data file, allowing us to pull from it.

First, locate the master dataset called "CountyHealthData_2014-2015.csv" in the **data** folder on our github landing page, download and store it in the same folder you are working out of with your JupyterLab notebook.

we will then use the following code: `df=pd.read_csv("CountyHealthData_2014-2015.csv")`. 

By doing this, we are defining the input "df" as we want pandas to read our .csv file.

In [50]:
df=pd.read_csv("CountyHealthData_2014-2015.csv")

## Step 3

In order to pull the data allowing the comparison of excessive drinking to median household income in North Carolina in 2014, we will use the following code: `df.loc[3244:3442,["State","county","Excessive drinking","Median household income"]][df["Year"]=="1/1/2014"]`.

* `.loc` allows us to select a range of cells by indicating the numerical value of the row, and as many columns as we like by name.
* `[df["Year"]=="1/1/2014"]` allows us to seperate the 2014 data from the 2015 data.

In [54]:
df.loc[3244:3442,["State","County","Excessive drinking","Median household income"]][df["Year"]=="1/1/2014"]

  df.loc[3244:3442,["State","County","Excessive drinking","Median household income"]][df["Year"]=="1/1/2014"]


Unnamed: 0,State,County,Excessive drinking,Median household income
3245,NC,Alexander County,0.119,39655
3247,NC,Alleghany County,0.165,34046
3249,NC,Anson County,,32339
3251,NC,Ashe County,0.086,34080
3253,NC,Avery County,,34727
...,...,...,...,...
3433,NC,Wayne County,0.099,38776
3435,NC,Wilkes County,0.086,35362
3437,NC,Wilson County,,37440
3439,NC,Yadkin County,,40012


## Step 4

Now we want to make this subset of data into its own table for export to .csv and ease of future recall. To do this, we will input the following code: `ED_to_MHI_subset=df.loc[3244:3442,["State","County","Excessive drinking","Median household income"]][df["Year"]=="1/1/2014"].copy()`.
* `.copy()` allows us to use the data subset again at a later time, without altering the original dataset.

In [55]:
ED_to_MHI_subset=df.loc[3244:3442,["State","County","Excessive drinking","Median household income"]][df["Year"]=="1/1/2014"].copy()

  ED_to_MHI_subset=df.loc[3244:3442,["State","County","Excessive drinking","Median household income"]][df["Year"]=="1/1/2014"].copy()


## Step 5

Lastly, we want to export the table as a .csv file. To do this we can use the method `.to_csv()`, adding the filename and extension within the parentheses at the end.

By default, this .csv will include the row of indices that pandas created when we read the original file into our notebook using `.read_csv.`

To eliminate these, we can add `index=false` to our statement, which tells it not bring in those index numbers.

The code will look like this: `ED_to_MHI_subset.to_csv("ED_to_MHI_subset.csv",index=False)`

In [53]:
ED_to_MHI_subset.to_csv("ExcessiveDrinking-MedianHHIncome.csv",index=False)

## Remarks

Now that we have exported the data subset, we can use it for whatever means necessary. This could be as a stand alone table, or as data inputs to create graphics (see README.md file).

I hope these instructions were helpful, best of luck!