##Instructions for Compiling Data Sets
### This notebook will provide instructions on how to compile a data set for obesity and health statistics in North Carolina.
### To begin, find and download the ```CountyHealthData_2014-2015.csv``` file from the data repository in the data folder.
**Uploading A .csv File**
* After the ```CountyHealthData_2014-2015.csv``` file is downloaded, locate the folder icon to the left of the screen and click on it.
* Next, click on the folder with the arrow at the top of the list and find the content folder.
* Finally, click the three dots next to the content folder and upload the .csv file.

After the data has been uploaded, click the add code button at the top of the screen to begin coding.

**Importing Pandas**
* To begin compiling your new data set, you must import the pandas package into your notebook.
* This is done by typing ```import pandas as pd```. Then you must hit **Shift**, **Enter** on your keyboard to run the code.

**You must do this for every line of code you run.**


In [None]:
import pandas as pd

##Creating and Analyzing Data Sets
####Because we have pandas imported, we can begin to analyze the County Health Data Set. To begin, we will use the code ```pd.read_csv``` to be able to code with the data.
* This code will need to be assigned a name so that while we code, we can reference the data set we want to analyze. For this project, I am using "df" to be my reference name for the data. You must specify the data set you are referencing in parenthesis and quotes in that order, after you type into your read .csv code.
####```df=pd.read_csv("CountyHealthData_2014-2015.csv")```

**Make sure that your capitalization and spelling is correct because it does matter when you are coding.**


In [None]:
df=pd.read_csv("CountyHealthData_2014-2015.csv")

##Creating a Subset with the .loc Function
####Now we need to determine which data points we need. We will use the .loc feature, which allows us to pull specific rows and columns, to compile specific data into one chart.

1. First we will start off by determining which columns we want to analyze in our new data. For this data, we will be using the "State", "Adult obesity", and "Poor or fair health" columns in our data.
2. We will pull out this data using the reference name for our data set, followed by the .loc function.
3. Then place parenthesis after the .loc function.
4. We will leave a colon where our row categories will be, followed by a comma.
5. Then add another set of brackets and specify our columns, each column name enclosed in quotations.
6. Finally, close all of your brackets and run your code.

**Note: The column titles must be enclosed in a bracket to pull each column.**

In [None]:
df.loc[:,["State","Adult obesity","Poor or fair health"]]

Unnamed: 0,State,Adult obesity,Poor or fair health
0,AK,0.300,0.122
1,AK,0.329,0.122
2,AK,0.257,0.125
3,AK,0.268,0.125
4,AK,0.315,0.211
...,...,...,...
6104,WY,0.293,0.135
6105,WY,0.241,0.106
6106,WY,0.242,0.106
6107,WY,0.313,0.162


##Compiling the Filtered Data Set
####Now that we have our new data for only North Carolina, we can rename our data set.
####**Set the new data set name equal to the code used to make the data set above.**

In [None]:
Compiled_Data = df.loc[:,["State","Adult obesity","Poor or fair health"]]

In [None]:
Compiled_Data

Unnamed: 0,State,Adult obesity,Poor or fair health
0,AK,0.300,0.122
1,AK,0.329,0.122
2,AK,0.257,0.125
3,AK,0.268,0.125
4,AK,0.315,0.211
...,...,...,...
6104,WY,0.293,0.135
6105,WY,0.241,0.106
6106,WY,0.242,0.106
6107,WY,0.313,0.162


##Filtering the Data
####Now, to determine which rows contain North Carolina data, we can create a code which will pull only rows with "NC".

1. First you will specify the reference name of the code.
2. Then add brackets and type the reference name again to be able to pull from the data.
3. Add another set of brackets and specify the column that contains the item you are looking up. In this case, we are looking up NC which is a state, so we will type in State in quotations.
4. Then you will add two equal signs and specify what state you are referring to, in our case "NC".
5. Make sure you close your brackets at the end of your code.

In [None]:
Compiled_Data[df["State"] == "NC"]

Unnamed: 0,State,Adult obesity,Poor or fair health
3243,NC,0.341,0.192
3244,NC,0.332,0.192
3245,NC,0.272,0.178
3246,NC,0.283,0.178
3247,NC,0.247,0.234
...,...,...,...
3438,NC,0.373,0.159
3439,NC,0.297,0.207
3440,NC,0.301,0.207
3441,NC,0.287,0.193


####Now that we have the rows that contain NC, we can plug those into the rows section of our code in our .loc function.

* The rows that contain "NC" are 3243:3442

In [None]:
Compiled_Data.loc[3243:3442,["State","Adult obesity","Poor or fair health"]]

Unnamed: 0,State,Adult obesity,Poor or fair health
3243,NC,0.341,0.192
3244,NC,0.332,0.192
3245,NC,0.272,0.178
3246,NC,0.283,0.178
3247,NC,0.247,0.234
...,...,...,...
3438,NC,0.373,0.159
3439,NC,0.297,0.207
3440,NC,0.301,0.207
3441,NC,0.287,0.193


##Now rename the data file to show the finalized subset.

In [None]:
Compiled_Data_Final = Compiled_Data.loc[3243:3442,["State","Adult obesity","Poor or fair health"]]

In [None]:
Compiled_Data_Final

Unnamed: 0,State,Adult obesity,Poor or fair health
3243,NC,0.341,0.192
3244,NC,0.332,0.192
3245,NC,0.272,0.178
3246,NC,0.283,0.178
3247,NC,0.247,0.234
...,...,...,...
3438,NC,0.373,0.159
3439,NC,0.297,0.207
3440,NC,0.301,0.207
3441,NC,0.287,0.193


##Exporting the Subset as a New .csv File
####Now we can export our data as a .csv file.

1. Type the name of the newly create data subset.
2. Then type ```.to_csv```
3. Then add a parenthesis and quotation mark and type the name of the new .csv file inside.
4. Close the quotations.
5. Add a comma, then a space, then type ```index=False``` and then close parenthesis.
6. Completed, the code should look like: ```CompiledData.to_csv("CompiledData.csv", index=False)```
7. Finally, run the code.

####This will export your data. Check to make sure that the .csv file has been created by looking at your folder on the left side of your screen where your data set was uploaded. Then click the three dots on the right and download your new .csv file.

In [None]:
Compiled_Data_Final.to_csv("CompiledDataFinal.csv",index=False)