***
# ETL Project: Extract, Transform, Load
***
## Step 1: Extract
> Data sources are from csv files, after deep exploration of the data we need for this project we will start retrieval process by reading in the dataset. 
> Dataset we are using in this part is to potentially answer the following question:
> * Using California as a model, is there a relationship between enrollments in medically-assisted facilities and rates of overdose deaths?
> * Data source: https://data.world/chhs/8329a339-ab77-4d05-ab7a-405d0ae5765c

### Importing Dependencies

In [2]:
# Import Dependencies:
import pandas as pd
import os

In [3]:
# Creating csv data file path:
medication_assisted = os.path.join("Resources", "mat_annually.csv")

### Store CSV into DataFrame

In [5]:
# Reading in data file to store into Pandas DataFrame:
medication_assisted_treatment = pd.read_csv("Resources/mat_annually.csv")
medication_assisted_treatment.head()

Unnamed: 0,County,Year,Medication_Assisted_Treatment,Beneficiaries,Status,Annotation,Annotation_Description
0,Statewide,2010,Buprenorphine,1265.0,F,,
1,Statewide,2011,Buprenorphine,1680.0,F,,
2,Statewide,2012,Buprenorphine,2099.0,F,,
3,Statewide,2013,Buprenorphine,2129.0,F,,
4,Statewide,2014,Buprenorphine,5000.0,F,,


***
## Step 2: Transform
***
> Transforming the dataset to suit the needs of our project, this will including:
> 1. Cleaning Data
> 2. Removing NaNs
> 3. Selecting needed columns
> 4. Re-naming columns

### Create new data with select columns

In [5]:
# Filtering dataset by selecting columns needed to answer potential query:
# Extracting only needed columns:
mat_subset = medication_assisted_treatment[["County", "Year", "Beneficiaries"]]
mat_subset.head()

Unnamed: 0,County,Year,Beneficiaries
0,Statewide,2010,1265.0
1,Statewide,2011,1680.0
2,Statewide,2012,2099.0
3,Statewide,2013,2129.0
4,Statewide,2014,5000.0


In [15]:
# Cleaning dataset and dropping any bad records:
assisted_treatment = mat_subset.dropna(how='any')
assisted_treatment.head()

Unnamed: 0,County,Year,Beneficiaries
0,Statewide,2010,1265.0
1,Statewide,2011,1680.0
2,Statewide,2012,2099.0
3,Statewide,2013,2129.0
4,Statewide,2014,5000.0


In [16]:
# Saving needed subset into csv file:
assisted_treatment.to_csv('Resources/assisted_treatment.csv')