<a id="TOC"></a>
## Table of Contents
1. [Loading Libraries](#Loading-Libraries)
2. [Pulling Data within ADRF](#Pull-Data)
3. [Dealing with Dates](#Dates)
4. [Unique Values in Column](#Column-Values)

<a id="Loading-Libraries"></a>
# Loading Libraries

- Back to [Table of Contents](#TOC)

In [None]:
import numpy as np # math-related functions
import pandas as pd # pulls in data, dataframe manipulations
import datetime # standardizes date formats
import psycopg2 # needed for SQL database connection/pull
import sqlalchemy # needed for SQL database connection/pull
from __future__ import print_function, division # Python 3.x behavior

<a id="Pull-Data"></a>
# Pulling Data from ADRF Databases

- Back to [Table of Contents](#TOC)

### Setting up the database connection

In [None]:
# connection using pysocpg2, a lower level connection package
db_name = "appliedda"
db_host = "stuffed"
pgsql_connection = psycopg2.connect( host = db_host, database = db_name )

In [None]:
cur = pgsql_connection.cursor()

In [None]:
# connection using SQLAlchemy, which uses psycopg2 under the hood for postgresql connections
pgsql_engine = sqlalchemy.create_engine( "postgresql://10.10.2.10/appliedda" )

# alternatively we could create the DB connection from our variables above:
# pgsql_engine = sqlalchemy.create_engine( "postgresql://{}/{}".format(db_host, db_name) )

### Pulling data 
This code helps you pull data from a specific table
- Back to [Table of Contents](#TOC)

In [None]:
# This sets up the parameters of a query 
# This example query pulls 1000 records from the ildoc_admit table, only for 2005 admissions
# adjust to your dataset/data subset 

query = '''SELECT * FROM {schema}.{table} 
WHERE curadmyr = 2015 LIMIT 1000;'''.format(schema='il_doc_kcmo', table="ildoc_admit")

In [None]:
# this code block takes the query parameters and pulls the data
# be sure to rename df_ildoc_admit to your table name (if needed)

df_ildoc_admit = pd.read_sql( query, con = pgsql_engine )
# note that we are using pd (pandas) to pull the data into a dataframe

### Viewing the Data

In [None]:
# after you make the database request, it is useful to verify that the right data came back
# .head() displays the top 5 rows of data
# you can specify a different number of rows by adding a number in the parentheses

df_ildoc_admit.head()
# df_ildoc_admit.tail()
# tail shows the last few lines of the dataframe

In [None]:
# provides descriptive statistics on the table
df_ildoc_admit.describe()

<a id="Dates"></a>
# Dealing with Dates
Combine separate month-day-year values into new column containing dates
- Back to [Table of Contents](#TOC)

In [None]:
#this example combines birthdate info - replace with other date-related column headings
#note that it requires the original date columns in year - month - day order
df_ildoc_admit['birthdate'] = df_ildoc_admit[['birthyr', 'birthmo', 'birthda']].apply(lambda s : datetime.datetime(*s), axis = 1)

In [None]:
# look at the top 5 rows of the new column
# note that it tells you that they are of datatype datetime64
df_ildoc_admit['birthdate'].head()

<a id = "Column-Values"></a>
# List of values in a column
- Back to [Table of Contents](#TOC)

In [None]:
# this code displays all the unique values in the column named 'birthyr'
# (sort = True) sorts by most common value
# (sort = False) sorts by the value instead of frequency

print(df_ildoc_admit['birthyr'].value_counts(sort = False))