In this notebook, you'll see how to connect to a Postgres database using the sqlalchemy library.

For this notebook, you'll need both the `sqlalchemy` and `psycopg2` libraries installed.

For much more information about SQLAlchemy and to see a more “Pythonic” way to execute queries, see Introduction to Databases in Python: https://www.datacamp.com/courses/introduction-to-relational-databases-in-python

In [1]:
#!pip install psycopg2-binary

In [2]:
#!pip install psycopg2

In [7]:
from sqlalchemy import create_engine, text



First, we need to create a connection string. The format is

 ```<dialect(+driver)>://<username>:<password>@<hostname>:<port>/<database>```

To connect to the Lahman baseball database, you can use the following connection string.

In [19]:
database_name = 'Prescribers'    # Fill this in with your prescribers database name

connection_string = f"postgresql://postgres:postgres@localhost:5432/{database_name}"

Now, we need to create an engine and use it to connect.

In [22]:
engine = create_engine(connection_string)

sqlalchemy works well with pandas to convert query results into dataframes.

In [25]:
import pandas as pd

First, let's write a meaningful query.

In [27]:
prescriber_query = 'SELECT * FROM prescriber'
prescription_query = 'SELECT * FROM prescription'
drug_query = 'SELECT * FROM drug'
zip_fips_query = 'SELECT * FROM zip_fips'
population_query = 'SELECT * FROM population'
cbsa_query = 'SELECT * FROM cbsa'
fips_county_query = 'SELECT * FROM fips_county'
overdose_deaths_query = 'SELECT * FROM overdose_deaths'

Now, bring it all together using the following syntax.

In [29]:
with engine.connect() as connection:
    prescriber_df = pd.read_sql(text(prescriber_query), con = connection)
    prescription_df = pd.read_sql(text(prescription_query),con = connection)
    drug_df = pd.read_sql(text(drug_query ),con = connection)
    zipfips_df = pd.read_sql(text(zip_fips_query),con = connection)
    population_df = pd.read_sql(text(population_query),con = connection)
    cbsa_df = pd.read_sql(text(cbsa_query),con = connection)
    fipscounty_df = pd.read_sql(text(fips_county_query),con = connection)
    overdose_df = pd.read_sql(text(overdose_deaths_query),con = connection)

### Q_4 Is there an association between rates of opioid prescriptions and overdose deaths by county?


In [86]:
#filter drug table for opioid drugs
drug_opioid = drug_df[(drug_df.opioid_drug_flag == 'Y') | (drug_df.long_acting_opioid_drug_flag == 'Y')]
drug_opioid.info()


<class 'pandas.core.frame.DataFrame'>
Index: 91 entries, 10 to 3381
Data columns (total 6 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   drug_name                     91 non-null     object
 1   generic_name                  91 non-null     object
 2   opioid_drug_flag              91 non-null     object
 3   long_acting_opioid_drug_flag  91 non-null     object
 4   antibiotic_drug_flag          91 non-null     object
 5   antipsychotic_drug_flag       91 non-null     object
dtypes: object(6)
memory usage: 5.0+ KB


In [106]:
#filter prescription table by using opioid drugs
prescription_opioid = prescription_df[prescription_df['drug_name'].isin(drug_opioid['drug_name'])]

prescription_opioid.info()

<class 'pandas.core.frame.DataFrame'>
Index: 31932 entries, 48 to 656049
Data columns (total 14 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   npi                            31932 non-null  float64
 1   drug_name                      31932 non-null  object 
 2   bene_count                     20538 non-null  float64
 3   total_claim_count              31932 non-null  float64
 4   total_30_day_fill_count        31932 non-null  float64
 5   total_day_supply               31932 non-null  float64
 6   total_drug_cost                31932 non-null  float64
 7   bene_count_ge65                8048 non-null   float64
 8   bene_count_ge65_suppress_flag  23884 non-null  object 
 9   total_claim_count_ge65         17268 non-null  float64
 10  ge65_suppress_flag             14664 non-null  object 
 11  total_30_day_fill_count_ge65   17268 non-null  float64
 12  total_day_supply_ge65          17268 non-null  fl

In [134]:
prescription_prescription = pd.merge(
    left = prescription_opioid,
    right = prescriber_df,
    how = 'left',
    on = "npi" 
  )

### Q_5  Is there any association between a particular type of opioid and number of overdose deaths?

In [108]:
prescriber_df.head()

Unnamed: 0,npi,nppes_provider_last_org_name,nppes_provider_first_name,nppes_provider_mi,nppes_credentials,nppes_provider_gender,nppes_entity_code,nppes_provider_street1,nppes_provider_street2,nppes_provider_city,nppes_provider_zip5,nppes_provider_zip4,nppes_provider_state,nppes_provider_country,specialty_description,description_flag,medicare_prvdr_enroll_status
0,1003000000.0,BLAKEMORE,ROSIE,K,FNP,F,I,TENNESSEE PRISON FOR WOMEN,3881 STEWARTS LANE,NASHVILLE,37243,1,TN,US,Nurse Practitioner,S,N
1,1003012000.0,CUDZILO,COREY,,M.D.,M,I,2240 SUTHERLAND AVE,SUITE 103,KNOXVILLE,37919,2333,TN,US,Pulmonary Disease,S,E
2,1003013000.0,GRABENSTEIN,WILLIAM,P,M.D.,M,I,1822 MEMORIAL DR,,CLARKSVILLE,37043,4605,TN,US,Family Practice,S,E
3,1003014000.0,OTTO,ROBERT,J,M.D.,M,I,2400 PATTERSON STREET SUITE 100,,NASHVILLE,37203,2786,TN,US,Orthopedic Surgery,S,E
4,1003018000.0,TODD,JOSHUA,W,M.D.,M,I,1819 W CLINCH AVE,SUITE 108,KNOXVILLE,37916,2435,TN,US,Cardiology,S,E


In [112]:
zipfips_df.tail()

Unnamed: 0,zip,fipscounty,res_ratio,bus_ratio,oth_ratio,tot_ratio
54176,99925,2198,0.0,0.0,1.0,1.0
54177,99926,2198,0.0,0.0,1.0,1.0
54178,99927,2198,0.0,0.0,1.0,1.0
54179,99928,2130,0.0,0.0,1.0,1.0
54180,99929,2275,0.0,0.0,1.0,1.0
