### Covid Policy by State 
This notebook will clean/organize data from Boston University School of Public Health that tracks individual state's COVID policies (including stay at home orders, business closures, and socioeconomic safety nets such as utility shut off freezes) as well as pre-existing state policies (including minimum wage). 

The resulting dataframes from this notebook will be used to determine if state's policies have an affect on its overdose rates, particularly COVID policies. Data limitation: overdose data is aggregated monthly data, so we won't be able to narrow down exact weeks after orders to see if there was an significant change in overdose rates immediately following policy implementations.

In [1]:
import pandas as pd
import numpy as np
import datetime as dt

### Initial data exploration
Steps
* Load in dataframe
* How many rows/columns are there?
* Define column labels with meta data 
* What does each row represent? 
* Is there missing data?

In [2]:
saho_df = pd.read_csv('../../data/data_raw/covid_policy_bystate.csv')
saho_df

Unnamed: 0,STATE,POSTCODE,FIPS,STEMERG,CLSCHOOL,CLDAYCR,OPNCLDCR,CLNURSHM,STAYHOME,STAYHOMENOGP,...,MINWAGE2020,ALTMINWAGE2020,TIPPEDMINWAGE2020,SMALLBIZMINWAGE2020,PLANMINWAGE2021,PLANMINWAGE2022,PLANMINWAGE2023,PLANMINWAGE2024,PLANMINWAGE2025,PLANMINWAGE2026
0,State,State Abbreviation,FIPS Code,State of emergency,Date closed K-12 public schools,Closed day cares,Reopen day cares,Date banned visitors to nursing homes,Stay at home/ shelter in place,Stay at home order' issued but did not specifi...,...,2020 Minimum Wage,2020 Alternative Minimum Wage,2020 Minimum Wage for Tipped Workers,Different Minimum Wage for Smaller Businesses,[Planned] 2021 Minimum Wage,[Planned] 2022 Minimum Wage,[Planned] 2023 Minimum Wage,[Planned] 2024 Minimum Wage,[Planned] 2025 Minimum Wage,[Planned] 2026 Minimum Wage
1,category,,,state_of_emergency,physical_distance_closure,physical_distance_closure,Reopening,physical_distance_closure,shelter,shelter,...,minimum_wage,minimum_wage,minimum_wage,minimum_wage,minimum_wage,minimum_wage,minimum_wage,minimum_wage,minimum_wage,minimum_wage
2,type,note,note,start,start,start,end,start,start,start,...,quantity,quantity,quantity,attribute,quantity,quantity,quantity,quantity,quantity,quantity
3,unit,text,attribute,date,date,date,date,date,date,date,...,dollars,dollars,dollars,flag,dollars,dollars,dollars,dollars,dollars,dollars
4,Alabama,AL,1,3/13/20,3/20/20,3/20/20,5/23/20,3/19/20,4/4/20,0,...,.,.,$2.13,0,.,.,.,.,.,.
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
999,,,,,,,,,,,...,,,,,,,,,,
1000,,,,,,,,,,,...,,,,,,,,,,
1001,,,,,,,,,,,...,,,,,,,,,,
1002,,,,,,,,,,,...,,,,,,,,,,


In [3]:
list(saho_df.columns)

['STATE',
 'POSTCODE',
 'FIPS',
 'STEMERG',
 'CLSCHOOL',
 'CLDAYCR',
 'OPNCLDCR',
 'CLNURSHM',
 'STAYHOME',
 'STAYHOMENOGP',
 'END_STHM',
 'CLBSNS',
 'END_BSNS',
 'RELIGEX',
 'FM_ALL',
 'FM_ALL2',
 'FMFINE',
 'FMCITE',
 'FMNOENF',
 'FM_EMP',
 'FM_END',
 'FM_STP',
 'ALCOPEN',
 'ALCREST',
 'ALCDELIV',
 'GUNOPEN',
 'CLREST',
 'ENDREST',
 'RSTOUTDR',
 'CLGYM',
 'ENDGYM',
 'CLMOVIE',
 'END_MOV',
 'CLOSEBAR',
 'END_BRS',
 'END_HAIR',
 'END_RELG',
 'ENDRETL',
 'BCLBAR2',
 'CLBAR2',
 'CLMV2',
 'CLHAIR2',
 'CLGYM2',
 'CLRST2',
 'ENDREST2',
 'END_BRS2',
 'CLBAR3',
 'CLMV3',
 'CLGYM3',
 'CLRST3',
 'QRSOMEST',
 'QR_ALLST',
 'QR_END',
 'EVICINTN',
 'EVICINRESUME',
 'EVICENF',
 'EVICSTPTWO',
 'EVICCOURTCL',
 'EVICCOURTRE',
 'EVICEND1',
 'EVICEND2',
 'EVICEND3',
 'RNTGP',
 'UTILSO',
 'UTILGAS_SO2',
 'UTILGAS_END',
 'UTILGAS_SO2.1',
 'UTILELEC_SO',
 'UTILELEC_END',
 'UTILELEC_SO2',
 'UTILWAT_SO',
 'UTILWAT_END',
 'UTILWAT_SO2',
 'UTILTEL_SO',
 'UTILTEL_END',
 'UTILTEL_SO2',
 'MORGFR',
 'SNAPALLO',
 'S

Column heading definitions:

In [4]:
define_columns = saho_df.loc[0]
list(define_columns)

['State',
 'State Abbreviation',
 'FIPS Code',
 'State of emergency',
 'Date closed K-12 public schools',
 'Closed day cares',
 'Reopen day cares',
 'Date banned visitors to nursing homes',
 'Stay at home/ shelter in place',
 "Stay at home order' issued but did not specifically restrict movement of the general public",
 'End/relax stay at home/shelter in place',
 'Closed other non-essential businesses',
 'Began to reopen businesses',
 'Religious Gatherings Exempt Without Clear Social Distance Mandate*',
 'Mandate face mask use by all individuals in public spaces',
 'Second mandate for facemasks by all individuals in public places',
 'Face mask mandate enforced by fines',
 'Face mask mandate enforced by criminal charge/citation',
 'No legal enforcement of face mask mandate',
 'Mandate face mask use by employees in public-facing businesses',
 'State ended statewide mask use by individuals in public spaces',
 'Attempt by state government to prevent local governments from implementing face

Subset by policies I think will be relevent:

In [5]:
cols_to_use = ['STATE','POSTCODE','STAYHOME','STAYHOMENOGP','END_STHM','CLBSNS','END_BSNS','CLSCHOOL','CLREST','ENDREST','RSTOUTDR','ENDREST2','CLRST2','CLRST3','CLGYM','ENDGYM','CLGYM2','CLGYM3','CLMOVIE','END_MOV','CLMV2', 'CLMV3','CLOSEBAR','END_BRS','CLBAR2','END_BRS2','CLBAR3','END_HAIR','CLHAIR2','END_RELG','ENDRETL','CLDAYCR','OPNCLDCR','CLNURSHM','ALCOPEN','ALCREST','ALCDELIV','QR_END','EVICINTN','EVICINRESUME','RNTGP','UTILSO','UTILGAS_END','UTILGAS_SO2','UTILELEC_SO','UTILELEC_END','UTILELEC_SO2','UTILWAT_SO','UTILWAT_END','UTILWAT_SO2','UTILTEL_SO','UTILTEL_END','UTILTEL_SO2','MORGFR','TLHLAUD','TLHLMED','TLHlBUPR','EXTOPFL','HMDLVOP','TLHLCL24','WVDEAREQ','MH19','MINWAGE2020']
policy_df = saho_df[cols_to_use]

Filter out non-states:

In [6]:
state_filter = policy_df['STATE'] == 'District of Columbia'
policy_df = policy_df[-state_filter]

There are 1000+ rows, but all rows after the states + DC are empty to I can get rid of those:

In [7]:
edit_df1 = policy_df.drop(index = range(55,1004))

In [8]:
edit_df1

Unnamed: 0,STATE,POSTCODE,STAYHOME,STAYHOMENOGP,END_STHM,CLBSNS,END_BSNS,CLSCHOOL,CLREST,ENDREST,...,MORGFR,TLHLAUD,TLHLMED,TLHlBUPR,EXTOPFL,HMDLVOP,TLHLCL24,WVDEAREQ,MH19,MINWAGE2020
0,State,State Abbreviation,Stay at home/ shelter in place,Stay at home order' issued but did not specifi...,End/relax stay at home/shelter in place,Closed other non-essential businesses,Began to reopen businesses,Date closed K-12 public schools,Closed restaurants except take out,Reopen restaurants,...,Froze mortgage payments,Allow audio-only telehealth,Allow/expand Medicaid telehealth coverage,Use of telemedicine/telephone evaluations to i...,Patients can receive 14-28 take-home doses of ...,Home delivery of take-home medication by opioi...,Use of telemedicine for schedule II-V prescrip...,Waive requirement to obtain separate DEA regis...,"Mental health professionals per 100,000 popula...",2020 Minimum Wage
1,category,,shelter,shelter,shelter,business_closure,business_closure,physical_distance_closure,physical_distance_closures,reopening,...,housing,healthcare_delivery,healthcare_delivery,SUD_policies,SUD_policies,SUD_policies,SUD_policies,SUD_policies,state_characteristics,minimum_wage
2,type,note,start,start,end,start,end,start,start,end,...,start,start,start,start,start,start,start,start,quantity,quantity
3,unit,text,date,date,date,date,date,date,date,date,...,date,date,date,date,date,date,date,date,"per 100,000",dollars
4,Alabama,AL,4/4/20,0,4/30/20,3/28/20,4/30/20,3/20/20,3/19/20,5/11/20,...,0,3/22/20,3/16/20,0,0,0,3/20/20,0,100.7,.
5,Alaska,AK,3/28/20,0,4/24/20,3/24/20,4/24/20,3/16/20,3/18/20,4/24/20,...,0,3/17/20,3/20/20,0,0,0,0,0,429.9,$10.19
6,Arizona,AZ,3/31/20,0,5/16/20,3/31/20,5/8/20,3/16/20,3/21/20,5/11/20,...,0,3/25/20,3/25/20,0,0,0,0,0,132.9,$12.00
7,Arkansas,AR,0,0,0,4/6/20,5/4/20,3/17/20,3/20/20,5/11/20,...,0,3/13/20,3/13/20,0,0,0,0,0,231.6,$10.00
8,California,CA,3/19/20,0,0,3/19/20,5/8/20,3/23/20,3/16/20,5/18/20,...,0,3/30/20,3/18/20,0,0,0,0,0,356.2,$13.00
9,Colorado,CO,3/26/20,0,4/27/20,3/19/20,5/1/20,3/23/20,3/17/20,5/27/20,...,0,4/1/20,3/20/20,0,0,0,0,0,356.4,$12.00


First few rows are data descripters, drop those:

In [9]:
edit_df2 = edit_df1.drop(index= range (0,4))

In [10]:
df_revised = edit_df2.set_index('STATE')
df_revised

Unnamed: 0_level_0,POSTCODE,STAYHOME,STAYHOMENOGP,END_STHM,CLBSNS,END_BSNS,CLSCHOOL,CLREST,ENDREST,RSTOUTDR,...,MORGFR,TLHLAUD,TLHLMED,TLHlBUPR,EXTOPFL,HMDLVOP,TLHLCL24,WVDEAREQ,MH19,MINWAGE2020
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama,AL,4/4/20,0,4/30/20,3/28/20,4/30/20,3/20/20,3/19/20,5/11/20,0,...,0,3/22/20,3/16/20,0,0,0,3/20/20,0,100.7,.
Alaska,AK,3/28/20,0,4/24/20,3/24/20,4/24/20,3/16/20,3/18/20,4/24/20,0,...,0,3/17/20,3/20/20,0,0,0,0,0,429.9,$10.19
Arizona,AZ,3/31/20,0,5/16/20,3/31/20,5/8/20,3/16/20,3/21/20,5/11/20,0,...,0,3/25/20,3/25/20,0,0,0,0,0,132.9,$12.00
Arkansas,AR,0,0,0,4/6/20,5/4/20,3/17/20,3/20/20,5/11/20,0,...,0,3/13/20,3/13/20,0,0,0,0,0,231.6,$10.00
California,CA,3/19/20,0,0,3/19/20,5/8/20,3/23/20,3/16/20,5/18/20,0,...,0,3/30/20,3/18/20,0,0,0,0,0,356.2,$13.00
Colorado,CO,3/26/20,0,4/27/20,3/19/20,5/1/20,3/23/20,3/17/20,5/27/20,0,...,0,4/1/20,3/20/20,0,0,0,0,0,356.4,$12.00
Connecticut,CT,0,3/23/20,5/20/20,3/23/20,5/20/20,3/17/20,3/16/20,5/20/20,1,...,0,3/19/20,3/18/20,0,0,0,0,0,396.9,$12.00
Delaware,DE,3/24/20,0,6/1/20,3/24/20,5/8/20,3/16/20,3/16/20,6/1/20,0,...,0,1/1/20,3/18/20,0,0,0,0,0,262.6,$9.75
Florida,FL,4/3/20,0,5/18/20,4/3/20,5/18/20,3/17/20,4/3/20,5/18/20,0,...,0,0,3/18/20,0,0,0,0,0,160.5,$8.56
Georgia,GA,4/3/20,0,5/1/20,4/3/20,5/1/20,3/18/20,4/3/20,4/27/20,0,...,0,3/18/20,3/18/20,0,0,0,0,0,137.3,$5.15


In [11]:
df_revised['POSTCODE'].nunique()


50

### Break this df into 3 separate dfs with related variables:
1. isolation factors - stay at home orders, business closures,etc; 
2. economic factors - freezing utility shut offs, eviction, mortgage freeze, min wage, etc; 
3. factors affecting healthcare services access - telehealth access, home delivery of meds, etc

In [12]:
df_revised.columns

Index(['POSTCODE', 'STAYHOME', 'STAYHOMENOGP', 'END_STHM', 'CLBSNS',
       'END_BSNS', 'CLSCHOOL', 'CLREST', 'ENDREST', 'RSTOUTDR', 'ENDREST2',
       'CLRST2', 'CLRST3', 'CLGYM', 'ENDGYM', 'CLGYM2', 'CLGYM3', 'CLMOVIE',
       'END_MOV', 'CLMV2', 'CLMV3', 'CLOSEBAR', 'END_BRS', 'CLBAR2',
       'END_BRS2', 'CLBAR3', 'END_HAIR', 'CLHAIR2', 'END_RELG', 'ENDRETL',
       'CLDAYCR', 'OPNCLDCR', 'CLNURSHM', 'ALCOPEN', 'ALCREST', 'ALCDELIV',
       'QR_END', 'EVICINTN', 'EVICINRESUME', 'RNTGP', 'UTILSO', 'UTILGAS_END',
       'UTILGAS_SO2', 'UTILELEC_SO', 'UTILELEC_END', 'UTILELEC_SO2',
       'UTILWAT_SO', 'UTILWAT_END', 'UTILWAT_SO2', 'UTILTEL_SO', 'UTILTEL_END',
       'UTILTEL_SO2', 'MORGFR', 'TLHLAUD', 'TLHLMED', 'TLHlBUPR', 'EXTOPFL',
       'HMDLVOP', 'TLHLCL24', 'WVDEAREQ', 'MH19', 'MINWAGE2020'],
      dtype='object')

###  Isolation factors 
To what degree was the state shut down? How long were businesses closed? 
I hypothesize that people in states with more extreme shut down policies will feel more isolated as a result of the pandemic, which could lead to an increase in drug use.

##### Steps:
1. Subset columns 
2. Reformat to datetime so I can use the date data 
3. Add column to represent business closures as an integer (sum of days closed)

Step 1: Subset

In [13]:
iso_cols = ['POSTCODE','STAYHOME','STAYHOMENOGP','END_STHM','CLBSNS','END_BSNS','CLSCHOOL','CLREST','ENDREST','RSTOUTDR', 'CLGYM','ENDGYM','CLMOVIE','END_MOV','CLOSEBAR','END_BRS','END_HAIR','END_RELG','ENDRETL','CLDAYCR','OPNCLDCR','CLNURSHM']
iso_df = df_revised[iso_cols]
iso_df

Unnamed: 0_level_0,POSTCODE,STAYHOME,STAYHOMENOGP,END_STHM,CLBSNS,END_BSNS,CLSCHOOL,CLREST,ENDREST,RSTOUTDR,...,CLMOVIE,END_MOV,CLOSEBAR,END_BRS,END_HAIR,END_RELG,ENDRETL,CLDAYCR,OPNCLDCR,CLNURSHM
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama,AL,4/4/20,0,4/30/20,3/28/20,4/30/20,3/20/20,3/19/20,5/11/20,0,...,3/28/20,5/22/20,3/19/20,5/11/20,5/11/20,5/11/20,4/30/20,3/20/20,5/23/20,3/19/20
Alaska,AK,3/28/20,0,4/24/20,3/24/20,4/24/20,3/16/20,3/18/20,4/24/20,0,...,3/18/20,5/8/20,3/18/20,5/8/20,4/24/20,5/8/20,4/24/20,0,0,0
Arizona,AZ,3/31/20,0,5/16/20,3/31/20,5/8/20,3/16/20,3/21/20,5/11/20,0,...,3/21/20,5/16/20,3/21/20,5/16/20,5/8/20,0,5/8/20,0,0,0
Arkansas,AR,0,0,0,4/6/20,5/4/20,3/17/20,3/20/20,5/11/20,0,...,3/20/20,5/18/20,3/20/20,5/19/20,5/6/20,0,5/18/20,0,0,3/13/20
California,CA,3/19/20,0,0,3/19/20,5/8/20,3/23/20,3/16/20,5/18/20,0,...,3/19/20,0,3/16/20,0,8/28/20,5/25/20,5/8/20,0,0,0
Colorado,CO,3/26/20,0,4/27/20,3/19/20,5/1/20,3/23/20,3/17/20,5/27/20,0,...,3/17/20,6/18/20,3/17/20,6/18/20,5/1/20,6/4/20,5/1/20,0,0,3/12/20
Connecticut,CT,0,3/23/20,5/20/20,3/23/20,5/20/20,3/17/20,3/16/20,5/20/20,1,...,3/16/20,6/17/20,3/16/20,0,6/1/20,5/29/20,5/20/20,0,0,3/9/20
Delaware,DE,3/24/20,0,6/1/20,3/24/20,5/8/20,3/16/20,3/16/20,6/1/20,0,...,3/19/20,6/1/20,3/16/20,6/15/20,6/8/20,5/20/20,5/20/20,4/6/20,6/15/20,0
Florida,FL,4/3/20,0,5/18/20,4/3/20,5/18/20,3/17/20,4/3/20,5/18/20,0,...,4/3/20,6/5/20,3/17/20,6/5/20,5/11/20,0,5/18/20,0,0,3/15/20
Georgia,GA,4/3/20,0,5/1/20,4/3/20,5/1/20,3/18/20,4/3/20,4/27/20,0,...,4/3/20,4/27/20,3/24/20,6/1/20,4/24/20,0,4/23/20,0,0,4/3/20


Step 2: Reformat to datetime

In [14]:
iso_df = iso_df.copy()
iso_df[['STAYHOME','STAYHOMENOGP','END_STHM','CLBSNS','END_BSNS','CLSCHOOL','CLREST','ENDREST','RSTOUTDR','CLGYM','ENDGYM','CLMOVIE','END_MOV','CLOSEBAR','END_BRS','END_HAIR','END_RELG','ENDRETL','CLDAYCR','OPNCLDCR','CLNURSHM']] = iso_df[['STAYHOME','STAYHOMENOGP','END_STHM','CLBSNS','END_BSNS','CLSCHOOL','CLREST','ENDREST','RSTOUTDR','CLGYM','ENDGYM','CLMOVIE','END_MOV','CLOSEBAR','END_BRS','END_HAIR','END_RELG','ENDRETL','CLDAYCR','OPNCLDCR','CLNURSHM']].apply(pd.to_datetime, format = '%m/%d/%y',errors = 'coerce')
iso_df

Unnamed: 0_level_0,POSTCODE,STAYHOME,STAYHOMENOGP,END_STHM,CLBSNS,END_BSNS,CLSCHOOL,CLREST,ENDREST,RSTOUTDR,...,CLMOVIE,END_MOV,CLOSEBAR,END_BRS,END_HAIR,END_RELG,ENDRETL,CLDAYCR,OPNCLDCR,CLNURSHM
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama,AL,2020-04-04,NaT,2020-04-30,2020-03-28,2020-04-30,2020-03-20,2020-03-19,2020-05-11,NaT,...,2020-03-28,2020-05-22,2020-03-19,2020-05-11,2020-05-11,2020-05-11,2020-04-30,2020-03-20,2020-05-23,2020-03-19
Alaska,AK,2020-03-28,NaT,2020-04-24,2020-03-24,2020-04-24,2020-03-16,2020-03-18,2020-04-24,NaT,...,2020-03-18,2020-05-08,2020-03-18,2020-05-08,2020-04-24,2020-05-08,2020-04-24,NaT,NaT,NaT
Arizona,AZ,2020-03-31,NaT,2020-05-16,2020-03-31,2020-05-08,2020-03-16,2020-03-21,2020-05-11,NaT,...,2020-03-21,2020-05-16,2020-03-21,2020-05-16,2020-05-08,NaT,2020-05-08,NaT,NaT,NaT
Arkansas,AR,NaT,NaT,NaT,2020-04-06,2020-05-04,2020-03-17,2020-03-20,2020-05-11,NaT,...,2020-03-20,2020-05-18,2020-03-20,2020-05-19,2020-05-06,NaT,2020-05-18,NaT,NaT,2020-03-13
California,CA,2020-03-19,NaT,NaT,2020-03-19,2020-05-08,2020-03-23,2020-03-16,2020-05-18,NaT,...,2020-03-19,NaT,2020-03-16,NaT,2020-08-28,2020-05-25,2020-05-08,NaT,NaT,NaT
Colorado,CO,2020-03-26,NaT,2020-04-27,2020-03-19,2020-05-01,2020-03-23,2020-03-17,2020-05-27,NaT,...,2020-03-17,2020-06-18,2020-03-17,2020-06-18,2020-05-01,2020-06-04,2020-05-01,NaT,NaT,2020-03-12
Connecticut,CT,NaT,2020-03-23,2020-05-20,2020-03-23,2020-05-20,2020-03-17,2020-03-16,2020-05-20,NaT,...,2020-03-16,2020-06-17,2020-03-16,NaT,2020-06-01,2020-05-29,2020-05-20,NaT,NaT,2020-03-09
Delaware,DE,2020-03-24,NaT,2020-06-01,2020-03-24,2020-05-08,2020-03-16,2020-03-16,2020-06-01,NaT,...,2020-03-19,2020-06-01,2020-03-16,2020-06-15,2020-06-08,2020-05-20,2020-05-20,2020-04-06,2020-06-15,NaT
Florida,FL,2020-04-03,NaT,2020-05-18,2020-04-03,2020-05-18,2020-03-17,2020-04-03,2020-05-18,NaT,...,2020-04-03,2020-06-05,2020-03-17,2020-06-05,2020-05-11,NaT,2020-05-18,NaT,NaT,2020-03-15
Georgia,GA,2020-04-03,NaT,2020-05-01,2020-04-03,2020-05-01,2020-03-18,2020-04-03,2020-04-27,NaT,...,2020-04-03,2020-04-27,2020-03-24,2020-06-01,2020-04-24,NaT,2020-04-23,NaT,NaT,2020-04-03


3. Sum the duration of business closures/stay at home orders

In [15]:
type(iso_df['STAYHOME'])

pandas.core.series.Series

In [16]:
iso_df.columns

Index(['POSTCODE', 'STAYHOME', 'STAYHOMENOGP', 'END_STHM', 'CLBSNS',
       'END_BSNS', 'CLSCHOOL', 'CLREST', 'ENDREST', 'RSTOUTDR', 'CLGYM',
       'ENDGYM', 'CLMOVIE', 'END_MOV', 'CLOSEBAR', 'END_BRS', 'END_HAIR',
       'END_RELG', 'ENDRETL', 'CLDAYCR', 'OPNCLDCR', 'CLNURSHM'],
      dtype='object')

In [17]:
sthm_duration = (iso_df['END_STHM'] - iso_df['STAYHOME']).dt.days
iso_df['sthm_duration'] = sthm_duration
clbsns_duration = (iso_df['END_BSNS'] - iso_df['CLBSNS']).dt.days
iso_df['clbsns_duration'] = clbsns_duration
clrest_duruation = (iso_df['ENDREST'] - iso_df['CLREST']).dt.days
iso_df['clrest_duruation'] = clrest_duruation
clgym_duration = (iso_df['ENDGYM'] - iso_df['CLGYM']).dt.days
iso_df['clgym_duration'] = clgym_duration
clmov_duration = (iso_df['END_MOV'] - iso_df['CLMOVIE']).dt.days
iso_df['clmov_duration'] = clmov_duration
clbar_duration = (iso_df['END_BRS'] - iso_df['CLOSEBAR']).dt.days
iso_df['clbar_duration'] = clbar_duration
clchildcare_duration = (iso_df['OPNCLDCR'] - iso_df['CLDAYCR']).dt.days
iso_df['clchildcare_duration'] = clchildcare_duration
iso_df

Unnamed: 0_level_0,POSTCODE,STAYHOME,STAYHOMENOGP,END_STHM,CLBSNS,END_BSNS,CLSCHOOL,CLREST,ENDREST,RSTOUTDR,...,CLDAYCR,OPNCLDCR,CLNURSHM,sthm_duration,clbsns_duration,clrest_duruation,clgym_duration,clmov_duration,clbar_duration,clchildcare_duration
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama,AL,2020-04-04,NaT,2020-04-30,2020-03-28,2020-04-30,2020-03-20,2020-03-19,2020-05-11,NaT,...,2020-03-20,2020-05-23,2020-03-19,26.0,33.0,53.0,44.0,55.0,53.0,64.0
Alaska,AK,2020-03-28,NaT,2020-04-24,2020-03-24,2020-04-24,2020-03-16,2020-03-18,2020-04-24,NaT,...,NaT,NaT,NaT,27.0,31.0,37.0,51.0,51.0,51.0,
Arizona,AZ,2020-03-31,NaT,2020-05-16,2020-03-31,2020-05-08,2020-03-16,2020-03-21,2020-05-11,NaT,...,NaT,NaT,NaT,46.0,38.0,51.0,53.0,56.0,56.0,
Arkansas,AR,NaT,NaT,NaT,2020-04-06,2020-05-04,2020-03-17,2020-03-20,2020-05-11,NaT,...,NaT,NaT,2020-03-13,,28.0,52.0,45.0,59.0,60.0,
California,CA,2020-03-19,NaT,NaT,2020-03-19,2020-05-08,2020-03-23,2020-03-16,2020-05-18,NaT,...,NaT,NaT,NaT,,50.0,63.0,,,,
Colorado,CO,2020-03-26,NaT,2020-04-27,2020-03-19,2020-05-01,2020-03-23,2020-03-17,2020-05-27,NaT,...,NaT,NaT,2020-03-12,32.0,43.0,71.0,79.0,93.0,93.0,
Connecticut,CT,NaT,2020-03-23,2020-05-20,2020-03-23,2020-05-20,2020-03-17,2020-03-16,2020-05-20,NaT,...,NaT,NaT,2020-03-09,,58.0,65.0,93.0,93.0,,
Delaware,DE,2020-03-24,NaT,2020-06-01,2020-03-24,2020-05-08,2020-03-16,2020-03-16,2020-06-01,NaT,...,2020-04-06,2020-06-15,NaT,69.0,45.0,77.0,74.0,74.0,91.0,70.0
Florida,FL,2020-04-03,NaT,2020-05-18,2020-04-03,2020-05-18,2020-03-17,2020-04-03,2020-05-18,NaT,...,NaT,NaT,2020-03-15,45.0,45.0,45.0,45.0,63.0,80.0,
Georgia,GA,2020-04-03,NaT,2020-05-01,2020-04-03,2020-05-01,2020-03-18,2020-04-03,2020-04-27,NaT,...,NaT,NaT,2020-04-03,28.0,28.0,24.0,21.0,24.0,69.0,


In [19]:
iso_df.to_csv('../../data/data_clean/isolation_factors.csv',index=True)

### Economic Factors 
What economic policies were passed? Were there financial protections put in place for people who were quarantined/unemployed? How long did these policies last?
I hypothesize that states that provided more financial protections for people will see less drastic increases in overdose rates as its been shown that financial stressors (such as job loss or eviction) can increase drug use or high-risk drug use (i.e. reusing needles, increased dosage) 

##### Steps:
1. Subset columns
2. Reformat to datetime


Step 1: Subset columns

In [49]:
df_revised.columns

Index(['POSTCODE', 'STAYHOME', 'STAYHOMENOGP', 'END_STHM', 'CLBSNS',
       'END_BSNS', 'CLSCHOOL', 'CLREST', 'ENDREST', 'RSTOUTDR', 'ENDREST2',
       'CLRST2', 'CLRST3', 'CLGYM', 'ENDGYM', 'CLGYM2', 'CLGYM3', 'CLMOVIE',
       'END_MOV', 'CLMV2', 'CLMV3', 'CLOSEBAR', 'END_BRS', 'CLBAR2',
       'END_BRS2', 'CLBAR3', 'END_HAIR', 'CLHAIR2', 'END_RELG', 'ENDRETL',
       'CLDAYCR', 'OPNCLDCR', 'CLNURSHM', 'ALCOPEN', 'ALCREST', 'ALCDELIV',
       'QR_END', 'EVICINTN', 'EVICINRESUME', 'RNTGP', 'UTILSO', 'UTILGAS_END',
       'UTILGAS_SO2', 'UTILELEC_SO', 'UTILELEC_END', 'UTILELEC_SO2',
       'UTILWAT_SO', 'UTILWAT_END', 'UTILWAT_SO2', 'UTILTEL_SO', 'UTILTEL_END',
       'UTILTEL_SO2', 'MORGFR', 'TLHLAUD', 'TLHLMED', 'TLHlBUPR', 'EXTOPFL',
       'HMDLVOP', 'TLHLCL24', 'WVDEAREQ', 'MH19', 'MINWAGE2020'],
      dtype='object')

In [50]:
eco_cols = ['POSTCODE','EVICINTN', 'EVICINRESUME','RNTGP', 'UTILGAS_SO2','UTILGAS_END', 'UTILELEC_SO', 'UTILELEC_END', 'UTILWAT_SO', 'UTILWAT_END','UTILTEL_SO','UTILTEL_END', 'MORGFR', 'MINWAGE2020']
eco_df = df_revised[eco_cols]

In [51]:
eco_df

Unnamed: 0_level_0,POSTCODE,EVICINTN,EVICINRESUME,RNTGP,UTILGAS_SO2,UTILGAS_END,UTILELEC_SO,UTILELEC_END,UTILWAT_SO,UTILWAT_END,UTILTEL_SO,UTILTEL_END,MORGFR,MINWAGE2020
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Alabama,AL,0,6/1/20,0,0,0,0,0,0,0,0,0,0,.
Alaska,AK,3/23/20,4/24/20,0,4/9/20,0,4/9/20,0,4/9/20,0,4/9/20,0,0,$10.19
Arizona,AZ,0,0,0,0,0,0,0,0,0,0,0,0,$12.00
Arkansas,AR,0,0,0,4/10/20,0,4/10/20,0,4/10/20,0,0,0,0,$10.00
California,CA,0,5/31/20,0,3/17/20,0,3/17/20,0,3/17/20,0,3/17/20,0,0,$13.00
Colorado,CO,4/30/20,6/13/20,0,3/20/20,6/13/20,3/20/20,6/13/20,3/20/20,6/13/20,0,0,0,$12.00
Connecticut,CT,4/10/20,0,4/10/20,3/12/20,0,3/12/20,0,3/12/20,0,0,0,0,$12.00
Delaware,DE,3/17/20,7/1/20,0,3/24/20,7/1/20,3/24/20,7/1/20,3/24/20,7/1/20,3/24/20,7/1/20,0,$9.75
Florida,FL,4/2/20,0,0,0,0,0,0,0,0,0,0,0,$8.56
Georgia,GA,0,0,0,0,0,0,0,0,0,0,0,0,$5.15


Step 2: Reformat to datetime

In [52]:
eco_df = eco_df.copy()
eco_df[['EVICINTN', 'EVICINRESUME', 'UTILGAS_SO2','UTILGAS_END', 'UTILELEC_SO', 'UTILELEC_END', 'UTILWAT_SO', 'UTILWAT_END','UTILTEL_SO','UTILTEL_END', 'MORGFR']] = eco_df[['EVICINTN', 'EVICINRESUME','UTILGAS_SO2', 'UTILGAS_END', 'UTILELEC_SO', 'UTILELEC_END', 'UTILWAT_SO', 'UTILWAT_END','UTILTEL_SO','UTILTEL_END', 'MORGFR']].apply(pd.to_datetime, format = '%m/%d/%y',errors = 'coerce')
eco_df

Unnamed: 0_level_0,POSTCODE,EVICINTN,EVICINRESUME,RNTGP,UTILGAS_SO2,UTILGAS_END,UTILELEC_SO,UTILELEC_END,UTILWAT_SO,UTILWAT_END,UTILTEL_SO,UTILTEL_END,MORGFR,MINWAGE2020
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Alabama,AL,NaT,2020-06-01,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,.
Alaska,AK,2020-03-23,2020-04-24,0,2020-04-09,NaT,2020-04-09,NaT,2020-04-09,NaT,2020-04-09,NaT,NaT,$10.19
Arizona,AZ,NaT,NaT,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,$12.00
Arkansas,AR,NaT,NaT,0,2020-04-10,NaT,2020-04-10,NaT,2020-04-10,NaT,NaT,NaT,NaT,$10.00
California,CA,NaT,2020-05-31,0,2020-03-17,NaT,2020-03-17,NaT,2020-03-17,NaT,2020-03-17,NaT,NaT,$13.00
Colorado,CO,2020-04-30,2020-06-13,0,2020-03-20,2020-06-13,2020-03-20,2020-06-13,2020-03-20,2020-06-13,NaT,NaT,NaT,$12.00
Connecticut,CT,2020-04-10,NaT,4/10/20,2020-03-12,NaT,2020-03-12,NaT,2020-03-12,NaT,NaT,NaT,NaT,$12.00
Delaware,DE,2020-03-17,2020-07-01,0,2020-03-24,2020-07-01,2020-03-24,2020-07-01,2020-03-24,2020-07-01,2020-03-24,2020-07-01,NaT,$9.75
Florida,FL,2020-04-02,NaT,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,$8.56
Georgia,GA,NaT,NaT,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,$5.15


Step 3: 
    Convert dates to integers to create scores

In [53]:
evic_dur = (eco_df['EVICINRESUME'] - eco_df['EVICINTN']).dt.days
eco_df['evic_dur'] = evic_dur
elec_dur = (eco_df['UTILGAS_END'] - eco_df['UTILGAS_SO2']).dt.days
eco_df['elec_dur'] = elec_dur
wat_dur = (eco_df['UTILWAT_END'] - eco_df['UTILWAT_SO']).dt.days
eco_df['wat_dur'] = wat_dur
tel_dur = (eco_df['UTILTEL_END'] - eco_df['UTILTEL_SO']).dt.days
eco_df['tel_dur'] = tel_dur
eco_df

Unnamed: 0_level_0,POSTCODE,EVICINTN,EVICINRESUME,RNTGP,UTILGAS_SO2,UTILGAS_END,UTILELEC_SO,UTILELEC_END,UTILWAT_SO,UTILWAT_END,UTILTEL_SO,UTILTEL_END,MORGFR,MINWAGE2020,evic_dur,elec_dur,wat_dur,tel_dur
STATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
Alabama,AL,NaT,2020-06-01,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,.,,,,
Alaska,AK,2020-03-23,2020-04-24,0,2020-04-09,NaT,2020-04-09,NaT,2020-04-09,NaT,2020-04-09,NaT,NaT,$10.19,32.0,,,
Arizona,AZ,NaT,NaT,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,$12.00,,,,
Arkansas,AR,NaT,NaT,0,2020-04-10,NaT,2020-04-10,NaT,2020-04-10,NaT,NaT,NaT,NaT,$10.00,,,,
California,CA,NaT,2020-05-31,0,2020-03-17,NaT,2020-03-17,NaT,2020-03-17,NaT,2020-03-17,NaT,NaT,$13.00,,,,
Colorado,CO,2020-04-30,2020-06-13,0,2020-03-20,2020-06-13,2020-03-20,2020-06-13,2020-03-20,2020-06-13,NaT,NaT,NaT,$12.00,44.0,85.0,85.0,
Connecticut,CT,2020-04-10,NaT,4/10/20,2020-03-12,NaT,2020-03-12,NaT,2020-03-12,NaT,NaT,NaT,NaT,$12.00,,,,
Delaware,DE,2020-03-17,2020-07-01,0,2020-03-24,2020-07-01,2020-03-24,2020-07-01,2020-03-24,2020-07-01,2020-03-24,2020-07-01,NaT,$9.75,106.0,99.0,99.0,99.0
Florida,FL,2020-04-02,NaT,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,$8.56,,,,
Georgia,GA,NaT,NaT,0,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,NaT,$5.15,,,,


In [58]:
eco_df['RNTGP'] = np.where(eco_df['RNTGP']!='0','1','0') 
eco_df['MORGFR'] = np.where(eco_df['MORGFR']!='0','1','0')

In [59]:
eco_df.replace({pd.NaT:'0'},inplace=True)
eco_df.to_csv('../../data/data_clean/economic_factors.csv')

### Healthcare factors
What policies were put in place to keep healthcare accessible to people experiencing addiction? Was telehealth made available? 
I hypothesize that states that put in more policies to provide telehealth counseling or home-delivery of medication assisted therapy (MAT) drugs will have lower overdose rates than those that did not. For this dataframe, I really just want a yes/no for policies 

##### Steps:
1. Subset columns 
2. Reformat dates to 0/1 no/yes format

In [None]:
df_revised.columns

In [None]:
hlcr_cols = ['POSTCODE','TLHLAUD','TLHLMED','TLHlBUPR','EXTOPFL','HMDLVOP','TLHLCL24','MH19']
hlcr_df = df_revised[hlcr_cols]

In [None]:
hlcr_df

In [None]:
hlcr_df['TLHLAUD'] = np.where(hlcr_df['TLHLAUD']!='0','1','0') 
hlcr_df['TLHLMED'] = np.where(hlcr_df['TLHLMED']!='0','1','0')
hlcr_df['TLHlBUPR'] = np.where(hlcr_df['TLHlBUPR']!='0','1','0')
hlcr_df['EXTOPFL'] = np.where(hlcr_df['EXTOPFL']!='0','1','0')
hlcr_df['EXTOPFL'] = np.where(hlcr_df['EXTOPFL']!='0','1','0')
hlcr_df['HMDLVOP'] = np.where(hlcr_df['HMDLVOP']!='0','1','0')
hlcr_df['TLHLCL24'] = np.where(hlcr_df['TLHLCL24']!='0','1','0')
hlcr_df

In [None]:
hlcr_df.to_csv('../data/data_clean/healthcare_factors.csv')

#### Dictionary for Column Headings 
I'm going to create a small df defining the column headings so that it can easily be uploaded into the analysis notebooks:

In [None]:
saho_df

In [None]:
cov_pol_dict = saho_df[0:1]
cov_pol_dict


In [None]:
dict_cols = ['STATE','POSTCODE','STAYHOME','STAYHOMENOGP','END_STHM','CLBSNS','END_BSNS','CLSCHOOL','CLREST','ENDREST','RSTOUTDR','ENDREST2','CLRST2','CLRST3','CLGYM','ENDGYM','CLGYM2','CLGYM3','CLMOVIE','END_MOV','CLMV2', 'CLMV3','CLOSEBAR','END_BRS','CLBAR2','END_BRS2','CLBAR3','END_HAIR','CLHAIR2','END_RELG','ENDRETL','CLDAYCR','OPNCLDCR','CLNURSHM','ALCOPEN','ALCREST','ALCDELIV','QR_END','EVICINTN','EVICINRESUME','RNTGP','UTILSO','UTILGAS_END','UTILGAS_SO2','UTILELEC_SO','UTILELEC_END','UTILELEC_SO2','UTILWAT_SO','UTILWAT_END','UTILWAT_SO2','UTILTEL_SO','UTILTEL_END','UTILTEL_SO2','MORGFR','TLHLAUD','TLHLMED','TLHlBUPR','EXTOPFL','HMDLVOP','TLHLCL24','WVDEAREQ','MH19','MINWAGE2020']
cov_pol_dict = cov_pol_dict[dict_cols]
cov_pol_dict

In [None]:
cov_pol_dict.transpose()

In [56]:
cov_pol_dict.to_csv('../../data/data_clean/covid_policy_dictionary.csv')

NameError: name 'cov_pol_dict' is not defined