## Disparities in overdose rates during COVID 

* How has COVID affected fatal and nonfatal drug overdoses?
* Are there geographic disparities in changing overdose rates?
* Do Good Samaritan laws have an affect on overdose fatalities?/ Correlation with COVID states  

In [1]:
import pandas as pd 

In [2]:
covid_df = pd.read_csv('../data/us_covid19_data.csv', skiprows = 3)

In [3]:
overdose_df = pd.read_csv('../data/us_overdose_data.csv')

In [4]:
gs_laws_df = pd.read_csv('../data/good_sam_data.csv')

### Exploring COVID Data 

In [5]:
covid_df.shape
covid_nrows = covid_df.shape[0]
covid_ncols = covid_df.shape[1]
print(f'There are {covid_nrows} rows and {covid_ncols} columns in the COVID data frame')

There are 61 rows and 13 columns in the COVID data frame


In [6]:
covid_df.columns

Index(['State/Territory', 'Total Cases', 'Confirmed Cases', 'Probable Cases',
       'Cases in Last 7 Days', 'Case Rate per 100000', 'Total Deaths',
       'Confirmed Deaths', 'Probable Deaths', 'Deaths in Last 7 Days',
       'Death Rate per 100000', 'Case Rate per 100000 in Last 7 Days',
       'Death Rate per 100K in Last 7 Days'],
      dtype='object')

In [7]:
covid_df['State/Territory'].unique()

array(['Alaska', 'Alabama', 'Arkansas', 'American Samoa', 'Arizona',
       'California', 'Colorado', 'Connecticut', 'District of Columbia',
       'Delaware', 'Florida', 'Federated States of Micronesia', 'Georgia',
       'Guam', 'Hawaii', 'Iowa', 'Idaho', 'Illinois', 'Indiana', 'Kansas',
       'Kentucky', 'Louisiana', 'Massachusetts', 'Maryland', 'Maine',
       'Michigan', 'Minnesota', 'Missouri', 'Northern Mariana Islands',
       'Mississippi', 'Montana', 'North Carolina', 'North Dakota',
       'Nebraska', 'New Hampshire', 'New Jersey', 'New Mexico', 'Nevada',
       'New York', 'New York City', 'Ohio', 'Oklahoma', 'Oregon',
       'Pennsylvania', 'Puerto Rico', 'Palau', 'Rhode Island',
       'Republic of Marshall Islands', 'South Carolina', 'South Dakota',
       'Tennessee', 'Texas', 'Utah', 'Virginia', 'Virgin Islands',
       'Vermont', 'Washington', 'Wisconsin', 'West Virginia', 'Wyoming',
       'United States of America'], dtype=object)

##### In order to match other data, need to remove territories and just have states - 
##### For the data subset, add a dictionary with state abbreviations and regions. Regional data will be easier to compare on a higher level
 Check the range of states within each territory - are the rates of the states in each region similar? 

In [8]:
covid_df.max()

State/Territory                            Wyoming
Total Cases                                7958254
Confirmed Cases                        3.22464e+06
Probable Cases                              209037
Cases in Last 7 Days                        375244
Case Rate per 100000                          4015
Total Deaths                                216917
Confirmed Deaths                            115988
Probable Deaths                              11140
Deaths in Last 7 Days                         4836
Death Rate per 100000                          284
Case Rate per 100000 in Last 7 Days           84.2
Death Rate per 100K in Last 7 Days             1.3
dtype: object

In [9]:
covid_df.min()

State/Territory                        Alabama
Total Cases                                  0
Confirmed Cases                              0
Probable Cases                               0
Cases in Last 7 Days                         0
Case Rate per 100000                         0
Total Deaths                                 0
Confirmed Deaths                             2
Probable Deaths                              0
Deaths in Last 7 Days                        0
Death Rate per 100000                        0
Case Rate per 100000 in Last 7 Days          0
Death Rate per 100K in Last 7 Days           0
dtype: object

#### Check out Alabama - missing data? Not enough data? 

In [10]:
covid_df.sample(10)

Unnamed: 0,State/Territory,Total Cases,Confirmed Cases,Probable Cases,Cases in Last 7 Days,Case Rate per 100000,Total Deaths,Confirmed Deaths,Probable Deaths,Deaths in Last 7 Days,Death Rate per 100000,Case Rate per 100000 in Last 7 Days,Death Rate per 100K in Last 7 Days
32,North Dakota,30517,,,4477,4015,388,,,67,51,84.2,1.3
5,California,858401,,,23601,2170,16757,,,396,42,8.5,0.1
57,Wisconsin,171122,162325.0,8797.0,21655,2944,1565,1553.0,12.0,130,26,53.2,0.3
53,Virginia,164124,154126.0,9998.0,7475,1927,3408,3161.0,247.0,64,40,12.5,0.1
4,Arizona,228748,223692.0,5056.0,5347,3190,5789,5502.0,287.0,46,80,10.7,0.1
16,Idaho,50610,45223.0,5387.0,4184,2885,517,476.0,41.0,14,29,34.1,0.1
49,South Dakota,31012,,,4571,3515,304,299.0,5.0,30,34,74.0,0.5
38,New York,227126,,,5718,2038,9127,,,35,81,7.3,0.0
2,Arkansas,96524,,,6379,3203,1645,,,138,54,30.2,0.7
50,Tennessee,222827,212116.0,10711.0,13380,3291,2864,2731.0,133.0,159,42,28.2,0.3


### Not all of these columns are necessary; I'm really only interested in the total cases and rate per 100,000

In [18]:
cols_to_use = ['State/Territory','Total Cases','Death Rate per 100000']
covid_df2 = covid_df[cols_to_use]
print(covid_df2)

             State/Territory  Total Cases  Death Rate per 100000
0                     Alaska        10323                      8
1                    Alabama       169162                     56
2                   Arkansas        96524                     54
3             American Samoa            0                      0
4                    Arizona       228748                     80
..                       ...          ...                    ...
56                Washington        96185                     29
57                 Wisconsin       171122                     26
58             West Virginia        19082                     21
59                   Wyoming         8375                      9
60  United States of America      7958254                     66

[61 rows x 3 columns]


In [21]:
mean = covid_df2.groupby('State/Territory')['Death Rate per 100000'].mean()
covid_mean = mean.sort_values(ascending = False)
print(covid_mean)

State/Territory
New York City                     284
New Jersey                        181
Massachusetts                     139
Connecticut                       127
Louisiana                         122
                                 ... 
Northern Mariana Islands            3
Republic of Marshall Islands        0
Palau                               0
Federated States of Micronesia      0
American Samoa                      0
Name: Death Rate per 100000, Length: 61, dtype: int64


#### For cleaning data/finding average, filter out states with rates of 0 

In [23]:
top_10 = covid_mean.head(10)
bottom_10 = covid_mean.tail(10)
print(top_10, bottom_10)

State/Territory
New York City           284
New Jersey              181
Massachusetts           139
Connecticut             127
Louisiana               122
Rhode Island            108
Mississippi             105
District of Columbia     90
New York                 81
Arizona                  80
Name: Death Rate per 100000, dtype: int64 State/Territory
Hawaii                            12
Maine                             10
Vermont                            9
Wyoming                            9
Alaska                             8
Northern Mariana Islands           3
Republic of Marshall Islands       0
Palau                              0
Federated States of Micronesia     0
American Samoa                     0
Name: Death Rate per 100000, dtype: int64


### Maybe try to find monthly data for 2020? 

## Exploring Overdose data 

In [11]:
overdose_df.shape
overdose_nrows = overdose_df.shape[0]
overdose_ncols = overdose_df.shape[1]
print(f'There are {overdose_nrows} rows and {overdose_ncols} columns in the Overdose data frame')

There are 34398 rows and 12 columns in the Overdose data frame


In [12]:
overdose_df.columns

Index(['State', 'Year', 'Month', 'Period', 'Indicator', 'Data Value',
       'Percent Complete', 'Percent Pending Investigation', 'State Name',
       'Footnote', 'Footnote Symbol', 'Predicted Value'],
      dtype='object')

In [13]:
overdose_df.sample(10)

Unnamed: 0,State,Year,Month,Period,Indicator,Data Value,Percent Complete,Percent Pending Investigation,State Name,Footnote,Footnote Symbol,Predicted Value
27185,TX,2017,January,12 month-ending,Methadone (T40.3),,100,0.167569,Texas,Numbers may differ from published reports usin...,**,
14964,MO,2016,June,12 month-ending,"Synthetic opioids, excl. methadone (T40.4)",,100,0.024536,Missouri,Numbers may differ from published reports usin...,**,
27129,TX,2017,April,12 month-ending,Heroin (T40.1),,100,0.148397,Texas,Numbers may differ from published reports usin...,**,
28911,UT,2018,October,12 month-ending,"Natural & semi-synthetic opioids, incl. methad...",327.0,100,0.14206,Utah,Numbers may differ from published reports usin...,**,332.0
25975,SD,2019,June,12 month-ending,"Natural & semi-synthetic opioids, incl. methad...",18.0,100,0.0,South Dakota,Underreported due to incomplete data.,*,19.0
32585,WV,2018,August,12 month-ending,Methadone (T40.3),31.0,100,0.412273,West Virginia,Numbers may differ from published reports usin...,**,33.0
12060,LA,2019,February,12 month-ending,Percent with drugs specified,52.4221453287197,100,0.0022,Louisiana,Underreported due to incomplete data.,*,
22336,OK,2016,June,12 month-ending,"Opioids (T40.0-T40.4,T40.6)",401.0,100,0.021142,Oklahoma,Numbers may differ from published reports usin...,**,401.0
28403,UT,2015,January,12 month-ending,Methadone (T40.3),48.0,100,0.340981,Utah,Numbers may differ from published reports usin...,**,50.0
2921,CO,2016,October,12 month-ending,Cocaine (T40.5),,100,0.026283,Colorado,Numbers may differ from published reports usin...,**,


#### Rows represent #of overdoses in that month based on drugs? Includes data from 2015 - 2020, able to get a baseline and post-Covid 


In [None]:
type(overdose_df['Year'])

In [14]:
overdose_df['Indicator'].unique()

array(['Percent with drugs specified',
       'Natural, semi-synthetic, & synthetic opioids, incl. methadone (T40.2-T40.4)',
       'Natural & semi-synthetic opioids (T40.2)',
       'Psychostimulants with abuse potential (T43.6)',
       'Number of Drug Overdose Deaths', 'Heroin (T40.1)',
       'Number of Deaths', 'Opioids (T40.0-T40.4,T40.6)',
       'Methadone (T40.3)', 'Synthetic opioids, excl. methadone (T40.4)',
       'Cocaine (T40.5)',
       'Natural & semi-synthetic opioids, incl. methadone (T40.2, T40.3)'],
      dtype=object)

In [28]:
fatal_filter = overdose_df['Indicator'] == "Number of Drug Overdose Deaths"
overdose_df2 = overdose_df[fatal_filter]
print(overdose_df2)

      State  Year      Month           Period                       Indicator  \
4        AK  2015      April  12 month-ending  Number of Drug Overdose Deaths   
19       AK  2015     August  12 month-ending  Number of Drug Overdose Deaths   
28       AK  2015   December  12 month-ending  Number of Drug Overdose Deaths   
38       AK  2015   February  12 month-ending  Number of Drug Overdose Deaths   
56       AK  2015    January  12 month-ending  Number of Drug Overdose Deaths   
...     ...   ...        ...              ...                             ...   
34345    YC  2019    October  12 month-ending  Number of Drug Overdose Deaths   
34352    YC  2019  September  12 month-ending  Number of Drug Overdose Deaths   
34364    YC  2020   February  12 month-ending  Number of Drug Overdose Deaths   
34375    YC  2020    January  12 month-ending  Number of Drug Overdose Deaths   
34388    YC  2020      March  12 month-ending  Number of Drug Overdose Deaths   

      Data Value Percent Co

#### Too much info with all the different drug info, pick out 2 or 3 drug types + total deaths to analyze 

#### Need to subset data by year, filter out cities and territories

In [29]:
filter_2019 = overdose_df['Year'] == '2019'
2019_overdose_df = overdose_df[filter_2019]
filter_2020 = overdose_df['Year'] == '2020'
2020_overdose_df = overdose_df[filter_2020]

SyntaxError: invalid token (<ipython-input-29-6bab26e9348e>, line 2)

^^^ Not sure what the issue is here 

* __NOTE__ you can't start a pointer name with a number so try:

```
_2019_overdose_df
```
or
```
overdose_2019_df
```

## Exploring Good Samaritan Data 
##### This data will not have a huge role in the analysis, I just want to use it to compare the legal differences in the states with the (1) highest change in overdose rates (positive or negative); (2) states with the highest number of COVID deaths; (3) states with the lowest number of overdose deaths; (4) states with the lowest number of COVID deaths 

In [30]:
gs_laws_df.sample(10)

Unnamed: 0,Jurisdictions,Effective Date,Valid Through Date,goodsam-law,goodsam-cs_Arrest,goodsam-cs_Charge,goodsam-cs_Prosecution,goodsam-cs_Law provides an affirmative defense,goodsam-cs_Law provides other procedural protections,goodsam-cs_None,...,goodsam-paroleyn,goodsam-parole_Protection from arrest,goodsam-parole_Protection from charge,goodsam-parole_Protection from prosecution,goodsam-parole_Protection from revocation of probation and/or parole,goodsam-parole_General protection from sanctions for violation of probation and/or parole,goodsam-mitigation,goodsam-mit-type_Controlled substances offenses,goodsam-mit-type_Alcohol-related offenses,goodsam-mit-type_Other offenses beyond controlled substances and alcohol-related violations
35,Florida,2012-10-01,2018-07-01,1,0,1,1,0,0,0,...,1,.,.,.,.,.,1,1,1,1
66,Maryland,2016-02-20,2016-03-13,1,1,1,1,0,0,0,...,0,0,0,0,0,1,1,1,1,1
107,North Carolina,2015-12-01,2018-07-01,1,0,0,1,0,0,0,...,0,1,0,0,1,0,0,.,.,.
115,Oklahoma,2007-01-01,2018-07-01,0,.,.,.,.,.,.,...,.,.,.,.,.,.,.,.,.,.
68,Maryland,2016-07-01,2016-09-30,1,1,1,1,0,0,0,...,0,0,0,0,0,1,1,1,1,1
0,Alabama,2007-01-01,2015-06-09,0,.,.,.,.,.,.,...,.,.,.,.,.,.,.,.,.,.
130,Tennessee,2007-01-01,2015-06-30,0,.,.,.,.,.,.,...,.,.,.,.,.,.,.,.,.,.
136,Utah,2014-05-13,2015-05-11,1,0,0,0,1,0,0,...,1,.,.,.,.,.,1,1,0,0
155,West Virginia,2015-06-12,2018-05-08,1,0,0,1,0,0,0,...,0,0,0,0,0,1,1,1,1,1
47,Illinois,2016-07-28,2016-08-21,1,0,1,1,0,0,0,...,1,.,.,.,.,.,1,1,0,0


In [32]:
gs_laws_df.columns

Index(['Jurisdictions', 'Effective Date', 'Valid Through Date', 'goodsam-law',
       'goodsam-cs_Arrest', 'goodsam-cs_Charge', 'goodsam-cs_Prosecution',
       'goodsam-cs_Law provides an affirmative defense',
       'goodsam-cs_Law provides other procedural protections',
       'goodsam-cs_None', 'goodsam-paraphernalia_Arrest',
       'goodsam-paraphernalia_Charge', 'goodsam-paraphernalia_Prosecution',
       'goodsam-paraphernalia_Law provides an affirmative defense',
       'goodsam-paraphernalia_Law provides other procedural protections',
       'goodsam-paraphernalia_None', 'goodsam-paroleyn',
       'goodsam-parole_Protection from arrest',
       'goodsam-parole_Protection from charge',
       'goodsam-parole_Protection from prosecution',
       'goodsam-parole_Protection from revocation of probation and/or parole',
       'goodsam-parole_General protection from sanctions for violation of probation and/or parole',
       'goodsam-mitigation', 'goodsam-mit-type_Controlled substan

### Need to define column headings to more clear names 
### Find a way of numerically rating/comparing states with a good samaritan "score"

# FB

* The good samaritan data seems like category data i.e. whether there is a law or not and then some other more specific features related to that law. So it would be thinking about which are most relevant and have the appropriate coverage to include them as categorical variables in your main datasets



* The covid data seems like it is going to be fixed on the date that you download it, i.e. past 7 days while your OD data has month and year. So I think you'll want to get a covid dataset that has at least state monthly data or aggregate from the daily data ones up to the month level.
