### Naloxone Distribution 2019 
This notebook will clean/organize Medicaid Claims data to estimate the consumption/distribution of Naloxone in 2019, which I will then compare to consumption rates in 2020 to see if there has been any significant change. Note: because this data set is so big, I'm going to have a separate notebook for each year. 

Naloxone (the generic of Narcan) is an opioid analgesic. If administered during an overdose, Naloxone blocks opioid receptors and in most cases, prevents a fatal overdose. Many harm reduction centers (ex. Prevention Point Philadelphia) distribute Naloxone to community members; many states have laws that allow anyone to get Naloxone from a pharmacy without a prescription; many prescribers include Naloxone prescriptions as a part of MAT. Data shows that greater Naloxone availability = lower fatality rates

## Update 
This data seemed like a good lead that would be interesting to look at, however after going through the dataset, I can now see that the data is not consistent enough to be helpful for my final project - it does not have nationally representative data. 

In [19]:
import pandas as pd
import numpy as np

In [20]:
nal_2019 = pd.read_csv('../data/data_raw/State_Drug_Utilization_Data_2019.csv')

### Initial data exploration
Steps
1. Load in dataframe
2. Define column labels 
3. Subset to find just Narcan/Naloxone
    * This can be done with either product name, NDC, or Labeler Code 
    * Need to be careful of syntax, dosage, etc which might make it look like different drugs or result in an incomplete list

In [21]:
nal_2019.sample(10)

Unnamed: 0,Utilization Type,State,Labeler Code,Product Code,Package Size,Year,Quarter,Product Name,Suppression Used,Units Reimbursed,Number of Prescriptions,Total Amount Reimbursed,Medicaid Amount Reimbursed,Non Medicaid Amount Reimbursed,Quarter Begin,Quarter Begin Date,Latitude,Longitude,Location,NDC
3052430,FFSU,AL,10370,829,11,2019,4,Diltiazem,True,,,,,,10/1,10/01/2019,32.799,-86.8073,POINT (-86.8073 32.799),10370082911
4268843,FFSU,NY,65162,783,9,2019,4,MEMANTINE,True,,,,,,10/1,10/01/2019,34.8375,-106.2371,POINT (-106.2371 34.8375),65162078309
1245049,MCOU,IN,60432,528,4,2019,2,SELENIUM S,True,,,,,,4/1,04/01/2019,40.3363,-89.0022,POINT (-89.0022 40.3363),60432052804
2097332,FFSU,IN,66993,936,58,2019,3,Metronidaz,True,,,,,,7/1,07/01/2019,40.3363,-89.0022,POINT (-89.0022 40.3363),66993093658
1646980,MCOU,WV,409,1160,1,2019,3,BUPIVACAIN,True,,,,,,7/1,07/01/2019,18.0001,-64.8199,POINT (-64.8199 18.0001),409116001
3054600,FFSU,XX,93,7314,98,2019,2,CLOPIDOGRE,False,2581.0,84.0,985.75,985.69,0.06,4/1,04/01/2019,,,,93731498
2894762,MCOU,XX,70000,108,2,2019,2,LDR ENEMA,False,8911.0,17.0,63.56,63.56,0.0,4/1,04/01/2019,,,,70000010802
1412449,MCOU,MD,24090,495,84,2019,2,TIROSINT 1,True,,,,,,4/1,04/01/2019,42.2373,-71.5314,POINT (-71.5314 42.2373),24090049584
4855665,FFSU,GA,43547,353,10,2019,2,LISINOPRIL,False,1470.0,47.0,504.05,504.05,0.0,4/1,04/01/2019,27.8333,-81.717,POINT (-81.717 27.8333),43547035310
4101431,MCOU,VA,70677,3,1,2019,3,ALLERGY RE,True,,,,,,7/1,07/01/2019,35.7449,-86.7489,POINT (-86.7489 35.7449),70677000301


In [22]:
nal_2019['State'].nunique()

52

In [23]:
nal_2019['State'].unique()

array(['AZ', 'XX', 'CA', 'FL', 'OH', 'ME', 'LA', 'MO', 'GA', 'CO', 'NJ',
       'TN', 'IA', 'IL', 'NM', 'VA', 'OR', 'NV', 'MT', 'MN', 'MA', 'MD',
       'MS', 'SC', 'UT', 'ND', 'PA', 'NY', 'WA', 'DE', 'MI', 'VT', 'AR',
       'RI', 'WY', 'KS', 'KY', 'AK', 'WV', 'ID', 'AL', 'DC', 'IN', 'HI',
       'CT', 'TX', 'WI', 'NE', 'SD', 'NC', 'NH', 'OK'], dtype=object)

- Column Name	: Column Description, as listed by (Medicaid metadata)[https://www.medicaid.gov/medicaid/prescription-drugs/state-drug-utilization-data/state-drug-utilization-data-faq/index.html?search_api_fulltext=91956] 

- Utilization Type : Constant “FFSU” or “MCOU”. The FFSU Record ID indicates that the information for this National Drug Code (NDC) represents a Fee-For-Service (FFS) Utilization record. The MCOU Record ID indicates that the information for this NDC represents a Managed Care Organization (MCO) Utilization record. Valid Values: 4Q2009 and earlier = Constant record ID of FFSU.1Q2010 and beyond = FFSU & MCOU. NOTE: Per the Affordable Care Act, MCO utilization data cannot be reported for periods prior to 1Q2010.

- State	: Two-character post office abbreviation for State. Note: For any data where NDCs are aggregated (e.g. National Totals) the state code is “XX” to represent multiple states.

- Labeller Code :	First segment of NDC that identifies the manufacturer, labeler, relabeler, packager, repackager or distributor of the drug.
- Product Code	: Second segment of NDC.
- Package Size Code	: Third segment of NDC.
- Year : year
- Quarter : Valid values are:
1.  January 1 – March 31

2.  April 1 – June 30

3.  July 1 – September 30

4. October 1 – December 31
- Product Name :	First 10 characters of product name as approved by the Food and Drug Administration (FDA). (formerly “Product FDA List Name”)
- Suppression Used: The state drug utilization data includes state, drug name, NDC, number of prescriptions, and dollars reimbursed. As CMS is obligated by the Federal Privacy Act, 5 U.S.C. Section 552a and the HIPAA Privacy Rule, 45 C.F.R Parts 160 and 164, to protect the privacy of individual beneficiaries and other persons, all direct identifiers have been removed and data that are less than eleven (11) counts are suppressed. A checkmark in the "Suppression Used" column notes suppressed data. CMS applies counter or secondary suppression in cases where only one prescription is suppressed for primary reasons (e.g., one prescription in a state). Also, if one sub-group (e.g., number of prescriptions) is suppressed, then the other sub-groups are suppressed.
- Units Reimbursed :	FFS Units - The number of units (based on Unit Type) of the drug 11-digit NDC reimbursed by the state during the quarter/year covered. MCO Units - The number of units (based on Unit Type) of the 11-digit NDC dispensed during the quarter/year covered.
- Number of Prescriptions :	The number of prescriptions should include any prescription for which Medicaid paid a portion of the claim, as well as those prescriptions for which Medicaid paid the claim in full. FFS - The number of prescriptions reimbursed by the state Medicaid agency as outpatient drug claims during the quarter/year covered. MCO - The number of prescriptions dispensed as outpatient drug claims during the quarter/year covered.
- Total Amount Reimbursed: The FFS or MCO total amount reimbursed by both Medicaid and non-Medicaid entities to pharmacies or other providers for the 11-digit NDC drug in the period covered (two previous fields added together). This total is not reduced or affected by Medicaid rebates paid to the state. This amount represents both federal and state reimbursement and is inclusive of dispensing fees. Note: As capitated payment arrangements are sometimes utilized by states and MCOs, a zero value in this field could be appropriate for MCO data; however, FFS utilization records will reject if this field is reported with a value of zero.
- Medicaid Amount Reimbursed : The amount reimbursed by the Medicaid Program ONLY to pharmacies or other providers for the 11-digit NDC FFS or MCO drug in the quarter/year covered. This total is not reduced or affected by Medicaid rebates paid to the state. This amount represents both federal and state reimbursement and is inclusive of dispensing fees. Note: As capitated payment arrangements are sometimes utilized by states and MCOs, a zero value in this field could be appropriate for MCO data; however, FFS utilization records will reject if this field is reported with a value of zero.
- Non-Medicaid Amount Reimbursed : The amount reimbursed by non-Medicaid entities to pharmacies or other providers for the 11-digit NDC FFS or MCO drug in the quarter/year covered. The Non-Medicaid Amount Reimbursed includes any drug reimbursement amount for which the state is not eligible for federal matching funds.
- Quarter Begin :	Beginning date for quarter. Derived field provides ability to create comparisons over time. Can be used as a label for timelines.
- Qarter Begin Date :	Beginning date for quarter. Derived field provides ability to create comparisons over time. Also can be used to create timeline visualizations
- Latitude: Location within state. Derived from state code provides ability to create maps and geographic comparisons.
- Longitude:	Location within state. Derived from state code provides ability to create maps and geographic comparisons.
- Location:	Location within state. Derived from state code provides ability to create maps and geographic comparisons.
- NDC :The National Drug Code (NDC) is a numerical code maintained by the FDA that includes the labeler code, product code, and package code. The NDC is an 11-digit code.


* Since all of the states are combined in the National Totals, the state abbreviation will show on the "National Totals" and "Annual State Detail" option as "XX".

1. Use list to identify how it is listed - listed as both Narcan and Naloxone

In [24]:
fun_list = [str(s) for s in list(nal_2019['Product Name'].unique()) if str(s)[0].lower() == 'n']
fun_list.sort()
fun_list

['NABI-HB',
 'NABI-HB VI',
 'NABUMETONE',
 'NACL 0.9%',
 'NADOLOL',
 'NADOLOL  2',
 'NADOLOL  4',
 'NADOLOL  8',
 'NADOLOL 20',
 'NADOLOL 40',
 'NADOLOL 80',
 'NADOLOL TA',
 'NADOLOL/BE',
 'NAFCILLIN',
 'NAFTIFINE',
 'NAFTIN',
 'NAFTIN 1%',
 'NAFTIN 2%',
 'NAFTIN GEL',
 'NAGLAZYME',
 'NALBUPHINE',
 'NALBUPHN 1',
 'NALFON',
 'NALFON 400',
 'NALOCET',
 'NALOXONE .',
 'NALOXONE 0',
 'NALOXONE 2',
 'NALOXONE 4',
 'NALOXONE H',
 'NALOXONE P',
 'NALOXONE S',
 'NALTREXONE',
 'NAMENDA',
 'NAMENDA 10',
 'NAMENDA 5',
 'NAMENDA 5-',
 'NAMENDA 5M',
 'NAMENDA TI',
 'NAMENDA XR',
 'NAMZARIC',
 'NAMZARIC 1',
 'NAMZARIC 2',
 'NAMZARIC 7',
 'NAMZARIC T',
 'NAPHCON A',
 'NAPHCON-A',
 'NAPRELAN',
 'NAPRELAN C',
 'NAPRELAN T',
 'NAPROSYN',
 'NAPROXEN',
 'NAPROXEN 1',
 'NAPROXEN 2',
 'NAPROXEN 3',
 'NAPROXEN 5',
 'NAPROXEN D',
 'NAPROXEN O',
 'NAPROXEN S',
 'NAPROXEN T',
 'NARATRIPTA',
 'NARCAN',
 'NARCAN  NA',
 'NARCAN (NA',
 'NARCAN 4 M',
 'NARDIL',
 'NARDIL  PH',
 'NARDIL (PH',
 'NAROPIN',
 'NAROPIN  R'

2. naloxone filter - contains only naloxone distribution

In [25]:
nal_2019['Product Name'] = nal_2019['Product Name'].str.lower()
naloxone_filter = nal_2019['Product Name'].str.contains("naloxone", na=False)
nal_dist_2019 = nal_2019[naloxone_filter]
nal_dist_2019.sample(10)



Unnamed: 0,Utilization Type,State,Labeler Code,Product Code,Package Size,Year,Quarter,Product Name,Suppression Used,Units Reimbursed,Number of Prescriptions,Total Amount Reimbursed,Medicaid Amount Reimbursed,Non Medicaid Amount Reimbursed,Quarter Begin,Quarter Begin Date,Latitude,Longitude,Location,NDC
2012423,MCOU,LA,641,6132,25,2019,3,naloxone h,True,,,,,,7/1,07/01/2019,37.669,-84.6514,POINT (-84.6514 37.669),641613225
1900462,FFSU,NC,70069,71,10,2019,3,naloxone h,True,,,,,,7/1,07/01/2019,32.7673,-89.6812,POINT (-89.6812 32.7673),70069007110
2408509,FFSU,MS,409,1782,69,2019,3,naloxone 0,True,,,,,,7/1,07/01/2019,38.4623,-92.302,POINT (-92.302 38.4623),409178269
3181901,FFSU,GA,76329,3369,1,2019,3,naloxone h,False,260.0,67.0,4943.14,4943.14,0.0,7/1,07/01/2019,27.8333,-81.717,POINT (-81.717 27.8333),76329336901
1645431,MCOU,RI,409,1215,1,2019,1,naloxone .,True,,,,,,1/1,01/01/2019,44.5672,-122.1269,POINT (-122.1269 44.5672),409121501
832497,MCOU,KS,70069,71,10,2019,2,naloxone h,True,,,,,,4/1,04/01/2019,39.8647,-86.2604,POINT (-86.2604 39.8647),70069007110
3725492,FFSU,MO,67457,292,0,2019,4,naloxone h,True,,,,,,10/1,10/01/2019,45.7326,-93.9196,POINT (-93.9196 45.7326),67457029200
2140197,FFSU,AR,409,1215,25,2019,2,naloxone h,True,,,,,,4/1,04/01/2019,34.9513,-92.3809,POINT (-92.3809 34.9513),409121525
2207494,MCOU,OH,17478,41,1,2019,1,naloxone h,True,,,,,,1/1,01/01/2019,38.4199,-117.1219,POINT (-117.1219 38.4199),17478004101
355535,MCOU,DE,409,1215,25,2019,3,naloxone h,False,82.5,24.0,0.0,0.0,0.0,7/1,07/01/2019,38.8964,-77.0262,POINT (-77.0262 38.8964),409121525


In [26]:
nal_dist_2019['State'].nunique(), nal_dist_2019['State'].unique()

(52, array(['XX', 'UT', 'MD', 'CT', 'NJ', 'LA', 'NY', 'IN', 'VA', 'KY', 'TN',
        'MO', 'MA', 'CA', 'PA', 'KS', 'AR', 'MN', 'WV', 'WA', 'GA', 'NC',
        'TX', 'CO', 'NE', 'DE', 'NM', 'IA', 'MI', 'OH', 'NV', 'FL', 'OR',
        'OK', 'SC', 'ME', 'IL', 'AZ', 'SD', 'DC', 'ND', 'MS', 'MT', 'VT',
        'WY', 'NH', 'ID', 'AL', 'RI', 'WI', 'AK', 'HI'], dtype=object))

Not all states have naloxone listed - they could be included in the Narcan subset, or are not included at all. 

In [27]:
nal_dist_2019.shape

(2690, 20)

Start narrowing down the df - subset columns:

In [28]:
nal_cols = ['State','Quarter Begin Date','Units Reimbursed','Number of Prescriptions','Total Amount Reimbursed','Product Name']
nal_dist_2019 = nal_dist_2019[nal_cols]
nal_dist_2019 = nal_dist_2019.dropna()

In [29]:
nal_dist_2019['Quarter Begin Date'] = pd.to_datetime(nal_dist_2019['Quarter Begin Date'])

In [30]:
state_filter = nal_dist_2019['State'] == 'XX'
nal_st_dist = nal_dist_2019[-state_filter]
nal_st_dist

Unnamed: 0,State,Quarter Begin Date,Units Reimbursed,Number of Prescriptions,Total Amount Reimbursed,Product Name
14567,NJ,2019-04-01,68.4,31.0,484.49,naloxone h
15993,LA,2019-07-01,117.5,34.0,559.73,naloxone h
16226,NY,2019-04-01,205.0,58.0,12493.29,naloxone h
18743,VA,2019-04-01,40.0,12.0,19801.81,naloxone h
21387,IN,2019-01-01,163.0,59.0,2795.09,naloxone h
...,...,...,...,...,...,...
4835403,NY,2019-01-01,122.5,35.0,6723.26,naloxone h
4854614,PA,2019-01-01,120.0,12.0,1362.12,naloxone h
4866156,GA,2019-01-01,72.5,23.0,775.68,naloxone h
4873545,DE,2019-04-01,40.0,15.0,1917.60,naloxone h


In [31]:
nal_st_dist["State"].value_counts()

CA    31
NY    30
OH    23
VA    22
IN    21
NJ    19
PA    18
MD    16
MN    12
WV    12
MA    11
OR    11
GA    11
TN    10
LA    10
DE    10
NC     9
FL     9
NV     8
MO     8
TX     8
WA     8
KY     8
NM     8
CO     7
RI     5
AZ     4
IA     4
OK     4
CT     4
AL     4
MI     4
ME     3
ID     3
KS     3
MT     2
AK     1
NH     1
UT     1
Name: State, dtype: int64

Not only is every state not included on this data set/has this data, but some states have more entries than others. This could be due to differences in product names? Could be that data is aggregated before being uploaded?


In [32]:
CA_filter = nal_st_dist["State"] == 'CA'
nal_CA = nal_st_dist[CA_filter]
nal_CA  

Unnamed: 0,State,Quarter Begin Date,Units Reimbursed,Number of Prescriptions,Total Amount Reimbursed,Product Name
81897,CA,2019-10-01,88.0,40.0,1694.17,naloxone h
143370,CA,2019-07-01,176.0,64.0,2026.96,naloxone h
163942,CA,2019-10-01,52.5,15.0,151.98,naloxone s
241082,CA,2019-07-01,165.0,82.0,1919.91,naloxone h
283432,CA,2019-01-01,242.5,72.0,1668.24,naloxone h
288136,CA,2019-04-01,62.5,20.0,384.06,naloxone h
305354,CA,2019-10-01,159.0,42.0,1786.29,naloxone h
342796,CA,2019-01-01,35.0,14.0,425.19,naloxone h
343145,CA,2019-10-01,47.0,19.0,472.42,naloxone h
352252,CA,2019-10-01,221.0,64.0,9206.33,naloxone h
