# Research Question 1: What FY18 projects in FederalReporter are explicitly tagged as having to do with opioids (in the project-terms field)?

## RQ 1.1: How many opioids projects were funded in FY18?
* Answer: 1,324 out of 91,184 (or 1.45%)
* (1249 at NIH, 24 at NSF, 48 at VA)  

## RQ 1.2: How many project dollars went to opioids projects in FY18?
* Answer: 580m out of 38b (or 1.5%)
* (Note that this describes all projects receiving funding in FY18, not just *new* projects)

#### Validation
These results match up well with what federalreporter.nih.gov query on 'opioid' yields.
* They report 1,412 projects (compared to our 1324)  
* They report 626 million in project funding (compared to our 580 million)

https://federalreporter.nih.gov/Projects/visualize/?searchId=8dc359a1301145e8a9dead51254d8f12&searchMode=Smart&resultType=projects

In [9]:
import pandas as pd
import numpy as np

# cd to the directory with data
# %cd '/path/to/your/data'

# download csv(s) of project data from https://federalreporter.nih.gov/FileDownload

### LOAD IN PROJECT DATA
file = '/FedRePORTER_PRJ_C_FY2018.csv'
df = (pd.read_csv(file,skipinitialspace=True,encoding='utf-8'))

# Change our date variable to the correct type
# df['PROJECT_START_DATE'] = pd.to_datetime(df['PROJECT_START_DATE'])
# df['PROJECT_END_DATE'] = pd.to_datetime(df['PROJECT_END_DATE'])
# df['BUDGET_START_DATE'] = pd.to_datetime(df['BUDGET_START_DATE'])
# df['BUDGET_END_DATE'] = pd.to_datetime(df['BUDGET_END_DATE'])

# new variable is 1 for rows with opioid in project term column
df['opioid'] = np.where(
    df['PROJECT_TERMS'].str.contains("opioid",case=False, na=False), 1, '')

# create a numeric version of our flag
df['opioid_num'] = pd.to_numeric(df['opioid'])

In [24]:
with pd.option_context('display.max_colwidth', -1): display(df[7:9][['PROJECT_TERMS','opioid']])

Unnamed: 0,PROJECT_TERMS,opioid
7,"Actins; Agonist; base; Binding; Blood Vessels; Caveolae; Cell Adhesion; Cell membrane; Cells; Cessation of life; Compartment syndromes; Coupled; Cyclic AMP; Cyclic GMP; Cytosol; Data; Development; Edema; Elements; Embryo; Endothelial Cells; Excision; experimental study; Feedback; Gene Deletion; G-Protein-Coupled Receptors; GTP-Binding Protein alpha Subunits, Gs; guanine nucleotide binding protein; improved; In Vitro; in vivo; Inflammation; Inflammatory; inhibitor/antagonist; innovation; Knowledge; Laboratories; Location; Measurement; Measures; Mediating; Methods; Microvascular Permeability; Modification; Mus; Muscle; Nitric Oxide; Nitrosation; NOS3 gene; novel therapeutics; Organ; Pathologic; Pathology; Permeability; Phosphorylation; Physicians; Platelet Activating Factor; platelet activating factor receptor; prevent; Process; Production; Property; Protein Family; Protein S; Proteins; receptor; Receptor Protein-Tyrosine Kinases; Recovery; Regulation; Research Personnel; response; restoration; Role; Signal Pathway; Signal Transduction; Small Interfering RNA; Stimulus; Surgeon; targeted treatment; Testing; Therapeutic; therapy development; Time; tissue repair; Tissues; Vascular Diseases; vascular inflammation; Vascular Permeabilities; Vasodilator Agents; vasodilator-stimulated phosphoprotein",
8,"Adhesives; African American; Age; Ancillary Study; Anticoagulants; Arrhythmia; Atherosclerosis Risk in Communities; Atrial Fibrillation; Blood Pressure; Brain; brain abnormalities; Brain natriuretic peptide; Cardiac; Cardioscopes; Cardiovascular Diseases; cardiovascular disorder prevention; cardiovascular disorder risk; Cessation of life; Clinical; clinical risk; cognitive function; cognitive testing; Cohort Studies; Contracts; Cross-Sectional Studies; Data; Detection; Devices; Electrocardiogram; Frequencies; Functional disorder; Funding; Genetic Variation; Goals; health difference; Heart Atrium; Heart failure; High Prevalence; Holter Electrocardiography; Hypertension; Hypokalemia; imaging study; Impaired cognition; implantable device; improved; Individual; Jackson Heart Study; Knowledge; Lead; longitudinal analysis; Magnetic Resonance Imaging; Measurement; Measures; Methods; modifiable risk; Monitor; monitoring device; Multi-Ethnic Study of Atherosclerosis; Myocardial Infarction; National Heart, Lung, and Blood Institute; novel; N-terminal; Opioid; Participant; Patients; Pharmaceutical Preparations; Population; Potassium; Prevalence; Prevention; Prevention strategy; pro-brain natriuretic peptide (1-76); Prospective cohort study; Psychosocial Factor; racial difference; Reporting; Research; Rest; Risk; Risk Factors; screening; Serum; sex; stroke; stroke risk; Structure; Surveillance Methods; Symptoms; Time; treatment strategy; Vision",1.0


In [9]:
# RQ 1.1
print('# of opioid projects = ' + str(df.opioid.value_counts()[1]) + '\n')
print('% opioid projects of total = %' + str(
    100 * (df.opioid.value_counts()[1] / df.opioid.value_counts()[0])))

grouped = df.groupby(['AGENCY'])
grouped['opioid_num'].agg(np.sum)

# of opioid projects = 1324

% opioid projects of total = %1.4520091244077908


AGENCY
AHRQ          0.0
ALLCDC        0.0
FDA           0.0
NIDILRR       3.0
NIH        1249.0
NSF          24.0
VA           48.0
Name: opioid_num, dtype: float64

In [5]:
# RQ 1.2
opioid_cost = df[df.opioid == '1']['FY_TOTAL_COST'].sum()
total_cost = df['FY_TOTAL_COST'].sum()
print('Opioid project costs = $' + str(opioid_cost))
print('\n' + 'Total project costs = $' + str(total_cost))
print('\n' + 'Pct opioid costs over total = %' + str(100 * (opioid_cost/total_cost)))

Opioid project costs = $579487666.0

Total project costs = $38443347898.0

Pct opioid costs over total = %1.5073808543874183
