## VA and Center for Medicaid Services (CMS) Use Case
## Proposed Walkthrough/"Solution"
#### Written: 9/12/2020
#### Updated: 9/21/2020

### Introduction
The below document includes code and descriptive text evaluating the use case assessment *title here*. Full materials for this use case can be found on the [Python OER GitHub Repository](https://github.com/domdisanto/Python_OER/tree/master/Use%20Cases/VA%20Dual%20Enrollment%20Case). This walkthrough document is tentatively titled a "solution" as, while this document offers a specific way of cleaning and analyzing the available, this document certainly does not offer a unique (or even a uniquely-best) solution to the given assessment. 

The writing in this document will do its best to outline specific parameters that should be met to satisfactorically complete the assessment. These parameters, tasks, outputs, etc. exist solely to assess your developing skills as an analyst and Python programmer. That being said, if you take different steps or follow a different analytic method to reach the same results, that is perfectly acceptable! 

### Data and Module Imports

In [1]:
import pandas as pd 
import numpy as np

In [2]:
va_data = pd.read_csv('VA_data.csv')
cms_data = pd.read_csv('CMS_data.csv')

In [3]:
va_data.sample(5)

Unnamed: 0,Patient ID,Visit Date,Age,Height,Weight,Medication,Medication Dose,Medication Dose Unit,Medication Duration Value,Medication Duration Unit
402,793635600,2019-07-01,67.0,175.49,84.62,ibuprofen,200.0,mg,42,Day
429,224350888,2018-08-12,67.0,173.19,91.69,ibuprofen,400000.0,mcg,4,Week
755,320153536,2018-08-14,45.0,179.12,97.04,ibuprofen,100.0,mg,43,Day
165,816915598,2019-10-27,71.0,181.86,99.91,ibuprofen,400.0,mg,4,Week
270,789760172,2019-11-11,58.0,189.58,106.66,ibuprofen,400.0,mg,4,Week


In [4]:
cms_data.sample(5)

Unnamed: 0,Patient ID,Medication,Medication Dose Unit,Medication Dose,Medication Duration,Duration Unit,Visit Date
87,193-49-0420,BUPRENORPHINE,mg,,34,Day,2018-10-15
218,343-37-5645,BUPRENORPHINE,mg,,30,Day,2019-01-01
33,129-00-5605,TRAMADOL,mcg,400000.0,52,Day,2019-12-17
139,906-51-2385,IBUPROFEN,mcg,100000.0,41,Day,2018-05-16
128,550-92-8654,IBUPROFEN,mg,200.0,44,Day,2018-11-13


### Overview
We know that we must merge our data frames. However I'd like to first clean both data frames, reducing them to only the information I want in a cleaned format, prior to merging. This will let a relatively simpler/cleaner step.  

Let's outline and handle some of the necessary data cleaning in both data frames, prior to merging:  
> 1. Medication doses should be standardized to the same unit     
> 2. We can check mediation names to ensure differently named medications are truly distinct meds, that is there are not multiple entries for essentially the same medication  
> 3. Medications can then be standardized for their mg-morphine equivalent (MME)  
> 4. Prescription duration should be standardized and calculated for each patient  
>     a. We can then calculate a "from" and "to" date for each medication prescription course

### 1.) Standardizing Medication Doses
Upon reviewing the data, I notice columns containing string values for the units of doses. These columns appeared to contain at least two units. Let's check what units are contained in our data frame:

In [5]:
print('The CMS data set contains units of: ', set(cms_data['Medication Dose Unit']), "\n"
      'The VA data set contains units of: ', set(va_data['Medication Dose Unit']))

The CMS data set contains units of:  {'mcg', 'mg'} 
The VA data set contains units of:  {'mcg', 'mg'}


So we fortunately have only have two units, `mg` for milligrams and `mcg` for micrograms, collectively in our data frames. For ease of comparison, we will create a re-calculated dose variable in mg for all of our patient data. We will have to do this for both the CMS adn VA data frames, so I will simply define a function below and loop through observations of both data frames:

In [6]:
def med_dose(data, dose_var, unit_var, new_dose_var, n_obs):
    if new_dose_var not in data.columns:
        data[new_dose_var] = np.nan 
    if data[unit_var].loc[n_obs] == 'mg':
        data[new_dose_var].loc[n_obs] = data[dose_var].loc[n_obs]
    if data[unit_var].loc[n_obs] == 'mcg':
        data[new_dose_var].loc[n_obs] = (data[dose_var].loc[n_obs]) / 1000

I will now loop through all of the observations in our two data frames to convert doses to mg where necessary (and to otherwise retain the original value when already recorded in mg):

In [7]:
for i in range(0,max(cms_data.shape[0], va_data.shape[0])):
    if i<cms_data.shape[0]:
        med_dose(data=cms_data, dose_var='Medication Dose', unit_var='Medication Dose Unit',
                 new_dose_var='Dose Mg Recalc', n_obs=i)
    if i<=va_data.shape[0]:
        med_dose(data=va_data, dose_var='Medication Dose', unit_var='Medication Dose Unit',
                 new_dose_var='Dose Mg Recalc', n_obs=i)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


In [8]:
display(va_data.head())
display(va_data.tail())

Unnamed: 0,Patient ID,Visit Date,Age,Height,Weight,Medication,Medication Dose,Medication Dose Unit,Medication Duration Value,Medication Duration Unit,Dose Mg Recalc
0,609548877,2019-05-29,76.0,177.61,98.53,tramadol HCL,200000.0,mcg,58,Day,200.0
1,224736371,2019-04-02,76.0,170.71,87.6,ibuprofen,200.0,mg,29,Day,200.0
2,601083249,2018-06-02,76.0,188.42,107.11,gabapentin,600000.0,mcg,42,Day,600.0
3,233862042,2019-12-26,80.0,168.5,90.58,butorphanol,2500.0,mcg,57,Day,2.5
4,485487112,2018-10-09,57.0,175.8,93.96,gabapentin,300.0,mg,30,Day,300.0


Unnamed: 0,Patient ID,Visit Date,Age,Height,Weight,Medication,Medication Dose,Medication Dose Unit,Medication Duration Value,Medication Duration Unit,Dose Mg Recalc
803,975180104,2019-07-23,35.0,169.84,84.09,acetaminophen,400.0,mg,34,Day,400.0
804,454394346,2019-09-14,42.0,181.29,106.32,ibuprofen,100000.0,mcg,31,Day,100.0
805,233499695,2018-09-09,40.0,169.51,73.74,ibuprofen,100.0,mg,38,Day,100.0
806,799529712,2018-07-07,35.0,164.54,71.06,dihydrocodeine,30.0,mg,46,Day,30.0
807,399606384,2018-09-23,45.0,170.99,72.2,ibuprofen,100.0,mg,1,Month,100.0


In [9]:
display(cms_data.head())
display(cms_data.tail())

Unnamed: 0,Patient ID,Medication,Medication Dose Unit,Medication Dose,Medication Duration,Duration Unit,Visit Date,Dose Mg Recalc
0,646-97-9801,DIHYDROCODEINE,mg,16.0,57,Day,2018-06-21,16.0
1,553-27-6047,TRAMADOL,mcg,400000.0,5,Week,2020-01-01,400.0
2,334-30-3080,TRAMADOL,mg,400.0,58,Day,2019-05-31,400.0
3,949-44-5667,GABAPENTIN,mg,300.0,43,Day,2019-10-29,300.0
4,995-42-5426,GABAPENTIN,mg,300.0,3,Week,2019-08-07,300.0


Unnamed: 0,Patient ID,Medication,Medication Dose Unit,Medication Dose,Medication Duration,Duration Unit,Visit Date,Dose Mg Recalc
379,309-23-6478,TRAMADOL,mg,200.0,31,Day,2018-01-16,200.0
380,543-29-3026,IBUPROFEN,mg,200.0,5,Week,2019-08-18,200.0
381,868-81-8652,BUTORPHANOL,mg,4.0,36,Day,2018-11-22,4.0
382,450-33-6659,IBUPROFEN,mg,400.0,28,Day,2019-11-01,400.0
383,263-06-4463,BUPERNORPHINE,mg,0.15,29,Day,2018-12-09,0.15


And for simplicity, having reviewed the results of our function, I will drop the now obsolete, original variables for medication dose and unit. 

In [10]:
cms_data.drop(['Medication Dose Unit', 'Medication Dose'], axis=1, inplace=True)
va_data.drop(['Medication Dose Unit', 'Medication Dose'], axis=1, inplace=True)

### 2.) Reviewing and Cleaning Medication Names
Similar to our check of the dose unit variable above, we will check the unique values of our data's medication variable:

In [12]:
print('The CMS data set contains the following medications: \n', set(cms_data['Medication']), "\n \n"
      'The VA data set contains the following medications: \n', set(va_data['Medication']))

The CMS data set contains the following medications: 
 {'DIHYDROCODEINE', 'BUTORPHANOL', 'BUPRENORPHINE', 'TRAMADOL', 'GABAPENTIN', 'IBUPROFEN', 'BUPRENORPHINE TABLET', 'BUPERNORPHINE'} 
 
The VA data set contains the following medications: 
 {'tramadol HCL', 'dihydrocodeine-acetaminophin-caff', 'gabapentin', 'acetaminophen', 'butorphanol', 'buprenorphine', 'tramadol', 'dihydrocodeine', 'ibuprofen'}


We notice that the list of medications between the two data sets are thankfully fairly concordant but with important differenes in syntax/case/spelling. 

### To Date
This walkthrough notebook is to date (9/21/20) incomplete but has been uploaded for reference. Future steps  will include the completion of data cleaning, data merging of our "VA" and "CMS" (simulated) data sources, and a visualization to be included in the student assessment prompt. 