# CCC Interview Presentation

**<u>Background</u>**															
This data compares clients enrolled in two mental health programs: usual care and a new intervention we're piloting.																
The data represents a one-year lookback. All clients were enrolled for the full year.	

<u>**Analysis Questions**</u>: The new intervention has been up and running for a full year, and everyone seems to think it’s very successful. However, the new intervention costs about 50% more than the “usual care.” Our leaders have asked for an analysis of the effectiveness of the new intervention. Key questions include (but are not limited to):
- Is the new intervention better than “usual care”?
- Which clients are best served by the intervention?
- Should we expand the care model from the new intervention to our “usual care” team?

																
<u>**Columns**</u>
    
**PPID** -	Unique client identifier															
Program	Program that the client was enrolled in for the full year (usual care vs. intervention)															
**Age**	- Client's age at the start of the one-year lookback															
**Gender** - Categorical variable representing client's gender															
**RaceEthnicity** - Composite variable representing client's race and ethnicity															
**MHDx** -	Primary mental health diagnosis; a diagnosis is required for enrollment in mental health treatment															
**SUDx** -	Primary substance use diagnosis; no substance use disorder is required for enrollment in mental health treatment															
**MedDx** -	Number of chronic medical conditions; no medical diagnoses are required for enrollment in mental health treatment															
**PsychAdmit** - Number of psychiatric  hospital admissions in the past year; this is a primary outcome that our funders would like to reduce															
**DLA1** - Average score on the DLA-20 assessment at the start of the year lookback; measures functional status; possible range of 1-7 (higher is better)															
**DLA2** -	Average score on the DLA-20 assessment at the end of the year lookback; measures functional status; possible range of 1-7 (higher is better)															

## Import and initial look at the Data

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('Test_MH_Data.csv')
df.head()

Unnamed: 0,PPID,Program,Age,Gender,RaceEthnicity,MHDx,SUDx,MedDx,PsychAdmit,DLA1,DLA2
0,A234282,Intervention,34,F,Other,Depression,Alcohol,2,1,3.69,4.13
1,A232412,Intervention,26,M,NonHispWhite,Trauma,Opioid,0,0,4.22,4.68
2,A259052,Intervention,62,M,NativeAm,Depression,Opioid,0,1,4.17,4.78
3,A353421,Intervention,34,F,NonHispWhite,Depression,Alcohol,0,0,4.11,4.46
4,A302351,UsualCare,46,M,NonHispBlack,Trauma,Opioid,0,1,4.19,4.25


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 479 entries, 0 to 478
Data columns (total 11 columns):
PPID             479 non-null object
Program          479 non-null object
Age              479 non-null int64
Gender           479 non-null object
RaceEthnicity    479 non-null object
MHDx             479 non-null object
SUDx             479 non-null object
MedDx            479 non-null object
PsychAdmit       479 non-null int64
DLA1             479 non-null float64
DLA2             479 non-null float64
dtypes: float64(2), int64(2), object(7)
memory usage: 41.2+ KB


In [4]:
# Check for null
df.isnull().sum()

PPID             0
Program          0
Age              0
Gender           0
RaceEthnicity    0
MHDx             0
SUDx             0
MedDx            0
PsychAdmit       0
DLA1             0
DLA2             0
dtype: int64

In [17]:
# add column that show yearly improvement
df["DLA_Diff"] = df["DLA2"] - df["DLA1"]

In [18]:
df.head()

Unnamed: 0,PPID,Program,Age,Gender,RaceEthnicity,MHDx,SUDx,MedDx,PsychAdmit,DLA1,DLA2,DLA_Diff
0,A234282,Intervention,34,F,Other,Depression,Alcohol,2,1,3.69,4.13,0.44
1,A232412,Intervention,26,M,NonHispWhite,Trauma,Opioid,0,0,4.22,4.68,0.46
2,A259052,Intervention,62,M,NativeAm,Depression,Opioid,0,1,4.17,4.78,0.61
3,A353421,Intervention,34,F,NonHispWhite,Depression,Alcohol,0,0,4.11,4.46,0.35
4,A302351,UsualCare,46,M,NonHispBlack,Trauma,Opioid,0,1,4.19,4.25,0.06


In [19]:
# look at numerical columns
df.describe()

Unnamed: 0,Age,PsychAdmit,DLA1,DLA2,DLA_Diff
count,479.0,479.0,479.0,479.0,479.0
mean,45.661795,0.613779,3.878789,4.053236,0.174447
std,12.485909,0.807327,0.501458,0.587929,0.292922
min,18.0,0.0,2.27,2.29,-0.12
25%,37.0,0.0,3.51,3.65,-0.02
50%,46.0,0.0,3.88,4.04,0.02
75%,55.0,1.0,4.23,4.46,0.44
max,80.0,5.0,5.53,6.02,0.96


In [20]:
# create dataframe of just clients that received usual care
df_usual = df[df['Program'] == "UsualCare"]

In [21]:
df_usual.head()

Unnamed: 0,PPID,Program,Age,Gender,RaceEthnicity,MHDx,SUDx,MedDx,PsychAdmit,DLA1,DLA2,DLA_Diff
4,A302351,UsualCare,46,M,NonHispBlack,Trauma,Opioid,0,1,4.19,4.25,0.06
8,A212390,UsualCare,41,F,Other,Depression,Opioid,1,1,3.95,4.01,0.06
9,A310084,UsualCare,33,M,NativeAm,Trauma,,2,0,4.18,4.16,-0.02
11,A280897,UsualCare,76,M,NonHispWhite,Psychosis,Opioid,0,1,3.74,3.7,-0.04
14,A350961,UsualCare,47,M,NonHispWhite,Depression,Stimulant,1,1,4.11,4.06,-0.05


In [22]:
# create dataframe of just clients that received intervention care
df_intervention = df[df['Program'] == "Intervention"]

In [23]:
df_intervention.head()

Unnamed: 0,PPID,Program,Age,Gender,RaceEthnicity,MHDx,SUDx,MedDx,PsychAdmit,DLA1,DLA2,DLA_Diff
0,A234282,Intervention,34,F,Other,Depression,Alcohol,2,1,3.69,4.13,0.44
1,A232412,Intervention,26,M,NonHispWhite,Trauma,Opioid,0,0,4.22,4.68,0.46
2,A259052,Intervention,62,M,NativeAm,Depression,Opioid,0,1,4.17,4.78,0.61
3,A353421,Intervention,34,F,NonHispWhite,Depression,Alcohol,0,0,4.11,4.46,0.35
5,A315862,Intervention,51,M,NonHispWhite,Anxiety,Opioid,1,0,3.55,4.06,0.51


## Exploratory Data Analysis

In [5]:
import plotly.express as px

In [24]:
df_usual.describe()

Unnamed: 0,Age,PsychAdmit,DLA1,DLA2,DLA_Diff
count,336.0,336.0,336.0,336.0,336.0
mean,46.508929,0.71131,3.866042,3.863006,-0.003036
std,11.089591,0.851683,0.510502,0.516134,0.048681
min,19.0,0.0,2.27,2.29,-0.12
25%,39.0,0.0,3.51,3.5,-0.03
50%,47.0,1.0,3.87,3.86,0.0
75%,54.0,1.0,4.21,4.22,0.03
max,80.0,5.0,5.53,5.56,0.17


In [25]:
df_intervention.describe()

Unnamed: 0,Age,PsychAdmit,DLA1,DLA2,DLA_Diff
count,143.0,143.0,143.0,143.0,143.0
mean,43.671329,0.384615,3.908741,4.50021,0.591469
std,15.131711,0.638253,0.479951,0.498272,0.18323
min,18.0,0.0,2.81,3.09,0.24
25%,32.5,0.0,3.515,4.145,0.455
50%,43.0,0.0,3.93,4.54,0.61
75%,57.0,1.0,4.26,4.85,0.71
max,76.0,3.0,5.16,6.02,0.96


First look at the data, it seems that the intervention program has better outcomes. The difference in the mean of DLA1 and DLA2 for usual care is -0.003, while the difference in the mean of the intervention program is 0.591.

In [31]:
# PLot histogram of DLA_Diff for UsualCare
px.histogram(df_usual, x='DLA_Diff', 
            title='DLA Difference Histogram for Usual Care')


In [32]:
# PLot histogram of DLA_Diff for Intervention
px.histogram(df_intervention, x='DLA_Diff', 
              title='Histogram of DLA Difference for Intervention Program')


In [33]:
px.histogram(df, x="DLA_Diff", color="Program", barmode="overlay",
            title="Histogram of DLA Difference based on Intervention Program")

In [35]:
fig = px.parallel_categories(df, dimensions=['Program', 'Gender', 'RaceEthnicity',
                                             'MHDx', 'SUDx'],
                color='DLA_Diff', color_continuous_scale=px.colors.sequential.Inferno,
                title='DLA Difference Across Categories')
fig.show()


Looking at the plot above, it seems that the Intervention program serves all clients better across gender, race/ethnicity, medical diagnoses, and substance use.

## Statistical Analysis

Two Sample T-Test to see 

In [50]:
import scipy.stats as stats
import numpy as np

In [44]:
group1 = np.array(df_usual['DLA_Diff'])
group2 = np.array(df_intervention['DLA_Diff'])

H0: µ1 = µ2 (the two program means are equal)

HA: µ1 ≠µ2 (the two program means are not equal)

In [49]:
# Find variance of both groups
var1 = (np.var(group1))
var2 = (np.var(group2))
print(f'var1 = {var1.round(4)} \n var2 = {var2.round(4)}')

var1 = 0.0024 
 var2 = 0.0333


In [52]:
stats.ttest_ind(a=group1, b=group2)


Ttest_indResult(statistic=-55.143722159871736, pvalue=4.3055817180988177e-209)

Since the P-value is under p=0.05, we can reject the null hypothesis and say with relative confidence that the means are not equal and the intervention program has statistical significance in improving a client's DLA-20 assessment score over the course of a year.

## Conclusion


- Is the new intervention better than “usual care”?

Overall, clients who receive the new intervention have greater improvement on the DLA-20 assessment after a year.

- Which clients are best served by the intervention?

Clients over all demographics that were in the intervention program showed greater improvements on their DLA-20 assessments after a year than clients in the "usual care" program.

- Should we expand the care model from the new intervention to our “usual care” team?

Based on the data given, I would recommend expanding the new intervention model to the "usual care" teams. 
