## Medical Plan Recommender Model


#### In this assignment, we want to evaluate your ability to engineer features and design and evaluate a model. We will create a multi-class classification model to recommend medical plans to employees based on user data (inputs) and medical plan labels (outputs). We have access to data from ~250 users. Each user is classified into 1 of 3 plans by actuaries. 

#### Feel free to use any python packages you would like. You are also allowed to google for help with method names / syntax! 

#### Dataset Columns:

- **age**: age of employee
- **family**: who is covered? (Just Me, Me and my Spouse', Me and my kids, Me, Spouse, and Kids)
- **salary**: income of employee
- **household_salaries**: household income of employee
- **financial_risk_preference**: (1) Prefer Savings to Prefer Protection (5) 
- **preexisting_conditions**: conditions that require frequent doctor visits (cancer, high blood pressure, etc)
- **prescription_costs**: costs of annual prescription 
- **pcp_costs**: costs of primary care costs last year
- **specialist_costs**: annual cost of speciality care costs last year
- **pcp_visits**: number of pcp visits last year
- **qle**: qualifying life event that might incur costs (baby, medical procedure, married, moving)
- **specialty_visits**: number of specalist visits last year 
- **exercises**: frequency of exercise (I exercise everyday, I exercise 3x a week, I don't exercise)
- **savings**: if they had to pay $3000, how would they pay for this? (borrow money, have savings, HSA)
- **label**: plan recommendation as indicated by actuary

In [10]:
import pandas as pd
surveys = pd.read_csv("data/surveys.csv", index_col=0)
surveys.sample(5)

Unnamed: 0_level_0,age,family,salary,household_salaries,financial_risk_preference,preexisting_conditions,prescription_costs,pcp_costs,specialist_costs,pcp_visits,qle,specialty_visits,exercises,savings,label
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
37,24,Me and my kids,40534,56747.6,3,none,45,1265,0,8,none,0,I exercise everyday,borrow money,Cigna Choice HDHP
201,19,Just Me,133141,133141.0,3,high blood pressure,87,671,811,8,none,2,I exercise everyday,HSA,Cigna Choice HDHP
105,30,Me and my Spouse,48047,76875.2,3,none,254,281,372,3,none,1,I exercise everyday,borrow money,Cigna Base HDHP
36,22,Me and my kids,60161,78209.3,4,high blood pressure,52,322,1209,2,none,3,I exercise 3x a week,have savings,Cigna Choice HDHP
209,18,"Me, Spouse, and Kids",48505,67907.0,3,obesity,61,0,2574,0,none,7,I don't exercise,have savings,Cigna Choice HDHP


In [42]:
X = surveys.drop("label", axis=1)
y = surveys.label.values

### 1) Write code to split the data into train & test sets

In [35]:
train_split = 0.75

X_train = ### CHANGEME!
X_test = ### CHANGEME!

y_train = ### CHANGEME!
y_test = ### CHANGEME!

### 2) You will train a multi-class classification model later in the script. Consider what model you would like to train, and implement a feature normalization strategy. Explain your reasoning behind your strategy. 

In [40]:
numeric_features = X_train.select_dtypes(include='number').columns
numeric_features

Index(['age', 'salary', 'household_salaries', 'financial_risk_preference',
       'prescription_costs', 'pcp_costs', 'specialist_costs', 'pcp_visits',
       'specialty_visits'],
      dtype='object')

In [41]:
categorical_features = X_train.select_dtypes(include='object').columns
categorical_features

Index(['family', 'preexisting_conditions', 'qle', 'exercises', 'savings'], dtype='object')

In [None]:
### CHANGEME! normalize variables based on model strategy you select

In [None]:
X_train_normalized = ### CHANGEME!
X_test_normalized = ### CHANGEME!

### 3) Select one model and train. We do not expect you to implement hyperparameter tuning, but please talk through how you would set this up.

In [None]:
clf = #### CHANGEME! select model
model = clf.fit() #### CHANGEME! fit model 

### 4) Evaluate Model: Display model train/test classification metrics of your choice and describe them in the context of this problem.  

In [None]:
#### CHANGEME! display classification metrics of your choice 

### 5) Monitoring: Describe metrics you would consider to monitor this model in production.  
