### 5. Python API Training - Model Factory [Solution]

<b>Author:</b> Thodoris Petropoulos <br>
<b>Contributors:</b> Rajiv Shah

This is the 5th exercise to complete in order to finish your `Python API Training for DataRobot` course! This exercise teaches you how to leverage DataRobot in order to create a model factory. That could help you increase model accuracy and shows you the true capabilities of an AutoML platform.

Here are the actual sections of the notebook alongside time to complete: 

1. Connect to DataRobot. [3min]<br>
2. Create a Pandas Dataframe with the `readmissions` dataset. [3min]
3. Create a DataRobot Project with the `readmissions` dataset explicitly using `quick` autopilot. [5min]
4. Split the dataset and run multiple DataRobot Projects based on the `admission_type_id` column explicitly using `quick` autopilot. [25min]
5. Check the validation and cross validation scores of the best models for each one of the projects. What do you see? [15min]

Each section will have specific instructions so do not worry if things are still blurry!

As always, consult:

- [API Documentation](https://datarobot-public-api-client.readthedocs-hosted.com)
- [Samples](https://github.com/datarobot-community/examples-for-data-scientists)
- [Tutorials](https://github.com/datarobot-community/tutorials-for-data-scientists)

The last two links should provide you with the snippets you need to complete most of these exercises.

<b>Data</b>

The dataset we will be using throughout these exercises is the well-known `readmissions dataset`. You can access it or directly download it through DataRobot's public S3 bucket [here](https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv).

### Import Libraries
Import libraries here as you start finding out what libraries are needed. The DataRobot package is already included for your convenience.

In [1]:
import datarobot as dr

#Proposed Libraries needed
import pandas as pd

### 1. Connect to DataRobot. [3min]<br>

In [2]:
#Possible solution
dr.Client(config_path='../../github/config.yaml')

<datarobot.rest.RESTClientObject at 0x1117bddd8>

### 2. Create a Pandas Dataframe with the `readmissions` dataset. [3min]

In [3]:
# Proposed Solution

df = pd.read_csv('https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.csv')
df.head()

Unnamed: 0,race,gender,age,weight,admission_type_id,discharge_disposition_id,admission_source_id,time_in_hospital,payer_code,medical_specialty,...,glipizide.metformin,glimepiride.pioglitazone,metformin.rosiglitazone,metformin.pioglitazone,change,diabetesMed,readmitted,diag_1_desc,diag_2_desc,diag_3_desc
0,Caucasian,Female,[50-60),?,Elective,Discharged to home,Physician Referral,1,CP,Surgery-Neuro,...,No,No,No,No,No,No,False,Spinal stenosis in cervical region,Spinal stenosis in cervical region,"Effusion of joint, site unspecified"
1,Caucasian,Female,[20-30),[50-75),Urgent,Discharged to home,Physician Referral,2,UN,?,...,No,No,No,No,No,No,False,"First-degree perineal laceration, unspecified ...","Diabetes mellitus of mother, complicating preg...",Sideroblastic anemia
2,Caucasian,Male,[80-90),?,Not Available,Discharged/transferred to home with home healt...,,7,MC,Family/GeneralPractice,...,No,No,No,No,No,Yes,True,Pneumococcal pneumonia [Streptococcus pneumoni...,"Congestive heart failure, unspecified",Hyperosmolality and/or hypernatremia
3,AfricanAmerican,Female,[50-60),?,Emergency,Discharged to home,Transfer from another health care facility,4,UN,?,...,No,No,No,No,No,Yes,False,Cellulitis and abscess of face,Streptococcus infection in conditions classifi...,Diabetes mellitus without mention of complicat...
4,AfricanAmerican,Female,[50-60),?,Emergency,Discharged to home,Emergency Room,5,?,Psychiatry,...,No,No,No,No,Ch,Yes,False,"Bipolar I disorder, single manic episode, unsp...",Diabetes mellitus without mention of complicat...,Depressive type psychosis


### 3. Create a DataRobot Project with the `readmissions` dataset explicitly using `quick` autopilot. [5min]


**Instructions**:
1. Set `readmitted` as the target.
2. Start the project using explicitely the `quick` autopilot in the `mode` variable.
3. Set `worker_count` variable to -1.
4. Wait for Autopilot to complete.

**HINT**: You should have already done something similar during the 1st exercise of this course.

In [4]:
#Possible solution
project = dr.Project.create(sourcedata = df,
                           project_name = '05_Model_Factory_01')

project.set_target(target = 'readmitted', mode = 'quick', worker_count = -1)

Project(05_Model_Factory_01)

### 4. Split the dataset and run multiple DataRobot Projects based on the `admission_type_id` column explicitly using `quick` autopilot. [25min]

**Hint**: Some `admission_type_id` values have less than 20 rows. That is the minimum requirement to start a DataRobot project. You should create an exception and just skip those admission types.

**Hint 2**: This will take a long time to execute. You can either execute for only a small part of the different `admission_type_id` values or just move on to the next exercise while this is executing.

In [5]:
#Proposed Solution

projects = {} #To save projects

#Create one project for each customer type
for value in df['admission_type_id'].unique():
    try:
        temp_project = dr.Project.create(df.loc[df['admission_type_id'] == value],
                                    project_name = 'Readmission_%s'%value)
        
        temp_project.set_target(target = 'readmitted', mode = 'quick', worker_count = 10)
        projects[value] = temp_project

    except: #Catching the case when dataset has fewer than 20 rows.
        pass

#Wait for all autopilots to finish
for key in projects:
    projects[key].wait_for_autopilot()

In progress: 1, queued: 0 (waited: 0s)
In progress: 1, queued: 0 (waited: 1s)
In progress: 1, queued: 0 (waited: 2s)
In progress: 1, queued: 0 (waited: 3s)
In progress: 1, queued: 0 (waited: 4s)
In progress: 1, queued: 0 (waited: 7s)
In progress: 1, queued: 0 (waited: 11s)
In progress: 1, queued: 0 (waited: 18s)
In progress: 0, queued: 0 (waited: 32s)
In progress: 1, queued: 0 (waited: 52s)
In progress: 1, queued: 0 (waited: 73s)
In progress: 1, queued: 0 (waited: 94s)
In progress: 0, queued: 0 (waited: 114s)
In progress: 1, queued: 0 (waited: 135s)
In progress: 1, queued: 0 (waited: 156s)
In progress: 1, queued: 0 (waited: 177s)
In progress: 1, queued: 0 (waited: 197s)
In progress: 1, queued: 0 (waited: 218s)
In progress: 1, queued: 0 (waited: 239s)
In progress: 1, queued: 0 (waited: 260s)
In progress: 1, queued: 0 (waited: 280s)
In progress: 1, queued: 0 (waited: 301s)
In progress: 0, queued: 0 (waited: 322s)
In progress: 0, queued: 0 (waited: 343s)
In progress: 0, queued: 0 (waited:

### 5. Check the validation scores of the best models for each one of the projects. What do you see? [15min]

- Based on `AUC` score

In [6]:
#Proposed solution

#Get best model from original project which has been trained on all of the data.
print(project.get_models()[0].metrics['AUC']['validation'])

#Get best model from each one of the projects trained on a subset of data based on `admission_type_id`.
for key in projects:
    print(projects[key].get_models()[0].metrics['AUC']['validation'])

0.70857
0.67956
0.77702
0.61946
0.70357
