![title](media/DataRobot.png)

<a href="https://colab.research.google.com/github/datarobot-community/tutorials-for-data-scientists/blob/master/Classification/Python/predict_hospital_readmissions/src/readmissions_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### DataRobot provides R and Python package to access different functionalities in the API
1 - Project   
2 - Model             
3 - Retraining    
4- Predicting

Full documentation of the Python package can be found here: https://datarobot-public-api-client.readthedocs-hosted.com/en/

The dataset we will be using today, is the well-known "readmissions dataset". You can also find it online but it will also be available when you download this notebook.

## Getting started
You can install datarobot using install.packages command from any computer with internet access! 

In [None]:
!pip install datarobot

### Loading the libraries

In [None]:
import pandas as pd
import datarobot as dr
import matplotlib.pyplot as plt
import seaborn as sns

### Credentials
To access the DataRobot API user need to connect to it. To make sure authorize users are accessing the DataRobot API user need to use their username, password or API token.
You also need to ensure your "API Access" configuration is ON (please ask your administrator if not).

To find your API Token, visit <code>YOUR_API_HOST</code> , log in and follow the instructions below:

![title](media/credentials_1.png)

![title](media/credentials_2.png)

![title](media/credentials_3.png)

In [None]:
#Make sure you add '/api/v2' at the end of your endpoint.

endpoint = "YOUR_DATAROBOT_HOST"
api_token = "YOUR_API_KEY"
dr.Client(token=api_token, endpoint=endpoint)

### Read the Dataset

In [None]:
readmissions_data = pd.read_csv("https://raw.githubusercontent.com/datarobot-community/tutorials-for-data-scientists/master/Classification/Python/predict_hospital_readmissions/src/data/10k_diabetes_training.csv")

In [None]:
readmissions_data.head()

### Start a DataRobot Project!

In [None]:
project = dr.Project.start(readmissions_data,            #Pandas Dataframe with data. Could also pass the folder link itself
                           project_name = 'readmissions',#Name of the project
                           target = 'readmitted',        #Target of the project
                           worker_count = -1,            #Amount of workers to use. -1 means all available workers
                           autopilot_on = True)          #Run on autopilot (Default value)

### Interacting with autopilot

In [None]:
project.pause_autopilot()

In [None]:
project.unpause_autopilot()

In [None]:
project.wait_for_autopilot()

In [None]:
# More jobs will go in the queue in each stage of autopilot
# This gets the current inprogress and queued jobs
project.get_model_jobs()

### Pick another project

### Where to find the project ID?
![title](media/model_id.png)

### What if I don't want to use my browser

In [None]:
for p in dr.Project.list()[0:3]:
    print(p, p.id)

In [None]:
# To choose another project
try:
    another_project = dr.Project.get('YOUR_PROJECT_ID')
except:
    pass

### Take a look at finished models

In [None]:
for model in project.get_models():
    print(model)

In [None]:
#Pick best model
best_model = project.get_models()[0]

print(best_model)
print(best_model.metrics['AUC'])
print(best_model.metrics['Gini Norm'])

In [None]:
#Visualise the ROC Curve
roc = best_model.get_roc_curve('crossValidation')
roc_df = pd.DataFrame(roc.roc_points)

plt.title('Receiver Operating Characteristic')
plt.plot(roc_df['false_positive_rate'], roc_df['true_positive_rate'], 'b', label = 'AUC = %0.2f')
plt.legend(loc = 'lower right')
plt.plot([0,1], [0,1], 'r--')
plt.xlim([0,1])
plt.ylim([0,1])
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.show()

### Plotting Feature Impact

In [None]:
#Get Feature Impact
feature_impacts = best_model.get_or_request_feature_impact()

#Sort feature impact based on normalised impact
feature_impacts.sort(key=lambda x: x['impactNormalized'], reverse=True)

#Save feature impact in pandas dataframe
fi_df = pd.DataFrame(feature_impacts)

In [None]:
fig, ax = plt.subplots(figsize = (12,5))

#Plot feature impact
sns.barplot(x='featureName', y='impactNormalized', data=fi_df[0:5], color='b')

### Train on 100% of Data

In [None]:
project.unlock_holdout()

#This command returns the model job id.
retrained_best_model_id = best_model.train(sample_pct=100)

#Waits for model to finish and gets the actual model
retrained_best_model = dr.models.modeljob.wait_for_async_model_creation(project.id, retrained_best_model_id)

## Predictions
#### Modelling API
You can use the modelling API if you use Python or R and there are multiple ways you can interact with it.
#### Prediction API
Any project can be called with the Prediction API if you have prediction servers. This is a simple REST API. Click on a model in the UI, then "Deploy Model" and "Activate now". You'll have access to a Python code snippet to help you interact with it. You can also deploy the model through the python API.

### Using the Modelling API

In [None]:
test_df = pd.read_csv("https://raw.githubusercontent.com/datarobot-community/tutorials-for-data-scientists/master/Classification/Python/predict_hospital_readmissions/src/data/10k_diabetes_test.csv") #Load testing data

prediction_data = project.upload_dataset(test_df)
predict_job = retrained_best_model.request_predictions(prediction_data.id)
result = predict_job.get_result_when_complete()