<a href="https://colab.research.google.com/github/coryamanda/datarobot-api-lab/blob/main/Special_Topics_Python_API_no_credentials.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Agenda
- How to connect to AI Catalog
- How to run the Manual mode
- How to share projects for collaboration
- Advanced Tuning API – Hyperparameter Tuning
- How to deploy a model using API

If you want the more comprehensive overview from basics, go to the tutorial here: https://github.com/coryamanda/datarobot-api-lab

### Prerequisites

Before we do anything else, we need to connect to DataRobot and set up our user access credentials.

In [None]:
!pip install datarobot==2.21.5

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import yaml
import datarobot as dr
from google.colab import drive
import requests
import pprint as pp
from datarobot.models.modeljob import wait_for_async_model_creation
from datarobot.enums import AVAILABLE_STATEMENT_TYPES
from io import StringIO
import requests

In [None]:
token = ""
endpoint = "https://app2.datarobot.com/api/v2"
client = dr.Client(token, endpoint='https://app.datarobot.com/api/v2') #replace if not on Managed Cloud

# Connect to the AI Catalog

References: This section borrows heavily from the DataRobot GitHub notebook here: https://github.com/datarobot-community/tutorials-for-data-scientists/blob/master/integrations/Database%20Connections%20and%20Writebacks/databases_and_deployment.ipynb

The steps describing how to set up your database connections use the following terminology:

- **DataStore**: A configured connection to a database— it has a name, a specified driver, and a JDBC URL. You can register data stores with DataRobot for ease of re-use. A data store has one connector but can have many data sources.
- **DataSource**: A configured connection to the backing data store (the location of data within a given endpoint). A data source specifies, via SQL query or selected table and schema data, which data to extract from the data store to use for modeling or predictions. A data source has one data store and one connector but can have many datasets.
- **DataDriver**: The software that allows the DataRobot application to interact with a database; each data store is associated with one driver (created by the admin). The driver configuration saves the storage location in DataRobot of the JAR file and any additional dependency files associated with the driver.
- **Dataset**: Data, a file or the content of a data source, at a particular point in time. A data source can produce multiple datasets; a dataset has exactly one data source. When a DataSource, file, url, or local dataframe is instantiated as a Dataset, it shows up in the AI Catalog.


In [None]:
#Database Credentials to connect to
USERNAME = ''
PASSWORD = ''
JDBC_URL = ''

Find the full list of drivers I have access to. For on-prem, these are configured by the Admin


In [None]:
drivers = dr.DataDriver.list()
drivers

In [None]:
redshift_driver = [d for d in dr.DataDriver.list() if d.canonical_name == 'Redshift (1.2.12)'][0]
redshift_datastore = dr.DataStore.create(data_store_type='jdbc',
                                             canonical_name='DataRobot API Training Redshift2',
                                             driver_id=redshift_driver.id,
                                             jdbc_url=JDBC_URL)

In [None]:
redshift_datastore = [x for x in dr.DataStore.list() if x.canonical_name == 'DataRobot API Training Redshift'][0]
redshift_datastore.tables(username=USERNAME, password=PASSWORD)

Now we have a Data Connection set up, but we haven't set up a Data Source. To do that, we specify a query. 

In [None]:
params = dr.DataSourceParameters(data_store_id=redshift_datastore.id,
                                 query='SELECT * FROM lending_club_profile;')
data_source = dr.DataSource.create(data_source_type='jdbc',
                                   canonical_name='dr_api_training_lc2',
                                   params=params)

# Manual Mode

At this point, we have a Data Source. From here, we can create a DataRobot project. Note that we can have many different Data Sources set up to work with a single Data Connection.

Note that my access credentials aren't stored, so I need to share them again. 

In [None]:
new_proj = dr.Project.create_from_data_source(data_source_id=data_source.id,
                                          username=USERNAME,
                                          password=PASSWORD)

This sets up a new project, that we're running in Manual mode. For the sake of a new novelty, we're going to try a different prediction approach - trying to predict the category of a loan request given the other features.

In [None]:
new_proj.set_target(target="purpose",
                mode=dr.AUTOPILOT_MODE.MANUAL,
                worker_count=-1)

We can get the full list of blueprints available to a project.

In [None]:
new_proj.get_blueprints()


This is a simple match to determine if the blueprint name contains "Neural Network Classifier". If you want a more complex version, you could filter on the specific process steps of the model (check to tutorial #1 for more info). 

In [None]:
BLUEPRINT_TRAIN = 'Neural Network Classifier'
dr_nn = [b for b in new_proj.get_blueprints() if BLUEPRINT_TRAIN in b.model_type]
dr_nn

And then we can train individual blueprints on a certain sample percentage, with a specific feature list if desired. The "wait_for_async_model_creation" code below will run models one at a time.

In [None]:
for blueprint in dr_nn:
  model_job_id = new_proj.train(blueprint, sample_pct=64)
  #dr_model = wait_for_async_model_creation( #helpful if your code progresses automatically
  #    project_id=new_proj.id,
  #    model_job_id=model_job_id,
  #)

At this point you can treat the Leaderboard as normal and grab models from it to compare, or to run Autopilot if you want to do that at this stage. 

In [None]:
new_proj.get_models()

# Hyperparameter Tuning

Hyperparameter tuning is a common use case for the API. To start, we'll create a new tuning session and understand which tasks are available for tuning within this Blueprint. 

In [None]:
#grab the top model on the leaderboard
model = new_proj.get_models()[0]
model

In [None]:
tune = model.start_advanced_tuning_session()
tasks = tune.get_task_names()
tasks

Next, we can understand which components of each step are available for tuning.

In [None]:
for task in tasks:
    pp.pprint(task)
    pp.pprint(tune.get_parameter_names(task))
    print()

DataRobot also provides some helpful information about the constraints on each parameter - what values can be provided for that parameter.

In [None]:
param_details = model.get_advanced_tuning_parameters()["tuning_parameters"]
param_list = [x["parameter_name"] for x in param_details]
pp.pprint (param_details[param_list.index("n_hidden_units")])

In [None]:
tune.set_parameter(parameter_name = "max_ngram",
                     value = 3)
tune.set_parameter(parameter_name = "stemmer",
                     value = 'snowball')
tune.set_parameter(parameter_name = "n_hidden_units",
                   value = 20)
job = tune.run()

This is how you would grab a new project by the ID

In [None]:
proj2 = dr.Project.get("<project-id-here>")

Search through available projects with a search parameter

In [None]:
dr.Project.list(search_params={"project_name": "Lending Club"})

Subset models returned by sample percentage to exclude the 100

In [None]:
new_proj.get_models(search_params={"sample_pct":64})

# Share Projects for Collaboration

We often want to provision access to a group of users or otherwise change permissions. This can be done using the API.

In [None]:
new_proj.get_status()

In [None]:
access_list = []
access_list.append(
    dr.SharingAccess(username = "timothy.whitaker@datarobot.com", role = dr.enums.SHARING_ROLE.READ_ONLY)
)
new_proj.share(access_list)

In [None]:
for user in new_proj.get_access_list():
    print("user: {} \nrole:{}\n".format(user.username, user.role))

The same code can be used to remove permissions if needed.

In [None]:
access_list = []
access_list.append(
    dr.SharingAccess(username = "cory.kind@datarobot.com", role = None)
)
new_proj.share(access_list)

for user in new_proj.get_access_list():
    print("user: {} \nrole:{}\n".format(user.username, user.role))

# Deploy a Model 

Once we finalize our model, we can deploy it using the API. It can be helpful to get our available prediction server.

In [None]:
#Get your prediction servers
dr.PredictionServer.list()
prediction_server = dr.PredictionServer.list()[0]
prediction_server

In [None]:
#Create a new deployment
deployment = dr.Deployment.create_from_learning_model(model.id, 
                                                      label="lending club multiclass deployment2", 
                                                      description='Deployment for LC multiclass demo',
                                                      default_prediction_server_id=prediction_server.id)

In [None]:
deployment.id

I can edit my deployment to turn on additional functionality, such as Data Drift.

In [None]:
deployment.update_drift_tracking_settings(feature_drift_enabled=True)

I can also replace my model with a different version in the event of a refresh.

In [None]:
another_model = new_proj.get_models()[2]
deployment.replace_model(another_model.id, "ACCURACY")