## Starting a Project with Selected Blueprints

**Author**: Thodoris Petropoulos

**Label**: Modeling Options

### Scope
The scope of this notebook is to provide instructions on how to initiate a DataRobot project manually where the user has the option to choose which models/blueprints he wants to initiate. The procedure below should work for any type of problem you are trying to solve (regression, classification, time series, etc).

### Requirements

- Python version 3.7.3
-  DataRobot API version 2.19.0. 
Small adjustments might be needed depending on the Python version and DataRobot API version you are using.

Full documentation of the Python package can be found here: https://datarobot-public-api-client.readthedocs-hosted.com

#### Import Libraries

In [27]:
import datarobot as dr
import pandas as pd
import numpy as np
import time

#### Import Dataset
We will be loading the breast cancer dataset. A very simple binary classification dataset that is available through sk-learn.

In [2]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

df = pd.DataFrame(np.c_[data['data'], data['target']],
                  columns= np.append(data['feature_names'], ['target']))
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0.0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0.0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0.0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0.0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0.0


#### Connect to DataRobot
Connect to DataRobot using your credentials and your endpoint. Change input below accordingly.

In [3]:
dr.Client(token='YOUR_API_KEY', 
          endpoint='YOUR_DATAROBOT_HOSTNAME')

<datarobot.rest.RESTClientObject at 0x11fc81cf8>

#### Initiate project
We will be initiating the project using <code>autopilot_on = False</code>. This way DataRobot will not start building models until we specify which ones we want to build

In [4]:
project = dr.Project.start(project_name='MyBinaryClassificationProject',
                        sourcedata= df,
                        autopilot_on = False,
                        target='target')

#### Find all of the blueprints
We can use the <code>get_blueprints</code> method to see all of the blueprints DataRobot generated.

In [22]:
blueprints = project.get_blueprints()

#Now that we have the Blueprints, we can search for a specific blueprint. 
#For example all models that have "Gradient" in their name

models_to_run = []
for blueprint in blueprints:
    if 'Gradient' in blueprint.model_type:
        models_to_run.append(blueprint)
        

In [23]:
models_to_run

[Blueprint(Gradient Boosted Trees Classifier),
 Blueprint(Stochastic Gradient Descent Classifier),
 Blueprint(Light Gradient Boosted Trees Classifier with Early Stopping),
 Blueprint(Gradient Boosted Greedy Trees Classifier),
 Blueprint(eXtreme Gradient Boosted Trees Classifier with Early Stopping),
 Blueprint(eXtreme Gradient Boosted Trees Classifier),
 Blueprint(eXtreme Gradient Boosted Trees Classifier (learning rate =0.01)),
 Blueprint(Light Gradient Boosting on ElasticNet Predictions ),
 Blueprint(eXtreme Gradient Boosted Trees Classifier),
 Blueprint(eXtreme Gradient Boosted Trees Classifier (learning rate =0.01)),
 Blueprint(eXtreme Gradient Boosted Trees Classifier (learning rate =0.01)),
 Blueprint(eXtreme Gradient Boosted Trees Classifier with Unsupervised Learning Features),
 Blueprint(eXtreme Gradient Boosted Trees Classifier),
 Blueprint(eXtreme Gradient Boosted Trees Classifier with Early Stopping),
 Blueprint(Gradient Boosted Trees Classifier),
 Blueprint(eXtreme Gradien

#### Lets now initiate these models
We can use the <code>train</code> method to initiate modeling for a specific blueprint. By default, the feature list used will be the <code>informative features </code> list produced by DataRobot but you can define your own feature list and pass it on the <code>featurelist_id</code> variable.

In [24]:
for model in models_to_run:
    project.train(model, sample_pct = 80, featurelist_id=None)

#### Waiting for job completion
We can use the <code>get_all_jobs</code> method to wait for the models to finish running

In [28]:
while len(project.get_all_jobs()) > 0:
    time.sleep(1)
    pass