## Starting a Regression Project
### Scope
The scope of this notebook is to provide instructions on how to initiate a DataRobot project for a numerical target using the R API.

### Background
Regression Analysis is the task of predicting the value of a continuous target column.

Examples:

- Predict Life Time Value (LTV) of customer.
- Predicting player performance.
- Predicting house price.

The target column will always be a continuous numeric variable even though regression could also be applicable a discreet high cardinality variable.

### Requirements

- Python version 3.7.3
-  DataRobot API version 2.19.0. 
Small adjustments might be needed depending on the Python version and DataRobot API version you are using.

Full documentation of the Python package can be found here: https://datarobot-public-api-client.readthedocs-hosted.com/en/

#### Import Libraries

In [1]:
import datarobot as dr
import pandas as pd
import numpy as np

#### Import Dataset
We will be loading the Boston Housing dataset. A very simple dataset for regression that is available through sk-learn.

In [2]:
from sklearn.datasets import load_boston
data = load_boston()

df = pd.DataFrame(np.c_[data['data'], data['target']],
                  columns= np.append(data['feature_names'], ['target']))
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,target
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


#### Connect to DataRobot
Connect to DataRobot using your credentials and your endpoint. Change input below accordingly.

In [None]:
dr.Client(token='YOUR_API_KEY', 
          endpoint='YOUR_DATAROBOT_HOSTNAME')

#### Initiate Project
I will be initiating a project calling the method <code>dr.Project.start</code>:
* project_name: Name of project
* source_data: Data source (Path to file or pandas dataframe)
* target: String with target variable name
* worker_count: Amount of workers to use
* metric: Optimisation metric to use

In [None]:
project = dr.Project.start(project_name='MyRegressionProject',
                        sourcedata= df,
                        target='target')
project.wait_for_autopilot() #Wait for autopilot to complete