## Starting a Binary Classification Project

**Author**: Thodoris Petropoulos

**Label**: Modeling Options
### Scope
The scope of this notebook is to provide instructions on how to initiate a DataRobot project for a Binary Classification target using the Python API.

### Background
Binary classification is the task of classifying the elements of a given set into two groups.

Examples:

- A customer is a churner or not.
- A loan is going to default or not.
- A patient has a disease or not.

Most commonly, the target column will have values:

- 0/1
- Yes/No
- True/False

### Requirements

- Python version 3.7.3
-  DataRobot API version 2.19.0. 
Small adjustments might be needed depending on the Python version and DataRobot API version you are using.

Full documentation of the Python package can be found here: https://datarobot-public-api-client.readthedocs-hosted.com/en/

#### Import Libraries

In [1]:
import datarobot as dr
import pandas as pd
import numpy as np

#### Import Dataset
We will be loading the breast cancer dataset. A very simple binary classification dataset that is available through sk-learn.

In [2]:
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()

df = pd.DataFrame(np.c_[data['data'], data['target']],
                  columns= np.append(data['feature_names'], ['target']))
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,target
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0.0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0.0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0.0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0.0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0.0


#### Connect to DataRobot
Connect to DataRobot using your credentials and your endpoint. Change input below accordingly.

In [None]:
dr.Client(token='YOUR_API_KEY', 
          endpoint='YOUR_DATAROBOT_HOSTNAME')

#### Initiate Project
I will be initiating a project calling the method <code>dr.Project.start</code>:
* project_name: Name of project
* source_data: Data source (Path to file or pandas dataframe)
* target: String with target variable name
* worker_count: Amount of workers to use
* metric: Optimisation metric to use

In [None]:
project = dr.Project.start(project_name='MyBinaryClassificationProject',
                        sourcedata= df,
                        target='target')

project.wait_for_autopilot() #Wait for autopilot to complete