## ***Machine learning*** 

Science and art of giving computers the ability to learn to make decisions from data without being explicitly programmed.

### 1. When there are labels present, we call it ***supervised learning***. 
    
The aim of supervised learning is to build a model that is able to predict the target variable. 

> feature = predictor variable = independent variable

> target variable = dependent variable = response variable
            
* If the target variable consists of categories, *like 'click' or 'no click', 'spam' or 'not spam', or different species of flowers*, we call the learning ***task classification***. 

> The goal is to correctly label budget line items by training a model to predict the probability of each possible label by relying on some correctly labeled examples, and taking the most probable label as the correct label.
            
* If the target is a continuously varying variable, *for example, the price of a house*, it is a ***regression task***.

You need labeled data, how to get it?
1. historical data, which already has labels that you are interested in
2. perform experiments to get labeled data, such as A/B-testing to see how many clicks you get
3. crowdsourced labeling data which, like reCAPTCHA does for text recognition

Libraries in python:
* scikit-learn/sklearn
* TensorFlow
* keras

### 2. When there are no labels present, we call it ***unsupervised learning***.
    
It consists in uncovering hidden patterns and structures from unlabeled data.
            
* ***clustering*** is one branch of unsupervised learning:  group data into distinct categories based on their features without knowing in advance what these categories may be.

### 3. When machines or software agents interact with an environment, it is ***reinforcement learning***. 
    
Reinforcement agents are able to automatically figure out how to optimize their behavior given a system of rewards and punishments. It draws inspiration from behavioral psychology.
            
* ***Deep learning***

## Useful imports in scikit-learn:

#### process

* `from sklearn.pipeline import Pipeline`

#### datasets and data prep:

* `from sklearn import datasets`

* `from sklearn.model_selection import train_test_split`

* `from sklearn.preprocessing import Imputer`

* `from sklearn.preprocessing import scale` # the performance of a model can improve greatly if the features are scaled

* `from sklearn.preprocessing import StandardScaler`

#### Hyperparameter tuning 

It should be performed on the training set.

* `from sklearn.model_selection import GridSearchCV`

* `from sklearn.model_selection import RandomizedSearchCV `

#### models:

##### classification

* `from sklearn.neighbors import KNeighborsClassifier` 

* `from sklearn.linear_model import LogisticRegression`

* `from sklearn.tree import DecisionTreeClassifier`

* `from sklearn.svm import SVC`

##### regression

* `from sklearn.linear_model import LinearRegression`

* `from sklearn.linear_model import Lasso`

* `from sklearn.linear_model import Ridge`

* `from sklearn.linear_model import ElasticNet`

#### metrics

Evaluation should be done on unseen data, the hold-out set.

* `from sklearn.metrics import classification_report, confusion_matrix`

* `from sklearn.metrics import roc_curve`

* `from sklearn.metrics import roc_auc_score`  

* `from sklearn.model_selection import cross_val_score` 

* `from sklearn.metrics import mean_squared_error`

