# Linear and Logistic Regression
### Simple, yet powerful predictiors



## Linear Regression:
### Predict continuous values
### Intuition
- Regression: predicting a continuous variable
- Problem statement:
    - given pairs of (x, y) points, create a model
        - input x, output y: goal: predict y given x
                - under the ssumption that y depends linearly on x(and nothing else)
- Modelling function:
    - $ \tilde{y} = ax + b $
    - many samples: for each sample $ (x_1,, y_1), ... , (x_m, y_m) $
        - $ \tilde{y_i} = ax_i + b, i \in [1; m] $
        - many variables: $ \tilde{y} = a_1.x_1 + a_2.x_2 + ... + a_n.x_n +   b   \equiv a^T.X + b $

## Training
- Loss function:
    - for each sample i, $ i \in [1,m] $
    - $ d_i = (\tilde{y_i} - y_i)^2$ 

- Total cost function:
    - also called simply "cost function"
    - $ J = 1/m \sum\limits_{i=1}^m (\tilde{y_i} - y_i)^2 $
    - *J* depends on a,b,x,y
    
- Training process:
    - minimize the cost function:
        - we're looking for parameters a,b that lead to min J
        - written as "arg min J"

## Gradient Descent
- Input a,b, output J
- Parabloid (3D parabola)
    - it has exactly one min value
        - and we can see it
- Intuition:
    - if the plot was a real object (say, a scheet of some sort), we could  slide a ball bearing on it
    - after a while, hte ball bearing will settle at the 'bottom' due to gravity
    - we can 'simulate' this : **gradient descent**
- Reminder: gradient:
        - multi 0dimensional derivative:
$\nabla J = \Bigg\{ \frac{\frac{\frac{\partial J}{\partial a}}{\frac{\partial J}{\partial b}} \}$

In [5]:
## DEMO
### Multiple Linear regression:

In [44]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

In [45]:
housing = pd.read_csv("data/housing.data", header=None,sep='\s+')
housing.columns = ["crime_rate", "zoned_land", "industry", "bounds_river",
"nox_conc", "rooms", "age", "distance", "highways", "tax", "pt_ratio",
"b_estimator", "pop_status", "price"]

In [46]:
housing

Unnamed: 0,crime_rate,zoned_land,industry,bounds_river,nox_conc,rooms,age,distance,highways,tax,pt_ratio,b_estimator,pop_status,price
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.0900,1,296.0,15.3,396.90,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242.0,17.8,396.90,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222.0,18.7,396.90,5.33,36.2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
501,0.06263,0.0,11.93,0,0.573,6.593,69.1,2.4786,1,273.0,21.0,391.99,9.67,22.4
502,0.04527,0.0,11.93,0,0.573,6.120,76.7,2.2875,1,273.0,21.0,396.90,9.08,20.6
503,0.06076,0.0,11.93,0,0.573,6.976,91.0,2.1675,1,273.0,21.0,396.90,5.64,23.9
504,0.10959,0.0,11.93,0,0.573,6.794,89.3,2.3889,1,273.0,21.0,393.45,6.48,22.0


## Create a Model
- Like in 2D example

In [47]:
housing_model = LinearRegression()
predictor_attributes = housing.drop("price", axis = 1)
housing_model.fit(predictor_attributes, housing.price)
print(housing_model.coef_)
print(housing_model.intercept_)

[-1.08011358e-01  4.64204584e-02  2.05586264e-02  2.68673382e+00
 -1.77666112e+01  3.80986521e+00  6.92224640e-04 -1.47556685e+00
  3.06049479e-01 -1.23345939e-02 -9.52747232e-01  9.31168327e-03
 -5.24758378e-01]
36.4594883850897


In [48]:
test_houses = housing.sample(10)
predicted = housing_model.predict(
test_houses.drop("price", axis = 1))
print(predicted)
print(test_houses.price)

[23.37308644 19.29559075 18.52177132  6.4519857  27.41266734 26.12796681
 27.2136458  22.14837562 23.98742856 21.28152535]
64     33.0
396    12.5
422    20.8
388    10.2
91     22.0
504    22.0
288    22.3
349    26.6
62     22.2
50     19.7
Name: price, dtype: float64


## Regression with Outliers:
- As we saw, the data has outliers:
    - a few points which are far from the others
- Our gial is to exlude outliers:
    - there are several methods:
        - one very common - RANSAC (RANdom SAmple Consensus)
- ALgorithm:
    1. Fit a model to a random subsample('inliers')
    2. Test all data points and include those which are 'near' the model:
        - small enough error, tolerance provided by developer
    3. Fit the model again
    4. Estimate the error of the model (difference between first and second)
    5. Iterate steps 1-4 until performance reaches a threshold or number of iterations
    

In [49]:
from sklearn.linear_model import RANSACRegressor
ransac = RANSACRegressor()
ransac.fit(housing.drop("price", axis = 1), housing.price)
print(ransac.estimator_.coef_, ransac.estimator_.intercept_)

[-2.07899484e-01  6.28661556e-02 -1.15380345e-01  3.66246719e-01
 -6.96164418e+00  8.34530244e+00 -8.51878916e-02 -1.13187113e+00
  1.13685351e-01  1.28236622e-03 -7.13848885e-01  2.83922510e-02
  1.12029668e-01] -14.952407698402258


- We can also privide parameters, e.g. min number of random samples, max iterations, threshold (to include data points)
- We can also provide the type of model we want to perform RANSAC on
    - Linear regression by default but we may use other refression models

In [50]:
ransac = RANSACRegressor(LinearRegression(), min_samples = 50,
max_trials = 100, residual_threshold = 5.0)

In [51]:
# view inliers and outliers
inliers = housing[ransac.inlier_mask_]
outliers = housing[~ransac.inlier_mask_]
plt.scatter(inliers.rooms, inliers.price)
plt.scatter(outliers.rooms, outliers.price)

AttributeError: 'RANSACRegressor' object has no attribute 'inlier_mask_'

## Polynomial Regression
- Extension of the linear regression algorithm
    - we can use the linear regression algorithm to perform polynomial regression(e.g. fitting a quadratic curve)
    - Just precompute the columns:
        - example 1:
        - example 2: 

In [52]:
from sklearn.preprocessing import PolynomialFeatures
x = np.arange(6).reshape(3, 2)
poly = PolynomialFeatures(2)
x_transformed = poly.fit_transform(x)
print(poly.get_feature_names())
print(poly.n_input_features_)
print(poly.n_output_features_)
# Now we can perform linear regression with x_transformed as the input

['1', 'x0', 'x1', 'x0^2', 'x0 x1', 'x1^2']
2
6


## Common Mistakes:
- There are two main types of errors we can make while trying regression models
    - Use a **wrong model** : anscombe's quartet
    - **Extrapolate** without knowing (expecially if we have interacting features)

## Logistic Regression
### Use a regression model to classify

### Classification
- Predict **one of several known classes**
    - based on the imput parameters
    - ex: classify whether a picture is of a cat or a dog
- Regression and classification make up most of the machine learning problems
- Choosing an algorithm:
    - 'No free lunc': no single algorithm works best
    - It's best to compare some algorithms to select best for a particular model
        - also, we might want to tune them first
    - Reminder: ML process
        - select features, choose a performance metric (cost function), choose a classifier, evaluate and fine-tune the performance
- Classification algorithm (despite its name)
- Two classes: negative-(0) and positive (1)
        - can be extended to more classes
- How does it work?
    - linear refression can give us all kinds of values
    - we want to constrain them between 0 and 1
    - approach:
         - perform linear regression: $ \widetilde{y} = \beta x $
    - use the sigmoid function to constrain the output:
$$ \sigma (\widetilde{y}) = \frac{1}{ 1 + e^{\tilde{-y}} = \frac{1}{ 1 + e^{- \beta^T . x}}   } $$


- Quantization: if $ \sigma > 0.5 $ return 1, and 0 otherwise
    - remember that we only need to return 0 or 1
    - we can also use the raw values as probability measures

# DEMO

In [54]:
# perform logistic regression
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(C = 1e6)
model.fit(iris_train_data, iris_train_labels)


NameError: name 'iris_train_data' is not defined

In [None]:
# test output classes or probabilities
print(model.predict(iris_test))
print(model.predict_proba(iris_test))

- In the model, there's a 'mysterious' parameter C
    - regularization: how powerful the data is (more-next time)
    - a large number means no regularization
        - we just take the data 'as-is' , with no other constraints
### Many Classes
- Two main approaches
    - One-vs-all: several predictors
        - one predictor for each class vs.the others
    - Overall: calculate probabilities of each class
- scikit-learn takes care of multiple classes - multinomial logistic regression by default
    - we don't even need to transform the labels
    - this applies to all algorithms in the library