In [12]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import *
import matplotlib.pyplot as plt
%matplotlib inline

# Linear Algebra: Especially use of numpy for vectors and matrices. Difference between 1D and 2D arrays. 

In [3]:
# Matrix multiplication in numpy
import numpy as np
SIZE = 200
A = np.random.rand(SIZE, SIZE)
B = np.random.rand(SIZE, SIZE)
out1 = A.dot(B)

[https://github.com/learn-co-curriculum/dsc-lineq-numpy-lab/tree/solution]

[Apply linear algebra to fit a function to data, describing linear mappings between input and output variables
/n Indicate how linear algebra is related to regression modeling](https://learn.co/tracks/data-science-career-v2/module-4-a-complete-data-science-project-using-multiple-regression/section-26-linear-algebra/regression-analysis-using-linear-algebra-and-numpy-code-along)

[https://github.com/learn-co-curriculum/dsc-linalg-regression-lab/tree/solution]

# Calculus and Cost functions: 

In [5]:
import math

def errors(x_values, y_values, m, b):
    y_line = (b + m*x_values)
    return (y_values - y_line)

def squared_errors(x_values, y_values, m, b):
    return np.round(errors(x_values, y_values, m, b)**2, 2)

def residual_sum_squares(x_values, y_values, m, b):
    return round(sum(squared_errors(x_values, y_values, m, b)), 2)

def root_mean_squared_error(x_values, y_values, m, b):
    return round(math.sqrt(sum(squared_errors(x_values, y_values, m, b)))/len(x_values), 2)

###    - What is gradient descent algorithm

A way to find the best fit line (m and b above)...

In [9]:
def updated_b(b, learning_rate, cost_curve_slope):
    change_to_b = -1 * learning_rate * cost_curve_slope
    return change_to_b + b

def gradient_descent(x_values, y_values, steps, current_b, learning_rate, m):
    cost_curve = []
    for i in range(steps):
        current_cost_slope = slope_at(x_values, y_values, m, current_b)['slope']
        current_rss = residual_sum_squares(x_values, y_values, m, current_b)
        cost_curve.append({'b': current_b, 'rss': round(current_rss,2), 'slope': round(current_cost_slope,2)})
        current_b = updated_b(current_b, learning_rate, current_cost_slope)
    return cost_curve

In [8]:
def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    costs = []
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y
        # avg cost per example (the 2 in 2*m doesn't really matter here.
        # But to be consistent with the gradient, I include it)
        cost = np.sum(loss ** 2) / (2 * m)
        if i % (numIterations // 40) == 0:
            print("Iteration %d | Cost: %f" % (i, cost))
            print(theta)
        costs.append(cost)
        # avg gradient per example
        gradient = np.dot(xTrans, loss) / m
        # update
        theta = theta - alpha * gradient
    sns.scatterplot(y=costs, x=[i for i in range(len(costs))])

    return theta

[More complicated version](https://github.com/learn-co-curriculum/dsc-applying-gradient-descent-lab/tree/solution)

[From Class](https://github.com/learn-co-students/dc-ds-100719/blob/master/module-4/week-1/Day-2-GradientDescent/Math_Gradient_Descent.ipynb)

### - What is learning rate(step sizes) in gradient descent.


alpha and "The size of these steps is called the learning rate. With a high learning rate we can cover more ground each step, but we risk overshooting the lowest point since the slope of the hill is constantly changing. With a very low learning rate, we can confidently move in the direction of the negative gradient since we are recalculating it so frequently. A low learning rate is more precise, but calculating the gradient is time-consuming, so it will take us a very long time to get to the bottom."

# Extension to linear models

### - Bias-Variance trade-off: Make sure you understand why this is a "trade-off".

* Bias in Inverse of Variance 
* An underfitting line is an example of a high bias low variance model. 
* An overfitting line is an example of a high variance low bias model.
* Low variance and Low bias models are the best
* Variance is the amount by which the prediction function would change if we estimated it using a different training data set.
* Bias is the error that is introduced by approximating a real-life problem, which may be extremely complicated, by a much simpler model.

 ### - Polynomial regression how this might be relevant in the context of bias-variance trade-off.


* Polynomial regression allows for better fitting data that isn't well predicted using a linear model.
* The risk of polynomial regressions is that it's easier to overfit data, so it's important to consider the Bias-Variance trade-off and perform proper cross-validation.
* High variance

[Bias-Variance Tradeoff - Lab](https://github.com/learn-co-students/dsc-bias-variance-trade-off-lab-dc-ds-100719/tree/solution)

In [13]:
# Transform with MinMaxScaler
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)

#Transform the test data (X_test) using the same scaler:
# Scale the test set
X_test_scaled = scaler.transform(X_test)

NameError: name 'X_train' is not defined

### - Ridge and Lasso models: How are they similar to each other and how are they different. Also compare these models with linear regression. Understand Ridge and Lasso in the context of bias-variance. For example: "Lasso decreases variance in the expense of adding a lit bit bias, etc."


* https://github.com/learn-co-students/dsc-ridge-and-lasso-regression-dc-ds-100719
* https://github.com/learn-co-students/dsc-ridge-and-lasso-regression-lab-dc-ds-100719/tree/solution

https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-ridge-lasso-regression-python/

### - Feature selection: Forward and Backward future selection.


Stepwise selection with p-values
* https://github.com/learn-co-students/dsc-model-fit-linear-regression-dc-ds-100719
* https://github.com/learn-co-students/dsc-model-fit-linear-regression-lab-dc-ds-100719/tree/solution

* Build a linear regression model with interactions and polynomial features
* [Use AIC and BIC to select the best value for the regularization parameter](https://github.com/learn-co-students/dsc-extensions-to-linear-models-lab-dc-ds-100719/tree/solution)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
## for correlation matrices
import seaborn as sns
%matplotlib inline
## for linear models
import statsmodels.api as sm


# Fitting the actual model
X = sm.add_constant(X)
model = sm.OLS(y, X, hasconst=True )
result = model.fit()
labels = ['intercept'] + x_cols
print(labels)
result.summary(xname=labels)

# Logistic Regression

### - ROC curves: especially understand how do we draw them by changing the threshold. 

* https://github.com/learn-co-students/dsc-roc-curves-and-auc-dc-ds-100719
* [In this lab you further explored ROC curves and AUC, drawing graphs and then interpreting these results to lead to a more detailed and contextualized understanding of your model's accuracy. With Example Of Changing Threshold](https://github.com/learn-co-students/dsc-roc-curves-and-auc-lab-dc-ds-100719/tree/solution)

### - Understand the model: log-odds, likelihood function and maximum likelihood.

### - Confusion matrices: recall-precision- True positive rate - False negative rate - F1 score.
https://skymind.ai/wiki/accuracy-precision-recall-f1

### - AUC: how do you compare two different classification algorithm by AUC?

### - What is imbalanced dataset? What are the techniques to solve this problem?
https://towardsdatascience.com/methods-for-dealing-with-imbalanced-data-5b761be45a18

### - BONUS 
https://www.reddit.com/r/learnmachinelearning/comments/8ic97h/what_is_one_hot_encoding_and_when_is_it_beneficial/