## Exam Review


### Exploratory Data Analysis

* Formulate Goal
* Collect data
* Specify and explore variables
* Check assumptions

#### Descriptive Statistics
* Mean, Median and Mode
* Variance and standard deviation
    - Population versus Sample
* Percentiles
* Covariance and Correlation

#### Visualization 

* Scatterplot
* Histogram
* Pairplots
* Box plots 
* Distribution plots

#### Data Preprocessing

* Missing Values
    - Imputing
* Scaling
    - Normalization 
    - Standardization
* Encoding Categorical variables
    - Label encoding
    - One-Hot encoding
* Training, Validation and Testing Data
    - Regression versus classification


### Learning Methods for Regression and Classification

* Supervised Learning for tabular data: columns are the dependent and independent variables, the rows are the observations (examples).
* Regression: real-valued dependent variable, real-valued or categorical independent variables
* Classification: categorical dependent variable, real-valued or categorical independent variables 

#### Modeling Data

* Learn model parameters by **fitting** model to the training data
* Extract estimated parameters from model
* Predict the test data using the fitted model
* Determine how good the model is

 

#### Linear Regression
  
<div style="font-size: 115%;"> $\{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\}$, $x_i \in R^d$, $y_i \in R$ </div>

<div style="font-size: 115%;">
$$Y = \beta_0 + \beta_1{X_1} + \beta_2{X_2} +...+\beta_p{X_p} + \epsilon $$
</div>



* Parameters: intercept and coefficients
* Numerical dependent variable, numerical or categorical independent variable(s)
* Simple,Multivariate and Polynomial Linear Regression
* Ordinary Least Squares
    - Residuals
    - Assumptions
* Maximum Likelihood Estimation
    - Likelihood function versus PMF or PDF
    - Method to solve for maximum
* Closed form solutions for parameters
    - Design Matrix and Normal equation
* Interpretation of the parameters
* Best fitting line
* Goodness of fit
    - R-squared
    - Mean Squared Error and Root Mean Squared Error


### Probability

* Random Variable
* Joint, conditional, and marginal distribution of RVs
* Independence 
* Expectation of RV
    - Discrete and Continuous
    - Expectation of a function of an RV
* Bayes Theorem
* Distributions
    * PMF,PDF,CDF,PPF,RVS 
    * Bernoulli, Binomial
    * Normal, Uniform, Exponential
    * Central Limit Theorem
    * Theoretical mean and variance versus random generation
    

#### Logistic Regression and Multinomial Regression


<div style="font-size: 115%;"> $\{(x_1,y_1),(x_2,y_2),...,(x_n,y_n)\}$, $x_i \in R^d$, $y_i \in \{0,1\}$ </div>


<div style="font-size: 115%;">

$$ Y \sim Bernoulli(p(X))$$
$$ p(X) = P(Y = 1)|X) = logistic(\beta_0 + \beta_1X) = \frac{e^{\beta_0 + \beta_1X}} {1+e^{\beta_0 + \beta_1X}} = \frac{1}{1 + e^{-(\beta_0 + \beta_1X)}})$$
</div>

* Parameters: intercept and coefficients of linear function of the data  
* Assume dependent variable has a Bernoulli distribution  
* Logistic (sigmoid) function maps infinite line defined by the parameters to a probability  
* Classification of dependent variable that has 2 classes  
* Predicts probability that a data point belongs to one of the two categories  
* No closed form of MLE for the parameters  
* Logit function: log odds, inverse of the logistic function  
* Interpretation of the coefficients: changes the log odds of a successful classification  

* Goodness of fit
    * Confusion Matrix
        - Accuracy, Error Rate, Precision, Recall, F!
        - ROC Curve and Area Under Curve

        
* Extend to more then two classes using the softmax function
* Generalized Linear Model (GLM)
    - Specify distribution of dependent variable
    - Specify Link function (i.e. inverse of function that transforms the independent variables 
    

### Linear Algebra

* Matrix multiplication and dot product
* Eigenvalue and eigenvector
* Inverse and Moore-Penrose Inverse
* Eigen decomposition

### Neural Network Basics

* Layers: Input, hidden, output
* Nodes: Transforms input using activation function
* Weights: dot product with input data or output of previous layer
    - Only thing that can be changed during training to improve network performance
* Links: connect the nodes
* Activation function: introduce non-linearity
    - ReLU, sigmoid, softmax, tanh
* Loss function: compares output prediction with truth to determine the error in the prediction
    - Mean Squared Error: Linear Regression
    - Binary Cross Entropy: Logistic Regression
    - Cross Entropy: Multinomial Regression
* Backpropagation: Uses the chain rule to determine how much each node contributes to the error
* Optimization: Weight update
    * Stochastic Gradient Descent: Updating the weights gradually. The rate is determined by the learning rate
    * Adam

### Multilayer Perceptron for Logistic and Multinomial Regression

Artificial Neural Network with one or two hidden layers applied to tabular data

#### Pytorch

* Utilizes a computational graph that holds the gradients of the tensors  
* Define Model class with forward function.  
* Uses __call()__ so model acts as a function   
* Must convert numpy arrays to tensors  
* Define Loss function and optimizer  
* Training loop
    - Clear gradients
    - Execute forward pass to produce output predictions
    - Determine Loss on training data
    - Execute backward pass
    - Update weights
* Execute forward pass with test data without gradients
* Covert tensors to numpy arrays to use sklearn functions to determine model accuracy

### Support Vector Machines

* Can be used for Classification and Regression
* Find hyperplane that maximally separates the data
* Optimization problem using Lagrange Multipliers
* Maximum margin determined by similarity of examples (i.e. dot product)
* Use slack variables for slightly non-separable data

#### Kernel Method and kernel trick For Non-linear data

* Project data to higher dimension
* Solve optimization problem in higher dimension calculation the dot products
* Kernel functions allow you to calculate the dot product in the higher dimension with actually having to do the transformation.
    - Polynomial and radial basis kernels, Hypertangent

### Format of Exam

* Jupyter lab notebook, like exercises
* Open notes
* Can use internet
* All work must be your own
* 10am-11:50am
* You can take the exam at a location of your choosing.
* Mixture of short coding questions and some short answer using text cells.