# Machine Learning

#### Basics
- Suite of algorithms and techniques that learn from data, find patterns, and make predictions
- Useful with multivariate systems that are too complicated to use classical statistics
- Two main types
    - Supervised Learning
        - learning from labeled examples
    - Unsupervised Learning
        - learning from unlabeled data to create categories/clusters

## Classification

#### Linear Regression

#### Bias
- Selection Bias
    - where did the data come from, and what are you missing
    - your sample vs. the population
- Publication Bias
    - only positive/significant results published
    - highly influence by $\alpha < 0.05$ threshold
- Non-Response Bias
    - people did/didn't respond are different
- Length Bias
    - especially important with time based measurements
    - bias towards individuals (data points) that remain in a specific state longer
        - you miss the two day sample and collect the 20 yr sample, this happens repeatedly
- Calculation
    - difference between your estimation and the "truth"
        - almost impossible to find the "truth", so really can't calculate bias accurately
- MSE (meas squared error)
    - $MSE = variance + bias^2$
    - decreasing bias often means taking more samples or increasing model complexity, which means increasing variance
    - it's a tradeoff, and decreasing bias may increase variance enough to make the MSE very large
    - want to minimize MSE, rather than bias
- Fisher Weighting
    - way to take multiple independent unbiased estimators and combine into one number
    - take a weighted average
        - individual weights add up to one
        - individual weights inversely proportional to the variance
- Nate Silver Weighting
    - take each variable into account based on particular circumstances
        - past reliability, known biases, etc.
- Bonferroni Correction
    - divide $\alpha$/number of tests

- Regression toward the mean (RTTM)
    - predicting things like son's heights based on father's heights
        - the son won't be exactly the same height as the father, and will likely be between the father's height and the mean
        - combination of luck/skill (luck = chance, skill = effect)
        - example praising good performance leads to worse performance and punishing bad performance leads to better performance:
            - RTTM would predict bad to get better and good to get worse in many cases just by chance

#### OLS (Ordinary Least Squares)
- defines a linear model (the model is not OLS, that is the procedure to generate the linear model)
- **always plot the residuals!**
    - no pattern is a good thing
    - heteroscedasticity: bigger spread on one side
    - nonlinear: distinct pattern to the residuals (U-shape or similar)
- Goodness of fit
    - $R^2$
        - explained or 'accounted for' variance
        - amount of variance captured by the model
    - bad to use for telling how good a model is at predicting
        - $R^2$ will always go up as you add more variables
        - cross validation is better

#### Linear Regression Tips
- When in doubt, take the log of predictor variables
- Avoid collinearity
    - using independent variables that are highly correlated with each other
    - tends to inflate regression coefficient, create high variances in estimates, and high instability
    - variance inflation factor helps to measure collinearity (1-2 very little to no collin. while over 20 is extreme collin.)

### Logistic Regression
- Good for predicting binary outcomes
    - think of the logistic s curve, the bottom is the first outcome with a value of 0, and the second outcome is the top of the curve with an outcome of 1
- Odds ratio
    - probability of an outcome is $p$
    - odds of the outcome are $p/(1-p)$
        - odds = 10/1
        - probability: 10 = p/(1-p) -> 10 - 10p = p -> 10 = 11p -> p = 10/11 = ~0.91
        - if p is very small, 1-p is about 0 and odds are about equal to prob, but only in this case
    - odds ratio = $\frac{(p_1/(1-p_1))}{(p_2/(1-p_2))}$
        - where $p_1$ is the probability of outcome 1 and $p_2$ is the probability of outcome 2