### Bias-Variance Trade-Off:
- **Error due to Bias:**
    - The error due to bias is taken as the difference between the expected (or average) prediction of our model and the correct value which we are trying to predict. Of course you only have one model so talking about expected or average prediction values might seem a little strange. However, imagine you could repeat the whole model building process more than once: each time you gather new data and run a new analysis creating a new model. Due to randomness in the underlying data sets, the resulting models will have a range of predictions. Bias measures how far off in general these models' predictions are from the correct value.

- **Error due to Variance:** 
    - The error due to variance is taken as the variability of a model prediction for a given data point. Again, imagine you can repeat the entire model building process multiple times. The variance is how much the predictions for a given point vary between different realizations of the model.

- Essentially, bias is how removed a model's predictions are from correctness, while variance is the degree to which these predictions vary between model iterations.

-  Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data (underfitting).
- Variance is the algorithm's tendency to learn random things irrespective of the real signal by fitting highly flexible models that follow the error/noise in the data too closely (overfitting).
<img src="./images/Bias-Variance-1.png" style="width: 300px;"/>
<img src="./images/Bias-Variance-3.png" style="width: 300px;"/>
<img src="./images/Bias-Variance-2.png" style="width: 300px;"/>



### My defintion of the bias-variance trade-off

-  Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data (underfitting). Parametric algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet the simplifying assumptions of the algorithms bias. High-Bias: Suggests more assumptions about the form of the target function. For example, a high-bias can be that you're relying too much on your sample size, an inaccurate one. Examples of high-bias machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression. 

- Error due to variance is the degree to which these predictions vary between model iterations. Low Bias: Suggests less assumptions about the form of the target function. Examples of low-bias machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines. Variance models caputure too much of the noise and signa;. Thus, everything it trains on a model, it will always be very different and specific to each model.






### True Error VS. Residuals

In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "theoretical value". The error (or disturbance) of an observed value is the deviation of the observed value from the (unobservable) true value of a quantity of interest (for example, a population mean), and the residual of an observed value is the difference between the observed value and the estimated value of the quantity of interest (for example, a sample mean). 

### Bias vs Unbiased estimator

- An estimator is consistent if, as the sample size increases, the estimates (produced by the estimator) "converge" to the true value of the parameter being estimated. To be slightly more precise - consistency means that, as the sample size increases, the sampling distribution of the estimator becomes increasingly concentrated at the true parameter value.
- An estimator is unbiased if, on average, it hits the true parameter value. That is, the mean of the sampling distribution of the estimator is equal to the true parameter value.
- The two are not equivalent: Unbiasedness is a statement about the expected value of the sampling distribution of the estimator. Consistency is a statement about "where the sampling distribution of the estimator is going" as the sample size increases.


### Standard Error VS Standard Deviation

- It depends. If the message you want to carry is about the spread and variability of the data, then standard deviation is the metric to use. If you are interested in the precision of the means or in comparing and testing differences between means then standard error is your metric.
- So standard deviation describes the variability of the individual observations while standard error shows the variability of the estimator.

### Confidence Interval VS. Prediction Interval

- **Prediction Interval:** 
     - 95% (or whatever you decide to go with) of the y-values to be found for a certain X-value will be within the interval range around the linear regression line. 
- **Confidence Interval**: 
    - There's a 95% probability that the true best-fit line for the population lies within the confidence interval. 
- Meaning, the confidence interval provides us the boundary of where the true population will be. Thus, the range will be around the line of best fit since most sample data resembles the population data. However, the prediction interval takes into account the true error term. Meaning, we can't take into account random things. So, the predition interval will have a wider range because there's more random occurences. 


### False Positive VS. False Negative
** False Negative**
- A false negative is a test result that indicates a person does not have a disease or condition when the person actually does have it, according to the National Institute of Health (NIH). False negative test results can occur in many different medical tests, from tests for pregnancy , tuberculosis or Lyme disease to tests for the presence of drugs or alcohol in the body.
- A Type 2 error happens if we fail to reject the null when it is not true. This is a false negative—like an alarm that fails to sound when there is a fire.

** False Positive**
- Correspondingly, a false-positive test result indicates that a person has a specific disease or condition when the person actually does not have it. An example of a false positive is when a particular test designed to detect melanoma, a type of skin cancer , tests positive for the disease, even though the person does not have cancer.
- We commit a Type 1 error if we reject the null hypothesis when it is true. This is a false positive, like a fire alarm that rings when there's no fire.


**What's worse? It depends on the situation:**
- For example, in cancer, we would prefer to have false positives than false negatives. We would prefer to tell people they have cancer and they don't, than tell people they don't have cancer and they do. 
- For spam, we would prefer to have false negatives. We would prefer to label a good email as nonspam, than label a good email as spam.  

### Specificity VS. Sensitivity

**What is a Sensitive Test?**
- The sensitivity of a test (also called the true positive rate) is defined as the proportion of people with the disease who will have a positive result. In other words, a highly sensitive test is one that correctly identifies patients with a disease. A test that is 100% sensitive will identify all patients who have the disease. It’s extremely rare that any clinical test is 100% sensitive. A test with 90% sensitivity will identify 90% of patients who have the disease, but will miss 10% of patients who have the disease.
- A ** highly sensitive test** can be useful for ruling out a disease if a person has a negative result. For example, a negative result on a pap smear probably means the person does not have cervical cancer. The acronym widely used is SnNout (high Sensitivity, Negative result = rule out).
- If a person has a disease, how often will the test be positive (true positive rate)? 
- Put another way, if the test is highly sensitive and the test result is negative you can be nearly certain that they don’t have disease. 

**What is a Specific Test?**
- The specificity of a test (also called the True Negative Rate) is the proportion of people without the disease who will have a negative result. In other words, the specificity of a test refers to how well a test identifies patients who do not have a disease. A test that has 100% specificity will identify 100% of patients who do not have the disease. A test that is 90% specific will identify 90% of patients who do not have the disease and 10% of patients who do not have the disease.
- **Tests with a high specificity** (a high true negative rate) are most useful when the result is positive. A highly specific test can be useful for ruling in patients who have a certain disease.
- If a person does not have the disease how often will the test be negative (true negative rate)?
- In other terms, if the test result for a highly specific test is positive you can be nearly certain that they actually have the disease.

### Accuracy VS Precision


- **Accuracy** refers to the closeness of a measured value to a standard or known value. For example, if in lab you obtain a weight measurement of 3.2 kg for a given substance, but the actual or known weight is 10 kg, then your measurement is not accurate. In this case, your measurement is not close to the known value.


- **Precision** refers to the closeness of two or more measurements to each other. Using the example above, if you weigh a given substance five times, and get 3.2 kg each time, then your measurement is very precise. Precision is independent of accuracy. You can be very precise but inaccurate, as described above. You can also be accurate but imprecise.


- For example, if on average, your measurements for a given substance are close to the known value, but the measurements are far from each other, then you have accuracy without precision.


- A good analogy for understanding accuracy and precision is to imagine a basketball player shooting baskets. If the player shoots with accuracy, his aim will always take the ball close to or into the basket. If the player shoots with precision, his aim will always take the ball to the same location which may or may not be close to the basket. A good player will be both accurate and precise by shooting the ball the same way each time and each time making it in the basket. 

### Frequentist Reasoning VS. Bayesian Reasoning

- **Frequentist Reasoning**
    - I can hear the phone beeping. I also have a mental model which helps me identify the area from which the sound is coming. Therefore, upon hearing the beep, I infer the area of my home I must search to locate the phone.


- **Bayesian Reasoning**
    - I can hear the phone beeping. Now, apart from a mental model which helps me identify the area from which the sound is coming from, I also know the locations where I have misplaced the phone in the past. So, I combine my inferences using the beeps and my prior information about the locations I have misplaced the phone in the past to identify an area I must search to locate the phone.


- The essential difference between Bayesian and Frequentist statisticians is in how probability is used. Frequentists use probability only to model certain processes broadly described as "sampling." Bayesians use probability more widely to model both sampling and other kinds of uncertainty

### Precision VS. Recall
<img src="Confusion-Matrix.png">

THINK of it like this. 
**PRECISION** is used on the predicted values.
**RECALL** is used on the actual values

- **Precision**: Out of all the patient that we predicted that have cancer (or 1), what fraction actually have cancer?
    - TRUE POSITIVES/PREDICTED POSITIVES -> TRUE POSITIVES/(TRUE POSITIVES + FALSE POSITIVES)
    - Using the box, this is represented as row 1
- **Recall**: Out of all the patient that actually have cancer (or 1), what fraction did we correctly detect as having cancer?
    - TRUE POSITIVES/ACTUAL POSITIVES -> TRUE POSITIVES/(TRUE POSITIVES + FALSE NEGATIVES)
    - Using the box, this is represented as col. 1 
    
- The tradeoff is the precision looks at the prediciton. How will did we predict. So even if we didn't predict ALL the values, if we did well on the predicitons we made, we have a good precision.
- However, recall looks at all the actual values (that are 1) and see how well we predicted out of those. IT doesn't care if we, say predicted all the values are 1, bc it would return a high recall score since we predicted 1 to most of the actual 1's

### Bagging VS. Boosting

- **Bagging**: Bagging attempts to reduce the chance overfitting complex models.
    - It trains a large number of "strong" learners in parallel.
    - A strong learner is a model that's relatively unconstrained.
    - Bagging then combines all the strong learners together in order to "smooth out" their predictions.
- **Boosting**: Boosting attempts to improve the predictive flexibility of simple models.
    - It trains a large number of "weak" learners in sequence.
    - A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree).
    - Each one in the sequence focuses on learning from the mistakes of the one before it.
    - Boosting then combines all the weak learners into a single strong learner.
- While bagging and boosting are both ensemble methods, they approach the problem from opposite directions. Bagging uses complex base models and tries to "smooth out" their predictions, while boosting uses simple base models and tries to "boost" their aggregate complexity.