StatQuest Notes

A. 2020/02/29 - StatQuest: The standard error (A+B_Tipping.ipynb) https://www.youtube.com/watch?v=XNgt7F6FqDU

Standard error is essentially, a standard deviation of the mean of bootstrap samples (many samples from the same population)
Sampling with replacement means if I have a sample of [1, 2, 8], my first sample could be [1, 2, 2] (repeated sampling of the same measurement), and my second sample can be [2, 2, 2]

B. 2020/02/29 - StatQuest: Confidence Intervals (A+B_Tipping.ipynb) https://www.youtube.com/watch?v=TqOeMYtOc1w

Sample mean is not the mean of the whole population, but we can use bootstrapping to determine reasonable values of the mean for the whole population
A 95% confidence interval is just an interval that covers 95% of the bootstrapped means
Two-tailed t-tests are used to determine whether there is a statistically significant difference between two population's means while one-tailed t-tests are used to determine whether a population's mean is statistically larger than or lesser than another population's mean

R^2 is the percentage of variation explained by the relationship between two variables
R^2 is easier to interpret because it's not obvious that R=0.7 is twice as good as R=0.5, but R^2=0.7 is what it looks like, 1.4 times as good as R^2=0.5

D. 2020/03/06 - What is a statistical distribution? https://www.youtube.com/watch?v=oI3hZJqXJuc

Advantage of curve is not limited by width of bins and can estimate the measurements

D. 2020/03/06 - Histograms, Clearly Explained https://www.youtube.com/watch?v=qBigTkBLU6g

Histograms are bins of measurements plotted to see how often a measurement has occurred

E. 2020/03/06 - The Normal Distribution, Clearly Explained!!! https://www.youtube.com/watch?v=rzFX5NWojp0

Normal distributions are always centered on the average value
The width of the curve is defined by the standard deviation
Normal curves are drawn such that 95% of the measurements fall between +/-2 standard deviations around the mean

F. 2020/03/11 - StatQuest: Principal Component Analysis (PCA), Step-by-Step https://www.youtube.com/watch?v=FgakZw6K1QQ&t=11s

Linear combination is simply a ratio between the two variables to make a PC
Singular Vector or Eigenvector is the 1 unit long vector consisting of x part units and y part units
The Loading Score is the scaled version of the linear combination
The Eigenvalue is the sum of squared distances for a PC
Square root of eigenvalue is the Singular Value for a PC
PC2 is the orthogonal version of PC1 if it's only 2D
Eigenvalues divided by sample size - 1 yields variation

Probabilities are the areas under a fixed distribution. pr(data|distribution)
Likeihoods are the y-axis values for fixed data points with distributions that can be moved. L(distribution|data)

A 0.05 threshold for p-values means that 5% of the experiments, where the only differences come from weird random things, will generate p-value smaller than 0.05.
Getting a small p-value when there is no difference is called a False Positive.
P-value quantify how confident we should be that one item is different from another item
Hypothesis testing means determining whether two items are different
The null hypothesis is that the two items are the same
While p-values help decide whether two items are different, it doesn't tell us how different they are

I. 2020/04/11 - StatQuest - Sample Size and Effective Sample Size, Clearly Explained

Technical replicates only count when we want to describe a method
If we can calculate the correlation, we can calculate the effective sample size
Effective sample size = number of samples / (1 + (number of samples - 1) * correlation

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
A+B_Tipping_Errorbars+T-Test.ipynb		A+B_Tipping_Errorbars+T-Test.ipynb
README.md		README.md

Provide feedback