Skip to content

Machine Learning Notes

Vincent Nguyen edited this page Oct 3, 2022 · 2 revisions

Machine Learning

Machine learning is a branch of artificial intelligence that serves to imitate the way humans learn to predict patterns in data

Supervised learning is the approach in machine learning of using labeled datasets - Classification - Regression (prediction)

Unsupervised learning is the approach in machine learning of analyzing and clustering unlabeled data - Clustering - Association - Dimensionality reduction

Data Analysis Points & Terms

Cross Validation is the comparison of different machine learning models to see which is best for the situation

Confusion Matrix is a table of output data to determine the performance of an algorithm ConfusionMatrixDataAnalysis

Data Points

Sensitivity is the data point that tells us the percent of the data that was truly positive Specificity is the data point that tells us the percent of the data that was truly negative

Sensitivity & Specificity

SensitivitySpecificity

Bias is the behavior in which a machine learning method cannot capture the true relationship or produces "biased" results. Bias

Variance difference in the model of datasets (commonly the training and actual datasets)

Odds is the ratio from something happening to something not happening. Probability is the ratio of something happening to everything that could happen

odds = probability / (1-probability)

Finding the log(odds) help to make the comparison of odds more symmetrical

Linear Regression (Least Squares)

Formula for Linear Fit Accuracy (sum of squared residuals)

LESS IS BETTER.

The sum of squared residuals is the method/formula that determines the accuracy of the estimated model. It shows the discrepancy between the model and the data as well (b-y1)2 + (b-y2)2 + (b-y3)2 + ... = sum of squared residuals For Exact Values with y=mx+b ((mx1+b)-y1)2 + ((mx2+b)-y2)2 + ((m*x3+b)-y3)2 + ... = sum of squared residuals

  • Using the average data values and finding the sum of the squared residuals from the average line. We can find the SS(mean) or the "sum of squares around the mean" SS(mean) = (data - mean)2 Var(mean) = SS(mean) / n -> Variation around the mean = (data - mean)2 / n -> Variance is the average of the sum of squares

  • R2 tells us the amount of variation that can be explained from a data category on the graph Pasted image 20220628230906 -> R2 = (var(mean) - var(fit)) / var (mean)