This repository includes data science optimisation concepts
-
1- Loss function: Also known as the objective or cost function, it quantifies the disparity between predicted and actual values. We represent the loss function via
$L(\hat{f},f)$ . For most optimization algorithms, it is desirable to have a loss function that is globally continuous and differentiable. Two common loss functions are,- Mean absolute error, also known as the absolute loss (
$\ell 1$ loss), i.e.,$L(\hat{f},f) = | \hat{f} - f|$ . - Mean squared error, also known as the squared error loss (
$\ell 2$ loss), i.e.,$L(\hat{f},f) = \Vert \hat{f} - f \Vert_2^2$ . Mean absolute error is not differentiable at the origin, while it is less sensitive to outliers
Squarred error cost function, together with linear regression results, always ends up with a bowl shape
- Mean absolute error, also known as the absolute loss (
-
2- Optimization Algorithm: A method or strategy to reduce the objective function.
An optimisation algorithm to find the values of the parameters (or coefficients)
Here:
-
$\alpha$ is the learning rate that controls the step size. -
$\frac{\partial}{\partial \omega} L(\widehat{f},f)$ is the adjustment term.
You take the direction of the steepest descent.
When you are updating multiple coefficients, e.g.,
If the learning rate
Fixed learning rate
Near a local minimum, derivatives become smaller, and the update steps become smaller. So the GD algorithm can reach the minimum without decreasing the learning rate, even with a fixed
Batch gradient descent: A challenge is that it can