**Popular Loss Functions for Classification:**
* Cross Entropy Loss (torch.nn.CrossEntropyLoss): Used for multi-class classification tasks with softmax activation in the output layer.

> $ \text{CE}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_{j} \exp(x[j])}\right) $


* Binary Cross Entropy Loss (torch.nn.BCELoss): For binary classification tasks where each sample belongs to one of two classes. It is the special case of CEloss.
  
> $\text{BCE}(x, y) = -y \cdot \log(x) - (1 - y) \cdot \log(1 - x)$

* Focal Loss (often used with Imbalanced Datasets): Focuses on hard-to-classify examples and aids in dealing with class imbalance. 

![image.png](attachment:image.png)

> $ \text{FL}(p_t) = -(1 - p_t)^\gamma \cdot \log(p_t)$, where $p_t = \begin{cases} p & \text{if true label} \\ 1 - p & \text{if false label} \end{cases}$


* Hinge Loss (Multi-class SVM Loss): Tasked with maximizing the margin between classes, typically used in Support Vector Machines (SVMs).
> For an intended output $t= \pm 1$ and a classifier score $y$, the hinge loss of the prediction $y$ is defined as
> $$\ell(y)=\max (0,1-t \cdot y)$$
> Note that $y$ should be the "raw" output of the classifier's decision function, not the predicted class label. For instance, in linear SVMs, $y=\mathbf{w} \cdot \mathbf{x}+b$, where $(\mathbf{w}, b)$ are the parameters of the hyperplane and $\mathbf{x}$ is the input variable(s).
> When $t$ and $y$ have the same sign (meaning $y$ predicts the right class) and $|y| \geq 1$, the hinge loss $\ell(y)=0$. When they have opposite signs, $\ell(y)$ increases linearly with $y$, and similarly if $|y|<1$, even if it has the same sign (correct prediction, but not by enough margin).




**Popular Loss Functions for Regression:**
* Mean Squared Error (MSE): The standard loss function used in regression tasks that penalizes larger errors more than smaller ones.

* Mean Absolute Error (MAE): Robust to outliers as it computes the average of the absolute differences between predictions and actual values.

* Huber Loss (Smooth L1 Loss): A combination of MSE and MAE that is less sensitive to outliers than MSE.

* Quantile Loss: Used to estimate conditional quantiles, expressing the uncertainty of the prediction.

* LogCosh Loss: A smooth approximation to the logarithm of the hyperbolic cosine of the prediction error.