### Common Mathematical Functions in Machine Learning

Mathematical functions are essential in machine learning, especially in neural networks and deep learning, for modeling non-linearity, activation, optimization, and distance measures. Below is a detailed explanation of the most commonly used functions in machine learning.

---

### 1. **Linear Function**

- **Mathematical Formula**:  
  $$
  f(x) = ax + b
  $$
- **Range**: $(-\infty, \infty)$
- **Purpose**: A linear function describes a straight-line relationship between the input $x$ and the output $f(x)$. It is used to model relationships where the dependent variable changes linearly with the independent variable.
- **Use Cases**:  
  - **Linear Regression**: To model the relationship between a dependent and independent variable with a straight-line fit.

---

### 2. **Sigmoid Function (Logistic Function)**

- **Mathematical Formula**:  
  $$
  S(x) = \frac{1}{1 + e^{-x}}
  $$
- **Range**: $(0, 1)$
- **Purpose**: The sigmoid function outputs a value between 0 and 1. It is widely used in classification problems to model probabilities.
- **Use Cases**:  
  - **Binary Classification**: In algorithms like **Logistic Regression** to predict the probability of a class.
  - **Neural Networks**: As an activation function in the output layer for binary classification tasks.

---

### 3. **Tangent Hyperbolic (tanh) Function**

- **Mathematical Formula**:  
  $$
  \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
  $$
- **Range**: $(-1, 1)$
- **Purpose**: The **tanh** function is a scaled version of the sigmoid function, providing output values between -1 and 1. It helps in centering the data around zero, making training of models more efficient.
- **Use Cases**:  
  - **Hidden Layers in Neural Networks**: Often used in deep networks for faster convergence and reducing gradient issues.
  - **Non-linear transformation** of data for better representation.

---

### 4. **Rectified Linear Unit (ReLU)**

- **Mathematical Formula**:  
  $$
  f(x) = \max(0, x)
  $$
- **Range**: $[0, \infty)$
- **Purpose**: **ReLU** outputs the input directly if it is positive; otherwise, it outputs zero. This makes it computationally efficient and helps avoid vanishing gradient problems during backpropagation in deep neural networks.
- **Use Cases**:  
  - **Hidden Layers in Neural Networks**: ReLU is the default activation function for many deep learning models due to its efficiency and simplicity.
  - **Convolutional Neural Networks (CNNs)** for feature extraction.

---

### 5. **Leaky ReLU**

- **Mathematical Formula**:  
  $$
  f(x) = \begin{cases} 
  x & \text{if } x > 0 \\
  \alpha x & \text{if } x \leq 0
  \end{cases}
  $$
  where $\alpha$ is a small constant (e.g., 0.01).
- **Range**: $(-\infty, \infty)$
- **Purpose**: Leaky ReLU allows a small, non-zero output for negative inputs, preventing the "dying ReLU" problem, where neurons stop learning if they output zero for all inputs.
- **Use Cases**:  
  - **Hidden Layers in Neural Networks**: Especially useful in deep networks where ReLU might cause certain neurons to die during training.
  
---

### 6. **Softmax Function**

- **Mathematical Formula**:  
  $$
  \text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}
  $$
- **Range**: $(0, 1)$ for each output, and the outputs sum to 1.
- **Purpose**: Softmax is used in **multi-class classification** to convert raw outputs (logits) from a model into a probability distribution. It ensures that the sum of the probabilities of all classes equals 1.
- **Use Cases**:  
  - **Output Layer in Neural Networks** for multi-class classification tasks.
  - **Categorical Cross-Entropy Loss**: In classification tasks to measure the performance of the model.

---

### 7. **Softplus Function**

- **Mathematical Formula**:  
  $$
  f(x) = \ln(1 + e^x)
  $$
- **Range**: $(0, \infty)$
- **Purpose**: The **Softplus** function is a smooth approximation of the ReLU function, addressing its sharp transition by having a smooth gradient, especially useful for optimization in deep networks.
- **Use Cases**:  
  - **Activation Function** in deep networks when a smooth version of ReLU is needed.

---

### 8. **Swish Function**

- **Mathematical Formula**:  
  $$
  f(x) = x \cdot \sigma(x)
  $$
  where $\sigma(x)$ is the sigmoid function.
- **Range**: $(-\infty, \infty)$
- **Purpose**: The **Swish** function is a newer activation function that combines the advantages of ReLU and sigmoid. It has been shown to outperform ReLU in deeper networks, offering smoother gradients.
- **Use Cases**:  
  - **Hidden Layers in Neural Networks** for improved model performance.
  
---

### 9. **Exponential Linear Unit (ELU)**

- **Mathematical Formula**:  
  $$
  f(x) = \begin{cases} 
  x & \text{if } x > 0 \\
  \alpha (e^x - 1) & \text{if } x \leq 0
  \end{cases}
  $$
  where $\alpha$ is a constant, typically 1.
- **Range**: $(-\infty, \infty)$
- **Purpose**: ELU is similar to ReLU but with an exponential function for negative values, which helps avoid the dying neuron problem and can lead to faster convergence.
- **Use Cases**:  
  - **Hidden Layers in Neural Networks** for deep learning architectures where ReLU might cause saturation in gradients.

---

### 10. **Gaussian Function**

- **Mathematical Formula**:  
  $$
  f(x) = e^{-\frac{x^2}{2\sigma^2}}
  $$
- **Range**: $(0, 1]$
- **Purpose**: The **Gaussian function** is commonly used as a kernel in machine learning algorithms and for smoothing. It is used in probability distributions and kernel methods.
- **Use Cases**:  
  - **Gaussian Naive Bayes**: In classification tasks when features follow a Gaussian distribution.
  - **Kernel Functions** in **Support Vector Machines (SVM)** for non-linear classification.

---

### 11. **Hard Sigmoid**

- **Mathematical Formula**:  
  $$
  f(x) = \min(\max(0, 0.2x + 0.5), 1)
  $$
- **Range**: $(0, 1)$
- **Purpose**: The **Hard Sigmoid** function is a computationally efficient approximation of the sigmoid function, providing a simpler calculation.
- **Use Cases**:  
  - **Efficient Neural Networks** where computational speed is crucial, and exact sigmoid function is not necessary.

---

### 12. **Cosine Function**

- **Mathematical Formula**:  
  $$
  f(x) = \cos(x)
  $$
- **Range**: $(-1, 1)$
- **Purpose**: The **Cosine** function is used in **distance measures** to calculate similarity between vectors.
- **Use Cases**:  
  - **Cosine Similarity**: In text mining and Natural Language Processing (NLP) to measure the similarity between two vectors (e.g., document vectors).

---

### Summary of Common Functions in Machine Learning

- **Linear Functions**: Used in simple regression tasks to model linear relationships.
- **Activation Functions**: Sigmoid, Tanh, ReLU, Leaky ReLU, Softmax, Softplus, Swish, and ELU are non-linear functions used in neural networks to introduce non-linearity and enable better learning of complex data representations.
- **Distance and Similarity Functions**: Cosine function is commonly used in similarity metrics such as cosine similarity for vector comparison.
- **Smooth Activation Functions**: Functions like Softplus, Swish, and ELU provide smooth transitions, helping with optimization in deep learning.

These functions play vital roles in different machine learning models, helping them learn from data more effectively and model complex patterns in the data.