#  Activation Functions

An activation function in deep learning is a mathematical function that introduces non-linearity to the output of a neural network layer. It determines the output of a neuron, allowing the neural network to learn complex patterns and make non-linear decisions. Here are some famous activation functions along with their main points and a small code snippet for each:

### 1. Sigmoid Function:
   - Outputs values between 0 and 1, representing probabilities.
   - Smooth gradient but prone to vanishing gradient problem.
   - Used in the output layer for binary classification problems.

   ```python
   import numpy as np

   def sigmoid(x):
       return 1 / (1 + np.exp(-x))
   ```

### 2. ReLU (Rectified Linear Unit):
   - Outputs the input directly if it is positive, otherwise outputs zero.
   - Fast computation and avoids the vanishing gradient problem.
   - Widely used in hidden layers of deep neural networks.

   ```python
   import numpy as np

   def relu(x):
       return np.maximum(0, x)
   ```

### 3. Leaky ReLU:
   - Similar to ReLU, but allows a small slope for negative values.
   - Addresses the "dying ReLU" problem where neurons can become inactive.
   - Can prevent gradient saturation and enable learning in more complex networks.

   ```python
   import numpy as np

   def leaky_relu(x, alpha=0.01):
       return np.where(x > 0, x, x * alpha)
   ```

### 4. Tanh (Hyperbolic Tangent):
   - Outputs values between -1 and 1, representing negative and positive values.
   - Similar to the sigmoid function, but centered at zero.
   - Used in recurrent neural networks (RNNs) and hidden layers.

   ```python
   import numpy as np

   def tanh(x):
       return np.tanh(x)
   ```

### 5. Softmax:
   - Used in the output layer for multi-class classification problems.
   - Scales the outputs to represent class probabilities that sum up to 1.
   - Suitable for mutually exclusive classes.

   ```python
   import numpy as np

   def softmax(x):
       exps = np.exp(x - np.max(x))
       return exps / np.sum(exps)
   ```


## <................................<------------------------------------------------------->.................................>                

# Loss Function

A loss function in deep learning is a mathematical function that quantifies the discrepancy between the predicted output of a neural network and the true target values. It measures how well the network is performing on a given task and provides a signal for the network to adjust its parameters during the learning process. Here are some famous loss functions along with their main points and a small code snippet for each:

### 1. Mean Squared Error (MSE):
   - Calculates the average squared difference between predicted and true values.
   - Sensitive to outliers due to the squared term.
   - Commonly used for regression problems.

   ```python
   import numpy as np

   def mean_squared_error(y_true, y_pred):
       return np.mean((y_true - y_pred) ** 2)
   ```

### 2. Binary Cross-Entropy:
   - Measures the dissimilarity between predicted probabilities and true binary labels.
   - Suitable for binary classification problems.
   - Penalizes large errors more than the mean squared error loss.

   ```python
   import numpy as np

   def binary_cross_entropy(y_true, y_pred):
       return np.mean(-y_true * np.log(y_pred) - (1 - y_true) * np.log(1 - y_pred))
   ```

### 3. Categorical Cross-Entropy:
   - Evaluates the dissimilarity between predicted class probabilities and true one-hot encoded labels.
   - Applicable for multi-class classification problems.
   - Encourages the network to assign high probabilities to the correct class.

   ```python
   import numpy as np

   def categorical_cross_entropy(y_true, y_pred):
       return np.mean(-np.sum(y_true * np.log(y_pred), axis=1))
   ```

### 4. Kullback-Leibler Divergence (KL Divergence):
   - Measures the difference between two probability distributions.
   - Often used in variational autoencoders (VAEs) or generative models.
   - Encourages the predicted distribution to match the true distribution.

   ```python
   import numpy as np

   def kl_divergence(p, q):
       return np.sum(p * np.log(p / q))
   ```

### 5. Huber Loss:
   - Combines the characteristics of squared error and absolute error.
   - Less sensitive to outliers compared to MSE.
   - Used for robust regression to balance between smoothness and robustness.

   ```python
   import numpy as np

   def huber_loss(y_true, y_pred, delta=1.0):
       error = y_true - y_pred
       quadratic_term = 0.5 * error**2
       absolute_term = delta * (np.abs(error) - 0.5 * delta)
       return np.mean(np.where(np.abs(error) <= delta, quadratic_term, absolute_term))
   ```

These are just a few examples of loss functions used in deep learning. Each loss function has its own characteristics and is suitable for different types of problems and network architectures. The choice of the loss function depends on the nature of the task and the desired behavior of the neural network during training.

## <................................<------------------------------------------------------->.................................>                

# Opimizers

In deep learning, optimizers are algorithms used to adjust the parameters (weights and biases) of a neural network during the training process. They aim to minimize the loss function by iteratively updating the network's parameters based on the gradients of the loss with respect to those parameters. Here are some famous optimizers along with their main points and a small code snippet for each:

### 1. Stochastic Gradient Descent (SGD):
   - Updates the parameters using the gradients of the loss with respect to the parameters.
   - Performs updates for each training example (or a mini-batch) individually.
   - Simple and widely used optimizer.

### 2. Adam (Adaptive Moment Estimation):
   - Combines the advantages of AdaGrad and RMSProp algorithms.
   - Adapts the learning rate for each parameter based on their first and second-order moments.
   - Efficient and commonly used optimizer.


### 3. RMSProp (Root Mean Square Propagation):
   - Adapts the learning rate based on the recent magnitude of gradients for each parameter.
   - Divides the learning rate by a running average of squared gradients.
   - Well-suited for dealing with sparse data or noisy gradients.

  
### 4. Adagrad (Adaptive Gradient Algorithm):
   - Adapts the learning rate based on the historical squared gradients for each parameter.
   - Divides the learning rate by a running sum of squared gradients.
   - Suitable for handling sparse data and frequent feature occurrences.

### 5. Adamax:
   - Variant of Adam optimizer that replaces the second moment estimation with the infinity norm.
   - Performs well on models with sparse gradients or large parameter updates.
   - Effective for deep learning models with recurrent structures.

   

# Evaluation Metrics 

Evaluation metrics in deep learning are used to assess the performance and effectiveness of a trained model on a specific task. These metrics provide quantitative measures of how well the model is performing, enabling comparison and selection of different models or tuning of hyperparameters. Here are some famous evaluation metrics along with their main points and a small code snippet for each:

1. Accuracy:
   - Measures the proportion of correctly classified samples.
   - Commonly used for classification problems with balanced class distribution.

   ```python
   import numpy as np

   def accuracy(y_true, y_pred):
       correct = np.sum(y_true == y_pred)
       total = len(y_true)
       return correct / total
   ```

2. Precision, Recall, and F1-Score:
   - Precision: Measures the proportion of true positive predictions out of all positive predictions.
   - Recall: Measures the proportion of true positive predictions out of all actual positive samples.
   - F1-Score: Harmonic mean of precision and recall, balances both metrics.

   ```python
   import numpy as np
   from sklearn.metrics import precision_score, recall_score, f1_score

   precision = precision_score(y_true, y_pred)
   recall = recall_score(y_true, y_pred)
   f1 = f1_score(y_true, y_pred)
   ```