## Introduction



Machine Learning (ML) is a field of artificial intelligence (AI) that focuses on the development of algorithms and models that enable computers to learn patterns and make predictions or decisions without being explicitly programmed. The goal of machine learning is to develop systems that can learn from data and improve their performance over time.

### Key Concepts in Machine Learning:

1. **Types of Machine Learning:**
   - **Supervised Learning:** The model is trained on a labeled dataset, where the input data and corresponding output labels are provided. The goal is to learn a mapping from inputs to outputs.
   - **Unsupervised Learning:** The model is given unlabeled data, and it tries to find patterns or structures within the data without explicit output labels.
   - **Reinforcement Learning:** The model learns by interacting with an environment and receiving feedback in the form of rewards or penalties based on its actions.

2. **Types of Models:**
   - **Linear Regression:** Used for predicting a continuous outcome based on one or more input features.
   - **Logistic Regression:** Used for binary classification problems, where the output is a probability of belonging to a particular class.
   - **Decision Trees:** Tree-like models that make decisions based on input features, often used for classification problems.
   - **Random Forests:** Ensembles of decision trees, providing higher accuracy and robustness.
   - **Support Vector Machines (SVM):** Used for classification and regression tasks by finding a hyperplane that best separates the data points.

3. **Neural Networks and Deep Learning:**
   - **Artificial Neural Networks (ANN):** Modeled after the human brain, consisting of interconnected nodes (neurons) organized in layers.
   - **Deep Learning:** Involves neural networks with multiple hidden layers, enabling the model to learn hierarchical representations of data.

4. **Evaluation Metrics:**
   - Common metrics for classification problems include accuracy, precision, recall, F1 score, and area under the receiver operating characteristic (ROC) curve.
   - For regression problems, metrics include mean squared error (MSE), mean absolute error (MAE), and R-squared.

5. **Overfitting and Underfitting:**
   - **Overfitting:** Occurs when a model performs well on the training data but poorly on new, unseen data.
   - **Underfitting:** Occurs when a model is too simple to capture the underlying patterns in the data.

### Machine Learning Workflow:

1. **Data Collection:**
   - Gathering relevant and representative data for training and evaluation.

2. **Data Preprocessing:**
   - Cleaning, transforming, and organizing the data to make it suitable for training models.

3. **Feature Engineering:**
   - Selecting or creating relevant features (input variables) for the model.

4. **Model Selection and Training:**
   - Choosing a suitable algorithm and training the model on the training dataset.

5. **Evaluation:**
   - Assessing the model's performance on a separate validation or test dataset.

6. **Hyperparameter Tuning:**
   - Adjusting model hyperparameters to improve performance.

7. **Deployment:**
   - Implementing the model in a real-world environment for making predictions on new data.

Machine learning is applied in various domains, including image and speech recognition, natural language processing, recommendation systems, autonomous vehicles, and healthcare. Popular machine learning libraries in Python include Scikit-Learn, TensorFlow, and PyTorch.

## AI vs ML vs DS


AL (Artificial Intelligence), ML (Machine Learning), and DS (Data Science) are related but distinct fields within the broader domain of data and computational intelligence. Here's a brief overview of each:

### Artificial Intelligence (AI):

1. **Definition:**
   - AI refers to the development of computer systems that can perform tasks that typically require human intelligence. It aims to create machines capable of reasoning, problem-solving, perception, understanding natural language, and learning.

2. **Scope:**
   - AI is a broad field that encompasses various approaches, including rule-based systems, expert systems, symbolic AI, and machine learning.

3. **Techniques:**
   - In addition to machine learning, AI involves techniques such as natural language processing (NLP), computer vision, robotics, and knowledge representation.

4. **Applications:**
   - AI applications include virtual assistants, chatbots, image and speech recognition, autonomous vehicles, game playing, and expert systems.

### Machine Learning (ML):

1. **Definition:**
   - ML is a subset of AI that focuses on the development of algorithms and models that allow computers to learn patterns from data and make predictions or decisions without being explicitly programmed.

2. **Learning Paradigms:**
   - ML includes various learning paradigms such as supervised learning, unsupervised learning, reinforcement learning, and semi-supervised learning.

3. **Techniques:**
   - ML techniques include linear regression, decision trees, support vector machines, neural networks, clustering, and dimensionality reduction.

4. **Applications:**
   - ML is applied in diverse domains, including image and speech recognition, recommendation systems, fraud detection, autonomous vehicles, and healthcare.

### Data Science (DS):

1. **Definition:**
   - DS is a multidisciplinary field that involves extracting insights and knowledge from data using a combination of domain knowledge, statistical methods, programming, and machine learning.

2. **Components:**
   - DS includes data cleaning, exploration, visualization, feature engineering, statistical analysis, and the development of predictive models using machine learning.

3. **Skills:**
   - Data scientists need a blend of skills in programming (e.g., Python or R), statistics, domain expertise, and machine learning.

4. **Applications:**
   - DS is applied across industries for tasks such as predictive modeling, business intelligence, fraud detection, optimization, and decision support.

### Relationships:

- **Overlap:**
  - ML is a crucial component of both AI and DS. ML techniques and algorithms are often used in AI systems, and data scientists leverage ML for predictive modeling and analysis.

- **Integration:**
  - AI and DS can be considered integrated fields, where AI systems often involve the application of DS techniques for data-driven decision-making.

In summary, AI is a broader concept that encompasses ML as one of its key components. DS, on the other hand, involves the extraction of insights from data, with ML as a subset of techniques within DS. All three fields are interrelated and contribute to the development of intelligent systems and solutions.

## Types of ML

Machine learning (ML) is categorized into several types based on the learning approach and the nature of the available data. The three main types of machine learning are:

1. **Supervised Learning:**
   - **Definition:** In supervised learning, the model is trained on a labeled dataset, where the input data is paired with the corresponding output labels. The goal is to learn a mapping from inputs to outputs.
   - **Use Cases:**
     - Classification: Predicting the category or class of an input.
     - Regression: Predicting a continuous numerical value.
   - **Examples:**
     - Linear Regression, Logistic Regression, Support Vector Machines, Decision Trees, Neural Networks.

2. **Unsupervised Learning:**
   - **Definition:** In unsupervised learning, the model is given unlabeled data and is tasked with finding patterns or structures within the data without explicit output labels.
   - **Use Cases:**
     - Clustering: Grouping similar data points together.
     - Dimensionality Reduction: Reducing the number of features while retaining important information.
   - **Examples:**
     - K-Means Clustering, Hierarchical Clustering, Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE).

3. **Reinforcement Learning:**
   - **Definition:** Reinforcement learning involves an agent interacting with an environment and learning to make decisions by receiving feedback in the form of rewards or penalties.
   - **Components:**
     - Agent: The learner or decision-maker.
     - Environment: The external system with which the agent interacts.
     - Rewards/Penalties: Feedback received by the agent based on its actions.
   - **Examples:**
     - Q-Learning, Deep Q Networks (DQN), Policy Gradient methods.
   - **Use Cases:**
     - Game playing, robotics, autonomous systems.

Additionally, there are some other specialized types of machine learning:

4. **Semi-Supervised Learning:**
   - Combines elements of supervised and unsupervised learning. The model is trained on a dataset that contains both labeled and unlabeled examples.

5. **Self-Supervised Learning:**
   - The model is trained on the data itself to generate labels automatically, without requiring external labeled data.

6. **Transfer Learning:**
   - Pre-training a model on a large dataset and fine-tuning it for a specific task. This leverages knowledge gained from a source task to improve performance on a target task.

7. **Ensemble Learning:**
   - Combines the predictions of multiple models to improve overall performance and robustness.

8. **Online Learning:**
   - The model is continuously updated as new data becomes available, adapting to changing conditions over time.

9. **Batch Learning:**
   - The model is trained on a fixed dataset, and updates are made periodically after processing batches of data.

Understanding the different types of machine learning is crucial for selecting the most appropriate approach for a given task or problem. Each type has its strengths and weaknesses, and the choice depends on the characteristics of the data and the goals of the application.

# Supervised Leaning

Supervised learning is a type of machine learning where the model is trained on a labeled dataset, meaning that the input data is paired with corresponding output labels. The goal of supervised learning is to learn a mapping from inputs to outputs, making it suitable for tasks where the algorithm needs to make predictions or classifications.

### Key Components of Supervised Learning:

1. **Labeled Dataset:**
   - The training dataset consists of examples where each input is associated with a corresponding output label. The labeled data is used to train the model to make predictions on new, unseen data.

2. **Training Process:**
   - During the training process, the model learns the patterns and relationships between input features and output labels. The goal is to minimize the difference between the predicted outputs and the actual labels.

3. **Prediction/Inference:**
   - Once trained, the model can make predictions or classifications on new, unseen data. The model generalizes its learned patterns to make accurate predictions on inputs it hasn't seen during training.

### Types of Supervised Learning:

1. **Classification:**
   - **Definition:** In classification, the model predicts the category or class of a given input. The output is a discrete label from a predefined set of classes.
   - **Examples:**
     - Binary Classification: Predicting whether an email is spam or not.
     - Multi-Class Classification: Identifying the type of flower from a set of categories.

2. **Regression:**
   - **Definition:** In regression, the model predicts a continuous numerical value or quantity. The output is a real number rather than a class label.
   - **Examples:**
     - Predicting house prices based on features like square footage, number of bedrooms, etc.
     - Forecasting stock prices based on historical data.

### Steps in Supervised Learning:

1. **Data Collection:**
   - Gather a labeled dataset that includes input features and corresponding output labels.

2. **Data Preprocessing:**
   - Clean and prepare the data, handle missing values, and scale or normalize features if necessary.

3. **Feature Selection/Engineering:**
   - Select relevant features or create new features to improve the model's performance.

4. **Splitting the Data:**
   - Split the dataset into training and testing sets to assess the model's performance on unseen data.

5. **Choosing a Model:**
   - Select an appropriate supervised learning algorithm based on the nature of the task (classification or regression) and the characteristics of the data.

6. **Training the Model:**
   - Train the model on the training dataset, adjusting the model's parameters to minimize the difference between predicted and actual values.

7. **Evaluation:**
   - Assess the model's performance on the testing dataset using evaluation metrics appropriate for the task (accuracy, precision, recall, F1 score for classification; mean squared error for regression).

8. **Hyperparameter Tuning:**
   - Fine-tune the model's hyperparameters to optimize performance.

9. **Prediction/Deployment:**
   - Deploy the trained model for making predictions on new, unseen data.

Popular algorithms for supervised learning include:

- **Linear Regression**
- **Logistic Regression**
- **Support Vector Machines (SVM)**
- **Decision Trees**
- **Random Forests**
- **Gradient Boosting algorithms (e.g., XGBoost, LightGBM)**
- **Neural Networks (Deep Learning)**

Supervised learning is widely used in various domains, including image and speech recognition, natural language processing, medical diagnosis, finance, and many others.

### Features:
- **Qualitative** - categorical data (nominal , ordinal)
- **Quantitative** - (contunous ,  discrete)

### Tasks:
1. **Classification** - predict discrete classes
   - Binary classification
   - Multiclasss classification
2. **Regression** - predict continous values


Here's a tabular comparison highlighting the key differences between binary classification and multiclass classification:

| Feature                             | Binary Classification                       | Multiclass Classification                    |
|-------------------------------------|--------------------------------------------|-----------------------------------------------|
| **Number of Classes**               | Two (Positive and Negative)               | Three or more (e.g., Class 1, Class 2, Class 3) |
| **Output Layer in Neural Networks** | Single neuron with sigmoid activation    | Multiple neurons with softmax activation      |
| **Evaluation Metrics**              | Accuracy, Precision, Recall, F1 Score, ROC-AUC | Extended or computed separately for each class |
| **Algorithms Handling Multiclass**  | Some algorithms can handle it naturally (e.g., logistic regression, SVM) | May require extensions or specific strategies (one-vs-all, one-vs-one) |
| **Examples**                        | Spam detection, Disease diagnosis         | Handwritten digit recognition, Language identification |

Understanding these differences is crucial when designing and implementing classification models, as the choice between binary and multiclass classification depends on the specific requirements of the problem at hand.

Binary classification and multiclass classification are two types of classification tasks in supervised machine learning. Let's explore each of them:

### Binary Classification:

**Definition:**
Binary classification is a type of supervised learning task where the goal is to categorize input instances into one of two possible classes or categories. The output is a binary decision, often expressed as class labels like "positive" and "negative," "spam" and "non-spam," or "1" and "0."

**Examples:**
1. Spam Detection: Classifying emails as either spam or not spam.
2. Disease Diagnosis: Determining whether a patient has a specific medical condition or not.
3. Credit Approval: Approving or rejecting a credit application based on risk assessment.

**Algorithms for Binary Classification:**
1. Logistic Regression
2. Support Vector Machines (SVM)
3. Decision Trees and Random Forests
4. Naive Bayes
5. Neural Networks (with a binary output layer)

### Multiclass Classification:

**Definition:**
Multiclass classification, also known as multinomial classification, is a type of supervised learning task where the goal is to assign input instances to one of three or more classes or categories. Each instance can belong to only one class, and the output is a single class label.

**Examples:**
1. Handwritten Digit Recognition: Classifying digits (0-9) based on images.
2. Species Classification: Identifying the species of a plant or animal from a set of possible categories.
3. Language Identification: Determining the language of a given text from multiple language options.

**Algorithms for Multiclass Classification:**
1. Logistic Regression (Extended for multiclass using one-vs-all or one-vs-one approaches)
2. Support Vector Machines (with extensions)
3. Decision Trees and Random Forests
4. K-Nearest Neighbors (KNN)
5. Naive Bayes
6. Neural Networks (with a softmax output layer)

### Key Differences:

1. **Number of Classes:**
   - Binary classification has two classes (positive/negative, spam/non-spam), while multiclass classification has three or more classes.

2. **Output Layer in Neural Networks:**
   - In neural networks designed for binary classification, the output layer typically has one neuron with a sigmoid activation function. For multiclass classification, the output layer has multiple neurons with a softmax activation function.

3. **Evaluation Metrics:**
   - For binary classification, metrics include accuracy, precision, recall, F1 score, and ROC-AUC. In multiclass classification, these metrics are often extended or computed separately for each class.

4. **Algorithms Handling Multiclass:**
   - Some algorithms (like logistic regression and support vector machines) can be extended naturally for multiclass classification, while others may require modifications or the use of specific strategies (one-vs-all, one-vs-one).

Both binary and multiclass classification are widely used in various applications, and the choice between them depends on the nature of the problem and the number of distinct classes involved.

# Regression

Regression is a type of supervised machine learning task where the goal is to predict a continuous numerical output or target variable based on one or more input features. The output in regression is a real-valued quantity, making it suitable for tasks such as predicting prices, temperatures, sales, or any other continuous variable.

### Key Concepts in Regression:

1. **Target Variable:**
   - In regression, the variable that we want to predict is called the target variable or dependent variable. It is denoted as \(y\).

2. **Input Features:**
   - The input features, denoted as \(X\), are the independent variables used to make predictions about the target variable.

3. **Prediction Function:**
   - The goal is to learn a mapping or function \(f\) that can predict the target variable \(y\) based on the input features \(X\). The prediction function can be represented as \(y = f(X)\).

4. **Training Process:**
   - During the training process, the model learns the relationships between the input features and the target variable from a labeled dataset.

5. **Types of Regression:**
   - **Linear Regression:** Assumes a linear relationship between the input features and the target variable. The prediction function is a linear combination of the input features.
   - **Polynomial Regression:** Allows for nonlinear relationships by introducing polynomial terms into the regression equation.
   - **Ridge Regression and Lasso Regression:** Regularized linear regression methods that prevent overfitting by adding regularization terms.
   - **Decision Tree Regression:** Uses decision trees to model complex relationships.
   - **Random Forest Regression:** Ensemble of decision trees for improved accuracy and robustness.

### Steps in Regression:

1. **Data Collection:**
   - Gather a dataset with labeled examples, where the input features are associated with corresponding target values.

2. **Data Preprocessing:**
   - Clean and preprocess the data, handle missing values, and scale or normalize features if necessary.

3. **Feature Selection/Engineering:**
   - Select relevant features or create new features to improve the model's performance.

4. **Splitting the Data:**
   - Divide the dataset into training and testing sets to assess the model's performance on unseen data.

5. **Choosing a Model:**
   - Select an appropriate regression algorithm based on the nature of the task and the characteristics of the data.

6. **Training the Model:**
   - Train the model on the training dataset, adjusting the model's parameters to minimize the difference between predicted and actual values.

7. **Evaluation:**
   - Assess the model's performance on the testing dataset using evaluation metrics like mean squared error (MSE), mean absolute error (MAE), or \(R^2\) score.

8. **Hyperparameter Tuning:**
   - Fine-tune the model's hyperparameters to optimize performance.

9. **Prediction/Deployment:**
   - Deploy the trained model for making predictions on new, unseen data.

### Example:

Consider a simple linear regression task where we want to predict house prices based on the square footage of the house. The dataset includes pairs of (square footage, house price), and the goal is to learn a model that can predict house prices for new houses.

```python
import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
square_footage = np.array([1400, 1600, 1700, 1875, 1100, 1550, 2350, 2450, 1425, 1700])
house_prices = np.array([245000, 312000, 279000, 308000, 199000, 219000, 405000, 324000, 319000, 255000])

# Reshape the data
square_footage = square_footage.reshape((-1, 1))

# Create and train the linear regression model
model = LinearRegression()
model.fit(square_footage, house_prices)

# Make predictions for new houses
new_square_footage = np.array([1600, 1800, 2000]).reshape((-1, 1))
predicted_prices = model.predict(new_square_footage)

print("Predicted Prices:", predicted_prices)
```

This is a simplified example, but it illustrates the basic steps involved in a regression task. The model learns a linear relationship between square footage and house prices, allowing it to make predictions for new houses.

### Loss Function 
loss = sum( | Y(real) - Y(predicted) |)

A loss function, also known as a cost function or objective function, is a key component in the training of machine learning models. It quantifies the difference between the predicted values generated by the model and the actual values (ground truth) present in the training dataset. The goal during training is to minimize this loss function, making the model's predictions as close as possible to the actual values.

The choice of a loss function depends on the type of task the machine learning model is designed to perform (e.g., classification, regression). Here are some commonly used loss functions for different tasks:

### Regression Loss Functions:

1. **Mean Squared Error (MSE):**
   - **Definition:** MSE is the average of the squared differences between predicted and actual values.
   - **Formula:** \[ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]
   - **Use Case:** Commonly used in linear regression and other regression tasks.

2. **Mean Absolute Error (MAE):**
   - **Definition:** MAE is the average of the absolute differences between predicted and actual values.
   - **Formula:** \[ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \]
   - **Use Case:** Provides a more interpretable measure of error compared to MSE.

3. **Huber Loss:**
   - **Definition:** Combines the characteristics of MSE and MAE, providing a balance between robustness and sensitivity to outliers.

### Classification Loss Functions:

1. **Binary Cross-Entropy Loss (Log Loss):**
   - **Definition:** Used in binary classification problems. Measures the average negative log-likelihood of the true labels given the predicted probabilities.
   - **Formula:** \[ \text{Binary Cross-Entropy} = - \frac{1}{n} \sum_{i=1}^{n} \left( y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right) \]

2. **Categorical Cross-Entropy Loss (Softmax Loss):**
   - **Definition:** Used in multiclass classification problems. Extends binary cross-entropy to more than two classes.
   - **Formula:** \[ \text{Categorical Cross-Entropy} = - \frac{1}{n} \sum_{i=1}^{n} \sum_{j=1}^{C} y_{ij} \log(\hat{y}_{ij}) \]
     - \(C\) is the number of classes, and \(y_{ij}\) is 1 if the true class of example \(i\) is \(j\) and 0 otherwise.

3. **Hinge Loss (SVM Loss):**
   - **Definition:** Used in support vector machines (SVM) for binary classification. Encourages correct classification with a margin.
   - **Formula:** \[ \text{Hinge Loss} = \frac{1}{n} \sum_{i=1}^{n} \max(0, 1 - y_i \cdot \hat{y}_i) \]

### Other Loss Functions:

1. **Kullback-Leibler Divergence (KL Divergence):**
   - **Definition:** Measures how one probability distribution diverges from a second, expected probability distribution. Used in tasks such as variational autoencoders (VAEs).

2. **Triplet Loss:**
   - **Definition:** Used in triplet networks for tasks like face recognition. Encourages the model to reduce the distance between similar instances and increase the distance between dissimilar ones.

Choosing an appropriate loss function is essential for the success of a machine learning model, as it guides the training process towards learning meaningful patterns in the data. The specific characteristics of the task and the model architecture influence the choice of the loss function.
### Mean Squared Error (MSE)
L2 loss, also known as Mean Squared Error (MSE) or Euclidean loss, is a common loss function used in regression tasks. It quantifies the average squared difference between predicted values and actual values. The formula for L2 loss or MSE is as follows:

\[ \text{L2 Loss (MSE)} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

Here:
- \( n \) is the number of examples in the dataset.
- \( y_i \) is the actual or true value for the \( i \)-th example.
- \( \hat{y}_i \) is the predicted value for the \( i \)-th example.

The L2 loss penalizes larger errors more heavily than smaller errors because it squares the differences. Minimizing the L2 loss during training corresponds to minimizing the average squared difference between predictions and actual values.

### Characteristics of L2 Loss:

1. **Squared Differences:**
   - L2 loss computes the squared differences between predicted and actual values.

2. **Differentiability:**
   - The L2 loss function is differentiable, allowing the use of gradient-based optimization algorithms like gradient descent for model training.

3. **Sensitivity to Outliers:**
   - L2 loss is sensitive to outliers. Large errors have a substantial impact on the loss, and the model may be overly influenced by outliers.

### Usage in Regression:

L2 loss is commonly used in regression problems, including linear regression and neural network regression tasks. In linear regression, the goal is to find the line that minimizes the sum of squared differences between the predicted and actual values.

```python
import numpy as np
from sklearn.metrics import mean_squared_error

# Example data
actual_values = np.array([2, 4, 6, 8, 10])
predicted_values = np.array([1.8, 3.5, 6.3, 8.2, 9.7])

# Calculate L2 loss (MSE) using NumPy
mse = np.mean((actual_values - predicted_values)**2)

# Alternatively, calculate L2 loss using scikit-learn
mse_sklearn = mean_squared_error(actual_values, predicted_values)

print("L2 Loss (MSE) using NumPy:", mse)
print("L2 Loss (MSE) using scikit-learn:", mse_sklearn)
```

In the above example, lower L2 loss values indicate a better fit of the model to the data.

### Note:
While L2 loss is widely used, it's essential to be aware of its sensitivity to outliers. In the presence of outliers, other loss functions like Huber loss or robust regression may be considered to provide a more balanced approach to handling extreme values.

### Mean Absolute Error (MAE)

Mean Absolute Error (MAE) is a commonly used loss function in regression tasks. It measures the average absolute difference between predicted values and actual values. The formula for MAE is as follows:

\[ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \]

Here:
- \( n \) is the number of examples in the dataset.
- \( y_i \) is the actual or true value for the \( i \)-th example.
- \( \hat{y}_i \) is the predicted value for the \( i \)-th example.

### Characteristics of MAE:

1. **Absolute Differences:**
   - MAE computes the absolute differences between predicted and actual values.

2. **Robustness to Outliers:**
   - Unlike Mean Squared Error (MSE), MAE is less sensitive to outliers. It provides a more balanced measure of error in the presence of extreme values.

3. **Interpretability:**
   - MAE is more interpretable than MSE because it represents the average absolute error, which is in the same units as the target variable.

### Usage in Regression:

MAE is particularly useful in situations where outliers may have a significant impact on the model's performance, and a more robust loss function is desired. It is suitable for tasks where the emphasis is on the magnitude of errors rather than their squared values.

```python
import numpy as np
from sklearn.metrics import mean_absolute_error

# Example data
actual_values = np.array([2, 4, 6, 8, 10])
predicted_values = np.array([1.8, 3.5, 6.3, 8.2, 9.7])

# Calculate MAE using NumPy
mae = np.mean(np.abs(actual_values - predicted_values))

# Alternatively, calculate MAE using scikit-learn
mae_sklearn = mean_absolute_error(actual_values, predicted_values)

print("MAE using NumPy:", mae)
print("MAE using scikit-learn:", mae_sklearn)
```

In the above example, lower MAE values indicate a better fit of the model to the data. The interpretation of MAE is straightforward, as it represents the average absolute error across all examples.

### Note:
While MAE is robust to outliers, it might give equal weight to all errors, including small ones. In scenarios where certain errors are more critical than others, other loss functions like Huber loss or quantile loss may be considered.


### Binary Cross-Entropy Loss (Log Loss)

Binary Cross-Entropy Loss, commonly known as Log Loss, is a loss function used in binary classification tasks. It measures the performance of a classification model whose output is a probability value between 0 and 1. The formula for Binary Cross-Entropy Loss is as follows:

\[ \text{Binary Cross-Entropy Loss} = -\frac{1}{n} \sum_{i=1}^{n} \left( y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right) \]

Here:
- \( n \) is the number of examples in the dataset.
- \( y_i \) is the actual label for the \( i \)-th example (0 or 1).
- \( \hat{y}_i \) is the predicted probability that the example belongs to class 1.

### Characteristics of Binary Cross-Entropy Loss:

1. **Probability Interpretation:**
   - The loss is based on the logarithm of predicted probabilities, penalizing models more when they confidently predict the wrong class.

2. **Differentiability:**
   - Binary Cross-Entropy Loss is differentiable, making it suitable for optimization using gradient-based methods.

### Usage in Binary Classification:

Binary Cross-Entropy Loss is commonly used as the loss function for binary classification problems, where the goal is to predict whether an instance belongs to one of two classes (0 or 1). The loss function is particularly well-suited for models that output probabilities, such as logistic regression or neural networks with a sigmoid activation function in the output layer.

```python
import numpy as np
from sklearn.metrics import log_loss

# Example data
actual_labels = np.array([1, 0, 1, 1, 0])
predicted_probabilities = np.array([0.8, 0.3, 0.9, 0.65, 0.2])

# Calculate Binary Cross-Entropy Loss using NumPy
log_loss_value = -np.mean(actual_labels * np.log(predicted_probabilities) + (1 - actual_labels) * np.log(1 - predicted_probabilities))

# Alternatively, calculate Binary Cross-Entropy Loss using scikit-learn
log_loss_sklearn = log_loss(actual_labels, predicted_probabilities)

print("Binary Cross-Entropy Loss using NumPy:", log_loss_value)
print("Binary Cross-Entropy Loss using scikit-learn:", log_loss_sklearn)
```

In the above example, lower Binary Cross-Entropy Loss values indicate a better fit of the model to the data. The loss is minimized when the predicted probabilities align well with the actual labels.

### Note:
Binary Cross-Entropy Loss is an important loss function for binary classification, and it is commonly used as an optimization objective during the training of classification models.

## Metrics of Performance

Performance metrics are used to evaluate the effectiveness of machine learning models. The choice of metrics depends on the type of task (classification, regression, clustering, etc.) and the specific goals of the model. Here are some commonly used performance metrics for different tasks:

### Classification Metrics:

1. **Accuracy:**
   - **Definition:** Proportion of correctly classified instances out of the total number of instances.
   - **Formula:** \[ \text{Accuracy} = \frac{\text{Number of Correct Predictions}}{\text{Total Number of Predictions}} \]

2. **Precision:**
   - **Definition:** Proportion of true positive predictions out of the total positive predictions.
   - **Formula:** \[ \text{Precision} = \frac{\text{True Positives}}{\text{True Positives + False Positives}} \]

3. **Recall (Sensitivity or True Positive Rate):**
   - **Definition:** Proportion of true positive predictions out of the total actual positives.
   - **Formula:** \[ \text{Recall} = \frac{\text{True Positives}}{\text{True Positives + False Negatives}} \]

4. **F1 Score:**
   - **Definition:** Harmonic mean of precision and recall.
   - **Formula:** \[ \text{F1 Score} = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision + Recall}} \]

5. **Area Under the Receiver Operating Characteristic (ROC-AUC):**
   - **Definition:** Area under the ROC curve, which represents the trade-off between true positive rate and false positive rate.

6. **Area Under the Precision-Recall Curve (PR AUC):**
   - **Definition:** Area under the precision-recall curve, useful for imbalanced datasets.

### Regression Metrics:

1. **Mean Absolute Error (MAE):**
   - **Definition:** Average absolute difference between predicted and actual values.
   - **Formula:** \[ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \]

2. **Mean Squared Error (MSE):**
   - **Definition:** Average squared difference between predicted and actual values.
   - **Formula:** \[ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]

3. **Root Mean Squared Error (RMSE):**
   - **Definition:** Square root of the MSE.
   - **Formula:** \[ \text{RMSE} = \sqrt{\text{MSE}} \]

4. **R-squared (Coefficient of Determination):**
   - **Definition:** Proportion of the variance in the dependent variable that is predictable from the independent variable.
   - **Formula:** \[ R^2 = 1 - \frac{\text{Sum of Squared Residuals}}{\text{Total Sum of Squares}} \]

### Clustering Metrics:

1. **Silhouette Score:**
   - **Definition:** Measures how similar an object is to its own cluster compared to other clusters.
   - **Range:** [-1, 1], where a higher score indicates better-defined clusters.

2. **Calinski-Harabasz Index:**
   - **Definition:** Ratio of the between-cluster variance to the within-cluster variance.
   - **Higher Values:** Indicate better-defined clusters.

### Other Metrics:

1. **Mean Squared Logarithmic Error (MSLE):**
   - **Definition:** Measures the mean squared logarithmic difference between the predicted and actual values.

2. **Cohen's Kappa:**
   - **Definition:** Measures the agreement between predicted and actual classifications, adjusted for chance.

3. **Jaccard Index (Intersection over Union):**
   - **Definition:** Measures the similarity between two sets, often used in image segmentation tasks.

4. **Matthews Correlation Coefficient (MCC):**
   - **Definition:** Measures the correlation between predicted and actual binary classifications.

The choice of metrics depends on the specific goals of the machine learning task, and it is common to use a combination of metrics to provide a comprehensive evaluation of model performance.



 Machine Learning Introduction 📚

What is Machine Learning? 📚
Author:- Ayush Singh [ Newera ( https://youtube.com/c/neweraa ] 📚
Computer programs that uses algorithms to analyze data and make intelligent predictions based on the data without being explicitly programmed.

Slightly more formal definition:-

Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed.

Arthur Samuel, 1959

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Tom Mitchell, 1997

Applications of Machine Learning:-

Self Driving Cars

In Real Estate

Stock Price Prediction

Much More...

.

How it works?📚
Study in the Problem

Train the Algorithm

Evaluate it

If it is Good

Launch it

Else, Analyze the errors


Types of Machine Learning Systems 📚


Supervised Learning:- In this, we feed data to the algorithms, In which the data are labeled and we know what our output should like having the relationship between input values "X" and Output values "Y".

Un Supervised Learning:- In this, we feed data to the algorithms which are not labeled or we can say that we don't know what our output should look like and there is not any kind of relationship between input var and output var. We have to recognize patterns based on the data, for doing so, we have different algorithms, we will study them later.


Supervised Learning Problems 📚
Regression:- for continuous data

A person's height: could be any value (within the range of human heights), not just certain fixed heights.

Classification:- for categorical data

Discrete data is counted.


Dividing Our Data 📚
We divide our data into two sets, Training and Testing sets the reason is very simple, we want to test the model after we are done, so we make a test set for evaluating our model.


Overfitting the training data 📚
It means that the model performs well on the training data, but it does not generalize well.

Solution to this problem is To gather more data, To reduce the noise in the training data, To use regularization ( we will study later on )


Underfitting the Training Data 📚
It means that the model performs bad on the trainig data,so it's obvious that it will also perform badly on testing data.



Notations:- 📚
X means input features.

y means Output features.

m means no. of training examples.