<a href="https://colab.research.google.com/github/Arunkarthik-K/AI_Learning_PB/blob/main/AI_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Artificial Intelligence (AI)**

---


Artificial Intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn like humans. These machines can perform tasks that typically require human intelligence, such as visual perception, speech recognition, decision-making, and language translation.


## **Types of AI**

### **Narrow AI (Weak AI)**
* Designed and trained for a specific task.
* Examples: Virtual assistants (like Siri), recommendation systems.

### **General AI (Strong AI)**
* Has the ability to understand, learn, and apply knowledge across a wide range of tasks, much like a human.
* This type of AI does not yet exist.

### **Superintelligent AI**
* An AI that surpasses human intelligence and capabilities.
* This is theoretical and poses ethical and safety concerns.


# **Machine Learning (ML)**

---


A subset of AI that involves the use of algorithms and statistical models to enable computers to perform specific tasks without using explicit instructions relying on patterns and inference instead.

## **Types of Machine Learning**
### **Supervised Learning**
The model is trained on a labeled dataset, meaning that each training example is paired with an output label.
### **Unsupervised Learning**
The model is given data without explicit instructions on what to do with it. It must find patterns and relationships in the data.
### **Reinforcement Learning**
The model learns by interacting with an environment and receiving rewards or penalties for the actions it performs.

# **Supervised Learning**


---



Supervised learning is a type of machine learning where the algorithm learns from labeled training data to make predictions or decisions.

Here are the main types of supervised learning:

### 1. **Classification**
In classification tasks, the goal is to predict a categorical label for a given input. The output is discrete, meaning the possible outcomes are predefined classes.

Examples include:
* **Binary Classification:** The model predicts one of two possible classes. For example, spam detection in emails (spam or not spam).
* **Multiclass Classification:** The model predicts one of three or more possible classes. For example, handwritten digit recognition (digits 0-9).

### 2. **Regression**
In regression tasks, the goal is to predict a continuous numerical value based on the input data.

Examples include:
* **Linear Regression:** Predicting house prices based on features like size, location, and number of bedrooms.
* **Polynomial Regression:** A generalization of linear regression to model non-linear relationships.

### 3. **Time Series Prediction**
Time series prediction involves forecasting future values based on previously observed values. This is often considered a specialized type of regression where the data points are time-ordered.

Examples include:
* **Stock Price Prediction:** Predicting future stock prices based on historical data.
* **Weather Forecasting:** Predicting future weather conditions based on past observations.

### 4. **Ranking**
Ranking tasks involve ordering items based on their relevance or preference. This is commonly used in search engines and recommendation systems.

Examples include:
* **Search Engine Results:** Ranking web pages based on their relevance to a search query.
* **Recommender Systems:** Ranking movies or products based on user preferences.

### 5. **Anomaly Detection**
Anomaly detection tasks focus on identifying rare items, events, or observations that differ significantly from the majority of the data.

Examples include:
* **Fraud Detection:** Identifying fraudulent transactions in a dataset of financial transactions.
* **Network Security:** Detecting unusual patterns that may indicate a security breach.

### 6. **Instance Segmentation**
Instance segmentation tasks involve classifying each pixel of an image into different object instances. This is commonly used in computer vision.

Examples include:
* **Object Detection:** Identifying and classifying objects within an image.
* **Medical Imaging:** Segmenting tumors or other structures within medical scans.

### 7. **Ordinal Regression**
Ordinal regression deals with predicting an ordinal variable, where the order between categories is significant but the exact difference between them is not.

Examples include:
* **Customer Satisfaction:** Predicting ratings like 'very dissatisfied', 'dissatisfied', 'neutral', 'satisfied', and 'very satisfied'.
* **Education Levels:** Predicting education levels like 'high school', 'bachelor’s', 'master’s', and 'PhD'.

Each type of supervised learning task requires different techniques and algorithms tailored to the nature of the input data and the prediction goals.


## **Classification Models**


---



In classification tasks, various models can be used depending on the complexity and nature of the data. Here are some common classification models and their corresponding metrics, along with a brief explanation of their architecture:

### 1. **Logistic Regression**
**Architecture:**
* **Input Layer:** Features of the data.
* **Weights and Biases:** Each feature is assigned a weight, and a bias term is added.
* **Sigmoid Function:** The weighted sum of inputs is passed through a sigmoid function to output a probability value between 0 and 1.

**Metrics:**
* **Accuracy:** The ratio of correctly predicted instances to the total instances (Correct Predictions/All Predictions).
* **Precision:** The ratio of true positive instances to the total predicted positives (True Positives/(True Positives + False Positives)).
* **Recall (Sensitivity):** The ratio of true positive instances to the actual positives (True Positives/(True Positives + False Negatives)).
* **F1 Score:** The harmonic mean of precision and recall.

### 2. **Support Vector Machine (SVM)**
**Architecture:**
* **Input Layer:** Features of the data.
* **Kernel Trick:** Data is transformed into a higher-dimensional space (if needed) using kernel functions.
* **Hyperplane:** A hyperplane is constructed to separate different classes. The objective is to maximize the margin between the classes.

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* **ROC-AUC:** Area under the Receiver Operating Characteristic curve.

### 3. **Decision Tree**
**Architecture:**
* **Nodes and Branches:** The tree is made up of decision nodes (where a feature is tested) and leaf nodes (where a decision is made).
* **Splitting Criteria:** Nodes are split based on criteria like Gini impurity or entropy (information gain).

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* **Confusion Matrix:** Provides a summary of prediction results on a classification problem.

### 4. **Random Forest**
**Architecture:**
* **Ensemble of Trees:** Multiple decision trees are built on different subsets of the data.
* **Bagging:** Random samples of data are used to train each tree.
* **Voting:** Each tree votes, and the class with the majority votes is chosen.

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* ROC-AUC

### 5. **k-Nearest Neighbors (k-NN)**
**Architecture:**
* **Instance-based Learning:** No explicit training phase; the algorithm stores training examples.
* **Distance Metric:** Calculates the distance (e.g., Euclidean) between a new data point and all training examples.
* **Voting:** The most common class among the k-nearest neighbors is chosen.

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* Confusion Matrix

### 6. **Naive Bayes**
**Architecture:**
* **Probabilistic Model:** Based on Bayes' Theorem with the assumption of feature independence.
* **Likelihood Calculation:** The probability of each class given the input features is calculated.
* **Decision Rule:** The class with the highest posterior probability is chosen.

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* ROC-AUC

### 7. **Artificial Neural Networks (ANN)**
**Architecture:**
* **Input Layer:** Takes the feature vector as input.
* **Hidden Layers:** One or more layers where each neuron applies a weighted sum followed by a non-linear activation function (e.g., ReLU).
* **Output Layer:** Outputs class probabilities (typically using softmax for multi-class classification).

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* ROC-AUC

### 8. **Convolutional Neural Networks (CNN)**
**Architecture:**
* **Input Layer:** Takes the image data as input.
* **Convolutional Layers:** The convolutional layer converts the image into numerical values, allowing the neural network to interpret and extract relevant patterns.
* **Pooling Layers:** Pooling layers, also known as downsampling, its function is to progressively reduce the spatial size of the representation to reduce the network complexity and computational cost. (**Types:** Max Pooling and Average Pooling).
* **Fully Connected Layers:** Final layers where all neurons are connected, leading to a softmax output layer for classification.

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* ROC-AUC

### 9. **Gradient Boosting Machines (e.g., XGBoost, LightGBM)**
**Architecture:**
* **Ensemble of Trees:** Similar to random forest but trees are built sequentially.
* **Boosting:** Each tree corrects the errors of the previous trees.
* **Weighted Voting:** The final prediction is a weighted sum of the predictions from all trees.

**Metrics:**
* Accuracy
* Precision
* Recall
* F1 Score
* ROC-AUC

Each of these models has its unique architecture tailored to capture different aspects of the data, and the choice of model depends on the specific requirements and nature of the classification task at hand.


## **Regression Models**


---


Regression models aim to predict continuous outcomes based on input features. Here are some common regression models and their corresponding metrics, along with a brief explanation of their architecture:


### 1. **Linear Regression**
**Architecture:**
* **Input Layer:** Features of the data.
* **Weights and Biases:** Each feature is assigned a weight, and a bias term is added.
* **Linear Combination:** The weighted sum of inputs plus the bias is calculated to predict the output.

**Metrics:**
* **Mean Absolute Error (MAE):** The average absolute difference between predicted and actual values.
* **Mean Squared Error (MSE):** The average squared difference between predicted and actual values.
* **Root Mean Squared Error (RMSE):** The square root of the MSE, representing the standard deviation of the prediction errors.
* **R-squared (R²):** The proportion of the variance in the dependent variable that is predictable from the independent variables.

### 2. **Polynomial Regression**
**Architecture:**
* **Input Layer:** Features of the data.
* **Polynomial Features:** Input features are transformed into polynomial features of a specified degree.
* **Weights and Biases:** Each polynomial feature is assigned a weight, and a bias term is added.
* **Linear Combination:** The weighted sum of polynomial features plus the bias is calculated to predict the output.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 3. **Decision Tree Regression**
**Architecture:**
* **Nodes and Branches:** The tree is made up of decision nodes (where a feature is tested) and leaf nodes (where a prediction is made).
* **Splitting Criteria:** Nodes are split based on criteria like mean squared error reduction.
* **Leaf Nodes:** Each leaf node represents the predicted value for the data points that reach that node.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 4. **Random Forest Regression**
**Architecture:**
* **Ensemble of Trees:** Multiple decision trees are built on different subsets of the data.
* **Bagging:** Random samples of data are used to train each tree.
* **Averaging:** The final prediction is the average of the predictions from all trees.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 5. **Support Vector Regression (SVR)**
**Architecture:**
* **Input Layer:** Features of the data.
* **Kernel Trick:** Data is transformed into a higher-dimensional space (if needed) using kernel functions.
* **Support Vectors:** Only a subset of the training data (support vectors) are used to define the regression function.
* **Hyperplane:** A hyperplane is constructed to minimize the error within a certain threshold (epsilon).

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 6. **k-Nearest Neighbors (k-NN) Regression**
**Architecture:**
* **Instance-based Learning:** No explicit training phase; the algorithm stores training examples.
* **Distance Metric:** Calculates the distance (e.g., Euclidean) between a new data point and all training examples.
* **Averaging:** The average of the k-nearest neighbors' values is used as the prediction.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 7. **Gradient Boosting Regression (e.g., XGBoost, LightGBM)**
**Architecture:**
* **Ensemble of Trees:** Similar to random forest but trees are built sequentially.
* **Boosting:** Each tree corrects the errors of the previous trees.
* **Weighted Sum:** The final prediction is a weighted sum of the predictions from all trees.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 8. **Artificial Neural Networks (ANN) for Regression**
**Architecture:**
* **Input Layer:** Takes the feature vector as input.
* **Hidden Layers:** One or more layers where each neuron applies a weighted sum followed by a non-linear activation function (e.g., ReLU).
* **Output Layer:** Outputs a continuous value.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 9. **Convolutional Neural Networks (CNN) for Regression**
**Architecture:**
* **Input Layer:** Takes the image or spatial data as input.
* **Convolutional Layers:** The convolutional layer converts the image into numerical values, allowing the neural network to interpret and extract relevant patterns.
* **Pooling Layers:** Pooling layers, also known as downsampling, its function is to progressively reduce the spatial size of the representation to reduce the network complexity and computational cost. (**Types:** Max Pooling and Average Pooling).
* **Fully Connected Layers:** Final layers where all neurons are connected, leading to a continuous output value.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 10. **Ridge Regression (L2 Regularization)**
**Architecture:**
* **Linear Regression Base:** Same as linear regression but with an added penalty term.
* **Regularization Term:** A penalty proportional to the square of the magnitude of coefficients to prevent overfitting.

**Metrics:**
* MAE
* MSE
* RMSE
* R²

### 11. **Lasso Regression (L1 Regularization)**
**Architecture:**
* **Linear Regression Base:** Same as linear regression but with an added penalty term.
* **Regularization Term:** A penalty proportional to the absolute value of the magnitude of coefficients, which can lead to sparse models (feature selection).

**Metrics:**
* MAE
* MSE
* RMSE
* R²

Each regression model has its specific architecture designed to capture different relationships in the data, and the choice of model depends on the complexity and nature of the regression task at hand.


## **Deep Learning**


---


Deep learning is a subset of machine learning involving neural networks with many layers (hence "deep") to model and understand complex patterns in data. It is powerful for tasks involving large datasets and intricate relationships, such as image and speech recognition, natural language processing, and autonomous driving. Here are the main types of deep learning architectures and their corresponding metrics:


### **[Activation functions in Neural Networks](https://www.geeksforgeeks.org/activation-functions-neural-networks/)**
In the process of building a neural network, one of the choices you get to make is what Activation Function to use in the hidden layer as well as at the output layer of the network.

**Elements of a Neural Network**
* **Input Layer:** This layer accepts input features. It provides information from the outside world to the network, no computation is performed at this layer, nodes here just pass on the information(features) to the hidden layer.
* **Hidden Layer:** Nodes of this layer are not exposed to the outer world, they are part of the abstraction provided by any neural network. The hidden layer performs all sorts of computation on the features entered through the input layer and transfers the result to the output layer.
* **Output Layer:** This layer bring up the information learned by the network to the outer world.

The activation function decides whether a neuron should be activated or not by calculating the weighted sum and further adding bias to it. The purpose of the activation function is to introduce non-linearity into the output of a neuron.

We know, the neural network has neurons that work in correspondence with weight, bias, and their respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis of the error at the output. This process is known as back-propagation. Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases.

**Why do we need Non-linear activation function?**

A neural network without an activation function is essentially just a linear regression model. The activation function does the non-linear transformation to the input making it capable to learn and perform more complex tasks.

**Variants of Activation Function:**

**Linear Function**
* **Equation :** Linear function has the equation similar to as of a straight line i.e. y = x
* No matter how many layers we have, if all are linear in nature, the final activation function of last layer is nothing but just a linear function of the input of first layer.
* **Range :** -inf to +inf
* **Uses :** Linear activation function is used at just one place i.e. output layer.
* **Issues :** If we will differentiate linear function to bring non-linearity, result will no more depend on input “x” and function will become constant, it won’t introduce any ground-breaking behavior to our algorithm.

**Sigmoid Function**
* It is a function which is plotted as ‘S’ shaped graph.
* **Equation :** A = 1/(1 + e-x)
* **Nature :** Non-linear. Notice that X values lies between -2 to 2, Y values are very steep. This means, small changes in x would also bring about large changes in the value of Y.
* **Value Range :** 0 to 1
* **Uses :** Usually used in output layer of a binary classification, where result is either 0 or 1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.

**Tanh Function**
* The activation that works almost always better than sigmoid function is Tanh function also known as Tangent Hyperbolic function. It’s actually mathematically shifted version of the sigmoid function. Both are similar and can be derived from each other.
* **Equation :-**
f(x) = tanh(x) = 2/(1 + e-2x) – 1 **OR** tanh(x) = 2 * sigmoid(2x) – 1
* **Value Range :-** -1 to +1
* **Nature :-** non-linear
* **Uses :-** Usually used in hidden layers of a neural network as it’s values lies between -1 to 1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps in centering the data by bringing mean close to 0. This makes learning for the next layer much easier.

**RELU Function**
* It Stands for Rectified linear unit. It is the most widely used activation function. Chiefly implemented in hidden layers of Neural network.
* **Equation :-** A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
* **Value Range :-** [0, inf)
* **Nature :-** non-linear, which means we can easily backpropagate the errors and have multiple layers of neurons being activated by the ReLU function.
* **Uses :-** ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations. At a time only a few neurons are activated making the network sparse making it efficient and easy for computation.

In simple words, RELU learns much faster than sigmoid and Tanh function.

**Softmax Function**

The softmax function is also a type of sigmoid function but is handy when we are trying to handle multi- class classification problems.

* **Nature :-** non-linear
* **Uses :-** Usually used when trying to handle multiple classes. the softmax function was commonly found in the output layer of image classification problems.The softmax function would squeeze the outputs for each class between 0 and 1 and would also divide by the sum of the outputs.
* **Output:-** The softmax function is ideally used in the output layer of the classifier where we are actually trying to attain the probabilities to define the class of each input.
* The basic rule of thumb is if you really don’t know what activation function to use, then simply use RELU as it is a general activation function in hidden layers and is used in most cases these days.
* If your output is for binary classification then, sigmoid function is very natural choice for output layer.
* If your output is for multi-class classification then, Softmax is very useful to predict the probabilities of each classes.

### **[Optimization Rule in Deep Neural Networks](https://www.geeksforgeeks.org/optimization-rule-in-deep-neural-networks/)**

In machine learning, optimizers and loss functions are two components that help improve the performance of the model. By calculating the difference between the expected and actual outputs of a model, a loss function evaluates the effectiveness of a model. Among the loss functions are log loss, hinge loss, and mean square loss. By modifying the model’s parameters to reduce the loss function value, the optimizer contributes to its improvement. RMSProp, ADAM, and SGD are a few examples of optimizers. The optimizer’s job is to determine which combination of the neural network’s weights and biases will give it the best chance to generate accurate predictions.

### Optimization Rule in Deep Neural Networks

There are various optimization techniques to change model weights and learning rates, like Gradient Descent, Stochastic Gradient Descent, Stochastic Gradient descent with momentum, Mini-Batch Gradient Descent, AdaGrad, RMSProp, AdaDelta, and Adam. These optimization techniques play a critical role in the training of neural networks, as they help improve the model by adjusting its parameters to minimize the loss of function value. Choosing the best optimizer depends on the application.

Before we proceed, it’s essential to acquaint yourself with a few terms

1. **The epoch** is the number of times the algorithm iterates over the entire training dataset.
2. **Batch weights** refer to the number of samples used for updating the model parameters.
3. **A sample** is a single record of data in a dataset.
4. **Learning Rate** is a parameter determining the scale of model weight updates
5. **Weights and Bias** are learnable parameters in a model that regulate the signal between two neurons.

### **Convolution, Padding, Stride, and Pooling in CNN**
[Refer the webpage](https://medium.com/analytics-vidhya/convolution-padding-stride-and-pooling-in-cnn-13dc1f3ada26)

### 1. **Feedforward Neural Networks (FNN)**
Feedforward Neural Networks, also known as Multi-Layer Perceptrons (MLPs), are the simplest type of artificial neural network.

**Architecture:**
* **Input Layer:** Receives the input data.
* **Hidden Layers:** One or more layers where each neuron receives input from the previous layer and passes the output to the next layer.
* **Output Layer:** Produces the final prediction or classification.

**Metrics:**
* **Loss Function:** Mean Squared Error (MSE) for regression, Cross-Entropy Loss for classification.
* **Accuracy:** For classification tasks.
* **MAE, RMSE:** For regression tasks.

### 2. **Convolutional Neural Networks (CNN)**
CNNs are designed for processing structured grid data like images.

**Architecture:**
* **Convolutional Layers:** The convolutional layer converts the image into numerical values, allowing the neural network to interpret and extract relevant patterns.
* **Pooling Layers:** Pooling layers, also known as downsampling, its function is to progressively reduce the spatial size of the representation to reduce the network complexity and computational cost. (**Types:** Max Pooling and Average Pooling).
* **Fully Connected Layers:** Combine features learned by convolutional layers for final classification or regression.

**Applications:** Image and video recognition, object detection, image segmentation.

**Metrics:**
* **Accuracy, Precision, Recall, F1 Score:** For classification tasks.
* **Intersection over Union (IoU):** For object detection and segmentation.
* **MSE, RMSE:** For regression tasks.

### 3. **Recurrent Neural Networks (RNN)**
RNNs are designed for sequential data and time series analysis.

**Architecture:**
* **Recurrent Layers:** Neurons have connections to the previous time step, allowing the network to maintain a memory of past inputs.

**Variants:**
* **Long Short-Term Memory (LSTM):** Addresses the vanishing gradient problem by using gates to manage memory.
* **Gated Recurrent Unit (GRU):** A simplified version of LSTM with fewer gates.

**Applications:** Language modeling, machine translation, speech recognition.

**Metrics:**
* **Perplexity:** For language modeling.
* **BLEU Score:** For machine translation.
* **Accuracy, Precision, Recall, F1 Score:** For classification tasks.
* **MSE, RMSE:** For regression tasks.

### 4. **Autoencoders**
Autoencoders are neural networks used for unsupervised learning of efficient codings.

**Architecture:**
* **Encoder:** Encoder maps the input to a latent-space representation.
* **Decoder:** Decoder reconstructs the input from the latent-space representation.

**Variants:**
* **Denoising Autoencoders:** Trained to remove noise from data.
* **Variational Autoencoders (VAE):** Introduce probabilistic components to the latent space for generating new data.

**Applications:** Dimensionality reduction, anomaly detection, data denoising, generative modeling.

**Metrics:**
* **Reconstruction Error:** Measures the difference between the input and its reconstruction.
* **Latent Space Regularization (for VAEs):** Assesses how well the latent space captures the data distribution.

### 5. **Generative Adversarial Networks (GANs)**
GANs consist of two networks, a generator and a discriminator, that compete against each other.

**Architecture:**
* **Generator:** Creates fake data from random noise.
* **Discriminator:** Tries to distinguish between real and fake data.
* **Adversarial Training:** The generator improves to fool the discriminator, and the discriminator improves to better identify fake data.

**Applications:** Image generation, data augmentation, style transfer.

**Metrics:**
* **Inception Score (IS):** Evaluates the quality of generated images.
* **Fréchet Inception Distance (FID):** Measures the similarity between real and generated data distributions.

### 6. **Transformers**
Transformers are designed for handling sequential data with attention mechanisms, often used in natural language processing.

RNNs often have short memories and start to forget previous inputs, even in long sentences. Transformers can handle larger input sequences because of their self-attention layers and ability to analyze words in parallel.

**Architecture:**
* **Self-Attention Mechanism:** Allows the model to weigh the importance of different parts of the input sequence.
* **Encoder-Decoder Structure:** Commonly used in tasks like machine translation.

**Variants:**
* **BERT (Bidirectional Encoder Representations from Transformers):** Pre-trained on large text corpora for various NLP tasks.
* **GPT (Generative Pre-trained Transformer):** Focuses on text generation.

**Applications:** Text generation, translation, summarization, question answering.

**Metrics:**
* **BLEU Score:** For machine translation.
* **ROUGE Score:** For summarization tasks.
* **Perplexity:** For language models.
* **Accuracy, F1 Score:** For classification tasks.

### 7. **Graph Neural Networks (GNN)**
GNNs are designed for graph-structured data.

**Architecture:**
* **Graph Convolutional Layers:** Apply convolution operations to graph data.
* **Message Passing:** Nodes aggregate information from their neighbors to update their representations.

**Applications:** Social network analysis, molecular graph analysis, recommendation systems.

**Metrics:**
* **Node Classification Accuracy:** For node classification tasks.
* **Mean Average Precision (MAP):** For link prediction tasks.

### 8. **Capsule Networks**
Capsule networks aim to better model spatial hierarchies in data compared to traditional CNNs.

**Architecture:**
* **Capsules:** Groups of neurons that capture specific properties of objects.
* **Dynamic Routing:** Mechanism to determine the relationships between lower-level and higher-level capsules.

**Applications:** Image recognition, object segmentation.

**Metrics:**
* **Accuracy, Precision, Recall, F1 Score:** For classification tasks.
* **IoU:** For object segmentation.

### 9. **Self-Supervised Learning**
Self-supervised learning leverages large amounts of unlabeled data by generating labels from the data itself.

**Architecture:**
* **Pretext Tasks:** Design tasks where the model predicts part of the data from other parts, like predicting missing parts of an image or future frames in a video.

**Applications:** Pre-training for various tasks, enhancing learning from limited labeled data.

**Metrics:**
* **Task-Specific Metrics:** Depending on the downstream task, such as accuracy for classification or BLEU for translation.

Deep learning models are highly flexible and can be adapted to a wide range of tasks, making them a cornerstone of modern artificial intelligence applications.


## **Natural Language Processing (NLP)**


---



Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and human languages. It encompasses various tasks aimed at enabling computers to understand, interpret, and generate human language in a way that is both meaningful and useful. Here are the main types of NLP tasks along with corresponding metrics commonly used to evaluate their performance:

### 1. **Text Classification**
Text classification involves assigning predefined categories or labels to text documents.

**Tasks**

* **Sentiment Analysis:** Determining the sentiment expressed in a piece of text (positive, negative, neutral).
* **Topic Classification:** Assigning topics or categories to documents.
* **Spam Detection:** Identifying spam emails or messages.

**Metrics:**
* **Accuracy:** Ratio of correctly classified instances to the total instances.
* **Precision:** Proportion of correctly predicted positive cases among all predicted positive cases.
* **Recall:** Proportion of correctly predicted positive cases among all actual positive cases.
* **F1 Score:** Harmonic mean of precision and recall, balancing both metrics.
* **ROC-AUC:** Area under the Receiver Operating Characteristic curve, useful for imbalanced datasets.

### 2. **Named Entity Recognition (NER)**
NER identifies and classifies named entities (e.g., names of persons, organizations, locations) in text.

**Metrics:**
* **Precision:** Percentage of correctly identified named entities out of all entities identified.
* **Recall:** Percentage of correctly identified named entities out of all actual named entities.
* **F1 Score:** Harmonic mean of precision and recall, balancing both metrics.
* **Accuracy:** Ratio of correctly identified named entities to total entities.

### 3. **Text Generation**
Text generation involves creating coherent and contextually relevant text based on a given prompt or input.

**Metrics:**
* **Perplexity:** Measure of how well a probability distribution predicts a sample, lower values indicate better performance.
* **BLEU Score (BiLingual Evaluation Understudy Score):** Measures the similarity of generated text to reference text.
* **ROUGE Score (Recall-Oriented Understudy for Gisting Evaluation):** Evaluates the quality of summaries and machine-generated text.
* **Human Evaluation:** Subjective assessment by human judges on the quality and coherence of generated text.

### 4. **Machine Translation**
Machine translation translates text from one language to another automatically.

**Metrics:**
* **BLEU Score:** Measures the similarity of generated translations to reference translations.
* **TER (Translation Edit Rate):** Measures the number of edits needed to transform a machine-generated translation into a reference translation.
* **METEOR (Metric for Evaluation of Translation with Explicit ORdering):** Measures the quality of machine translation by comparing it to a set of reference translations.

### 5. **Text Summarization**
Text summarization generates concise summaries from longer texts while retaining the main points and meaning.

**Metrics:**
* **ROUGE Score:** Evaluates the quality of summaries by comparing them against reference summaries.
* **BLEU Score:** Measures the similarity of generated summaries to reference summaries.
* **Content-Based Metrics:** Assess how well the generated summary covers the important information from the source text.

### 6. **Question Answering**
Question Answering systems provide precise answers to questions posed in natural language.

**Metrics:**
* **Exact Match (EM):** Percentage of questions for which the model's answer matches the exact ground-truth answer.
* **F1 Score:** Measures overlap between the model's answer and the ground-truth answer using token-level precision and recall.
* **BLEU Score:** Measures the overlap between the generated answer and reference answers.

### 7. **Sentiment Analysis**
Sentiment Analysis determines the sentiment expressed in a piece of text, such as positive, negative, or neutral.

**Metrics:**
* **Accuracy:** Ratio of correctly classified instances to the total instances.
* **Precision:** Proportion of correctly predicted positive (or negative) sentiment cases among all predicted positive (or negative) cases.
* **Recall:** Proportion of correctly predicted positive (or negative) sentiment cases among all actual positive (or negative) cases.
* **F1 Score:** Harmonic mean of precision and recall, balancing both metrics.
* **ROC-AUC:** Area under the Receiver Operating Characteristic curve, useful for imbalanced datasets.

### 8. **Language Modeling**
Language modeling predicts the probability of a word given its context in a sentence or document.

**Metrics:**
* **Perplexity:** Measure of how well a probability distribution predicts a sample, lower values indicate better performance.
* **Accuracy:** Percentage of correctly predicted words in a sequence.
* **BLEU Score:** Measures the similarity of generated text to reference text.

### 9. **Semantic Parsing**
Semantic Parsing converts natural language into a formal representation (such as logical forms or SQL queries) that computers can understand and execute.

**Metrics:**
* **Accuracy:** Percentage of correctly parsed sentences.
* **Precision and Recall:** Measure how well the predicted formal representation matches the true representation.

### 10. **Speech Recognition**
While not strictly text-based, speech recognition converts spoken language into text and is closely related to NLP tasks.

**Metrics:**
* **Word Error Rate (WER):** Measures the difference between predicted words and actual words in speech recognition.

Each NLP task requires specific metrics to evaluate its performance accurately, considering factors like context, language nuances, and the task's objectives. These metrics help assess the effectiveness and reliability of NLP models in real-world applications.


# **Unsupervised Learning**


---



Unsupervised learning involves training models on data without labeled responses, aiming to uncover hidden patterns or intrinsic structures in the data. Here are the main types of unsupervised learning:

### 1. **Clustering**
Clustering methods aim to partition the data into distinct groups (clusters) such that data points in the same group are more similar to each other than to those in other groups.
* **K-Means Clustering:** Divides data into KKK clusters by minimizing the variance within each cluster.
* **Hierarchical Clustering:** Creates a tree of clusters by either agglomerative (bottom-up) or divisive (top-down) approaches.
* **DBSCAN (Density-Based Spatial Clustering of Applications with Noise):** Forms clusters based on the density of data points, capable of finding arbitrarily shaped clusters and handling noise.
* **Gaussian Mixture Models (GMM):** Assumes that the data is generated from a mixture of several Gaussian distributions.

### 2. **Dimensionality Reduction**
Dimensionality reduction techniques transform data into a lower-dimensional space, preserving as much information as possible.
* **Principal Component Analysis (PCA):** Projects data onto the directions of maximum variance.
* **t-Distributed Stochastic Neighbor Embedding (t-SNE):** Reduces dimensions while preserving local structure and relationships in the data.
* **Linear Discriminant Analysis (LDA):** Finds a linear combination of features that best separates multiple classes (though LDA is often used in a supervised context, it can be used in an unsupervised manner for dimensionality reduction).
* **Autoencoders:** Neural networks that learn to encode data into a lower-dimensional representation and decode it back.

### 3. **Association Rule Learning**
Association rule learning identifies interesting relationships between variables in large datasets.
* **Apriori Algorithm:** Identifies frequent itemsets and derives association rules from them.
* **Eclat Algorithm:** Uses a depth-first search approach to find frequent itemsets.

### 4. **Anomaly Detection**
Anomaly detection methods identify rare items, events, or observations that deviate significantly from the majority of the data.
* **Isolation Forest:** Randomly partitions data and isolates anomalies which require fewer partitions.
* **One-Class SVM:** Finds a hyperplane that separates normal data points from anomalies.
* **Elliptic Envelope:** Fits a multivariate Gaussian distribution to the data and flags outliers based on Mahalanobis distance.

### 5. **Neural Network-Based Approaches**
Neural networks can also be used for unsupervised learning tasks, often incorporating some of the above techniques.
* **Self-Organizing Maps (SOM):** Uses neural networks to produce a low-dimensional representation of the input space, often used for visualization.
* **Generative Adversarial Networks (GANs):** Consist of a generator and a discriminator network that learn to generate new data samples similar to the training data.

### 6. **Matrix Factorization**
Matrix factorization techniques decompose a matrix into a product of matrices, useful in collaborative filtering and recommendation systems.
* **Singular Value Decomposition (SVD):** Decomposes a matrix into three other matrices to reduce dimensionality.
* **Non-negative Matrix Factorization (NMF):** Decomposes a matrix into non-negative factors, suitable for interpretability in applications like topic modeling.

### 7. **Density Estimation**
Density estimation methods model the probability distribution of the data.
* **Kernel Density Estimation (KDE):** Estimates the probability density function of a random variable.
* **Gaussian Mixture Models (GMM):** Also used for clustering, GMM can model the overall distribution of data as a mixture of Gaussian distributions.

### 8. **Topological Data Analysis (TDA)**
TDA uses techniques from topology to study the shape of data.
* **Persistent Homology:** Measures and analyzes features such as connected components, loops, and voids across different scales in the data.

Each type of unsupervised learning serves a unique purpose and is chosen based on the specific characteristics and goals of the data analysis task at hand.


# **Prompt Engineering**


---

Prompt engineering is a crucial technique in natural language processing (NLP) that involves designing and constructing specific prompts or instructions given to language models to influence their behavior and output. This methodology is particularly important for fine-tuning large-scale language models, such as those based on transformer architectures like GPT-3, to ensure they generate responses that align with desired tasks or requirements effectively.

**Importance of Prompt Engineering**

Language models like GPT-3 are incredibly powerful in generating human-like text based on the context provided in their input (prompt). However, without careful engineering of these prompts, their outputs can vary widely and may not consistently meet specific task requirements. Prompt engineering addresses this challenge by guiding the model's responses through structured and well-crafted inputs.

### **Types of Prompt Engineering**
1. **Task Specification**
  * **Description:** Directly specify the task or desired outcome in the prompt.
  * **Example:** For sentiment analysis, the prompt might explicitly state: "Classify the sentiment of the following text as positive or negative."

2. **Prompt Format**
  * **Description:** Structure the prompt in a specific format to guide the model's response.
  * **Example:** For text completion, the prompt might start with "Complete the sentence: " and the model is expected to fill in the blank.
3. **Contextual Cues**
  * **Description:** Include relevant context or hints within the prompt to guide the model's understanding.
  * **Example:** Providing background information or context about the topic before asking the model to generate a summary or answer questions.
4. **Control Tokens**
  * **Description:** Introduce special tokens or markers within the prompt to influence specific aspects of the model's behavior.
  * **Example:** Using control tokens to adjust the style or formality of the generated text, such as "Formal" or "Casual" markers.
5. **Fine-Tuning Parameters**
  * **Description:** Adjust model parameters or hyperparameters during training or inference to prioritize certain characteristics of the output.
  * **Example:** Fine-tuning the model with additional training data or adjusting the temperature parameter during generation to control the creativity of responses.
6. Human-in-the-Loop (HITL)
  * **Description:** Incorporate human feedback or corrections iteratively to refine prompts and improve model performance.
  * **Example:** Iteratively adjusting prompts based on human evaluation of generated outputs to achieve higher quality results.

**How Prompt Engineering Works**
* **Precision and Control:** By carefully crafting prompts, NLP practitioners can steer language models towards producing outputs that are more accurate and relevant to specific tasks or contexts.
* **Adaptability:** Different tasks and domains require different prompt engineering strategies. For instance, a prompt designed for sentiment analysis will differ from one used for text generation or translation.
* **Iterative Improvement:** Through experimentation and iteration, prompt engineering allows for refining prompts based on observed model behavior and performance. This iterative process helps in optimizing model outputs over time.

**Practical Applications**

Prompt engineering finds applications in various NLP tasks and domains:
* **Text Generation:** Crafting prompts that guide the model to generate coherent and contextually relevant text.
* **Machine Translation:** Designing prompts that facilitate accurate translation between languages.
* **Question Answering:** Structuring prompts to elicit precise answers to user queries.
* **Summarization:** Using prompts to generate concise summaries from longer texts.

**Challenges**
* **Prompt Design Complexity:** Designing effective prompts requires understanding both the capabilities of the language model and the nuances of the task at hand.
* **Interpreting Model Responses:** Ensuring that the generated outputs are not only accurate but also align with user expectations and requirements can be challenging.

**Conclusion**

Prompt engineering plays a pivotal role in maximizing the utility and accuracy of large-scale language models in real-world applications. By strategically designing prompts, practitioners can harness the full potential of these models to perform complex NLP tasks effectively and efficiently.

# **General Artificial Intelligence (Gen AI)**

---



General artificial intelligence (Gen AI) refers to the concept of developing AI systems that exhibit human-like intelligence across a wide range of tasks and domains, rather than being narrowly focused on specific tasks or applications. Gen AI aims to mimic the broad cognitive abilities of humans, such as perception, reasoning, learning, problem-solving, and natural language understanding. Here's a detailed explanation of Gen AI and its types:

## **Understanding Gen AI**
Gen AI seeks to create AI systems capable of:
* **Flexibility:** Performing well across diverse tasks and domains.
* **Adaptability:** Learning from new experiences and adjusting to novel situations.
* **Generalization:** Applying knowledge and skills learned in one context to solve problems in different contexts.

Unlike narrow AI, which excels at specific tasks like playing chess or recognizing images, Gen AI aims to achieve a level of cognitive versatility and autonomy comparable to human intelligence. While achieving true Gen AI remains a long-term goal, researchers and developers are making strides in advancing AI capabilities towards this vision.

## **Types of Gen AI**
1. **Cognitive Computing**
  * **Description:** Cognitive computing systems aim to simulate human thought processes, such as perception, reasoning, and decision-making.
  * **Examples:** IBM Watson, which processes natural language and generates insights from data, is an example of cognitive computing applied in healthcare, finance, and other industries.
2. **Self-Learning Systems**
  * **Description:** These systems improve their performance over time by continuously learning from new data and experiences.
  * **Examples:** Deep learning models used in speech recognition, image processing, and autonomous vehicles are self-learning systems that refine their abilities with more exposure to data.
3. **Transfer Learning**
  * **Description:** Transfer learning involves leveraging knowledge gained from one task or domain to improve performance in another related task or domain.
  * **Examples:** Pre-trained language models like BERT and GPT, which are initially trained on large datasets for general language understanding, can be fine-tuned for specific NLP tasks such as sentiment analysis or question answering.
4. **Adaptive Intelligence**
  * **Description:** Adaptive intelligence refers to AI systems that can dynamically adjust their behavior and responses based on changes in their environment or user interactions.
  * **Examples:** Personalized recommendation systems in e-commerce and streaming platforms that adapt recommendations based on user preferences and behavior patterns.
5. **Autonomous Agents**
  * **Description:** Autonomous agents are AI systems capable of acting independently in complex environments, making decisions and taking actions to achieve specific goals.
  * **Examples:** Autonomous vehicles that navigate roads safely and make real-time decisions based on traffic conditions and obstacles.
6. **Creativity and Generative Models**
  * **Description:** AI systems that can generate creative outputs, such as art, music, or stories, often through generative models.
  * **Examples:** Generative Adversarial Networks (GANs) used for generating realistic images, or text-based models like GPT-3 capable of generating coherent and contextually relevant text.

### **Challenges and Considerations**
* **Ethical Implications:** As Gen AI evolves, ethical considerations regarding privacy, bias, accountability, and the impact on jobs and society become increasingly important.
* **Technical Complexity:** Achieving Gen AI requires advances in areas such as unsupervised learning, reasoning, common-sense understanding, and human-machine interaction.
* **Integration with Human Expertise:** Effective collaboration between AI systems and human experts is crucial for solving complex problems and leveraging AI's strengths in decision-making and problem-solving.

### **Future Directions**
The pursuit of Gen AI involves interdisciplinary research spanning computer science, cognitive science, neuroscience, and ethics. While significant progress has been made in narrow AI applications, the development of true Gen AI remains a grand challenge with the potential to revolutionize industries, healthcare, education, and many aspects of daily life.

In summary, Gen AI represents the aspiration to create AI systems that possess broad intelligence and versatility akin to human cognition, capable of adapting and excelling in diverse tasks and environments.


# **Generative AI**


---


Generative AI refers to a subset of artificial intelligence focused on creating new content, such as images, text, music, and even videos, that mimic or are inspired by human creativity. Unlike traditional AI models that are designed for specific tasks (like classification or prediction), generative AI models are capable of generating entirely new data based on patterns and examples in the training data. Here's a detailed explanation of generative AI and its types:


## **Understanding Generative AI**
Generative AI models are primarily based on machine learning techniques that enable them to learn from large datasets and generate outputs that resemble human-created content. These models often use neural networks, especially generative models like Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and autoregressive models.

## **Types of Generative AI**
1. **Generative Adversarial Networks (GANs)**
  * **Description:** GANs consist of two neural networks: a generator and a discriminator. The generator creates new data instances, such as images, from random noise or a seed, while the discriminator tries to distinguish between real data and data generated by the generator.
  * **Applications:** GANs are widely used for generating high-quality images, realistic textures, and even videos. They are also applied in domains such as image synthesis, style transfer, and data augmentation.
  * **Example:** Creating photorealistic images of non-existent human faces (as seen in projects like This Person Does Not Exist).

2. **Variational Autoencoders (VAEs)**
  * **Description:** VAEs are generative models that learn a latent representation of input data, which can then be used to generate new data points. They consist of an encoder network that maps input data to a latent space and a decoder network that reconstructs the input from the latent space.
  * **Applications:** VAEs are used for generating diverse and structured outputs, such as images, by sampling from the learned latent space. They are also employed in anomaly detection and data generation tasks.
  * **Example:** Generating new images based on learned representations of a dataset (e.g., generating variations of handwritten digits).

3. **Autoregressive Models**
  * **Description:** Autoregressive models generate data sequentially, where each element in the sequence is generated based on previously generated elements. They model the conditional probability of each element given the previous elements.
  * **Applications:** Autoregressive models are used in generating sequences such as text, audio, and time-series data. They are effective in capturing dependencies within sequences and generating coherent outputs.
  * **Example:** Generating natural language text, where each word is generated based on the preceding words (e.g., language models like GPT-3).

4. **Flow-based Models**
  * **Description:** Flow-based models transform a simple input distribution into a more complex output distribution through a series of invertible transformations. They are designed to model complex data distributions without explicitly modeling the latent space.
  * **Applications:** Flow-based models are used in generating high-quality images, videos, and audio samples. They excel in generating sharp and realistic samples compared to other generative models.
  * **Example:** Generating high-resolution images or realistic animations using learned transformations of input data.

## **Applications of Generative AI**
* **Image Synthesis:** Creating realistic images, textures, and scenes for applications in art, design, and entertainment.
* **Text Generation:** Generating human-like text, including poetry, stories, and dialogue for chatbots and creative writing applications.
* **Music Composition:** Creating new compositions and generating melodies, harmonies, and rhythms based on learned patterns from existing music.
* **Video Generation:** Producing video sequences and animations, ranging from simple animations to complex visual effects.

## **Challenges and Considerations**
* **Quality vs. Diversity:** Balancing the quality and diversity of generated outputs remains a challenge, where models must generate both realistic and diverse content.
* **Ethical and Legal Concerns:** Issues such as copyright infringement, misuse of generated content, and ethical considerations in AI-generated content need careful attention.
* **Computational Resources:** Generative AI models often require substantial computational resources and training data, making scalability and efficiency important considerations.

## **Future Directions**
Generative AI continues to advance rapidly, driven by innovations in deep learning architectures, increased computational power, and larger and more diverse datasets. Future research focuses on improving the realism, diversity, and controllability of generated content, as well as addressing ethical and societal implications.

In summary, generative AI represents a powerful capability in artificial intelligence, enabling machines to create new and meaningful content across various domains. From generating art and music to enhancing creative workflows and personalizing user experiences, generative AI is poised to revolutionize how we interact with technology and create content in the digital age.


## __Architecture for models that listed in resume__

### __InceptionResNetV2__

The InceptionResNetV2 model is a deep convolutional neural network that combines the strengths of the Inception architecture and Residual Networks (ResNets). Here’s a brief explanation of its architecture:

* __Inception Modules:__ Perform parallel convolutions with different filter sizes to capture features at multiple scales.

* __Residual Connections:__ Provide shortcut paths to help gradients flow and ease training of deep networks.

* __Reduction Modules:__ Downsample feature maps to reduce spatial dimensions and increase depth.

* __Stem Block:__ Initial series of convolutions to reduce input image dimensions and increase depth.

* __Inception-ResNet Blocks:__ Combines Inception modules with residual connections, divided into A, B, and C types.

* __Grid Size Reduction:__ Uses specialized reduction blocks to transition between network stages.

* __Final Layers:__ Includes global average pooling, a fully connected layer, and a softmax layer for classification.

### __RoBERTa__

* __Transformer Architecture:__ Based on the Transformer architecture, utilizing self-attention mechanisms for capturing relationships between words.

* __Pre-training Strategy:__ Trained on large-scale datasets with masked language modeling (MLM) and next sentence prediction (NSP) objectives to learn contextual representations.

* __Robust Pre-training:__ Extends BERT by leveraging larger batch sizes, longer training duration, and more data to achieve improved performance.

* **SentencePiece Tokenization:** Uses SentencePiece for tokenization, enabling more efficient handling of various languages and subword units.

* **Training Enhancements:** Incorporates dynamic masking during pre-training and removes the NSP task to enhance model performance and efficiency.

* **Fine-tuning:** Adaptable to downstream tasks through fine-tuning, demonstrating strong performance across a range of natural language understanding tasks.

### __BERT (Bidirectional Encoder Representations from Transformers)__

* **Transformer Architecture:** Utilizes the Transformer architecture, focusing on self-attention mechanisms to capture word relationships.

* **Bidirectional Context:** Trained to understand context from both left and right directions simultaneously, improving understanding of word meanings.

* **Masked Language Model (MLM):** Pre-training involves masking some words in a sentence and predicting them to learn contextual embeddings.

* **Next Sentence Prediction (NSP):** Also pre-trained to predict if one sentence follows another, aiding in understanding sentence relationships.

* **Pre-training:** Conducted on large-scale text corpora to learn deep contextual representations.

* **Fine-tuning:** Adaptable to various downstream tasks (e.g., Q&A, classification) through fine-tuning on task-specific data.

### __SMOTE (Synthetic Minority Over-sampling Technique)__

* **Oversampling Technique:** Generates synthetic samples for minority class by interpolating between existing minority class instances.

* **Addressing Class Imbalance:** Specifically designed to mitigate class imbalance in datasets where minority class instances are significantly fewer than majority class.

* **Algorithm Mechanism:** Creates synthetic examples along line segments joining k-nearest neighbors of each minority class instance in feature space.

* **Implementation:** Implemented using k-nearest neighbors algorithm to determine where synthetic samples are created in feature space.

* **Advantages:** Helps improve model performance on imbalanced datasets by providing more representative training data for minority class.

* **Considerations:** Requires careful tuning of parameters such as the number of neighbors (k) and the amount of oversampling to avoid overfitting.

### **Adam optimizer**
The Adam (Adaptive Moment Estimation) optimizer is a popular algorithm used in training deep learning models. It combines the advantages of two other extensions of stochastic gradient descent (SGD), namely AdaGrad and RMSProp.

* **Adaptive Learning Rates:** Adam computes individual adaptive learning rates for different parameters from estimates of first and second moments of the gradients.

**Advantages:**

* **Efficient:** Computationally efficient with little memory requirement.
* **Adaptive Learning Rates:** Provides adaptive learning rates for each parameter.
* **Robust:** Works well in practice and often used as a default optimizer.

Adam is widely used due to its efficiency and effectiveness in handling sparse gradients and noisy problems often encountered in deep learning tasks.