# What is Gen AI?
-   Generative AI involves models that can generate new data resembling a given dataset. 
-   These models are trained to understand patterns in data and create outputs such as text, images, music, or other forms of content.

#### AI Journey
![image.png](attachment:image.png)

### Diffrent types of AI:
The different types of AI include:

-   Diagnostic/descriptive AI: Focuses on assessing the correctness of behavior by analyzing historical data to understand what happened and why.

-   Predictive AI: Concerned with forecasting future outcomes based on historical and current data.

-   Prescriptive AI: Focuses on determining the optimal course of action by providing recommendations based on data analysis.

- Generative/cognitive AI: Involved in producing various types of content, such as code, articles, images, and more.

- Reactive AI: Designed to respond to specific inputs with predetermined responses.

- Limited memory AI: Have the ability to use past experiences to inform current decisions.

- Theory of Mind AI: Advanced type of AI that aims to understand human emotions, beliefs, and intentions.

- Self-aware AI: Represents the most advanced form of AI, which has its own consciousness and self-awareness.

- Narrow AI (Weak AI): Designed to perform a specific task or a limited range of tasks.

- General AI (Strong AI): Can understand, learn, and apply knowledge across a wide range of tasks like human intelligence.

# What are language models?
-   Generative AI applications are powered by language models, which are a specialized type of machine learning model that you can use to perform natural language processing (NLP) tasks, including:
    -   Determining sentiment or otherwise classifying natural language text.
    -   Summarizing text.
    -   Comparing multiple text sources for semantic similarity.
    -   Generating new natural language.
-   A language model (LM) is a type of machine learning model trained to understand, generate, and manipulate natural language. 
-   It predicts the next word or sequence of words in a given context, making it essential for tasks involving text.

# What are trnsformer models?
-   The Transformer model is a deep learning architecture introduced by Vaswani et al. in 2017 in the paper "Attention is All You Need."
-   this has solved many limitations of earlier architectures like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTMs).

### Keyconcepts of Transformer model:
-   Attention Mechanism:
    -   The self-attention mechanism allows the model to focus on different parts of the input sequence when making predictions.
    -   For example, in a sentence like "The cat sat on the mat," the word "cat" might focus more on "sat" and less on "mat."
    -   This mechanism makes the Transformer effective at understanding long-range dependencies in sequences.
- Encoder-Decoder Structure: 
    - Encoder: Takes an input sequence of tokens and processes it to create a contextualized representation of the entire sequence.
    - Decoder: Generating an output sequence by interpreting the encoded input sequence produced by the encoder.

![image.png](attachment:image.png)

Steps performed in Transformation model:
1.   The model is trained with a large volume of natural language text, often sourced from the internet or other public sources of text.
2.  The sequences of text are broken down into tokens (for example, individual words) and the encoder block processes these token sequences using a technique called attention to determine relationships between tokens (for example, which tokens influence the presence of other tokens in a sequence, different tokens that are commonly used in the same context, and so on.)
3.  The output from the encoder is a collection of vectors (multi-valued numeric arrays) in which each element of the vector represents a semantic attribute of the tokens. These vectors are referred to as embeddings.
4.  The decoder block works on a new sequence of text tokens and uses the embeddings generated by the encoder to generate an appropriate natural language output.
5.  For example, given an input sequence like "When my dog was", the model can use the attention technique to analyze the input tokens and the semantic attributes encoded in the embeddings to predict an appropriate completion of the sentence, such as "a puppy".

# 

### Tokenization
-   Tokenization is the process of breaking down text into smaller units called tokens.


### Embeddings
- Embeddings are a powerful concept in machine learning that involve representing complex data (like text, images, or audio) as numerical vectors. 
-   These vectors capture the essence of the data, allowing machines to understand and process information in a way that's more meaningful than raw data.

In [None]:
#Word Embedding Using Word2Vec
from gensim.models import Word2Vec

# Sample sentences
sentences = [["I", "love", "NLP"], ["Transformers", "are", "amazing"]]

# Train Word2Vec model
model = Word2Vec(sentences, vector_size=50, window=5, min_count=1, workers=2)

# Get the embedding for a word
embedding = model.wv["NLP"]
print(embedding)


### Attention
-   Attention is a mechanism that allows a model to focus on specific parts of the input when making predictions. 
-   It helps the model decide which words (or tokens) in a sequence are most relevant to the current task. 
-   This is especially important in tasks like machine translation, text summarization, and generative text, where understanding relationships between tokens is key.

Process :
1. A sequence of token embeddings is fed into the attention layer. Each token is represented as a vector of numeric values.
2. The goal in a decoder is to predict the next token in the sequence, which will also be a vector that aligns to an embedding in the model’s vocabulary.
3. The attention layer evaluates the sequence so far and assigns weights to each token to represent their relative influence on the next token.
4. The weights can be used to compute a new vector for the next token with an attention score. Multi-head attention uses different elements in the embeddings to calculate 5. multiple alternative tokens.
5. A fully connected neural network uses the scores in the calculated vectors to predict the most probable token from the entire vocabulary.
6. The predicted output is appended to the sequence so far, which is used as the input for the next iteration.