<a href="https://colab.research.google.com/github/anastaszi/GenAI/blob/main/introduction_to_llms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Large Language Model Basics
This notebook serves as a primer on large language models. This notebook will briefly discuss:
- How LLMs are different from other language models
- Why LLMs can be so powerful
- A brief synopsis of the transformer architecture
- An introduction to flow chaining with examples

If you're already knowledgeable on the information above, feel free to skip this notebook. Additionally if you're looking for fundamentals of NLP as a whole, please reference the ODSC NLP course.

## How do LLMs Differ from Language Models?


LLMs, or Large Language Models, differ from other language models primarily in their size and complexity. LLMs are characterized by their immense scale, with billions of parameters, which allows them to capture and learn from a vast amount of textual data. This large-scale training enables LLMs to generate more coherent and contextually relevant responses compared to smaller language models.

Additionally, LLMs often employ advanced architectures, such as Transformer models, that utilize self-attention mechanisms to better understand the relationships between words and phrases in a given text. This architecture allows LLMs to capture long-range dependencies and generate more accurate and coherent outputs. See below for a visual representation of this architecture

Another crucial aspect of LLMs is their training process. They are trained on large, diverse datasets, which helps them acquire a wide range of knowledge and linguistic patterns. This training process enables LLMs to perform well on various natural language processing tasks, including text completion, summarization, translation, and question answering.

Overall, the significant differences between LLMs and other language models lie in their scale, complexity, and advanced architectures, which contribute to their improved performance in understanding and generating human-like text.

## Why are LLMs so Powerful?

The increased power of LLMs compared to other language models primarily stems from their scale and the amount of training data they are exposed to. By training on vast amounts of diverse text data, LLMs acquire a broad understanding of language patterns, grammar, and semantics. This enables them to generate more contextually accurate and coherent responses.

The large number of parameters in LLMs allows for a higher degree of complexity in modeling language. These models can capture intricate relationships between words and phrases, including long-range dependencies, which leads to more nuanced and sophisticated outputs. LLMs are capable of understanding context, making inferences, and generating text that closely resembles human-generated content.

Furthermore, the sheer size of LLMs allows them to encapsulate a wealth of knowledge. They learn from a diverse range of sources, including books, articles, and websites, and can recall and apply that knowledge when generating responses. This broad knowledge base enhances their ability to provide informative and accurate answers across various domains.

The power of LLMs also lies in their versatility. They can be fine-tuned on specific tasks or domains, which further enhances their performance and adaptability. This flexibility makes LLMs applicable to a wide range of applications, from chatbots and virtual assistants to content generation and language translation.


## The Transformer Architecture

The **Transformer architecture** and the **attention mechanism** are key components of modern language models, including LLMs.

1. Transformer Architecture: The Transformer architecture is a neural network model that revolutionized natural language processing tasks. It was introduced in the paper [*Attention is All You Need*](https://arxiv.org/abs/1706.03762) in 2017. Unlike previous models that relied on recurrent or convolutional layers, the Transformer employs a purely attention-based approach. The core idea of the Transformer is to process entire sequences of words simultaneously rather than sequentially. It consists of two main components: an encoder and a decoder. The encoder takes an input sequence and processes it in parallel, encoding the information about the relationships between the words. The decoder then generates an output sequence based on the encoded input.

2. Attention Mechanism: The attention mechanism is a fundamental component of the Transformer architecture. It allows the model to focus on specific parts of the input sequence while processing each word. The attention mechanism calculates attention weights for each word in the input sequence, indicating its relevance to the current word being processed. The attention mechanism works by computing the similarity between a target word and all the words in the input sequence. This similarity, often measured using dot product or other scoring functions, determines the importance or weight assigned to each word. The attention weights are then used to compute a weighted sum of the input words, which provides a context vector for the current word in the sequence.

By incorporating the attention mechanism, the Transformer can capture long-range dependencies and give more importance to relevant words in the input sequence. This attention-based approach has proven highly effective in language modeling tasks, allowing models to generate more coherent and contextually relevant outputs. Something important to understand is that the transformer architecture has non-deterministic elements, which means there will always be some level of variability in models that leverage this architecture. This is why you could ask an LLM the same thing three times, and get three different answers.

![transformer_architecture](./images/transformer_architecture.png)

## Applications of LLMs
While the power of LLMs live in their ability to adapt to various situations, there are handful of common applications that we'll reference several time during the course. For a given task, there's always the option to use an LLM out of the box (no prompting, fine tuning, etc.)

#### **Zero-shot learning**
Zero-shot learning refers to the ability of these language models to generate responses or perform tasks for which they have not been explicitly trained or provided with any examples. Zero-shot learning leverages the general knowledge and understanding of language encoded in the LLMs to generate contextually appropriate responses in unseen scenarios.

Let's look at an example. We'll be leveraging the transformers library from Huggingface, namely the pipeline module which is a high level API that useful for getting started with language models quickly.

In [None]:
from transformers import pipeline
# Specify the bart large model to be used for zero shot classification
classifier = pipeline("zero-shot-classification",
                      model="facebook/bart-large-mnli",
                     framework="pt")


Downloading pytorch_model.bin:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
sequence_to_classify = "one day I will see the world"
candidate_labels = ['travel', 'cooking', 'dancing']
classifier(sequence_to_classify, candidate_labels)

{'sequence': 'one day I will see the world',
 'labels': ['travel', 'dancing', 'cooking'],
 'scores': [0.9938652515411377, 0.0032737920992076397, 0.0028610059525817633]}

#### **Few-shot learning**
Similar to zero-shot learning, few-shot learning refers to the capability of these language models to generalize and learn from a limited number of examples or prompts for new tasks or classes. Unlike traditional machine learning algorithms that require a large amount of labeled data for training, few-shot learning enables LLMs to quickly adapt and generate responses based on a small set of examples.

In few-shot learning, the LLM is provided with a small number of labeled examples or prompts for a specific task or class. These examples serve as a training set, allowing the LLM to learn the patterns and characteristics associated with that task or class. The model leverages its pre-existing knowledge and language understanding to generalize from these limited examples and generate responses or perform the desired task accurately.

The few-shot learning approach often involves techniques such as meta-learning, where the LLM is trained to quickly adapt to new tasks or classes based on a few examples. These techniques aim to capture the underlying patterns and similarities between the examples and utilize that knowledge to make predictions or generate relevant text.

In [None]:
prompt = """ For each tweet, classify the sentiment:

Tweet: "I hate spinach"
Sentiment: negative
###
Tweet: "I love apples"
Sentiment: positive
###
Tweet: "The sky is blue"
Sentiment: neutral
###
Tweet: "Eating watermelon in the summer brings me immense joy"
Sentiment:"""

In [None]:
few_shot_learning_pipeline = pipeline(task="text-generation", model="EleutherAI/gpt-neo-1.3B",
                                      max_new_tokens=10, framework="pt")

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.35k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/5.31G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/200 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

In [None]:
# Since the ### sign implies the end of our prompt, we'll need to specify that as our end of sequence token
eos_token_id = few_shot_learning_pipeline.tokenizer.encode("###")

In [None]:
print(few_shot_learning_pipeline(prompt, eos_token_id=eos_token_id)[0]["generated_text"])

Setting `pad_token_id` to `eos_token_id`:21017 for open-end generation.


 For each tweet, classify the sentiment:

Tweet: "I hate spinach"
Sentiment: negative 
###
Tweet: "I love apples"
Sentiment: positive 
###
Tweet: "The sky is blue"
Sentiment: neutral
###
Tweet: "Eating watermelon in the summer brings me immense joy"
Sentiment: positive 

As the world struggles to cope


#### **Fine Tuning**

Fine-tuning refers to the process of further training a pre-trained language model on a specific task or domain using a smaller, task-specific dataset. Fine-tuning allows LLMs to specialize and adapt their knowledge to better perform a specific task or improve performance in a particular domain.

During fine-tuning, the pre-trained LLM, which has already learned general language patterns and semantics from a large corpus of data, is exposed to a new dataset that is specific to the target task or domain. This dataset is typically smaller in size and labeled with task-specific annotations or examples.

The fine-tuning process involves updating the parameters of the LLM using the new task-specific data while preserving the knowledge acquired during pre-training. The model is trained to refine its understanding and generate more accurate outputs for the specific task or domain it is being fine-tuned for.

By fine-tuning, LLMs can adapt their language generation capabilities to specific applications such as text completion, sentiment analysis, or named entity recognition. This process allows the LLM to leverage its pre-existing knowledge while aligning its performance with the requirements of the task or domain at hand.

Fine-tuning is beneficial because it allows LLMs to quickly adapt to new tasks or domains without starting the training process from scratch. The pre-trained LLMs serve as a strong foundation, and fine-tuning facilitates the transfer of knowledge to specific tasks, leading to improved performance and efficiency.

** Note that we'll discuss fine-tuning in great detail in lesson 3

## Flow Chaining
While these models can do some pretty incredible things with just the tasks they're trained on, the models can also be chained together to produce more complex pipelines. Flow chaining, in the context of LLMs, refers to the process of generating a sequence of outputs by iteratively feeding the model's previous output as input for the next step. It allows for the generation of longer and more coherent pieces of text, extending beyond a single prompt or query.

In flow chaining, the initial input prompt or query is given to the LLM to generate an output. Instead of stopping at the initial response, the generated text is then appended or concatenated to the input as a continuation, forming a new prompt. This concatenated prompt is then passed back to the model for further generation. This process is repeated iteratively to generate longer pieces of text, building a coherent flow of information or narrative.

Flow chaining can be beneficial in scenarios where a continuous and contextually consistent output is desired, such as story generation, dialogue systems, or document completion. By using the previous outputs as part of the input, the LLM can maintain consistency and coherence throughout the generated text.


![flow_chaining](./images/flow_chaining.png)

Consider the architecture above - say that this depicts the flow for a Q&A Bot. The goal of this application is to be able to answer questions about various documents. However, several models have token limits which means users can't just feed 5+ page documents into a Q&A model and expect a response. To combat this limitation, we can chain multiple models together. We can first use a summarization model, to summarized the documents then feed those summaries into the Q&A model to generate a result. This way we can provide a reasonable amount of context per document, without worrying about the model's token limit

Applications of flow chaining go far beyond this simple example. In lesson four we will dive deeper into chains, their capabilites and frameworks like [LangChain](https://github.com/hwchase17/langchain) that are built to support these larger LLM chains.