<a href="https://colab.research.google.com/github/ajw109/SkdimoreCodes/blob/main/Chapter1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Key Terms and Definitions**

This is a list of the key terms and definitions that I have picked up through my study of the fundamentals of language models and prompt engineering. These definitions were predomenantly derived from the book *Hands-On Large Language Models*.
<br>

* <u>Language AI</u>: A sub-field of artificial intelligence (AI) that focuses on developing technologies that can of understand, process, and generating human language.  

* <u>Natural Language Processing:</u> Can be used interchangeably with Language AI and is abbreviated as "NLP".

* <u>Large Language Model:</u> A sophisticated AI model within the realm of Language AI, specifically engineered to analyze, and, in some cases generate human language using patterns and insights derived from large amounts of data.

* <u>Representation vs Generation Language Models:</u>
  * <u>Representation models:</u> Large language models (LLMs) that do not generate text but are commonly used for task-specific use cases, like classification. They focus on creating embeddings, and are referred to as encoder-only models.
  * <u>Generation models:</u> LLMS that that generate text, like generative pre-trained transformer (GPT) models. They focus primarily on generating text and are not trained to generate embeddings. Referred to as decoder-only models and completion models.

* <u>Embeddings:</u> Vector representations of data that attempt to capture its meaning.

* <u>Node:</u> Represents a specific data point or element within the model's architecture, which takes weights, performs calculations, and produces output.

* <u>Weights:</u> The strength and direction of the influence one node has on another.

* <u>Recurrent Neural Networks (RNNs):</u>
  * Variants of neural networds that can model sentences as an additional input.
  * RNNS are used for 2 tasks:
    * Encoding: representing an input sentence
    * Decoding: generating an output sentence
  * Autoregressive: each previous output token is used as input to generate the next token.

* <u>Attention:</u> Allows a model to focus on parts of the input sequence that are relevant to one another and amplify their signal.

* <u>Transformer:</u>
  * Soley based on the attention mechanism and removed the recurrence network.
  * Trains in parallel.
  * Encoder and decoder components are stacked on top of each other.
  * Remains autoregressive, needing to consume each generated word before consuming a new word.
  * The encoder block consists of two parts:
    * Self-attention
    * Feedforward neural network
  * The decoder has an additional attention layer that attends to the output of the encoder.

* <u>Self-attention:</u>
  * Can attend to different positions within a single sequence, thereby more easily and accurately representing the input.
  * Instad of processing one token at a time, it can be used to look at the entire sequence in one go.

* <u>Masked Language Modeling:</u> This method masks a part of the input for the model to predict.

* <u>Transfer Learning:</u> Involves using knowledge gained from a pre-trained model, then fine-tuning the model for a specific task. Allows LLMs to efficiently adapt to specific applications.

* <u>Generative Pre-trained Transformer (GPT):</u> Uses a decoder-only architecture and removes the encoder-attention block.

* <u>Generative LLMS:</u>
  * Take in some text, and attempt to complete it.
  * Also can be trained as a chatbot.
    * Takes in a user query (prompt)
    * Outputs a response that would most likely follow that prompt.

* <u>Context Window/Length:</u> The maximum number of tokens the model can process.

* <u>Foundational Models:</u> Open source based models. Can be fine tuned for specific tasks.

* <u>Traditional Machine Learning:</u> Training a model for a specific target task, like classification or regression.

* <u>Two-step Approach to Training LLMS:</u>
  * Pretraining (Language Modeling)
  * Fine-tuning (Specific Task)

* <u>Supervised Learning:</u> Uses labeled data, where each input has a corresponding correct output, to train a model to make predictions.

* <u>Unsupervised Learning:</u> Uses unlabeled data to discover patterns and relationships without predefined outputs.

* <u>Semantic Search:</u> Involves understanding the meaning and intent behind a user's query, rather than just matching keywords. It NLP to analyze the meaning of words and phrases in a search query and find the most relevant results.

* <u>Closed Source/Proprietary Models:</u> Owned and controlled by a specific entity, often a company, and their usage is typically governed by licensing terms. Do not have their weights and architecture
shared with the public.

* <u>Public/Open LLMS:</u> Open for anyone to access, use, modify, and distribute, often under a license that permits these actions. They share their weights and architecture with the public.

* <u>API:</u> An API, or Application Programming Interface, defines how different software components or systems can communicate and exchange data or services.

* <u>Backend Packages:</u> Packages without a GUI that are created for efficiently loading and running
any LLM on your device.

**Prompt Engineering**

* <u>Temperature:</u>
  * Controls the randomness or creativity of the text generated.
  * A temperature of 0 generates the same response every time, and a higher vlaue allowing less probable words to be generated.
  * High temperature: results in a more diverse output / stochastic behavior.
  * Low temperature: creates a more deterministic output.  
  <br>

### **A Brief Timeline**

* <u>Bag-of-Words Model</u>
  * A representation model used for representing unstructured text.
  * Steps:
    * Step 1: Tokenization
    * Step 2: Combine all unique words from each sentence to create a vocabulary
    * Step 3: Count how often a word in each sentence appears
  * Outcome: Creates a representation of text in the form of numbers, also called a vector representation.
  * Issue: Ignores the semantic nature of text.

* <u>word2vec</u>
  * Released in 2013, was one of the first successful attempts at capturing the meaning of text in embeddings.
  * Words will always have the same embeddings regardless of the context it is used in, which presents an issue.

* <u>BERT:</u>
  * Bidirectional Encoder Representations from Transformers.
  * Released by Google in 2018.
  * Encoder-only architecture that focuses on representing language.
  * The encoder blocks are self-attention followed by feedforward neural networks.
  * CLS: Classification token, represents the entire input.
  * Major advancement: Contextual embeddings that vary depending on the word's usage in a sentence.

* <u>GPT (Generative Pre-trained Transformer)</u>
  * Decoder-only transformer model released by OpenAI.
  * Focuses on text generation tasks.
  * Major contribution: Unified pretraining and fine-tuning framework for NLP.

### **Generating My First Text**

<u>When you use an LLM, two models are loaded:</u>
* The generative model itself
* Its underlying tokenizer

<u>Steps:</u>
1. Open Google CoLab.
2. Switch your runtime to GPU type T4.
  *  Go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4
3. Install the dependencies for chapter 1 of *Hands-On Large Language Models*.
4. Load the model onto the GPU for faster inference.
5. Wrap the model and tokenizer in a pipeline to encapsulate the model, tokenizer, and text generation process into a single function.
6. Create a prompt as a user and give it to the model.

In [None]:
# Install the dependencies for chapter 1

%%capture
!pip install transformers>=4.40.1 accelerate>=0.27.2

In [None]:
# The first step is to load our model onto the GPU for faster inference.
# Note that we load the model and tokenizer separately (although that isn't always necessary).

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

In [None]:
# Although we can now use the model and tokenizer directly,
# it's much easier to wrap it in a pipeline object.
# transformers.pipeline encapsulates the model, tokenizer, and text
# generation process into a single function

from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    # Setting this to false, the prompt won't be returned, but just the output of the model
    return_full_text=False,
    # Max number of token the model will generate
    max_new_tokens=500,
    # By setting to false, the model will always select the next most probable token
    do_sample=False
)

In [None]:
# Finally, we create our prompt as a user and give it to the model:
# The prompt (user input / query)

messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])
