<a href="https://colab.research.google.com/github/drpetros11111/Hands-On-Large-Language-Models/blob/drpp_study/chapter_1_introduction_to_language_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Chapter 1 - Introduction to Language Models</h1>
<i>Exploring the exciting field of Language AI</i>


<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961"><img src="https://img.shields.io/badge/Buy%20the%20Book!-grey?logo=amazon"></a>
<a href="https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/"><img src="https://img.shields.io/badge/O'Reilly-white.svg?logo=data:image/svg%2bxml;base64,PHN2ZyB3aWR0aD0iMzQiIGhlaWdodD0iMjciIHZpZXdCb3g9IjAgMCAzNCAyNyIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGNpcmNsZSBjeD0iMTMiIGN5PSIxNCIgcj0iMTEiIHN0cm9rZT0iI0Q0MDEwMSIgc3Ryb2tlLXdpZHRoPSI0Ii8+CjxjaXJjbGUgY3g9IjMwLjUiIGN5PSIzLjUiIHI9IjMuNSIgZmlsbD0iI0Q0MDEwMSIvPgo8L3N2Zz4K"></a>
<a href="https://github.com/HandsOnLLM/Hands-On-Large-Language-Models"><img src="https://img.shields.io/badge/GitHub%20Repository-black?logo=github"></a>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/HandsOnLLM/Hands-On-Large-Language-Models/blob/main/chapter01/Chapter%201%20-%20Introduction%20to%20Language%20Models.ipynb)

---

This notebook is for Chapter 1 of the [Hands-On Large Language Models](https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961) book by [Jay Alammar](https://www.linkedin.com/in/jalammar) and [Maarten Grootendorst](https://www.linkedin.com/in/mgrootendorst/).

---

<a href="https://www.amazon.com/Hands-Large-Language-Models-Understanding/dp/1098150961">
<img src="https://raw.githubusercontent.com/HandsOnLLM/Hands-On-Large-Language-Models/main/images/book_cover.png" width="350"/></a>


### [OPTIONAL] - Installing Packages on <img src="https://colab.google/static/images/icons/colab.png" width=100>

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to **uncomment and run** the following codeblock to install the dependencies for this chapter:

---

💡 **NOTE**: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to
**Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4**.

---

In [None]:
# %%capture
# !pip install transformers>=4.40.1 accelerate>=0.27.2

# Phi-3

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/11.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.2k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-4k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.5k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.44k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.94M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

# Using Hugging Face to load LLM models
This snippet is using the Hugging Face Transformers library to load a pre-trained language model called "Phi-3-mini-4k-instruct".


---


Here's a step-by-step explanation:

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
This line imports the necessary classes from the Transformers library.

##AutoModelForCausalLM

This is a general-purpose class for loading causal language models (models that predict the next token in a sequence).

##AutoTokenizer

This class is used to load the tokenizer associated with the model.

A tokenizer is responsible for converting text into numerical representations that the model can understand.

    model = AutoModelForCausalLM.from_pretrained(...)
    
This line loads the pre-trained Phi-3-mini-4k-instruct model.

##"microsoft/Phi-3-mini-4k-instruct"

This specifies the model name or path.

In this case, it's loading the model from the Hugging Face Model Hub.

##device_map="cuda"

This tells the model to load onto the GPU if available. This will speed up inference.

##torch_dtype="auto"

This lets the model automatically choose the best data type for the GPU.

##trust_remote_code=True

This is necessary for some models that have custom code associated with them.

##tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")

This line loads the tokenizer for the Phi-3-mini-4k-instruct model.

It uses the same model name to ensure that the tokenizer is compatible with the model.

In essence, this code snippet loads a pre-trained language model and its tokenizer, making it ready for use in your code.

The model is loaded onto the GPU for faster processing.

This sets the stage for generating text or performing other tasks with the language model.

Although we can now use the model and tokenizer directly, it's much easier to wrap it in a `pipeline` object:

In [None]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

# Text generation pipeline
This snippet is creating a text generation pipeline using the Hugging Face Transformers library.


---


Here's a step-by-step explanation:

    from transformers import pipeline

This line imports the pipeline function from the Transformers library.

The pipeline function provides a convenient way to use pre-trained models for various NLP tasks.

    generator = pipeline(...)

This line creates a pipeline object called generator that is specifically configured for text generation.

Let's look at the arguments passed to the pipeline function:

    "text-generation"

This specifies the task that the pipeline will perform, which is text generation.

    model=model

This provides the pre-trained model that was previously loaded using AutoModelForCausalLM.from_pretrained.

    tokenizer=tokenizer

This provides the tokenizer that was previously loaded using AutoTokenizer.from_pretrained.

    return_full_text=False

This indicates that the pipeline should only return the generated text, not the full input text.

    max_new_tokens=500

This sets the maximum number of tokens that the pipeline will generate.

    do_sample=False

This disables sampling, which means that the pipeline will always generate the most likely text according to the model.

This leads to deterministic outputs.
In essence, this code snippet creates a text generation pipeline using a pre-trained model and tokenizer.

It configures the pipeline to generate up to 500 tokens and to always generate the most likely text.

This pipeline can then be used to generate text by providing it with a prompt or input text.

Finally, we create our prompt as a user and give it to the model:

In [None]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])

The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
`get_max_cache()` is deprecated for all Cache classes. Use `get_max_cache_shape()` instead. Calling `get_max_cache()` will raise error from v4.48


 Why did the chicken join the band? Because it had the drumsticks!


# Prompt Handling
This snippet is using the previously created generator pipeline to generate a funny joke about chickens based on a user prompt.


---


Here's a breakdown:

    messages = [...]

This line defines the user prompt in a specific format that the pipeline expects.

It's a list containing a dictionary with two keys:

##"role"

This indicates the role of the message, which is "user" in this case.

##"content"
This contains the actual prompt or query, which is "Create a funny joke about chickens."

##output = generator(messages)

This line calls the generator pipeline with the messages list as input.

The pipeline processes the prompt and generates the output, which is stored in the output variable.

##print(output[0]["generated_text"])

This line extracts the generated text from the output variable and prints it to the console.

Since the output is a list containing a dictionary, output[0] accesses the first element (which is the dictionary), and ["generated_text"] accesses the value associated with the "generated_text" key, which is the actual generated text.

In essence, this code snippet sends a user prompt asking for a funny joke about chickens to the text generation pipeline.

The pipeline generates the joke based on its understanding of the prompt and the pre-trained model. The generated joke is then printed to the console.