
# Detailed Explanation of the LLaMA3 Notebook

This notebook demonstrates various concepts and functionalities using the LLaMA3 model. In this detailed explanation, we will walk through each section of the notebook, providing clarity and context to the code and concepts presented.

## Importing the Necessary Libraries

We start by importing the essential libraries required for our tasks.


In [None]:
# This cell contains code that performs a specific task
!pip install transformers torch bitsandbytes accelerate


Collecting bitsandbytes
  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.30.1-py3-none-any.whl (302 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.6/302.6 kB[0m [31m42.7 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (7

In [None]:
# This cell contains code that performs a specific task
!python -m pip install huggingface_hub




In [None]:
# This cell contains code that performs a specific task
!huggingface-cli login



    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# Text Generation Pipeline with Transformers

## Detailed Explanation

This cell contains code that performs a specific task: initializing and using a text generation pipeline with a specified model from the Hugging Face `transformers` library. The key components and functionalities are explained below:

### Imports
- **transformers**: The main library for working with pre-trained transformer models.
- **torch**: A deep learning library used here for tensor operations.
- **AutoModelForCausalLM** and **AutoTokenizer**: Specific classes from the `transformers` library used to load pre-trained language models and tokenizers.
- **re**, **json**: Standard Python libraries for regular expressions and JSON manipulation.
- **IPython.display**: Used for displaying rich media (e.g., Markdown) in Jupyter Notebooks.
- **ipywidgets**: A library for creating interactive widgets in Jupyter Notebooks.

### Class: `TextGenerationPipeline`

#### `__init__` Method
- **Parameters**:
  - `model_id`: The identifier for the pre-trained model to be used.
  - `torch_dtype`: The data type for tensors (default is `torch.bfloat16`).
  - `load_in_4bit`: A flag indicating whether to load the model with 4-bit precision (default is `False`).

#### `load_model_and_tokenizer` Method
- Loads the pre-trained model and tokenizer based on the provided `model_id`.
- Returns the loaded model and tokenizer.

#### `format_messages` Method
- **Parameters**:
  - `messages`: A list of dictionaries, each representing a message with `role` and `content`.
- **Returns**:
  - A single formatted string that concatenates all messages, prefixed by their roles.

#### `generate_text` Method
- **Parameters**:
  - `messages`: A list of dictionaries, each representing a message with `role` and `content`.
  - `max_new_tokens`: The maximum number of new tokens to generate (default is `256`).
  - `temperature`: The sampling temperature (default is `0.6`). Higher values mean more random generations.
  - `top_p`: The cumulative probability for nucleus sampling (default is `0.9`).
- **Process**:
  - Formats the input messages into a prompt.
  - Tokenizes the prompt.
  - Generates text based on the input prompt and specified parameters.
  - Decodes and returns the generated text, excluding the prompt part.

### Usage
- Initialize the `TextGenerationPipeline` with the desired model.
- Use the `generate_text` method to produce text based on given input messages.

This setup allows for flexible and powerful text generation using state-of-the-art transformer models.

In [None]:
# This cell contains code that performs a specific task
import transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
import re
import json
from IPython.display import display, Markdown, clear_output
import ipywidgets as widgets

class TextGenerationPipeline:
    def __init__(self, model_id, torch_dtype=torch.bfloat16, load_in_4bit=False):
        self.model_id = model_id
        self.torch_dtype = torch_dtype
        self.load_in_4bit = load_in_4bit
        self.model, self.tokenizer = self.load_model_and_tokenizer()

    def load_model_and_tokenizer(self):
        model = AutoModelForCausalLM.from_pretrained(
            self.model_id,
            torch_dtype=self.torch_dtype,
            load_in_4bit=self.load_in_4bit
        )
        tokenizer = AutoTokenizer.from_pretrained(self.model_id)
        return model, tokenizer

    def format_messages(self, messages):
        return "\n".join([f"{message['role']}: {message['content']}" for message in messages])

    def generate_text(self, messages, max_new_tokens=256, temperature=0.6, top_p=0.9):
        prompt = self.format_messages(messages)
        inputs = self.tokenizer(prompt, return_tensors="pt")
        input_ids = inputs.input_ids
        attention_mask = inputs.attention_mask

        outputs = self.model.generate(
            input_ids,
            attention_mask=attention_mask,
            max_new_tokens=max_new_tokens,
            eos_token_id=self.tokenizer.eos_token_id,
            pad_token_id=self.tokenizer.eos_token_id,  # Set pad_token_id to eos_token_id
            do_sample=True,
            temperature=temperature,
            top_p=top_p,
        )
        generated_text = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        return generated_text[len(prompt):].strip()

# Interactive Chat Interface with Text Generation Pipeline

## Detailed Explanation

This cell contains code that performs a specific task: creating an interactive chat interface that uses a text generation pipeline to generate responses based on user input. The key components and functionalities are explained below:

### Class: `InteractiveChat`

#### `__init__` Method
- **Parameters**:
  - `pipeline`: An instance of the `TextGenerationPipeline` class used to generate text responses.
- **Attributes**:
  - `self.pipeline`: Stores the provided text generation pipeline instance.
  - `self.messages`: Initializes a list to store the chat history, starting with a system message.
  - `self.input_box`: Creates a text input widget for the user to type questions.
  - `self.output_area`: Creates an output area widget to display responses.
  - `self.progress_label`: Creates a label widget to display the progress status.
- **Display**:
  - Displays the input box, progress label, and output area widgets in the notebook interface.
- **Event Handling**:
  - Sets up an event listener on the input box to handle user input submission (`self.input_box.on_submit`).

#### `on_submit` Method
- **Parameters**:
  - `change`: An event object that contains the user input.
- **Process**:
  - Retrieves the user input from the event object.
  - Clears the input box after submission.
  - If the user input is "exit", the interaction ends, and the input box is closed.
  - Adds the user input to the chat history (`self.messages`).
  - Updates the progress label to indicate that a response is being generated.
  - Displays the user question in the output area.
  - Generates a response using the text generation pipeline.
  - Displays the generated response or an error message in the output area.
  - Updates the progress label to indicate that the response generation is complete.

#### `formatted_response` Method
- **Parameters**:
  - `output_string`: The generated text response from the model.
- **Process**:
  - Extracts code blocks from the response using regular expressions.
  - Formats the code blocks for Markdown display.
  - Displays the formatted response as Markdown in the notebook.

### Usage
- Initialize the `InteractiveChat` class with an instance of the `TextGenerationPipeline`.
- The interface allows users to type questions and receive generated responses interactively.
- The chat history and responses are displayed within the notebook, providing a seamless user experience.

This setup provides an interactive way to engage with a text generation model, making it suitable for workshops, demonstrations, and educational purposes.

In [None]:
# This cell contains code that performs a specific task
class InteractiveChat:
    def __init__(self, pipeline):
        self.pipeline = pipeline
        self.messages = [{"role": "system", "content": "Answer questions"}]
        self.input_box = widgets.Text(
            placeholder='Type your question here...',
            description='Your input:',
            style={'description_width': 'initial'},
            continuous_update=False
        )
        self.output_area = widgets.Output()
        self.progress_label = widgets.Label(value="")

        display(self.input_box, self.progress_label, self.output_area)

        self.input_box.on_submit(self.on_submit)

    def on_submit(self, change):
        user_input = change.value
        self.input_box.value = ''  # Clear the input box after submission
        if user_input.lower() == "exit":
            self.input_box.close()
            self.progress_label.value = "Interaction ended."
            return

        self.messages.append({"role": "user", "content": user_input})
        self.progress_label.value = "Generating response..."

        with self.output_area:
            clear_output(wait=True)
            print(f"★ Question: {user_input}")

        model_response = self.pipeline.generate_text(self.messages)

        with self.output_area:
            if model_response is not None:
                display(Markdown(f"#### ★ Question: {user_input} \n #### ➤ Response"))
                self.formatted_response(model_response)
                self.progress_label.value = "Response generated."
            else:
                print("Something went wrong!")
                self.progress_label.value = "Error in generating response."

    def formatted_response(self, output_string):
        code_blocks = re.findall(r'```(.*?)```', output_string, re.DOTALL)
        formatted_display = output_string
        for i in code_blocks:
            formatted_code_blocks = "```python" + i + "```"
            formatted_display = formatted_display.replace("```" + i + "```", formatted_code_blocks)
        return display(Markdown(formatted_display))

In [None]:
# This cell contains code that performs a specific task
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
pipeline = TextGenerationPipeline(model_id, load_in_4bit=True)

# Run the interactive chat
interactive_chat = InteractiveChat(pipeline)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
`low_cpu_mem_usage` was None, now set to True since model is quantized.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Text(value='', continuous_update=False, description='Your input:', placeholder='Type your question here...', s…

Label(value='')

Output()