## Open notebook in:
| Colab                                 |  Gradient                                                                                                                                         |
|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/https://github.com/Nicolepcx/transformers-the-definitive-guide/blob/main/CH02/ch02_llama_index_llama3.ipynb)                                              | [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com//github.com/Nicolepcx/transformers-the-definitive-guide/blob/main/CH02/ch02_llama_index_llama3.ipynb)|             

# About this notebook


In this notebook you perform:
- Named Entity Recognition
- Text Summarization

# About this notebook


In this notebook you download a file from a publicly accessible file from GoogleDrive and process it with LlamaIndex. You will load the Llama 3 model with [quantization](https://huggingface.co/docs/bitsandbytes/main/en/index) to leverage an optimized, less resource-hungry version of the model for these tasks.


# Installs

In [1]:
!pip -q install llama-index-llms-huggingface==0.1.5 \
                llama-index-embeddings-huggingface==0.2.0 \
                loralib==0.1.2 \
                sentencepiece==0.1.99 \
                bitsandbytes==0.43.0 \
                accelerate==0.28.0 \
                llama-index==0.10.33

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m31.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m84.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m54.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m23.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m35.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.9/141.9 kB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━

# Imports

In [None]:
!pip install flash-attn --no-build-isolation -q

In [None]:
import os
import requests
import torch
import transformers
from textwrap import TextWrapper

from huggingface_hub import HfApi, HfFolder

from transformers import (AutoModelForCausalLM,
                          AutoTokenizer,
                          PreTrainedTokenizer,
                          PreTrainedModel,
                          BitsAndBytesConfig,
                          pipeline
                        )

from llama_index.core import (SummaryIndex,
                              VectorStoreIndex,
                              SimpleDirectoryReader,
                              StorageContext,
                              load_index_from_storage,
                              Settings,
                              PromptTemplate
)

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.llms import ChatMessage

In [None]:
def print_wrapper(print):
    """Adapted from: https://stackoverflow.com/questions/27621655/how-to-overload-print-function-to-expand-its-functionality/27621927"""

    def function_wrapper(text):
        if not isinstance(text, str):
            text = str(text)
        wrapper = TextWrapper()
        return print("\n".join([wrapper.fill(line) for line in text.split("\n")]))

    return function_wrapper

print = print_wrapper(print)

In [None]:
def download_file(url, destination_folder):
    """
    Download a file from a URL to the specified destination folder.
    Attempts to use the original filename from the Content-Disposition header.
    """
    # Ensure the destination folder exists
    if not os.path.exists(destination_folder):
        os.makedirs(destination_folder)

    # Get the file content from the URL
    response = requests.get(url, allow_redirects=True)
    response.raise_for_status()  # Raise an exception for HTTP errors

    # Try to fetch the filename from the content disposition header
    content_disposition = response.headers.get('content-disposition')
    if content_disposition:
        # Extract filename from content_disposition
        filename = content_disposition.split('filename=')[1].strip('"')
    else:
        # If no filename is found in the headers, default to a filename
        filename = "default_filename.txt"

    # Create the full path for the local file
    local_file_path = os.path.join(destination_folder, filename)

    # Write the file content in binary mode to the local file
    with open(local_file_path, 'wb') as f:
        f.write(response.content)

    return local_file_path

In [None]:
# Hugging Face access token
hf_token = "your_access_token"

# HfFolder to save the token for subsequent API calls
HfFolder.save_token(hf_token)

In [None]:
# Infos about chat template for llama 3: https://github.com/meta-llama/llama-recipes
system_prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
                    You are a helpful, respectful, and honest assistant.
                    <|eot_id|><|start_header_id|>user<|end_header_id|>
                """

# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = PromptTemplate("{query_str}<|eot_id|><|start_header_id|>assistant<|end_header_id|>")

In [None]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id, token=hf_token)

stopping_ids = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("<|eot_id|>"),
]

# BitsAndBytes configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_in_8bit=False, # You can optionally load it in 8bit
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="fp4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

llm = HuggingFaceLLM(
    model_name=model_id,
    max_new_tokens=512,
    model_kwargs={
        "token": hf_token,
        "quantization_config": bnb_config
    },
    generate_kwargs={
        "do_sample": True,
        "temperature": 0.6,
        "top_p": 0.9,
    },
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name=model_id,
    tokenizer_kwargs={"token": hf_token},
    stopping_ids=stopping_ids,
    device_map="auto",
)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
Settings.llm = llm

In [None]:
input_text = """
Tim Cook is CEO of Apple. Apple is an American multinational
corporation and technology company headquartered in Cupertino,
California, in Silicon Valley.
"""

text = f"Find all entities in the following \n\n {input_text}, and return only the entities."

print(text)

Find all entities in the following


Tim Cook is CEO of Apple. Apple is an American multinational
corporation and technology company headquartered in Cupertino,
California, in Silicon Valley.
, and return only the entities.


In [None]:
messages = [
    ChatMessage(role="system", content="You are and entities expert and can find all entities in a text."),
    ChatMessage(role="user", content=text),
]
response = llm.chat(messages)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [None]:
print(response)

assistant: assistant

Here are the entities found in the text:

* Tim Cook (Person)
* Apple (Organization)
* California (Location)
* Cupertino (Location)
* Silicon Valley (Location)


In [None]:
# List of publicly shared Google Drive PDF file URLs
urls = [
    "https://drive.google.com/uc?export=download&id=1EhXzZd2YHs0qzMwxF3EzaljMfVgLDYEK",
]

# Destination folder
destination_folder = "data"

# Download each file
for url in urls:
    print(f"Downloading from {url}...")
    file_path = download_file(url, destination_folder)
    print(f"Saved to {file_path}")


Downloading from https://drive.google.com/uc?export=download&id=1EhXzZ
d2YHs0qzMwxF3EzaljMfVgLDYEK...
Saved to data/medical_record.txt


# Index Setup

In [None]:
documents = SimpleDirectoryReader('./data').load_data()
len(documents)

1

In [None]:
# Embedding model - You need to add this
# otherwise it will ask yu for OpenAI credentials
embed_model = HuggingFaceEmbedding(
    model_name="hkunlp/instructor-large"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [None]:
Settings.llm = llm
Settings.embed_model = embed_model
Settings.num_output = 256
Settings.context_window = 4096
Settings.chunk_size = 512
Settings.chunk_overlap = 64

In [None]:
vector_index = VectorStoreIndex.from_documents(documents)

In [None]:
print(
    vector_index.as_query_engine(
        llm=llm,
    ).query("Provide a short summary of the patient record of Pamela Rogers")
)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




Here is a short summary of the patient record of Pamela Rogers:

Pamela Rogers, a 56-year-old woman, was admitted to the emergency
department with a chief complaint of chest pains. She reported
experiencing dull and aching chest pain, which radiates to her neck,
accompanied by shortness of breath. The pain occurs approximately once
a week, usually after working in her garden or engaging in physical
activity. The patient has a history of hypertension, diagnosed 3 years
ago, but has never been told she has heart problems. She does not
smoke or have diabetes. Her physical examination revealed normal vital
signs, no abnormal findings on her skin, HEENT, and neurological
examination, and a grade 2/6 systolic decrescendo murmur in the second
right intercostal space.
