
# LLama Index Demo - By Seth Steele
---

This is a simple demo of RAG on LLama-2


## 1. Change to GPU runtime
Click on "Runtime" -> "Change runtime type" and make sure "T4 GPU" is selected (the only GPU available on the free plan).

## 2. Install and login to the HuggingFace transformers library

The following snippet of code will:
1. Install the transformers and accelerate libraries that we will use to access and run the Llama model.
2. Initiate a login to your HuggingFace account.
3. Install the necessary packages and our LLama-2 LLM.

This second step is nessecary because, whilst Llama is an open-source model, access to it is still restricted to those who have been given access by Meta. Instructions for getting access to Llama + granting that access to your HuggingFace account can be found here: https://ai.meta.com/llama/get-started/


In [None]:
hf_token = "INSERT HUGGING FACE KEY HERE"

!huggingface-cli login --token #INSERT HUGGING FACE KEY HERE

!pip3 install llama-index-llms-anthropic
!pip3 install transformers
!pip3 install accelerate
!pip3 install bitsandbytes
!pip3 install datasets
!pip3 install peft
!pip3 install trl

!pip3 install llama-index
!pip3 install llama-index-llms-anthropic
!pip3 install llama-index-llms-huggingface
!pip3 install llama-index-embeddings-huggingface
!pip3 install llama-index-readers-file


from pathlib import Path
from peft import LoraConfig
from datasets import load_dataset
from trl import SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline, logging

from llama_index.core import PromptTemplate
from llama_index.core import ServiceContext, VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.llms.anthropic import Anthropic
from llama_index.readers.file import XMLReader

from google.colab import drive
drive.mount('/content/drive')
import torch

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful
Collecting llama-index-llms-anthropic
  Downloading llama_index_llms_anthropic-0.1.5-py3-none-any.whl (4.4 kB)
Collecting anthropic<0.18.0,>=0.17.0 (from llama-index-llms-anthropic)
  Downloading anthropic-0.17.0-py3-none-any.whl (848 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m848.2/848.2 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-llms-anthropic)
  Downloading llama_index_core-0.10.17-py3-none-any.whl (15.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.3/15.3 MB[0m [31m58.3 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from anthropic<0.18.0,>=0.17.0->llama-index-llms-anthropic)
  Downloading httpx-0.27.0-py3-no

**Note** - you may have to restart the runtime
by clicking "Runtime" -> "Restart runtime" after loading in the accelerator library for the subsequent code to run.

# 3. Setup The LLM

These are the settings that change the LLM in use to the 7 billion parameter model of Llama-2.

In [None]:
compute_dtype = getattr(torch, "float16")

baseModel = "meta-llama/Llama-2-7b-chat-hf"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    load_in_8bit=False,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = AutoModelForCausalLM.from_pretrained(
    baseModel,
    quantization_config=quant_config,
    device_map={"": 0}
)

llm.config.use_cache = False
llm.config.pretraining_tp = 1


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [None]:
datasetName = "ApoAlquaary/sau_university"
dataset = load_dataset(datasetName , split="train")

new_model = "llama-2-7b-chat-academy-test"

tokenizer = AutoTokenizer.from_pretrained(baseModel, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

peft_params = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="QUESTION_ANS",
)

training_params = TrainingArguments(
    output_dir="./results",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_steps=25,
    logging_steps=25,
    learning_rate=2e-4,
    weight_decay=0.001,
    fp16=False,
    bf16=False,
    max_grad_norm=0.3,
    max_steps=-1,
    warmup_ratio=0.03,
    group_by_length=True,
    lr_scheduler_type="constant",
    report_to="tensorboard"
)

trainer = SFTTrainer(
    model=llm,
    train_dataset=dataset,
    peft_config=peft_params,
    dataset_text_field="text",
    max_seq_length=None,
    tokenizer=tokenizer,
    args=training_params,
    packing=False,
)

trainer.model.save_pretrained(new_model)
trainer.tokenizer.save_pretrained(new_model)

HFllm = HuggingFaceLLM(
    model_name= new_model,
    tokenizer_name= new_model,
    query_wrapper_prompt=PromptTemplate("<s> [INST] {query_str} [/INST] "),
    context_window=3900,
    model_kwargs={"token": hf_token, "quantization_config": quant_config},
    tokenizer_kwargs={"token": hf_token},
    device_map="auto",
)

service_context = ServiceContext.from_defaults(llm=HFllm, embed_model="local:BAAI/bge-small-en-v1.5")

Downloading readme:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/114k [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]



Map:   0%|          | 0/77 [00:00<?, ? examples/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  service_context = ServiceContext.from_defaults(llm=HFllm, embed_model="local:BAAI/bge-small-en-v1.5")


config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

#  4. Load the data and build an index

The following code creates an index over the xml  documents in our test database from the google drive.

Consult the publicatons.xlsx file for more info on the data in the index.

In [None]:
def absoluteFilePaths(directory):
    files = []
    for dirpath,_,filenames in os.walk(directory):
        for f in filenames:
            files.append(os.path.abspath(os.path.join(dirpath, f)))
    return files

XMLfiles = absoluteFilePaths("/content/drive/Shareddrives/Darwin Team E/xml")

loader = XMLReader()
documentsXML = []
for file in XMLfiles:
  documentsXML = loader.load_data(file=Path(file))

index = VectorStoreIndex.from_documents(documentsXML, service_context = service_context)

['/content/drive/Shareddrives/Darwin Team E/xml/1699.xml', '/content/drive/Shareddrives/Darwin Team E/xml/5399.xml', '/content/drive/Shareddrives/Darwin Team E/xml/190.xml', '/content/drive/Shareddrives/Darwin Team E/xml/297.xml', '/content/drive/Shareddrives/Darwin Team E/xml/3611.xml', '/content/drive/Shareddrives/Darwin Team E/xml/2358.xml', '/content/drive/Shareddrives/Darwin Team E/xml/1955.xml', '/content/drive/Shareddrives/Darwin Team E/xml/2633.xml', '/content/drive/Shareddrives/Darwin Team E/xml/3466.xml', '/content/drive/Shareddrives/Darwin Team E/xml/5469.xml', '/content/drive/Shareddrives/Darwin Team E/xml/4006.xml', '/content/drive/Shareddrives/Darwin Team E/xml/313.xml', '/content/drive/Shareddrives/Darwin Team E/xml/3385.xml', '/content/drive/Shareddrives/Darwin Team E/xml/4932.xml', '/content/drive/Shareddrives/Darwin Team E/xml/4676.xml', '/content/drive/Shareddrives/Darwin Team E/xml/3726.xml', '/content/drive/Shareddrives/Darwin Team E/xml/5762.xml', '/content/drive/

# 5. Use the model to respond to a query
In this section we can write out our query and then get the model to respond.


The following line is simply to set our query, change this to whatever you would like to ask the model.

In [None]:
prompt ="Tell me about a paper titled Connectionist simulation of attitude learning: Asymmetries in the acquisition of positive and negative evaluations by JR Eiser?"

And then these final lines of code can be used to actually generate a response.

In [None]:
query_engine = index.as_query_engine(verbose = True)
response = query_engine.query(prompt)
print(response)
chat_engine = index.as_chat_engine(verbose = True)
response = chat_engine.chat(prompt)
print(response)



Based on the provided context information, the paper titled "Connectionist simulation of attitude learning: Asymmetries in the acquisition of positive and negative evaluations" by JR Eiser is associated with the following information:

* Title: Connectionist simulation of attitude learning: Asymmetries in the acquisition of positive and negative evaluations
* Author: JR Eiser
* Journal: Journal of Experimental Psychology: Learning, Memory, and Cognition
* Year: 1998
* Volume: 24
* Issue: 3
* Page range: 659-665
* Publication date: 2004
* Publication status: Published
* Record made publicly available: March 23, 2005
* Location: Netherlands
* Language: eng
* Pagination: 659-665
* Keywords: Attitude learning, connectionism, evaluation, positive, negative, asymmetries

The paper discusses the use of connectionist models to simulate the acquisition of positive and negative evaluations, and how these models can help to explain the asymmetries in the acquisition of
[1;3;38;5;200mThought: I n

KeyError: 'tool'