logging into hugging face so we can access the llama2! (very exciting)

In [24]:
import os
from dotenv import load_dotenv
load_dotenv()

from huggingface_hub import login
login(token=os.getenv("TOKEN"))

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to C:\Users\jerii\.cache\huggingface\token
Login successful


make sure we're using gpu so we can use tensors!

In [2]:
import torch
print(torch.__version__)
print(torch.cuda.is_available())

2.2.2+cu118
True


run the LLM with the prompt template

In [3]:
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core import PromptTemplate

SYSTEM_PROMPT = """You are an AI assistant that answers questions in a friendly manner, based on the given source documents. Here are some rules you always follow:
- Generate human readable output, avoid creating output with gibberish text.
- Generate only the requested output, don't include any other language before or after the requested output.
- Never say thank you, that you are happy to help, that you are an AI agent, etc. Just answer directly.
- Generate professional language typically used in business documents in North America.
- Never generate offensive or foul language.
"""

model = "meta-llama/Llama-2-7b-chat-hf"
query_wrapper_prompt = PromptTemplate(
    "[INST]<<SYS>>\n" + SYSTEM_PROMPT + "<</SYS>>\n\n{query_str}[/INST] "
)

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.0, "do_sample": False},
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name=model,
    model_name=model,
    device_map="auto",
    # change these settings below depending on your GPU
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
model.safetensors.index.json: 100%|██████████| 26.8k/26.8k [00:00<00:00, 26.8MB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
model-00001-of-00002.safetensors: 100%|██████████| 9.98G/9.98G [04:24<00:00, 37.8MB/s]
model-00002-of-00002.safetensors: 100%|██████████| 3.50G/3.50G [01:28<00:00, 39.4MB/s]
Downloading shards: 100%|██████████| 2/2 [05:53<00:00, 176.99s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:10<00:00,  5.05s/it]
generation_config.json: 100%|██████████| 188/188 [00:00<00:00, 188kB/s]
tokenizer_config.json: 100%|██████████| 1.62k/1.62k [00:00<?, ?B/s]
token

### loading documents

loading documents using `SimpleDirectoryReader` ([docs](https://docs.llamaindex.ai/en/stable/examples/data_connectors/simple_directory_reader/))

this is kinda all that is needed bro

In [4]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_dir="./data")

check if we have the file lol. i'm using dnd 5e which is 180 pages, so the `len(docs)` should return 180. :D

In [6]:
dnd_docs = reader.load_data()
len(dnd_docs)

180

need to get embeddeding before indexing

In [8]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

modules.json: 100%|██████████| 349/349 [00:00<00:00, 348kB/s]
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
config_sentence_transformers.json: 100%|██████████| 124/124 [00:00<?, ?B/s] 
README.md: 100%|██████████| 94.8k/94.8k [00:00<00:00, 917kB/s]
sentence_bert_config.json: 100%|██████████| 52.0/52.0 [00:00<?, ?B/s]
config.json: 100%|██████████| 743/743 [00:00<?, ?B/s] 
model.safetensors: 100%|██████████| 133M/133M [00:04<00:00, 27.8MB/s] 
tokenizer_config.json: 100%|██████████| 366/366 [00:00<?, ?B/s] 
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 2.35MB/s]
tokenizer.json: 100%|██████████| 711k/711k [00:00<00:00, 4.42MB/s]
special_tokens_map.json: 100%|██████████| 125/125 [00:00<?, ?B/s] 
1_Pooling/config.json: 100%|██████████| 190/190 [00:00<00:00, 189kB/s]


we use `Settings` for LlamaIndex to set our LLM and embed model. *(look into this!)* we need embedding to index?

In [9]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model

### indexing

In [10]:
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(dnd_docs, show_progress=True)

Parsing nodes: 100%|██████████| 180/180 [00:00<00:00, 241.30it/s]
Generating embeddings: 100%|██████████| 320/320 [00:04<00:00, 68.89it/s]


### query

query the model.

In [11]:
query_engine = index.as_query_engine()

In [22]:
response = query_engine.query("What are the classes and subclasses a player can play as?")

In [23]:
print(response)

Based on the provided context and rules, a player can play as the following classes and subclasses:

1. Cleric: A priestly champion who wields divine magic in service of a higher power. Subclasses include:
	* Channel Divinity: A cleric who can channel divine energy directly from their deity.
	* Life: A cleric who specializes in healing and restoration magic.
	* Light: A cleric who specializes in protective and beneficial magic.
2. Fighter: A master of martial combat, skilled with a variety of weapons and armor. Subclasses include:
	* Battle Master: A fighter who excels in combat maneuvers and tactical thinking.
	* Frontline Fighter: A fighter who specializes in melee combat and taking damage for their allies.
	* Shield Master: A fighter who specializes in defensive magic and shields.
3. Rogue: A scoundrel who uses stealth and trickery to overcome obstacles and enemies. Subclasses include:
	* Assassin: A rogue who specializes in killing and stealth.
	* Th
