 # LangChain + LLM + ChromaDB

In [1]:
# SETUP

!pip3 install transformers
!pip3 install einops
!pip3 install accelerate
!pip3 install unstructured-pytesseract
!pip3 install unstructured-inference
!pip3 install sentence_transformers
!pip3 install chromadb

#!pip3 install protobuf==3.20.*

!pip3 install langchain



In [2]:
!pip3 install unstructured
!pip3 install pillow_heif
#!pip3 install cmake
!pip3 install pikepdf pypdf
#!pip3 install python-poppler

Collecting pikepdf
  Downloading pikepdf-8.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pypdf
  Downloading pypdf-4.2.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m36.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pypdf, pikepdf
Successfully installed pikepdf-8.15.0 pypdf-4.2.0


In [3]:
# IMPORTS

from transformers import AutoTokenizer
import transformers
import torch

import langchain

langchain.__version__

'0.1.16'

## LLM

In [4]:
# LLM using HuggingFace GPT2

from langchain import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    model_kwargs={
        "max_length": 1024,
        'do_sample': True,
        'top_k': 10,
        'num_return_sequences': 2,
        #'device_map': 'auto',
        'trust_remote_code': True,
        'torch_dtype': torch.bfloat16
    },
    pipeline_kwargs={"max_new_tokens": 50},
    device=0, # With GPU
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


## DOCUMENTS & SPLITTER & VECTORSTORE

In [5]:
# INDEXING

from langchain.document_loaders import OnlinePDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain

# 1. LOAD
loader = OnlinePDFLoader("https://dn720004.ca.archive.org/0/items/bloodborne-collectors-edition-guide/BLOODBORNE%20Collector%E2%80%99s%20Edition%20Guide_text.pdf")
document = loader.load()

# 2. SPLIT
text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=64)
documents = text_splitter.split_documents(document)

# 3. EMBED & STORE
embeddings = HuggingFaceEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# RETRIEVAL & GENERATION
qa = ConversationalRetrievalChain.from_llm(
    llm,
    vectorstore.as_retriever(),
    return_source_documents=True,
)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

## QUESTION TO LLM = MY_QUERY + CONTEXT_from_VECTORSTORE -> ANSWER FROM LLM

In [6]:
# QUESTION TO LLM

import warnings
warnings.filterwarnings('ignore')

chat_history = []

# Examples: QUERYS PROMPTS:

my_query ="Hot to get ludwig's holy sword?"

result = qa({"question": my_query, "chat_history": chat_history})
print(result["answer"])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Normal Mode

As mentioned in the Kirkhammer section, the normal mode of Ludwig's Holy Blade is fast but weak. One.thing to consider about the (R1 chain is that the fourth attack is a thrust. This has a couple of interesting implications. First, it's good damage against enemies that are weak against thrusts. Second, it can miss quite easily against some enemies, particularly large humanoid foes. If you fight with- out locking-on to the enemy, you'll need to aim that particular attack very carefully.

You'll find this — on a corpse in the Orphanage of the Msc Ca- thedral Ward p142. Although you won't be able to reach it until very late, picking it up allows you to purchase Twin Bloodstone Shards from the shop. This will allow you to experiment with a wider variety of weapons. The Sedatives it unlocks are also useful, as any hu

In [7]:
# QUESTION TO LLM

import warnings
warnings.filterwarnings('ignore')

chat_history = []

# Examples: QUERYS PROMPTS:

my_query ="Weakness from father gascoigne"

result = qa({"question": my_query, "chat_history": chat_history})
print(result["answer"])


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Basic Information | Special Weakness: Righteous |

Forsaken Cainhurst Castle

Forsaken Cainhurst Castle | Strong

908

922

^ Item Drops

Boason Caka si” Twin Bloodstone Shards x1 [28%], Bloodstone Chunk x1 [1.526]

Twin Bloodstone Shards x1 [28%],

REDONE: ee teo ee o [me 0 | e [o [v o | — | - BID...

Y |

wit

|

=

203

| eg

he E tee

Chalice Dungeon Base Level

282

| Random Nourishing gem based on area x1 [896], Twin Bloodstone Shards | 165 x1 [18%], Bloodstone Chunk x1 [1.526]

|

76

120 | 180

|

"".

Basic Information

| Item Drops

Oedon Writhe x1 [100%] One Third of Umbilical Cord [10096]*

*Blood Moon only

514

BLOODBORNE

COLLECTOR'S EDITION GUIDE

|

|

Basic Information Special Weakness: Righteous

Cainhurst Castle

^A >

mi 1

ie

» & ^

r

- aninnnr | Fact aay CDIOUOOLI A JL ECCI!

=o,

Ea |e

4]

ra

en P