## Ingesting PDF

In [1]:

# %pip install --q unstructured langchain
# %pip install --q "unstructured[all-docs]"

In [2]:
from langchain_community.document_loaders import UnstructuredPDFLoader
from langchain_community.document_loaders import OnlinePDFLoader

In [3]:
local_path = 'blade runner 2049.pdf'

# Local PDF file uploads
if local_path:
  loader = UnstructuredPDFLoader(file_path=local_path)
  data = loader.load()
else:
  print("Upload a PDF file")

detectron2 is not installed. Cannot use the hi_res partitioning strategy. Falling back to partitioning with the fast strategy.


In [4]:
# Preview first page
data[0].page_content



## Vector Embeddings

In [5]:
# !ollama pull nomic-embed-text

In [6]:
!ollama list

NAME         	ID          	SIZE  	MODIFIED    
llama2:latest	78e26419b446	3.8 GB	3 weeks ago	


In [7]:
# %pip install --q chromadb
# %pip install --q langchain-text-splitters

In [8]:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

In [9]:
# Split and chunk 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)

In [10]:
# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="llama2",show_progress=True),
    collection_name="local-rag"
)

OllamaEmbeddings: 100%|██████████| 20/20 [02:17<00:00,  6.86s/it]


In [18]:
# import tensorflow as tf

# # Check if GPU is available
# if tf.config.list_physical_devices('GPU'):
#     print("GPU is available")
# else:
#     print("GPU is NOT available")


## Retrieval

In [12]:
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

In [13]:
# LLM from Ollama
local_model = "llama2"
llm = ChatOllama(model=local_model)

In [14]:
QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate relevant answers to the user question by retrieving documents from a vector database. Your goal is to ensure clarity, coherence, and fidelity to the original sources. Provide an accurate answer to the following question:

Original question: {question}""",
)


In [15]:
retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [16]:
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [17]:
chain.invoke(input(""))

 Explain the theme of the movie?


OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.98s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.41s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:03<00:00,  3.01s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.42s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:03<00:00,  3.01s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.40s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.92s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.44s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.96s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.45s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.92s/it]


'The movie "Blade Runner 2049" explores several themes, including:\n\n1. Humanity and what it means to be alive: The film raises questions about what makes us human, whether it\'s our emotions, memories, or experiences. Joi, the AI hologram girlfriend, is programmed to emulate human emotions and behaviors, but she is not truly alive. K, the blade runner, is tasked with "retiring" advanced androids (replicants) that have developed emotions and a sense of self, which challenges his understanding of what it means to be human.\n2. Identity and self-discovery: The movie delves into the concept of identity and how it\'s shaped by memories, experiences, and societal expectations. K discovers that he may not be a "real" blade runner, but rather a replicant himself, which raises questions about his own identity and purpose in life.\n3. Memory and history: The film explores the role of memory in shaping our understanding of ourselves and our world. Joi is designed to remember and recreate K\'s m

In [19]:
chain.invoke(input(""))

 Who are the characters? 


OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.90s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.42s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.96s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.42s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.96s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.84s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.79s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.79s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.84s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.79s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.78s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.80s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.79s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.78s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [00:02<00:00,  2.43s/it]
OllamaEmbeddings: 100%|██████████| 1/1 [

'Answer: The characters in Blade Runner 2049 are:\n\n1. K (played by Ryan Gosling): A former blade runner who unearths a long-buried secret that could plunge society into chaos.\n2. Rick Deckard (played by Harrison Ford): A retired blade runner who is brought back to the force to track down K and help him find the missing replicant.\n3. Niander Wallace (played by Jared Leto): The CEO of the megacorporation that created the replicants, who has a hidden agenda for his creation.\n4. Luv (played by Sylvia Hoeks): A powerful and mysterious replicant who works for Wallace and is determined to eliminate K.\n5. Joshi (played by David Dastmalchian): A bumbling and awkward bioengineer who helps K in his quest.\n6. The Voice (played by Carl Lumbly): A mysterious figure who provides guidance and information to K throughout the film.'

In [41]:
# Delete all collections in the db
vector_db.delete_collection()