## Data ingestion steps:
1. load the pdf
2. split the pdf

In [2]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader('./files/computer-vision.pdf')

In [3]:
docs = loader.load()
print(docs[0].page_content)

arXiv:2506.03150v1  [cs.CV]  3 Jun 2025
IllumiCraft: Unified Geometry and Illumination
Diffusion for Controllable Video Generation
Yuanze Lin♣ Yi-Wen Chen♮ Yi-Hsuan Tsai♢
Ronald Clark♣ Ming-Hsuan Yang♠♡
♣University of Oxford ♠UC Merced ♮NEC Labs America ♢Atmanity Inc. ♡Google DeepMind
Project Page: https://yuanze-lin.me/IllumiCraft_page
Input  Video          Background                                      Relit  Video  Frames        
“...,  strong lighting”
“..., cool-blue spotlights pierce mist, moody depth”
“..., natural light”
“..., bright white spotlights cut through grey haze, stark contrast”
Figure 1: Given a prompt and input video, IllumiCraft edits scene illumination conditioned on the
static background image. It handles a variety of illumination scenarios, including spotlight effects.
Abstract
Although diffusion-based models can generate high-quality and high-resolution
video sequences from textual or image inputs, they lack explicit integration of geo-
metric cues when contro

In [4]:
print(len(docs))

25


In [5]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)

final_docs = text_splitter.split_documents(docs)
print(len(final_docs))

130


## Create embeddings

In [6]:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small", dimensions=1024)

doc_embeddings = embeddings.embed_query(final_docs[0].page_content)
print(len(doc_embeddings))

1024


## Adding the data to the vector store
We always need to care about 2 things in a vector store:
1. Index
2. similarity_search

In [7]:
import faiss
from langchain_community.vectorstores import FAISS
from langchain_community.docstore.in_memory import InMemoryDocstore

index = faiss.IndexFlatL2(1024)


In [8]:
vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={}
)

In [9]:
add_docs_result = vector_store.add_documents(final_docs)
print(len(add_docs_result))

130


In [10]:
docstore_ids = vector_store.index_to_docstore_id
print(vector_store.index_to_docstore_id.keys())

dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129])


In [11]:
search_results = vector_store.similarity_search("A dark wooden desk", k=3)
print(search_results)

for doc in search_results:
    print(doc)
    print('\n\n')

[Document(id='cab2ea4d-7cbf-400a-84c6-0e57df169268', metadata={'producer': 'pikepdf 8.15.1', 'creator': 'arXiv GenPDF (tex2pdf:74238ef)', 'creationdate': '', 'author': 'Yuanze Lin; Yi-Wen Chen; Yi-Hsuan Tsai; Ronald Clark; Ming-Hsuan Yang', 'doi': 'https://doi.org/10.48550/arXiv.2506.03150', 'license': 'http://arxiv.org/licenses/nonexclusive-distrib/1.0/', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'title': 'IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation', 'trapped': '/False', 'arxivid': 'https://arxiv.org/abs/2506.03150v1', 'source': './files/computer-vision.pdf', 'total_pages': 25, 'page': 21, 'page_label': '22'}, page_content='“Adark wooden desk, …, soft-edged white beams overlapping on a cyan backdrop”\n“A stationary airplane wing, …, crisp bright spotlights, minimal, neutral-gallery atmosphere”\nOurs            Background       Input Video                    Ours          

In [12]:
retriever = vector_store.as_retriever(search_kwargs={"k": 3})

In [13]:
retrieved_results = retriever.invoke("A dark wooden desk")
# print(retrieved_results)

for doc in retrieved_results:
    print(doc)
    print('\n\n')

page_content='“Adark wooden desk, …, soft-edged white beams overlapping on a cyan backdrop”
“A stationary airplane wing, …, crisp bright spotlights, minimal, neutral-gallery atmosphere”
Ours            Background       Input Video                    Ours            Background       Input Video
Figure 13: Visual results of IllumiCraft. Our method produces high-fidelity, prompt-aligned videos
that adapt to diverse lighting conditions, including dramatic spotlight effects.
22' metadata={'producer': 'pikepdf 8.15.1', 'creator': 'arXiv GenPDF (tex2pdf:74238ef)', 'creationdate': '', 'author': 'Yuanze Lin; Yi-Wen Chen; Yi-Hsuan Tsai; Ronald Clark; Ming-Hsuan Yang', 'doi': 'https://doi.org/10.48550/arXiv.2506.03150', 'license': 'http://arxiv.org/licenses/nonexclusive-distrib/1.0/', 'ptex.fullbanner': 'This is pdfTeX, Version 3.141592653-2.6-1.40.25 (TeX Live 2023) kpathsea version 6.3.5', 'title': 'IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation', 'tr

## Data retrieval pipeline
1. User query should retrieve the context from the vector store
2. Update the prompt with the query and context
3. Send the input to the LLM and generate the output.

In [14]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [15]:
from dotenv import load_dotenv
import os
load_dotenv()

True

In [16]:
os.environ['GROQ_API_KEY'] = os.getenv('GROQ_API_KEY')

In [17]:
from langchain_groq import ChatGroq
llm = ChatGroq(model='qwen-qwq-32b')

In [18]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_messages([
    ('system', """
        You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question.
        If you don't know the answer, just say that you don't know.
        Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:
        """
    )
])


In [19]:
rag_chain = (
    { "context": retriever | format_docs, "question": RunnablePassthrough() }
    | prompt
    | llm
    | StrOutputParser()
)

In [20]:
from pprint import pprint
llm_out = rag_chain.invoke("Who is Apoorva Shukla")
pprint(llm_out)


('\n'
 '<think>\n'
 "Okay, let's see. The question is asking about who Apoorva Shukla is. The "
 'user provided some context from academic papers, but looking through the '
 "entries, I don't see any mention of Apoorva Shukla in the listed references. "
 'The authors listed include names like Nam, Lee, Gutierrez, Kim, Zhang, Luan, '
 "Wang, Bala, Snavely, and others, but no Shukla. Maybe there's a typo or "
 "maybe the context provided doesn't include their work. Since none of the "
 "cited papers mention Apoorva Shukla, I can't find any information here. I "
 "should inform the user that the answer isn't available with the given "
 'context.\n'
 '</think>\n'
 '\n'
 "I don't know. The provided context does not mention Apoorva Shukla, and none "
 'of the cited papers or authors listed here include them.')


Catwithhorseinstaple@17