In [87]:
!pip install google-generativeai langchain langchain-google-genai



In [88]:
!pip install langchain-community pypdf
!pip install -qU langchain-chroma



# Loading documents

In [89]:
from langchain_community.document_loaders import PyPDFLoader

file_path='/content/World_War_1_The_Great_War_and_its_Impact_OA_edition.pdf'
loader=PyPDFLoader(file_path)
docs=loader.load()
len(docs)

348

In [90]:
print(f"{docs[0].page_content[:200]}\n")
print(docs[0].metadata)

 
  
 
Aalborg Universitet
World War 1
The Great War and its Impact
Dosenrode, Søren
Creative Commons License
CC BY-NC-ND 4.0
Publication date:
2018
Document Version
Publisher's PDF, also known as Ver

{'source': '/content/World_War_1_The_Great_War_and_its_Impact_OA_edition.pdf', 'page': 0, 'page_label': '1'}


# Splitting

In [91]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_spillter=RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True,
    separators=["\n\n", "\n", "(?<=\. )", " ", ""]
)

all_splits=text_spillter.split_documents(docs)
len(all_splits)

1055

# **Embeddings**

In [92]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

# Initialize embeddings model with API key
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key="AIzaSyDVbDtdb8Lm4LjamGfMI2d52KK2xCeW0Pc")

In [93]:
vector_1 = embeddings.embed_query(all_splits[0].page_content)
vector_2 = embeddings.embed_query(all_splits[1].page_content)

assert len(vector_1) == len(vector_2)
print(f"Generated vectors of length {len(vector_1)}\n")
print(vector_1[:10])

Generated vectors of length 768

[0.015674669295549393, -0.07411897927522659, 0.0036870783660560846, -0.015573435463011265, 0.0753054991364479, 0.04701641947031021, 0.057334691286087036, -0.02546210028231144, 0.022449776530265808, 0.028969120234251022]


In [94]:
from langchain_chroma import Chroma

vector_store=Chroma(embedding_function=embeddings)

Having instantiated our vector store, we can now index the documents.

In [95]:
ids=vector_store.add_documents(documents=all_splits)

# **Usage**

Embeddings typically represent text as a "dense" vector such that texts with similar meanings are geometrically close. This lets us retrieve relevant information just by passing in a question, without knowledge of any specific key-terms used in the document.

Return documents based on similarity to a string query:

In [99]:
results=vector_store.similarity_search(
    "War",
)

print(results[0])

page_content='1914-1919 (pp. 135-163). Cambridge, UK: Cambridge University Press.
Leach, Edmund (2000). “The nature of war.” In S. Hugh-Jones and J. 
Laidlaw (Eds.), The essential Edmund Leach. Vol. 1. Anthropology  
and society (pp. 343-357). New Haven, Ct.: Yale University Press 
[original paper in 1965]
Lebow, Richard N. (2014). “What have we learned from World War I?” 
International Relations, 28, 2, 245-250.
Lewin, Kurt. (1917). „Kriegeslandschaft“. Zeitschrift für angewandte 
Psychologie, 12, 440-447.
Liu, James. H., and Sibley, Chris G. (2015). “Representations of world his-
tory”. In Gordon Sammut, Elena Andreouli, George Gaskell and Jaan 
Valsiner (Eds.), The Cambridge Handbook of Social Representations 
(pp. 269-279), Cambridge, UK: Cambridge University Press.
Macek, Ivana (2000). War within: everyday life in Sarajevo under siege. 
Uppsala Studies in Cultural Anthropology 29. Uppsala: University  
of Uppsala Press.
Marková, Ivana (2012). “Social Representations as anthropolog

Async query:

In [100]:
results = await vector_store.asimilarity_search("What the reason behind this War?")

print(results[0])

page_content='42
bers. Why this massive interest? Two good reasons for the interest in the 
origins of the War might be suggested. The first is the notion that World 
War 1 marked a caesura in history, not only because of the enormous de -
struction caused by the War – including ten million dead soldiers – but 
also because the War set the scene for the catastrophes to come in the next 
three decades: how were such disastrous forces let loose, and why this del -
uge upon Europe? The second notion is the course of events 1914-18. No 
one had foreseen a war of such disastrous dimensions. How could trained 
politicians, diplomats, and generals miscalculate the consequences of their 
decisions? How could so many be so wrong? 
Historians often distinguish between two categories of explanations for the 
causes of World War 1; the first is those related to the events of July 1914. 
During the July Crisis of 1914, politicians and military leaders made deci -' metadata={'page': 45, 'page_label'

Return scores:

In [101]:
# Note that providers implement different scores; the score here
# is a distance metric that varies inversely with similarity.

results = vector_store.similarity_search_with_score("World war")
doc, score = results[0]
print(f"Score: {score}\n")
print(doc)

Score: 0.6034525632858276

page_content='World War 1
The Great War and its Impact
The Great War belongs to one of modern world history’s most important periods, rivaled  
only by the Thirty Years’ War and its peace agreement in Westphalia in 1648. The results  
of the Great War left their impact on the whole world, including:
•	 	the	move	of	economic	power	from	Europe	to	the	United	States	of	America	
•	 	the	construction	of	the	Middle	East	after	the	fall	of	the	Ottoman	Empire
•	 	the	transferal	of	German	colonies	to	the	victorious	powers	via	the	League	  
of Nations
•	 	the	Russian	Revolution,	and
•	 	 	the	developments	in	inter alia Germany smoothing the path for World War 2  
and the Cold War. 
The traces of the War can be followed well into the twenty-first century, too. 
In this book, we will analyze the War itself and trace its impacts. This will be done using  
an interdisciplinary approach: the best way to get a broad understanding of the War as the' metadata={'page': 347, 'page