# Indexes

Indexes offer a way to structure documents so that LLMs can interact with them. The most common way of using them is for “retrieval”: based on a user query retrieve the documents that are associated with that query so that they can be passed as a context to the LLM.

LangChain primary focuses on constructing indexes with the goal of using them as a Retriever.

In [None]:
!pip install chromadb

Question answering over documents consists of four steps:

- Create an index
- Create a Retriever from that index
- Create a question answering chain
- Ask questions!

In [3]:
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator


loader = TextLoader('state_of_the_union.txt', encoding='utf8')
index = VectorstoreIndexCreator().from_loaders([loader])

Using embedded DuckDB without persistence: data will be transient


In [6]:
query = "What did the president say about Ketanji Brown Jackson"
index.query(query)

" The president said that Ketanji Brown Jackson is one of the nation's top legal minds, a consensus builder, and has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans."

In [7]:
query = "What did the president say about Ketanji Brown Jackson"
index.query_with_sources(query)

{'question': 'What did the president say about Ketanji Brown Jackson',
 'answer': " The president said that he nominated Circuit Court of Appeals Judge Ketanji Brown Jackson, one of the nation's top legal minds, to continue Justice Breyer's legacy of excellence, and that she has received a broad range of support from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.\n",
 'sources': 'state_of_the_union.txt'}

Okay, so what’s actually going on? How is this index getting created?
A lot of the magic is being hid in this VectorstoreIndexCreator. What is this doing?
There are three main steps going on after the documents are loaded:

- Splitting documents into chunks
- Creating embeddings for each document
- Storing documents and embeddings in a vectorstore

In [9]:
index.query("How is Zelensky related to the speech in this document?")

" President Zelenskyy is mentioned in the speech as an example of the Ukrainian people's courage and determination in the face of Russian aggression."

In [10]:
index.query("What did Zelenskyy say?")

' Zelenskyy said that "Light will win over darkness" and that Putin may circle Kyiv with tanks, but he will never gain the hearts and souls of the Ukrainian people. He will never extinguish their love of freedom. He will never weaken the resolve of the free world.'