<a href="https://colab.research.google.com/github/SamurAIGPT/LlamaIndex-course/blob/main/fundamentals/Fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LlamaIndex fundamentals

In this lesson we discuss the fundamentals of LlamaIndex and it's core components

We will be discussing the below in this lesson

1. Nodes
2. Document loaders
3. Indexes
4. Retrievers
5. Query Engines

### Nodes

Node is the fundamental unit of LlamaIndex. Node is nothing but a data structure which contains a piece of text.

Whenever you are provided a document, you can split into multiple chunks and store in nodes

### Document Loader

Document Loader in LlamaIndex is an interface to extract data from a source. The source can be a webpage, youtube video, pdf etc.

LlamaIndex supports a bunch of document loaders and we will be studying some of them for our usecase

### Indexes

An index in LlamaIndex is a data structure that organizes and stores information from various data sources, making it easier to search. An index is built over a bunch of nodes

LlamaIndex offers different types of indices which we will be studying in further lessons

### Retrievers

A retriever in LlamaIndex helps fetch a set of Nodes from an index based on a given query. It's like a search tool that finds relevant information from a large dataset to answer your question.

There are different types of retrievers in LlamaIndex which we will be studying in further lessons

### Query Engines

A query engine in LlamaIndex processes user input query, interacts with the underlying data structures (like indexes), and returns a synthesized response.

LlamaIndex offers different types of query engines which we will be studying in further lessons

Let's try to understand all these concepts with the help of an example

### Install LlamaIndex and dependencies


In [None]:
!pip install llama_index langchain

### Download the data to train on. We use state of the union text document to train over ChatGPT


In [None]:
!wget https://raw.githubusercontent.com/hwchase17/chat-your-data/master/state_of_the_union.txt
!mkdir data
!mv state_of_the_union.txt data/

### Load the input text into LlamaIndex input. We can do this using a simple Document loader which we discussed above.

We will be using SimpleDirectoryReader

In [3]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('./data').load_data()

### Split the data into nodes.

As we discussed Node is the fundamental data structure which holds the input. We will take the above loaded input and split into multiple nodes using the code below

In [4]:
from llama_index.node_parser import SimpleNodeParser
parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

### Create an Index

Now that we have the nodes created, we can create an index on top of it. We will be using VectorStoreIndex which creates embeddings from all the text in the nodes and store it in a vector db. More details on embeddings are shared in the first lesson

In [12]:
from llama_index import LLMPredictor, VectorStoreIndex
from langchain import OpenAI
os.environ["OPENAI_API_KEY"] = "api-key"

index = VectorStoreIndex(nodes)

### Create a retriever

We will be using VectorIndexRetriever which retrieves the top k matching documents based on similarity. For this example we will keep k=2

In [13]:
from llama_index.retrievers import VectorIndexRetriever

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

### Create a query engine

Now we can construct a query engine over our retriever to start making queries.

In [14]:
from llama_index.query_engine import RetrieverQueryEngine

query_engine = RetrieverQueryEngine(
    retriever=retriever
)

### Now make a query

In [15]:
response = query_engine.query("What did the author do growing up?")
print(response)


The author grew up in a family where they had to adjust to the rising cost of food, gas, housing, and other expenses. They experienced the struggles of their father leaving home to find work.
