## Basics of LLamaIndex

> https://gpt-index.readthedocs.io/en/stable/getting_started/starter_example.html

---

In [1]:
import sys
sys.path.append("/Users/shaunaksen/Documents/personal-projects/Natural-Language-Processing/LLM Concepts/mini_projects/chatgpt_clone")

In [2]:
import logging
import os
from llama_index.llms import AzureOpenAI
from llama_index.embeddings import AzureOpenAIEmbedding
from llama_index import (
    VectorStoreIndex, SimpleDirectoryReader, ServiceContext,
    set_global_service_context, StorageContext, load_index_from_storage,
    Document
)
from creds import AZURE_API_BASE, AZURE_API_KEY, AZURE_API_VERSION

In [3]:
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [4]:
llm = AzureOpenAI(
    deployment_name='gpt-4-32k',
    model='gpt-4-32k',
    api_key=AZURE_API_KEY,
    azure_endpoint=AZURE_API_BASE,
    api_version=AZURE_API_VERSION,
)

# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key=AZURE_API_KEY,
    azure_endpoint=AZURE_API_BASE,
    api_version=AZURE_API_VERSION,
)

In [5]:
service_context = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
)

set_global_service_context(service_context)

In [6]:
# non-streaming
resp = llm.complete("Paul Graham is ")
print(resp)

INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/gpt-4-32k/chat/completions?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
an English-born computer scientist, entrepreneur, venture capitalist, author, and essayist. He is best known as the co-founder of the influential startup accelerator and seed capital firm Y Combinator, which has funded and mentored numerous successful startups, including Dropbox, Airbnb, and Reddit.

Graham was born in Weymouth, England, in 1964, and moved to the United States with his family when he was a child. He studied at Cornell University, where he earned a Bachelor of Arts in Philosophy and a Bachelor of Science in Computer Science. He later earned a Master of Science in Computer Science from Harvard University and a Ph.D. in Applied Sciences, also from Harvard.

In the 1990s, Graham co-founded Viaweb, an early e-commerce platform that allowed users to create online stores. Viaweb was acquired by Yahoo! in 1998 and becam

In [7]:
documents = SimpleDirectoryReader(input_dir="./data/").load_data(show_progress=True)

Loading files: 100%|██████████| 1/1 [00:00<00:00, 346.87file/s]


In [8]:
documents[0].metadata

{'file_path': 'data/paul_graham_essay.txt',
 'file_name': 'paul_graham_essay.txt',
 'file_type': 'text/plain',
 'file_size': 75042,
 'creation_date': '2023-11-24',
 'last_modified_date': '2023-11-24',
 'last_accessed_date': '2023-11-24'}

In [9]:
index = VectorStoreIndex.from_documents(documents=documents)

INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"


This builds an index over the documents in the `data`` folder 
(which in this case just consists of the essay text, but could contain many documents).

In [10]:
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/gpt-4-32k/chat/completions?api-version=2023-03-15-preview "HTTP/1.1 200 OK"


In [11]:
print (response)

Growing up, the author mainly worked on writing and programming outside of school. They wrote short stories and experimented with programming on the IBM 1401 using an early version of Fortran. Later, they started programming on a TRS-80, creating simple games, a program to predict model rocket heights, and a word processor.


By default, the data you just loaded is stored in memory as a series of vector embeddings. You can save time (and requests to OpenAI) by saving the embeddings to disk. That can be done with this line:

In [12]:
index.storage_context.persist()

Of course, you don’t get the benefits of persisting unless you load the data.

In [13]:
if not os.path.exists("storage"):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    index.storage_context.persist()

else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context)

INFO:llama_index.indices.loading:Loading all indices.


In [None]:
# either way we can now query the index
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

## High Level Concepts

> https://gpt-index.readthedocs.io/en/stable/getting_started/concepts.html

---

### Retrieval Augmented Generation (RAG)

LLMs are trained on enormous bodies of data but they aren’t trained on your data. Retrieval-Augmented Generation (RAG) solves this problem by adding your data to the data LLMs already have access to. You will see references to RAG frequently in this documentation.

In RAG, your data is loaded and prepared for queries or “indexed”. User queries act on the index, which filters your data down to the most relevant context. This context and your query then go to the LLM along with a prompt, and the LLM provides a response.

Even if what you’re building is a chatbot or an agent, you’ll want to know RAG techniques for getting data into your application.

![](https://gpt-index.readthedocs.io/en/stable/_images/basic_rag.png)

#### Stages within RAG

There are five key stages within RAG, which in turn will be a part of any larger application you build. These are:

- Loading: this refers to getting your data from where it lives – whether it’s text files, PDFs, another website, a database, or an API – into your pipeline. LlamaHub provides hundreds of connectors to choose from.

- Indexing: this means creating a data structure that allows for querying the data. For LLMs this nearly always means creating vector embeddings, numerical representations of the meaning of your data, as well as numerous other metadata strategies to make it easy to accurately find contextually relevant data.

- Storing: once your data is indexed you will almost always want to store your index, as well as other metadata, to avoid having to re-index it.

- Querying: for any given indexing strategy there are many ways you can utilize LLMs and LlamaIndex data structures to query, including sub-queries, multi-step queries and hybrid strategies.

- Evaluation: a critical step in any pipeline is checking how effective it is relative to other strategies, or when you make changes. Evaluation provides objective measures of how accurate, faithful and fast your responses to queries are.

![](https://gpt-index.readthedocs.io/en/stable/_images/stages.png)

### Loading Stage

__Documents/Nodes__:

> https://gpt-index.readthedocs.io/en/stable/module_guides/loading/documents_and_nodes/root.html
---

A Document is a generic container around any data source - for instance, a PDF, an API output, or retrieved data from a database. They can be __constructed manually__, or __created automatically via our data loaders__. By default, a Document stores text along with some other attributes. Some of these are listed below.

- metadata - a dictionary of annotations that can be appended to the text.
- relationships - a dictionary containing relationships to other Documents/Nodes.


A Node represents a “chunk” of a source Document, whether that is a text chunk, an image, or other. Similar to Documents, they contain metadata and relationship information with other nodes.


Nodes are a first-class citizen in LlamaIndex. You can choose to define Nodes and all its attributes directly. You may also choose to “parse” source Documents into Nodes through our NodeParser classes. By default every Node derived from a Document will inherit the same metadata from that Document (e.g. a “file_name” filed in the Document is propagated to every Node).

__Connectors:__

 A data connector (often called a Reader) ingests data from different data sources and data formats into Documents and Nodes.



In [14]:
text_list = ["hi my name is mini", "i am 28 yrs old", "i ate tender cocounut ice cream today. i also like coconut water"]
documents = [Document(text=t) for t in text_list]

In [15]:
index = VectorStoreIndex.from_documents(documents)

INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"


In [16]:
query_engine = index.as_query_engine()
response = query_engine.query("What did the mini eat?")


INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/gpt-4-32k/chat/completions?api-version=2023-03-15-preview "HTTP/1.1 200 OK"


In [17]:
from llama_index.node_parser import SentenceSplitter

In [18]:
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents=documents)

In [19]:
for node_ in nodes:
    print (node_.text)

hi my name is mini
i am 28 yrs old
i ate tender cocounut ice cream today. i also like coconut water


In [20]:
index = VectorStoreIndex(nodes)
query_engine = index.as_query_engine()
response = query_engine.query("What did the mini eat?")

INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-03-15-preview "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://ml-dev-3.openai.azure.com//openai/deployments/gpt-4-32k/chat/completions?api-version=2023-03-15-preview "HTTP/1.1 200 OK"


In [21]:
print (response)

Mini ate tender coconut ice cream today.


### Indexing Stage

An `Index`` is a __data structure that allows us to quickly retrieve relevant context for a user query__. For LlamaIndex, it’s the core foundation for retrieval-augmented generation (RAG) use-cases.

