In [40]:
%%capture
%pip install llama-index llama-index-embeddings-cohere--0.2.0

In [41]:
import os

from getpass import getpass
import nest_asyncio

from dotenv import load_dotenv

nest_asyncio.apply()

load_dotenv()

True

In [3]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

# 🗂️ Indexing

An `Index` is a data structure that allows for the quick retrieval of relevant context for a user query. 

It is the core foundation for retrieval-augmented generation (RAG) use-cases. Indexes are built from `Documents` and are used to build Retrievers, Query Engines and Chat Engines. All of which enable question & answer and chat over your data.

- 📂 After loading your data, you're ready to construct an `Index`.

- 🌐 **Vector Store Index:** The most common Index type. It segments your `Documents` into `Nodes` and generates vector embeddings for each node's text, prepping them for LLM queries.

- 🔄 **Vector Store Index Process:** Parse raw texts into document objects, split document objects into chunks/nodes, then convert all your nodes into embeddings and store them in a vector database.

### ⚙️ Embedding Text

First, let's see what an embedding is.


In [4]:
from llama_index.embeddings.cohere import CohereEmbedding

embed_v3 = CohereEmbedding(model_name="embed-english-v3.0")

embed_v3_light = CohereEmbedding(model_name="embed-english-light-v3.0")

embed_v2 = CohereEmbedding(model_name="embed-english-v2.0") 

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /opt/conda/envs/lil_llama_index/lib/python3.10/site-
[nltk_data]     packages/llama_index/core/_static/nltk_cache...
[nltk_data]   Package punkt_tab is already up-to-date!


#### You can also use local embedding models, by using an embedding model from Hugging Face. Check the [MTEB Leaderboard for what's hot](huggingface.co/spaces/mteb/leaderboard)

```python

pip install llama-index-embeddings-huggingface

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

hf_embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
```

#### If you're running locally and on a CPU, though, you may want to use `FastEmbed`. These models are lightweight, quantized, and optimized for CPU. Here are the [supported models](https://qdrant.github.io/fastembed/examples/Supported_Models/)

This is how you can instantiate a `FastEmbed` model:

```python
pip install llama-index-embeddings-fastembed

from llama_index.embeddings.fastembed import FastEmbedEmbedding

embed_model = FastEmbedEmbedding(model_name="BAAI/bge-large-en-v1.5-quantized")
```

In [5]:
string = "A"

string_2 = "This is a complete sentence."

string_3 = """In the pursuit of a life well-lived, one must recognize the transient nature of the 
material world and the enduring value of virtue. The Sikh Gurus taught us that the Divine Light 
resides within all, and thus, we are united in our essence beyond the superficial distinctions of 
caste, creed, or status. Similarly, the Stoics emphasized the cultivation of inner virtues such as courage, 
temperance, and wisdom, understanding that true freedom lies in mastery over one's own perceptions and actions. 
As we navigate the vicissitudes of life, let us remember that our choices are our own, and in choosing virtue, 
we align ourselves with the cosmic order and the teachings of the Gurus. It is through selfless service, 
compassion, and the relentless pursuit of truth that we may attain a state of inner peace and contribute 
to the harmony of the world, embodying the principles of both Sikhism and Stoicism in our daily lives
"""

In [6]:
example_embedding = embed_v3.get_text_embedding(string)

In [7]:
len(example_embedding)

1024

In [8]:
def get_embedding_dimensions(embed_model, list_of_strings):
    embeddings = embed_model.get_text_embedding_batch(list_of_strings)   
    embed_lens = []
    for embedding in embeddings:
        embed_lens.append(len(embedding))
    return embed_lens

In [9]:
get_embedding_dimensions(embed_v3, [string, string_2, string_3])

[1024, 1024, 1024]

In [10]:
get_embedding_dimensions(embed_v3_light, [string, string_2, string_3])

[384, 384, 384]

In [11]:
get_embedding_dimensions(embed_v2, [string, string_2, string_3])

[4096, 4096, 4096]

In [12]:
embed_v3.similarity(
    embed_v3.get_text_embedding("""In embracing both the wisdom of the Sikh Gurus and the Stoic philosophers, 
                              we find a path to tranquility by accepting what is beyond our control and focusing 
                              our efforts on living virtuously and with purpose."""), 
    embed_v3.get_text_embedding(string_2),
    mode="cosine"
    )

0.18940321498701687

# Create an Index

First, let's get some data

In [13]:
import requests

def load_text_from_url(url: str) -> str:
    """
    Fetches and returns the text content from the specified URL.

    Parameters:
    - url: The URL of the text file to fetch.

    Returns:
    - The text content of the file if the request is successful; otherwise, an error message.
    """
    try:
        response = requests.get(url)
        response.raise_for_status()  # This will raise an HTTPError if the response was an error
        return response.text
    except requests.RequestException as e:
        return f"Failed to load content from {url}. Error: {e}"

url = "https://www.gutenberg.org/files/10763/10763.txt"

text_content = load_text_from_url(url)

⏳ Generating embeddings can be time-consuming, especially with large volumes of text, due to numerous API calls required. 

Now, create an index by passing a **list of Documents**. To save time, and cost, we will only use 10,000 characters of the document

In [44]:
from llama_index.core import Document, VectorStoreIndex

full_document = Document(text=text_content)

partial_document = Document(text=text_content[50000:60000])

The `VectorStoreIndex` in LlamaIndex can be created in two ways: `from_documents` and `from_vector_store`.

- `from_documents`: when you have a set of documents that you want to index. This method takes these documents, computes their embeddings, and stores them in the vector store. 

- `from_vector_store`: when you already have computed embeddings that are stored in an external vector store (like Qdrant). This method connects to the external vector store and uses the pre-computed embeddings for the index. 



In [45]:
index = VectorStoreIndex.from_documents(
    # remember, you must pass a list of documents!
    [partial_document], 
    embed_model=embed_v3,
    show_progress=True)

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

ValueError: Cannot build index from nodes with no content. Please ensure all nodes have content.

Note, you can also build an index over a **list of `Node` objects**.


In [46]:
from llama_index.core.node_parser import SentenceSplitter

# instantiate a node parser
splitter = SentenceSplitter(
    chunk_size=512,
    chunk_overlap=16,
    paragraph_separator="\n\n\n\n",
)

# pass a list of documents to the node paraser
nodes = splitter.get_nodes_from_documents([partial_document])

# create the index from the nodes
index_from_nodes = VectorStoreIndex(
    nodes,
    embed_model=embed_v3,
    show_progress=True
    )

ValueError: Cannot build index from nodes with no content. Please ensure all nodes have content.

Let's build on this pattern in the next lesson, where we'll store and persist our index for future use.