In [None]:
%%capture
!pip install llama-index==0.10.25 llama-index-embeddings-cohere llama-index-llms-cohere qdrant-client llama-index-vector-stores-qdrant 

In [None]:
import os
from getpass import getpass

In [None]:
CO_API_KEY = os.environ['CO_API_KEY'] or getpass("Enter your Cohere API key: ")

# 🗄️ Storing

Loading and indexing data costs time and money.

By default, indexed data is stored in memory. But, you can store your data to avoid the time and costs associated with re-indexing them.  The simplest way to do this **persisting to disk**.

Each `Index` object has a `.persist()` method, which will write all the data to disk at the specified location.

In [None]:
from pathlib import Path

def create_directory(directory_name):
    path = Path(directory_name)
    path.mkdir(parents=True, exist_ok=True)
    print(f"Directory '{directory_name}' created successfully.")

create_directory("data")

create_directory("persisted_storage")

In [None]:
!wget -P data https://www.gutenberg.org/cache/epub/10763/pg10763.txt

Now that we've dowloaded data, let's:

1) Load as Document
2) Parse as Nodes
3) Create index

In [None]:
# load as document
from llama_index.core import SimpleDirectoryReader

file_path = "data/pg10763.txt"

document = SimpleDirectoryReader(input_files=[file_path], filename_as_id=True).load_data()

In [None]:
# parse as nodes
from llama_index.core.node_parser import SentenceSplitter

sentence_splitter = SentenceSplitter(
    chunk_size=512, 
    chunk_overlap=16,
    paragraph_separator="\n\n\n\n" 
)

nodes = sentence_splitter.get_nodes_from_documents(document)

In [None]:
# create index
from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.core import  VectorStoreIndex

embed_model = CohereEmbedding(model_name="embed-english-v3.0")

index = VectorStoreIndex(nodes, embed_model=embed_model)

Use the `.persist()` method of the index to store the indexed data to disk. Now you can avoid re-loading and re-indexing your data by loading the persisted index.

In [None]:
index.storage_context.persist(persist_dir="persisted_storage")

# 🗃️ Storege Context

`StorageContext` in `LlamaIndex` is a core abstraction that revolves around the storage of `Nodes`, indices, and vectors. 

It is a utility container that includes the following:

 - `docstore`: A [`BaseDocumentStore`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/docstore/types.py) for storing nodes.

 - `index_store`: A [`BaseIndexStore`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/storage/index_store/types.py#L13) for storing indices.

 - `vector_store`: A [`VectorStore`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/vector_stores/simple.py) for storing vectors.

 - `graph_store`: A [`GraphStore`](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/graph_stores/simple.py) for storing knowledge graphs.

The `StorageContext` can be created with default settings, comprising a document store, index store, vector store, and graph store. It enables storing the context in a specific directory, facilitating data storage and retrieval from disk to save time and indexing expenses.

In [None]:
from llama_index.core import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="persisted_storage")

# ☁️ Using a Vector Database

We'll use qdrant as our vector database of choice throughout this course.

To use qdrant to store embeddings from the `VectorStoreIndex`, you need to:

- Initialize the qdrant client

- Create a `Collection` to store your data in qdrant

- Assign qdrant as the `vector_store` in a `StorageContext`

- Initialize your `VectorStoreIndex` using that `StorageContext`

Below, we initialize a `QdrantClient` for interacting with qdrant, an open-source vector store. 


In [None]:
import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore

create_directory("persisted_storage/qdrant")

# initialize qdrant client
client = qdrant_client.QdrantClient(
    path="persisted_storage/qdrant"
)

vector_store = QdrantVectorStore(
    client=client, 
    collection_name="it_can_be_done",
    prefer_grpc=True
)

# assign qdrant vector store to storage context
storage_context = StorageContext.from_defaults(
    vector_store=vector_store)

# create the index
index = VectorStoreIndex(
    nodes,
    embed_model=embed_model,
    storage_context=storage_context
)

# You can persist the `qdrant` index to disk for future use
index.storage_context.persist(persist_dir="persisted_storage/qdrant")

# 🪃 Retrieval

A `Retriever` is an interface exposed by the `Index`. An `Index` with its `Retriever` is used for storing and fetching data. The `Retriever` is a part of the `Index` and is used to retrieve the data stored in the Index.


### LlamaIndex provides [many different types of retrievers](https://github.com/run-llama/llama_index/tree/main/llama-index-core/llama_index/core/retrievers) to fetch relevant information from ingested data based on a given query. 

Some examples include

### Vector Retriever

The vector retriever uses vector similarity search to find the most relevant nodes (chunks of text) based on the query embedding. It requires a vector database like to store and search through the node embeddings.

### [Fusion Retriever](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/retrievers/fusion_retriever.py)

The fusion retriever generates multiple queries from the original query, performs retrieval over an ensemble of retrievers for each query, and then fuses and reranks the results across all queries. This aims to better capture the query intent through query rewriting and ensembling.

### [Recursive Retriever](https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/retrievers/recursive_retriever.py)

The recursive retriever allows for hierarchical retrieval by first retrieving coarse nodes and then recursively retrieving finer-grained nodes within those coarse nodes. This can be useful for multi-level indexing and retrieval.

You can also combine retrievers in interesting ways and build out more advanced retrieval strategies, as we will see later in this course.


### In the example here, we're using a Vector Retriever

 - 🔍 When searching, your query is also converted into a vector embedding. 
 
- 🗂️ The `VectorStoreIndex` then performs a mathematical operation to rank embeddings based on semantic similarity to your query.

- 🔝 Top-k semantic retrieval is the simplest wasy to query a vector index.

- ⩬ You can also apply a similarity threshold  (e.g., only return results that are more similar than some value)


In [None]:
retirever = index.as_retriever(
    similarity_top_k=5,
    similarity_threshold=0.75)

In [None]:
retirever.retrieve("What lessons can be learned from the poems about success?")

But, chances are you don't just want the returned documents. You want the documents to be synthesized into a response. 

So, let's build on this pattern in the next lesson and see how we can get a response based on those retrieved documents.

In [None]:
# close the client so you're not locked out of the index
client.close()