## Mistral-7B Retrieval Augmented Generation (RAG) ⚙️ 🗃️

As the applications of Large Language Models (LLMs) continue to grow, companies and users are increasingly seeking out ways to understand and extract value from their proprietary data by using LLMs. However, security and privacy are serious concerns that have made companies reluctant to expose their sensitive proprietary data to external models. 

There are two ways this can be addressed. By building LLMs from scratch or fune-tuning open source LLMs on the proprietary data, which can be boht expensive and time consuming. Another option, is to build a RAG framework.

Simply put RAG allows users query a data or data source to receive relevant response. 
RAG frameworks, powered by large language models (LLM), take a data or data source, generate embeddings from the data, store the embeddings in a vector database, perform similarity search on query embeddings across the vector database to find relevant chunks, and then send the query embeddings and relevant chunks to the LLM, which generates a response.

In [1]:
!nvidia-smi

Wed Jan 17 22:10:32 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA TITAN RTX               Off | 00000000:00:05.0 Off |                  N/A |
| 41%   40C    P8              17W / 280W |   9121MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

#### 1. Import packages

- 🦙 `llama-index` is a framework for fast retrieval and querying of data

- 🗄️ `qdrant` is a vector database and vector similarity search engine for storing, searching and managing embeddings

In [2]:
# Import Modules
from llama_index.llms import Ollama
import qdrant_client
from pathlib import Path
from llama_index import download_loader
from llama_index import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

#### 2. Loading the data and Initializing the service context

In [3]:
# reading and loading the data
UnstructuredReader = download_loader("UnstructuredReader")
loader = UnstructuredReader()
docs = loader.load_data(file=Path('../data/data.txt'))

[nltk_data] Downloading package punkt to /home/ubuntu/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/ubuntu/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


In [4]:
# path to store the data
client = qdrant_client.QdrantClient(path="../data/qdrant_data")

# name of the collection
vector_store = QdrantVectorStore(client=client, collection_name="mistral_data")

# context responsible for storing the nodes, indices, and vectors
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [5]:
# Initializing Ollama and ServiceContext
llm = Ollama(model="mistral")
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local") # model is located in local machine

  from .autonotebook import tqdm as notebook_tqdm


In [6]:
# Creating the VectorStoreIndex and query engine
index = VectorStoreIndex.from_documents(docs, service_context=service_context, storage_context=storage_context) # embeds data and creates indices for the embeddings
query_engine = index.as_query_engine(streaming=True)

In [7]:
# perform a query and stream the response
response = query_engine.query("what is the story about?")

In [8]:
# streaming response
response.print_response_stream()

 The story revolves around Elara, a young woman with mysterious magical abilities in the town of Eldoria. She is known for her enchanting beauty and captivating connection to ancient magic. One day, a stranger named Seraphim arrives in Eldoria seeking her help to unlock a long-lost magic that could save his homeland from a devastating curse. Elara and Seraphim embark on a perilous journey together, encountering various trials and mythical creatures along the way. Their bond transcends boundaries as they forge a new understanding of each other's worlds. The story culminates in a final battle against a malevolent being to break the curse, ultimately freeing Seraphim's homeland while Elara seeks to reconcile her altered identity and find a new purpose for her magic.