## Mistral-7B Retrieval Augmented Generation (RAG) ‚öôÔ∏è üóÉÔ∏è

As the applications of Large Language Models (LLMs) continue to grow, companies and users are increasingly seeking out ways to understand and extract value from their proprietary data by using LLMs. However, security and privacy are serious concerns that have made companies reluctant to expose their sensitive proprietary data to external models. 

There are two ways this can be addressed. By building LLMs from scratch or fune-tuning open source LLMs on the proprietary data, which can be boht expensive and time consuming. Another option, is to build a RAG framework.

Simply put RAG allows users query a data or data source to receive relevant response. 
RAG frameworks, powered by large language models (LLM), take a data or data source, generate embeddings from the data, store the embeddings in a vector database, perform similarity search on query embeddings across the vector database to find relevant chunks, and then send the query embeddings and relevant chunks to the LLM, which generates a response.

In [2]:
!nvidia-smi

Mon Jan 15 21:53:52 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA TITAN RTX               Off | 00000000:00:05.0 Off |                  N/A |
|ERR!   40C    P8              N/A /  N/A |   8960MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

#### 1. Import packages

- ü¶ô `llama-index` is a framework for fast retrieval and querying of data

- üóÑÔ∏è `qdrant` is a vector database and vector similarity search engine for storing, searching and managing embeddings

In [None]:
# Import Modules
from llama_index.llms import Ollama
import qdrant_client
from llama_index import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
from llama_index.storage.storage_context import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore

In [None]:
# Load and reading all data
reader = SimpleDirectoryReader(input_dir="/home/ubuntu/Mistral-7B-RAG/data")
docs = reader.load_data()

# create Qdrant client and store
client = qdrant_client.QdrantClient(path="../data/qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="corpus_data")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [None]:
# Initialize Ollama and ServiceContext
llm = Ollama(model="mistral")
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local")

In [None]:
# Create VectorStoreIndex and query engine
index = VectorStoreIndex.from_documents(docs, service_context=service_context, storage_context=storage_context)
query_engine = index.as_query_engine(streaming=True)

In [None]:
# perform a query
response = query_engine.query("which of the models performed best")
response.print_response_stream()

In [None]:
# perform a query
response = query_engine.query("On a scale of 1-10 how well is the document written?")

In [None]:
print(response)