# Advanced RAG using `llama-index`

##### Install dependencies (`llama-index, chromaDB, llama-index-embeddings & ollama dependencies`)

In [None]:
!pip install llama-index chromadb llama-index-llms-ollama langchain-community  llama-index-embeddings-langchain

### Step 1 Load the External Knowedge Base into a document store

In [2]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("KB").load_data()
len(documents)

85

##### Step 1a Create a **ChromaDB** *(Vector DB)* instance and initialize a `storage_context`

In [3]:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

chroma_client = chromadb.PersistentClient()
chroma_collection = chroma_client.get_or_create_collection("concepts-in-tamil")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

##### Step 1b Set Ollama as the LLM to be used both for embedding & for response generation

In [4]:
# Global settings
from llama_index.core import Settings
from llama_index.llms.ollama import Ollama
from langchain_community.embeddings import OllamaEmbeddings
from llama_index.embeddings.langchain import LangchainEmbedding

Settings.llm = Ollama(model='dolphin-mistral:latest', request_timeout=60.0)
lc_ollama_embeddings = OllamaEmbeddings(model="dolphin-mistral:latest")
Settings.embed_model = LangchainEmbedding(lc_ollama_embeddings)

### Step 2 & Step 3 Generate Index based on the knowledge base and store the index in vector DB

In [5]:
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

This took around 4m 27s to build this index

### Step 4 & 5 Now we are ready to query this index along with the input search string from the user

In [6]:
query_engine = index.as_query_engine()
response = query_engine.query("What was Microsoft's cloud revenue in fiscal year 2023?")
print(response)

 From the provided context information, we cannot precisely determine Microsoft's total cloud revenue for fiscal year 2023. However, we can find some information related to specific aspects of their business that involve clouds services. For instance, we know they earned "Revenue from Server products and cloud services" which includes Azure and other cloud services as well as their SQL Server, Windows Server, Visual Studio, System Center, and GitHub.

In the 2023 fiscal year, the revenue generated from this segment is reported to be $146,052 million. Although it does not explicitly mention "cloud revenue," a significant portion of this revenue comes from cloud-related offerings. In order to pinpoint exactly how much of this total revenue can be attributed specifically to their Azure and other cloud services, you would need more detailed financial data which is beyond the scope of provided context information.


### Step 6 Customization if required

##### Now to get improved results, we can customize the index retriver to get more context and also ask the LLM to summarize the answer more throughly.

In [9]:
query_engine = index.as_query_engine(similarity_top_k=5, response_mode="tree_summarize")
response = query_engine.query("What was Microsoft's cloud revenue in fiscal year 2023?")
print(response)

 According to the provided information, Microsoft's cloud revenue in fiscal year 2023 was $111.6 billion. This amount is primarily included in Server products and cloud services, Office products and cloud services, LinkedIn, and Dynamics in the table above. To be more specific, this revenue figure is found on page 77 of the annual report under the section titled "Revenue from LinkedIn, including Talent Solutions, Marketing Solutions, Premium Subscriptions, and Sales Solutions," where it states that Microsoft's "Microsoft Cloud revenue... for fiscal year 2023 was $111.6 billion."


For more customization options, look [here](https://docs.llamaindex.ai/en/stable/getting_started/customization.html)