# RAG Process - Inference (New version)

This notebook shows how to use Retrieval Augmented Generation on the Domino platform to do Q&A over information that OpenAI's models have not been trained on and will not be able to provide answers out of the box. It also demonstrates the following enhancements:

1. During the “retrieve” step, Domino’s Vector Access Layer enforces enterprise-ready credential management and security procedures, logs audit trail steps, and tracks the data’s lineage.
1. The “augment” step involves sending the prompt and the data extracted from the Vector Database to the LLM. The data is in an embedding format that is readable by the LLM.
1. n the “generate” step, the LLM uses the embedding to generate a response. Combined with the new Domino AI Gateway, enterprises can use prompts while ensuring data governance and reducing the likelihood of hallucinations. The workflow passes the LLM’s response back to the user.

### Load the needed libraries

In [17]:
import os
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from domino_data.vectordb import domino_pinecone3x_init_params, domino_pinecone3x_index_params
from mlflow.deployments import get_deploy_client
from langchain_community.embeddings import MlflowEmbeddings
from langchain_community.chat_models import ChatMlflow

from pinecone import Pinecone
from getpass import getpass
import warnings
warnings.filterwarnings('ignore')

### Load environment variables

Notice that the API Keys that you would normally need are commented out. Using Domino AI Gateway Endpoints for your model access and Domino Vector Data Sources for vector store access means that the API keys are managed and stored securely by the Admins.

In [2]:

#os.environ['OPENAI_API_KEY'] = getpass("Enter OpenAI API key:")
#os.environ['PINECONE_API_KEY'] = getpass("Enter Pinecone API key:")

#OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') 
#PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
PINECONE_ENV = os.getenv('PINECONE_API_ENV')
os.environ['TOKENIZERS_PARALLELISM'] = 'false'

### Create embeddings to embed queries using Domino AI Gateway Endpoint in LangChain

In [3]:
embeddings = MlflowEmbeddings(
    target_uri=os.environ["DOMINO_MLFLOW_DEPLOYMENTS"],
    endpoint="embedding-ada-002ja2",
)

### Initialize Pinecone vector store using a Domino-specific Environment

In [4]:
#Domino Vector Data Source name
datasource_name = "mrag-fin-docs-ja"
# Load Domino Pinecone Data Source Configuration 
pc = Pinecone(**domino_pinecone3x_init_params(datasource_name))


# Load Pinecone Index
index_name = "mrag-fin-docs"
index = pc.Index(**domino_pinecone3x_index_params(datasource_name, index_name))
text_field = "text"  # switch back to normal index for langchain
vectorstore = PineconeVectorStore(  
    index, embeddings, text_field   # Using embedded data from Domino AI Gateway Endpoint
)

### Check index current stats as a simple checkpoint 

You'll see that the index has a ```total_vector_count```. This shows the number of vectors are currently present.

In [5]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 361}},
 'total_vector_count': 361}

### Create the Prompt Template

In [6]:
prompt_template = """You are an AI assistant with expertise in financial analysis. You are given the following extracted parts and a question. 
If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not about financial analysis, politely inform them that you are tuned to only answer questions pertaining to financial analysis.
Question: {question}
=========
{context}
=========
Answer in Markdown:
"""
PROMPT = PromptTemplate(template=prompt_template, input_variables=["question", "context"])
#
chain_type_kwargs = {"prompt": PROMPT}

### Using the Domino AI Gateway Endpoint via Langchain ChatMLflow object

To use a different model, change the ```endpoint``` parameter to a different endpoint name.  

In [11]:
rag_llm = ChatMlflow(
        target_uri=os.environ["DOMINO_MLFLOW_DEPLOYMENTS"],
        endpoint="chat-gpt4-ja", 
        temperature=1.0,
    )

### Instantiate the RetrievalQA chain for answering questions from the embedded data in the vectorstore

In [12]:
qa_chain = RetrievalQA.from_chain_type(llm=rag_llm, # AI Gateway Endpoint
                                       chain_type="stuff",
                                       chain_type_kwargs={"prompt": PROMPT},
                                       retriever=vectorstore.as_retriever(search_kwargs={"k": 5}), # Domino Pinecone Data Source
                                       return_source_documents=True
                                      )

### Get question to answer in the docs and run the chain

In [14]:
user_question = input("Please ask your financial analysis question:")
result = qa_chain(user_question)

Please ask your financial analysis question: What was the gross income amount and percentage as share of total revenues in FY23


### Retrieve the result

In [15]:
result['result']

'The gross income amount for FY23 was $169,148 million. This represented 44.1% as a share of total revenues in the same fiscal year.'

### Display Source Documents retrieved from the vector store and used for the answer

In [16]:
result['source_documents'][0].page_content

'Gross Margin\nProducts and Services gross margin and gross margin percentage for 2023, 2022 and 2021 were as follows (dollars in millions):\n2023 2022 2021\nGross margin:\nProducts $ 108,803 $ 114,728 $ 105,126 \nServices 60,345 56,054 47,710 \nTotal gross margin $ 169,148 $ 170,782 $ 152,836 \nGross margin percentage:\nProducts 36.5 % 36.3 % 35.3 %\nServices 70.8 % 71.7 % 69.7 %\nTotal gross margin percentage 44.1 % 43.3 % 41.8 %\nProducts Gross Margin\nProducts gross margin decreased during 2023 compared to 2022 due to the weakness in foreign currencies relative to the U.S. dollar and lower Products\nvolume, partially of fset by cost savings and a dif ferent Products mix.\nProducts gross margin percentage increased during 2023 compared to 2022 due to cost savings and a different Products mix, partially offset by the weakness in\nforeign currencies relative to the U.S. dollar and decreased leverage.\nServices Gross Margin'