* Name: Arindam Choudhury

> # 3. NvidiaLLM_main

> #### Install folowing libraries to run this nonebook

In [1]:
# !pip install langchain
# !pip install gradio
# !pip install -qU langchain-openai
# !pip install langchain_google_genai
# !pip install faiss-cpu
# !pip install langchain_openai
# !pip install sentence-transformers

> #### Import Necessary Libraries

In [2]:
from langchain_google_genai import GoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.embeddings import HuggingFaceEmbeddings, GooglePalmEmbeddings
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
from langchain_openai import OpenAI, OpenAIEmbeddings
import pandas as pd
import gradio as gr
import os
from IPython.display import Markdown, display

> #### Google API Key Setup Guide

##### You need API key for Google PALM generative LLM to use Google Embedding

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

<a class="button button-primary" href="https://makersuite.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In Colab, add the key to the secrets manager under the "🔑" in the left panel. Give it the name `GOOGLE_API_KEY`.

Once you have the API key, pass it to the SDK. You can do this in two ways:

* Put the key in the `GOOGLE_API_KEY` environment variable (the SDK will automatically pick it up from there).
* Pass the key to `genai.configure(api_key=...)`

Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.

> #### Connect google drive

In [3]:
# from google.colab import userdata, drive
# drive.mount('/content/drive', force_remount=True)

In [4]:
GOOGLE_API_KEY               = 'AI-'
os.environ["OPENAI_API_KEY"] = 'sk-K'
#save_data_folder             = '/content/drive/MyDrive/Arindam_Choudhury_NvidiaLLM/NVIDIA_DATA_FOLDER/'
save_data_folder             = './NVIDIA_DATA_FOLDER/'
vactor_db_name_MiniLM        = 'FIAS_INDEX_VDB_MiniLM'
vactor_db_name_mpnet         = 'FIAS_INDEX_VDB_mpnet'
vactor_db_name_google        = 'FIAS_INDEX_VDB_google'
vactor_db_name_G_Palm        = 'FIAS_INDEX_VDB_G_Palm'
vactor_db_name_openai        = 'FULL_FIAS_INDEX_VDB_openai'
Sample_Question              = 'Sample_Question.csv'

> #### Load Sample Questions file (To test the model)

In [5]:
QA_df = pd.read_csv(save_data_folder + Sample_Question)
QA_df.drop("Unnamed: 0", axis=1, inplace=True)

> #### Define LLM Models and Embeddings

In [6]:
LLM_PaLM2_text  = GoogleGenerativeAI(model="models/text-bison-001", google_api_key=GOOGLE_API_KEY, temperature=1)
#LLM_OpenAI      = OpenAI(model_name="gpt-3.5-turbo-instruct")
LLM_Gemini_pro  = GoogleGenerativeAI(model="gemini-pro", google_api_key=GOOGLE_API_KEY, temperature=1)
LLM_Gemini_pro1 = GoogleGenerativeAI(model="gemini-1.0-pro", google_api_key=GOOGLE_API_KEY, temperature=1)

In [7]:
EMBEDDING_MiniLM = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-l6-v2", multi_process=True, model_kwargs={'device': 'cpu'})
EMBEDDING_mpnet  = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2", multi_process=True, model_kwargs={'device': 'cpu'})
EMBEDDING_Google = GoogleGenerativeAIEmbeddings(model="models/embedding-001", task_type = "RETRIEVAL_QUERY", google_api_key=GOOGLE_API_KEY)
EMBEDDING_G_PALM = GooglePalmEmbeddings(model="models/embedding-gecko-001", google_api_key=GOOGLE_API_KEY)
#EMBEDDING_OpenAI = OpenAIEmbeddings(model="text-embedding-3-small")

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

> #### SET MODEL AND EMBEDDING TO USE

In [8]:
LLM = LLM_PaLM2_text
EMBEDDING = EMBEDDING_MiniLM

if EMBEDDING == EMBEDDING_MiniLM:
    vactor_db_name = vactor_db_name_MiniLM; SEARCH_TYPE = 'mmr';        KWARGS_DOC = 4
if EMBEDDING == EMBEDDING_mpnet:
    vactor_db_name = vactor_db_name_mpnet;  SEARCH_TYPE = 'similarity'; KWARGS_DOC = 4
if EMBEDDING == EMBEDDING_Google:
    vactor_db_name = vactor_db_name_google; SEARCH_TYPE = 'mmr';        KWARGS_DOC = 4
if EMBEDDING == EMBEDDING_G_PALM:
    vactor_db_name = vactor_db_name_G_Palm; SEARCH_TYPE = 'mmr';        KWARGS_DOC = 4
# if EMBEDDING == EMBEDDING_OpenAI:
#     vactor_db_name = vactor_db_name_openai; SEARCH_TYPE = 'similarity'; KWARGS_DOC = 4

VECTORDB  = FAISS.load_local(save_data_folder + vactor_db_name, EMBEDDING)

> #### Question Answer PROMPT

In [9]:
prompt_template = """

Your task is to answer the Question based only on the Context docuuments.

While answering the asked question take your time to read the Context documents and provide as much details as possible from the CONTEXT.

Do NOT use any salutation e.g Hi, Hey, Hello or name of a person in the answer.

Do NOT add url links at the begining of the answer and Do NOT add README file or github links in the answer.

Always format the answer for better readability when applicable.

In case you are unable to answer the Question from the Context docuuments then you can say that you don't know the answer and do not makeup an answer.

Context: {context}

Question: {input} """

> #### Function: Create Retrieval chain using LangChain

In [10]:
def get_qa_chain(model, prompt):
    print("Vector DB size: ", VECTORDB.index.ntotal)
    print("LLM :", model)
    print("Embedding :", EMBEDDING)

    retriever = VECTORDB.as_retriever(search_type=SEARCH_TYPE, search_kwargs={"k": KWARGS_DOC})
    qa_chain_prompt    = PromptTemplate(template = prompt, input_variables = ["context", "input"])
    combine_docs_chain = create_stuff_documents_chain(model, qa_chain_prompt)
    retrieval_chain    = create_retrieval_chain(retriever, combine_docs_chain)
    return retrieval_chain
retrieval_chain  = get_qa_chain(LLM, prompt_template)

Vector DB size:  808127
LLM : [1mGoogleGenerativeAI[0m
Params: {'model': 'models/text-bison-001', 'temperature': 1.0, 'top_p': None, 'top_k': None, 'max_output_tokens': None, 'candidate_count': 1}
Embedding : client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False})
  (2): Normalize()
) model_name='sentence-transformers/all-MiniLM-l6-v2' cache_folder=None model_kwargs={'device': 'cpu'} encode_kwargs={} multi_process=True show_progress=False


> #### Function: Invoke LLM to Get the Answer

In [11]:
def get_ans_from_llm(retrieval_chain, question):
    result = retrieval_chain.invoke({"input": question})
    return result['answer']

> #### TESTING CELL

In [12]:
# import langchain
# langchain.debug = True
question = "What is NVIDIA RAPIDS?"
# question = "What is the NVIDIA CUDA Toolkit?"
# question = "How can I install the NVIDIA CUDA Toolkit on Windows?"
# question = "What is Megatron 530B LLM?"
# question = "Why do I need nvCOMP"
#question = "What is DeepStream SDK"
result = retrieval_chain.invoke({"input": question})
Markdown(result["answer"])

RAPIDS™ is a suite of open-source libraries that accelerate end-to-end data science and analytics pipelines, delivering up to 10x faster performance for data preparation, machine learning inference and training, and graph analytics. Built on CUDA®, RAPIDS provides a unified experience for data scientists, engineers, and analysts to accelerate their workflows on NVIDIA GPUs.

> #### Print Answers from the Sample Question File

In [13]:
# for i, QA in QA_df.iterrows():
#     result = retrieval_chain.invoke({"input": QA["Question"]})
#     QA_df.iloc[i,1] = result["answer"]

#     format_print = f"""
#     {i+1}. Question: {QA['Question']}
#     Answer: {result['answer']}
#     {'-' * 100} """
#     print(format_print)
#     break

> #### Do you want to save these Answers?

In [14]:
# QA_df.to_csv(save_data_folder + Sample_Question, columns=["Question", "Answer"])
#QA_df

> #### Function: Graido URL for Interactive Tool

In [1]:
FAQs = [
 "What is NVIDIA?"
,"What is NVIDIA RAPIDS?"
,"What is the NVIDIA CUDA Toolkit?"
,"How can I install the NVIDIA CUDA Toolkit on Windows?"
,"What is the difference between NVIDIA's BioMegatron and Megatron 530B LLM?"
,"What is Megatron 530B LLM?"
,"What is BioMegatron?"
,"Why do I need nvCOMP"
,"What is the difference GPU and CPU"
,"Why do I need GPU"]

def greet(question):
    return get_ans_from_llm(retrieval_chain, question)

with gr.Blocks() as demo:
    response  = gr.Textbox(label="Answer:")
    question  = gr.Textbox(label="Question:")
    image = save_data_folder + "nvidia.png"
    gr.Image(image)
    gr.Markdown("Ask me a questions related to NVIDIA, I'll try to answer referring SDKs")
    iface = gr.Interface(fn=greet, inputs=question, outputs=response, theme=gr.themes.Glass(),
                examples=FAQs,allow_flagging="never")
demo.launch(share=True)

NameError: name 'gr' is not defined