# NVIDIA AI Foundation Endpoints 

> [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) give users easy access to NVIDIA hosted API endpoints for NVIDIA AI Foundation Models like Mixtral 8x7B, Llama 2, Stable Diffusion, etc. These models, hosted on the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), are optimized, tested, and hosted on the NVIDIA AI platform, making them fast and easy to evaluate, further customize, and seamlessly run at peak performance on any accelerated stack.
> 
> With [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/), you can get quick results from a fully accelerated stack running on [NVIDIA DGX Cloud](https://www.nvidia.com/en-us/data-center/dgx-cloud/). Once customized, these models can be deployed anywhere with enterprise-grade security, stability, and support using [NVIDIA AI Enterprise](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/).
> 
> These models can be easily accessed via the [`langchain-nvidia-ai-endpoints`](https://pypi.org/project/langchain-nvidia-ai-endpoints/) package, as shown below.

This example goes over how to use LangChain to interact with the supported [NVIDIA Reranker Model](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/nvolve-40k) for [retrieval-augmented generation](https://developer.nvidia.com/blog/build-enterprise-retrieval-augmented-generation-apps-with-nvidia-retrieval-qa-embedding-model/) via the `NVIDIAEmbeddings` class.

For more information on accessing the chat models through this api, check out the [ChatNVIDIA](../chat/nvidia_ai_endpoints) documentation.

## Installation

In [8]:
# %pip install --upgrade --quiet langchain-nvidia-ai-endpoints
# %pip install --upgrade --quiet langchain langchain-community langchain-text-splitters
# %pip install --upgrade --quiet faiss-cpu

## Setup

**To get started:**

1. Create a free account with the [NVIDIA NGC](https://catalog.ngc.nvidia.com/) service, which hosts AI solution catalogs, containers, models, etc.

2. Navigate to `Catalog > AI Foundation Models > (Model with API endpoint)`.

3. Select the `API` option and click `Generate Key`.

4. Save the generated key as `NVIDIA_API_KEY`. From there, you should have access to the endpoints.

In [9]:
import getpass
import os

## API Key can be found by going to NVIDIA NGC -> AI Foundation Models -> (some model) -> Get API Code or similar.
## 1K free queries to any endpoint.

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset") 
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

Valid NVIDIA_API_KEY already in environment. Delete to reset


## Initialization

Let's list out some of the models we will be using for this notebook:

In [10]:
from langchain_nvidia_ai_endpoints import (
    ChatNVIDIA,
    NVIDIAEmbeddings,
    NVIDIARerank,
)

all_models = ChatNVIDIA.get_available_models(list_all=True)

[m for m in all_models if m.client in ("NVIDIARerank", "NVIDIAEmbeddings") or "mixtral" in m.id.lower()]

[Model(id='ai-mixtral-8x7b-instruct', model_type='chat', api_type=None, kwargs={'model_name': 'mistralai/mixtral-8x7b-instruct-v0.1', 'max_tokens': 1024}, client='ChatNVIDIA', path='a1e53ece-bff4-44d1-8b13-c009e5bf47f6'),
 Model(id='mixtral_8x7b', model_type='chat', api_type='aifm', kwargs={}, client='ChatNVIDIA', path='8f4118ba-60a8-4e6b-8574-e38a4067a4a3'),
 Model(id='ai-embed-qa-4', model_type='embedding', api_type=None, kwargs={'model_name': 'NV-Embed-QA'}, client='NVIDIAEmbeddings', path='09c64e32-2b65-4892-a285-2f585408d118'),
 Model(id='nvolveqa_40k', model_type='embedding', api_type='aifm', kwargs={}, client='NVIDIAEmbeddings', path='091a03bb-7364-4087-8090-bd71e9277520'),
 Model(id='ai-rerank-qa-mistral-4b', model_type='ranking', api_type=None, kwargs={'model_name': 'nv-rerank-qa-mistral-4b:1'}, client='NVIDIARerank', path='0bf77f50-5c35-4488-8e7a-f49bb1974af6')]

Among the list above, we should be able to see the following models:
- `ai-mixtral-8x7b-instruct`: A NIM-containerized Mixtral-8x7b model which we will use as our LLM backbone via `ChatNVIDIA`.
- `ai-embed-qa-4`: A NIM-containterized query-answer embedding model based on the e5-large architecture which we will use to generate embeddings via `NVIDIAEmbeddings`.
- `ai-rerank-qa-mistral-4b`: A NIM-containerized mistral-backed question-answer reranking model which we will use to rank question-answer pairs via `NVIDIARerank`.

In this notebook, we will focus on the **Reranking Model** which evaluates the relevance of passages in making decisions about a query. They are a common component of a retrieval-augmented generation pipeline and allow you to access quick relevance scores to help rank, order, filter, or otherwise process your retrieval. 

Let's initialize these models for use later:

In [11]:
from langchain_nvidia_ai_endpoints import NVIDIARerank

llm = ChatNVIDIA(model="ai-mixtral-8x7b-instruct")
embedder = NVIDIAEmbeddings(model="ai-embed-qa-4")
reranker = NVIDIARerank(model="ai-rerank-qa-mistral-4b")

In [12]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

documents = TextLoader("../../modules/state_of_the_union.txt",).load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
for idx, text in enumerate(texts):
    text.metadata["id"] = idx

retriever = FAISS.from_documents(texts, embedder).as_retriever(search_kwargs={"k": 10})

query = "What did the president say about Ketanji Brown Jackson"
docs = retriever.get_relevant_documents(query)

print("\nDoc Snippets:")
for doc in docs:
    print(repr(doc.page_content[:100])+"...")
    print({k:v for k,v in doc.dict().items() if k != "page_content"})


Doc Snippets:
'One of the most serious constitutional responsibilities a President has is nominating someone to ser'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 73}, 'type': 'Document'}
'As I said last year, especially to our younger transgender Americans, I will always have your back a'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 79}, 'type': 'Document'}
'And I know you’re tired, frustrated, and exhausted. \n\nBut I also know this. \n\nBecause of the progres'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 55}, 'type': 'Document'}
'A former top litigator in private practice. A former federal public defender. And from a family of p'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 74}, 'type': 'Document'}
'This was a bipartisan effort, and I want to thank the members of both parties who worked to make it '...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 2

In [13]:
top_docs = reranker.compress_documents(docs, query, top_k=5)

print("Query:", query)

print("\nMost Relevant Chunks:")
for doc in top_docs:
    print(repr(doc.page_content[:100])+"...")
    print({k:v for k,v in doc.dict().items() if k != "page_content"})

print("\n'Relevant' Documents:")
for doc in top_docs:
    if doc.metadata.get('relevance_score') > 0:
        print(doc.page_content)

Query: What did the president say about Ketanji Brown Jackson

Most Relevant Chunks:
'One of the most serious constitutional responsibilities a President has is nominating someone to ser'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 73, 'relevance_score': 0.43701171875}, 'type': 'Document'}
'A former top litigator in private practice. A former federal public defender. And from a family of p'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 74, 'relevance_score': -7.38671875}, 'type': 'Document'}
'I spoke with their families and told them that we are forever in debt for their sacrifice, and we wi'...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 67, 'relevance_score': -14.3359375}, 'type': 'Document'}
'He will never extinguish their love of freedom. He will never weaken the resolve of the free world. '...
{'metadata': {'source': '../../modules/state_of_the_union.txt', 'id': 18, 'relevance_score': -14.7109375}, '

In [14]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.chains import RetrievalQA

compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranker, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
    "What did the president say about Ketanji Jackson Brown"
)
print("Most Relevant Documents:", [doc.metadata["id"] for doc in compressed_docs])

chain = RetrievalQA.from_chain_type(llm=llm, retriever=compression_retriever)
chain.invoke(query)

Most Relevant Documents: [73, 74, 67, 68, 53]


{'query': 'What did the president say about Ketanji Brown Jackson',
 'result': ' At her confirmation hearing to become an Associate Justice on the Supreme Court, President Joe Biden expressed his strong support for Judge Ketanji Brown Jackson. He praised her extensive qualifications, experience, and temperament, noting that she is "one of our nation\'s brightest legal minds" and "eminently qualified" for the role. President Biden emphasized that her confirmation would be a historic milestone, as Judge Jackson would be the first Black woman to serve on the Supreme Court.\n\nHere is a quote from the President\'s statement at the confirmation hearing:\n\n"Judge Jackson’s nomination is a testament to her character and to her brilliant legal mind. She is, without a doubt, one of our nation’s brightest legal minds. One of our most impressive legal scholars. A consensus builder. A devoted public servant. And someone who will be the first Black woman to serve on the United States Supreme Court