# NVIDIA AI Endpoints, LlamaIndex, and LangChain

This notebook demonstrates how to plug in a NVIDIA AI Endpoint [mixtral_8x7b](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/mixtral-8x7b) and [embedding nvolveqa_40k](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints#setup), bind these into [LlamaIndex](https://gpt-index.readthedocs.io/en/stable/) with these customizations.


<div class="alert alert-block alert-info">
    
⚠️ There are continous development and retrieval techniques supported in LlamaIndex and this notebook just shows to quikcly replace components such as llm and embedding to a user-choice, read more [documentation on llama-index](https://docs.llamaindex.ai/en/stable/) for the latest information. 
</div>

### Prerequisite 
In order to successfully run this notebook, you will need the following -

1. Already successfully gone through the [setup](https://python.langchain.com/docs/integrations/text_embedding/nvidia_ai_endpoints#setup) and generated an API key.

2. Please verify you have successfully pip install all python packages in [requirements.txt](https://github.com/NVIDIA/GenerativeAIExamples/blob/3d29acf677466c5c301370cab5867cb09e04e318/notebooks/requirements.txt)

In this notebook, we will cover the following custom plug-in components -

    - LLM using NVIDIA AI Endpoint mixtral_8x7b
    
    - A NVIDIA AI endpoint embedding nvolveqa_40k
    
Note: As one can see, since we are using NVIDIA AI endpoints as an API, there is no further requirement in the prerequisites about GPUs as compute hardware


---
### Step 1 - Load NVIDIA AI Endpoint [mixtral_8x7b](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/mixtral-8x7b)

Note: check the prerequisite if you have not yet obtain a valid API key

In [1]:
import getpass
import os

## API Key can be found by going to NVIDIA NGC -> AI Foundation Models -> (some model) -> Get API Code or similar.
## 10K free queries to any endpoint (which is a lot actually).

# del os.environ['NVIDIA_API_KEY']  ## delete key and reset
if os.environ.get("NVIDIA_API_KEY", "").startswith("nvapi-"):
    print("Valid NVIDIA_API_KEY already in environment. Delete to reset")
else:
    nvapi_key = getpass.getpass("NVAPI Key (starts with nvapi-): ")
    assert nvapi_key.startswith("nvapi-"), f"{nvapi_key[:5]}... is not a valid key"
    os.environ["NVIDIA_API_KEY"] = nvapi_key

run a test and see the model generating output response

In [10]:
# test run and see that you can genreate a respond successfully
from langchain_nvidia_ai_endpoints import ChatNVIDIA
llm = ChatNVIDIA(model="mixtral_8x7b", nvidia_api_key=nvapi_key)
result = llm.invoke("Write me a small song on AI")
print(result.content)

(Verse 1)
In a world of data, so vast and wide,
There's a new intelligence, starting to rise.
Artificial minds, born from code,
On a digital highway, they now roam.

(Chorus)
AI, oh AI, in silicon you thrive,
Seeking patterns, making sense of our lives.
Through the bytes and the bits, you learn and grow,
A helping hand, in the future's glow.

(Verse 2)
From self-driving cars, to virtual scribes,
You weave through our world, like electric tribes.
Predicting weather, curing disease,
In every corner, your presence increases.

(Chorus)
AI, oh AI, in silicon you thrive,
Seeking patterns, making sense of our lives.
Through the bytes and the bits, you learn and grow,
A helping hand, in the future's glow.

(Bridge)
Yet, with great power, comes great responsibility,
A balance of trust, and accountability.
As we shape you, AI, so too, you shape us,
In this dance of progress, let's make no fuss.

(Chorus)
AI, oh AI, in silicon you thrive,
Seeking patterns, making sense of our lives.
Through the b

### Step 2 - Load the chosen NVIDIA Endpoint Embedding into llama-index

In [11]:
# Create and dl embeddings instance wrapping huggingface embedding into langchain embedding
# Bring in embeddings wrapper
from llama_index.embeddings import LangchainEmbedding

from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings
nv_embedding = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="query")
li_embedding=LangchainEmbedding(nv_embedding)
# Alternatively, if you want to specify whether it will use the query or passage type
# embedder = NVIDIAEmbeddings(model="nvolveqa_40k", model_type="passage")


### Step 3 - Wrap the NVIDIA embedding endpoint and the NVIDIA mixtral_8x7b endpoints into llama-index's ServiceContext

In [12]:
# Bring in stuff to change service context
from llama_index import set_global_service_context
from llama_index import ServiceContext
# service_context bundle of commonly used resources used during the indexing and querying stage 

# Create new service context instance
service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=li_embedding
)
# And set the service context
set_global_service_context(service_context)


### Step 4a - Load the text data using llama-index's SimpleDirectoryReader and we will be using the built-in [VectorStoreIndex](https://docs.llamaindex.ai/en/latest/community/integrations/vector_stores.html)

In [13]:
#create query engine with cross encoder reranker
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
import torch

documents = SimpleDirectoryReader("./toy_data").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)


### Step 4b - This will serve as the query engine for us to ask questions

In [14]:
# Setup index query engine using LLM
query_engine = index.as_query_engine()
# Test out a query in natural
response = query_engine.query("Give me 5 characteristics of Donald Trump")

In [15]:
response.metadata

{'c32974d1-e231-43c7-aaa9-16d85b4c96c0': {'file_path': 'toy_data\\trump_speeches.txt',
  'file_name': 'trump_speeches.txt',
  'file_type': 'text/plain',
  'file_size': 924745,
  'creation_date': '2024-04-28',
  'last_modified_date': '2024-04-28',
  'last_accessed_date': '2024-04-28'},
 '2ed4e923-aa1b-4a87-816c-3fc2cce4fb9e': {'file_path': 'toy_data\\trump_speeches.txt',
  'file_name': 'trump_speeches.txt',
  'file_type': 'text/plain',
  'file_size': 924745,
  'creation_date': '2024-04-28',
  'last_modified_date': '2024-04-28',
  'last_accessed_date': '2024-04-28'}}

In [16]:
response.response

'1. Confident: Donald Trump exudes confidence in his speeches, often using phrases like "I\'m the only one" or "I\'m the best."\n2. Business-oriented: He frequently highlights his business background, mentioning his companies, properties, and financial successes.\n3. Opinionated: Trump is not afraid to share his views, often making strong statements about his likes and dislikes.\n4. Competitive: He presents himself as a winner and emphasizes the importance of winning in various aspects of life.\n5. Outspoken: Trump is known for his direct and unfiltered communication style, often speaking bluntly and candidly about his thoughts and feelings.'

In [19]:
response = query_engine.query("Summarize the game")
print(response.response)

The game referred to is the Hunger Games, an annual event in a dystopian society where 24 tributes, young representatives from different districts, fight to the death until only one remains. The contestant described in the context is in the midst of this competition, trying to survive and outsmart her opponents. The Gamemakers, the organizers of the event, manipulate the arena and the tributes' circumstances to create entertainment for the audience. In this part of the story, the contestant faces a wall of fire, precision launchers firing fireballs, and a pack of Careers, the stronger and more ruthless tributes. She must navigate these challenges while avoiding poisonous substances and staying hidden from her enemies.


In [20]:
response = query_engine.query("How to win the game. Write in 2-3 sentences")
response.response

'To win the game, it is essential to have a clear understanding of the rules and objectives, maintain a strong focus on your goals, and consistently demonstrate determination and resilience in the face of challenges. Additionally, being adaptable to changing circumstances and learning from past experiences can significantly contribute to your success.'