# 使用NVIDIA的LLM API目录连接器

本笔记本将指导您了解`NVIDIA`连接器的基本用法。

通过这个连接器，您将能够连接到NVIDIA的[API目录](https://build.nvidia.com/explore/discover)中可用的兼容模型，并生成模型，例如：

- Google的[gemma-7b](https://build.nvidia.com/google/gemma-7b)
- Mistal AI的[mistral-7b-instruct-v0.2](https://build.nvidia.com/mistralai/mistral-7b-instruct-v2)
- 等等！

我们将首先确保安装了`llama-index`和相关软件包。

> 注意：目前，只有基本URL为`https://integrate.api.nvidia.com/v1`的模型与此连接器兼容。


In [None]:
!pip install llama-index-embeddings-openai llama-index-readers-file

Collecting llama-index-embeddings-openai
  Using cached llama_index_embeddings_openai-0.1.7-py3-none-any.whl.metadata (603 bytes)
Using cached llama_index_embeddings_openai-0.1.7-py3-none-any.whl (6.0 kB)
Installing collected packages: llama-index-embeddings-openai
Successfully installed llama-index-embeddings-openai-0.1.7


## API密钥和样板文件

在下一个单元格中，我们将运行一些样板文件，以便在笔记本环境中顺利执行示例。

我们还将提供我们的API密钥。

> 注意：您可以使用代码示例窗口中的“获取API密钥”按钮创建您自己的NVIDIA API密钥。


In [None]:
# llama-parse是异步优先的，运行笔记本中的异步代码需要使用nest_asyncioimport nest_asyncionest_asyncio.apply()import os# 使用OpenAI API进行嵌入os.environ["OPENAI_API_KEY"] = "sk-"# 使用NVIDIA API Playground API密钥进行LLMos.environ["NVIDIA_API_KEY"] = "nvapi-"

## 加载NVIDIA LLM

现在我们可以通过传入模型名称来加载我们的`NVIDIA` LLM，模型名称可以在文档中找到 - 位于[这里](https://docs.api.nvidia.com/nim/reference/)

> 注意：默认模型是`mistralai/mistral-7b-instruct-v0.2`。


In [None]:
from llama_index.llms.nvidia import NVIDIA
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings

llm = NVIDIA(model="mistralai/mistral-7b-instruct-v0.2")

Settings.llm = llm

我们可以观察我们的 `llm` 对象当前关联的模型是哪个，通过查看 `.model` 属性。


In [None]:
llm.model

'mistralai/mistral-7b-instruct-v0.2'

## 加载API目录 LLM

我们还可以使用它们的API目录地址来加载模型。

让我们以`gemma-7b`为例！

1. 转到[model页面](https://build.nvidia.com/google/gemma-7b)
2. 在`model`参数中找到地址（例如`"google/gemma-7b"`）
3. 验证它具有`base_url`为`"https://integrate.api.nvidia.com/v1"`
4. 使用`NVIDIA(model="model_name_here")`来指向该模型的连接器（例如`NVIDIA(model="google/gemma-7b"`）

让我们在代码中看看这个。


In [None]:
llm = NVIDIA(model="google/gemma-7b")

让我们确认一下我们是否将 `NvidiaAIPlayground` LLM 与正确的模型关联起来了！


In [None]:
llm.model

'google/gemma-7b'

## 基本功能

现在我们可以探索在LlamaIndex生态系统中可以使用连接器的不同方式！

在开始之前，让我们设置一个`ChatMessage`对象的列表 - 这是一些方法的预期输入。


In [None]:
from llama_index.core.llms import ChatMessage, MessageRole

chat_messages = [
    ChatMessage(
        role=MessageRole.SYSTEM, content=("You are a helpful assistant.")
    ),
    ChatMessage(
        role=MessageRole.USER,
        content=("What are the most popular house pets in North America?"),
    ),
]

我们将按照每个示例相同的基本模式进行操作：

1. 我们将把我们的 `NVIDIA` LLM 指向我们想要的模型
2. 我们将检查如何使用端点来实现期望的任务！


### 完成：`.complete()`

我们可以使用`.complete()`/`.acomplete()`（接受一个字符串）来从所选模型中获取响应。

让我们为这个任务使用我们的默认模型。


In [None]:
completion_llm = NVIDIA()

我们可以通过检查`.model`属性来验证这是否是预期的默认值。


In [None]:
completion_llm.model

'mistralai/mistral-7b-instruct-v0.2'

让我们在模型上调用`.complete()`方法，并使用字符串`"Hello!"`作为输入，观察响应。


In [None]:
completion_llm.complete("Hello!")

CompletionResponse(text=" Hello there! How can I help you today? I'm here to answer any questions you might have or provide information on a wide range of topics. So, feel free to ask me anything!\n\nIf you're looking for some general information, I can help you with that too. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what would you like to know? Let me know and I'll do my best to help you out!\n\nIf you have any specific question or topic in mind, please let me know and I'll be glad to help you out. If you want some general information, I can provide you with that as well. For example, I can tell you about the weather, current events, or provide definitions for various words and concepts. I can also help you with math problems, translate words and phrases, and even tell you a joke or two!\n\nSo, what wo

正如LlamaIndex所期望的那样 - 我们会收到一个`CompletionResponse`作为响应。


#### 异步完成：`.acomplete()`

还有一个可以以相同方式利用的异步实现！


In [None]:
await completion_llm.acomplete("Hello!")

CompletionResponse(text=" Hello there! How can I help you today? I'm here to answer any questions you might have or provide information on a wide range of topics. So feel free to ask me anything!\n\nIf you're looking for a specific topic, just let me know and I'll do my best to provide you with accurate and up-to-date information. And if you have any requests for fun facts or trivia, I'm happy to oblige!\n\nSo, what would you like to know today? Let me help make your day a little brighter! 😊", additional_kwargs={}, raw={'id': 'chatcmpl-8ce881c1-a47b-43aa-afd8-9e9addf26ce9', 'choices': [Choice(finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[], token_logprobs=[0.0, 0.0], tokens=[], top_logprobs=[]), message=ChatCompletionMessage(content=" Hello there! How can I help you today? I'm here to answer any questions you might have or provide information on a wide range of topics. So feel free to ask me anything!\n\nIf you're looking for a specific topic, just let

#### 聊天：`.chat()`

现在我们可以尝试使用`.chat()`方法来做同样的事情。这个方法需要一个聊天消息的列表，所以我们将使用上面创建的那个列表。

我们将使用`mistralai/mixtral-8x7b-instruct-v0.1`模型作为示例。


In [None]:
chat_llm = NVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1")

现在我们只需要在我们的`ChatMessages`列表上调用`.chat()`，然后观察我们的响应。

您还会注意到，我们可以传入一些额外的关键字参数来影响生成过程 - 在本例中，我们使用了`seed`参数来影响我们的生成，以及`stop`参数来指示我们希望模型在达到特定标记时停止生成！

> 注意：您可以在所选模型的API文档中找到有关模型端点支持的其他kwargs的信息。例如，Mixtral的API文档位于[此处](https://docs.api.nvidia.com/nim/reference/mistralai-mixtral-8x7b-instruct-infer)！


In [None]:
chat_llm.chat(chat_messages, seed=4, stop=["cat", "cats", "Cat", "Cats"])

ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=" In North America, the most popular types of house pets are:\n\n1. Dogs: Man's best friend is the most popular pet in North America. They are known for their loyalty, companionship, and the variety of breeds that cater to different lifestyles and preferences.\n\n2. Cats", additional_kwargs={}), raw={'id': 'chatcmpl-b6ef95ca-e023-4dc8-8ee9-843f214169e9', 'choices': [Choice(finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[], token_logprobs=[0.0, 0.0], tokens=[], top_logprobs=[]), message=ChatCompletionMessage(content=" In North America, the most popular types of house pets are:\n\n1. Dogs: Man's best friend is the most popular pet in North America. They are known for their loyalty, companionship, and the variety of breeds that cater to different lifestyles and preferences.\n\n2. Cats", role='assistant', function_call=None, tool_calls=None))], 'created': 1713474655, 'model':

如预期，我们收到了一个`ChatResponse`作为响应。


#### 异步聊天：(`achat`)

我们还有一个异步实现的`.chat()`方法，可以按照以下方式调用。


In [None]:
await chat_llm.achat(chat_messages)

ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=' The most popular house pets in North America are dogs and cats. According to the American Pet Products Association (APPA), as of 2021, approximately 69 million homes in the United States own a pet, and 63.4 million of those households have a dog, while 42.7 million have a cat. Birds, small mammals, reptiles, and fish are also popular pets, but to a lesser extent.', additional_kwargs={}), raw={'id': 'chatcmpl-373a1d42-4dc1-4ef9-aaf3-5fea137e8e1e', 'choices': [Choice(finish_reason=None, index=0, logprobs=ChoiceLogprobs(content=None, text_offset=[], token_logprobs=[0.0, 0.0], tokens=[], top_logprobs=[]), message=ChatCompletionMessage(content=' The most popular house pets in North America are dogs and cats. According to the American Pet Products Association (APPA), as of 2021, approximately 69 million homes in the United States own a pet, and 63.4 million of those households have a dog, while 42.7 million

### 流：`.stream_chat()`

我们也可以使用在`build.nvidia.com`上找到的模型来进行流式使用案例！

让我们选择另一个模型并观察其行为。我们将使用谷歌的`gemma-7b`模型来完成这个任务。


In [None]:
stream_llm = NVIDIA(model="google/gemma-7b")

让我们使用`.stream_chat()`来调用我们的模型，它再次期望一个`ChatMessage`对象的列表，并捕获响应。


In [None]:
streamed_response = stream_llm.stream_chat(chat_messages)

In [None]:
streamed_response

<generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_llm_chat.<locals>.wrapped_gen at 0x7dd89853e320>

正如我们所看到的，响应是一个生成器，其中包含流式响应。

让我们在生成完成后查看最终的响应。


In [None]:
last_element = None
for last_element in streamed_response:
    pass

print(last_element)

assistant: **Top Popular House Pets in North America:**

**1. Dogs:**
* Estimated 63.4 million pet dogs in households (2023)
* Known for their loyalty, companionship, and trainability

**2. Cats:**
* Estimated 38.4 million pet cats in households (2023)
* Known for their independence, affection, and low-maintenance nature

**3. Fish:**
* Estimated 14.5 million pet fish in households (2023)
* Popular for their tranquility, beauty, and variety of species

**4. Small mammals (guinea pigs, hamsters, rabbits):**
* Estimated 14.4 million pet small mammals in households (2023)
* Known for their playful and affectionate nature

**5. Birds:**
* Estimated 13.3 million pet birds in households (2023)
* Known for their beauty, song, and intelligence

**Other popular pets:**

* Tortoises and reptiles
* Hamsters and rodents
* Invertebrates (such as spiders and hermit crabs)

**Factors influencing pet popularity:**

* **Lifestyle and living situation:** Urban dwellers are more likely to have cats, whil

#### 异步流：`.astream_chat()`

我们也有与流式处理等效的异步方法，可以以类似的方式用于同步实现。


In [None]:
streamed_response = await stream_llm.astream_chat(chat_messages)

In [None]:
streamed_response

<async_generator object llm_chat_callback.<locals>.wrap.<locals>.wrapped_async_llm_chat.<locals>.wrapped_gen at 0x787709eea460>

In [None]:
last_element = None
async for last_element in streamed_response:
    pass

print(last_element)

assistant: Sure, here are the most popular house pets in North America:

1. Dogs
2. Cats
3. Fish
4. Small Mammals
5. Birds


## 流式查询引擎响应

让我们来看一个稍微复杂一点的例子，使用一个查询引擎！

我们将从加载一些数据开始（我们将使用[《银河系漫游指南》](https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt)）。


### 加载数据

让我们首先创建一个目录，用来存放我们的数据。


In [None]:
!mkdir -p 'data/hhgttg'

我们将从上述来源下载我们的数据。


In [None]:
!wget 'https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt' -O 'data/hhgttg/hhgttg.txt'

--2024-04-01 14:39:38--  https://web.eecs.utk.edu/~hqi/deeplearning/project/hhgttg.txt
Resolving web.eecs.utk.edu (web.eecs.utk.edu)... 160.36.127.165
Connecting to web.eecs.utk.edu (web.eecs.utk.edu)|160.36.127.165|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1534289 (1.5M) [text/plain]
Saving to: ‘data/hhgttg/hhgttg.txt’


2024-04-01 14:39:39 (6.75 MB/s) - ‘data/hhgttg/hhgttg.txt’ saved [1534289/1534289]



我们需要为这一步准备一个嵌入模型！我们将使用OpenAI的`text-embedding-03-small`模型来实现这一点，并将其保存在我们的`Settings`中。


In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding

openai_embedding = OpenAIEmbedding(model="text-embedding-3-small")

Settings.embed_model = openai_embedding

现在我们可以加载我们的文档，并利用上面创建的 `OpenAIEmbedding()` 创建一个索引。


In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data/hhgttg").load_data()
index = VectorStoreIndex.from_documents(documents)

现在我们可以创建一个简单的查询引擎，并将我们的 `streaming` 参数设置为 `True`。


In [None]:
streaming_qe = index.as_query_engine(streaming=True)

让我们向查询引擎发送一个查询，然后流式传输响应。


In [None]:
streaming_response = streaming_qe.query(
    "What is the significance of the number 42?",
)

In [None]:
streaming_response.print_response_stream()

The significance of the number 42 is a central theme in "The Hitchhiker's Guide to the Galaxy" by Douglas Adams. The book is a comedic science fiction satire that follows the adventures of two intergalactic travelers, Arthur Dent and Ford Prefect, as they try to escape the destruction of Earth and uncover the true meaning of the number 42.

Throughout the book, the number 42 is presented as the ultimate answer to the ultimate question of life, the universe, and everything. The question itself is never explicitly stated, but it is implied to be a deeply profound and existential one that has been sought after by philosophers, scientists, and thinkers throughout history.

The idea of the number 42 as the ultimate answer is a playful jab at the idea of seeking ultimate knowledge and understanding, which is often seen as an impossible task. The number 42 is also a reference to the famous "42" answer in the "The Hitchhiker's Guide to the Galaxy" by Douglas Adams, which is a comedic science f

## 连接本地NIMs

除了连接到托管的[NVIDIA NIMs](https://ai.nvidia.com)之外，此连接器还可以用于连接到本地微服务实例。这有助于在必要时将您的应用程序部署到本地。

有关设置本地微服务实例的说明，请参阅https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/


In [None]:
from llama_index.llms.nvidia import NVIDIA

llm = NVIDIA(model="...").mode("nim", base_url="https://localhost.../v1")
llm.available_models