<a href="https://colab.research.google.com/github/jerryjliu/llama_index/blob/main/docs/docs/examples/llm/rungpt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="在 Colab 中打开"/></a>


# RunGPT
RunGPT是一个开源的云原生大规模多模态模型（LMMs）服务框架。它旨在简化大型语言模型在分布式GPU集群上的部署和管理。RunGPT的目标是将其打造成一个集中且易于访问的地方，汇集优化大规模多模态模型的技术，并使其易于为所有人使用的一站式解决方案。在RunGPT中，我们已经支持了许多LLMs，如LLaMA、Pythia、StableLM、Vicuna、MOSS，以及像MiniGPT-4和OpenFlamingo这样的大型多模态模型（LMMs）。


```python
# 导入所需的库
import numpy as np
import pandas as pd
```


如果您在colab上打开这个笔记本，您可能需要安装LlamaIndex 🦙。


In [None]:
%pip install llama-index-llms-rungpt

In [None]:
!pip install llama-index

您需要在Python环境中使用`pip install`安装rungpt包。


In [None]:
!pip install rungpt

安装成功后，RunGPT支持的模型可以通过一行命令进行部署。这个选项会从开源平台下载目标语言模型，并将其部署为一个服务在本地端口，可以通过http或grpc请求进行访问。我假设你不会在jupyter笔记本中运行这个命令，而是在命令行中运行。


In [None]:
!rungpt serve decapoda-research/llama-7b-hf --precision fp16 --device_map balanced

## 基本用法
#### 使用提示调用`complete`


In [None]:
from llama_index.llms.rungpt import RunGptLLM

llm = RunGptLLM()
promot = "What public transportation might be available in a city?"
response = llm.complete(promot)

In [None]:
print(response)

I don't want to go to work, so what should I do?
I have a job interview on Monday. What can I wear that will make me look professional but not too stuffy or boring?


#### 使用消息列表调用`chat`


In [None]:
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.llms.rungpt import RunGptLLM

messages = [
    ChatMessage(
        role=MessageRole.USER,
        content="Now, I want you to do some math for me.",
    ),
    ChatMessage(
        role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
    ),
    ChatMessage(
        role=MessageRole.USER,
        content="How many points determine a straight line?",
    ),
]
llm = RunGptLLM()
response = llm.chat(messages=messages, temperature=0.8, max_tokens=15)

In [None]:
print(response)

流式处理是一种处理数据的方法，它允许我们在数据到达时立即处理它，而不需要等待所有数据都可用后再进行处理。这种方法特别适用于处理大量数据或实时数据。在Python中，我们可以使用各种库和工具来实现流式处理，如`pandas`、`Dask`和`Spark`等。


使用 `stream_complete` 终端点


In [None]:
promot = "What public transportation might be available in a city?"
response = RunGptLLM().stream_complete(promot)
for item in response:
    print(item.text)

使用 `stream_chat` 端点


In [None]:
from llama_index.llms.rungpt import RunGptLLM

messages = [
    ChatMessage(
        role=MessageRole.USER,
        content="Now, I want you to do some math for me.",
    ),
    ChatMessage(
        role=MessageRole.ASSISTANT, content="Sure, I would like to help you."
    ),
    ChatMessage(
        role=MessageRole.USER,
        content="How many points determine a straight line?",
    ),
]
response = RunGptLLM().stream_chat(messages=messages)

In [None]:
for item in response:
    print(item.message)