Implementation of [HuggingGPT](https://github.com/microsoft/JARVIS). HuggingGPT is a system to connect LLMs (ChatGPT) with ML community (Hugging Face).

+ 🔥 Paper: https://arxiv.org/abs/2303.17580
+ 🚀 Project: https://github.com/microsoft/JARVIS
+ 🤗 Space: https://huggingface.co/spaces/microsoft/HuggingGPT

## Set up tools

We set up the tools available from [Transformers Agent](https://huggingface.co/docs/transformers/transformers_agents#tools). It includes a library of tools supported by Transformers and some customized tools such as image generator, video generator, text downloader and other tools.

In [1]:
from transformers import load_tool

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
hf_tools = [load_tool(tool_name) for tool_name in [
    "document-question-answering", 
    "image-captioning", 
    "image-question-answering", 
    "image-segmentation", 
    "speech-to-text", 
    "summarization", 
    "text-classification", 
    "text-question-answering", 
    "translation", 
    "huggingface-tools/text-to-image", 
    "huggingface-tools/text-to-video", 
    "text-to-speech", 
    "huggingface-tools/text-download", 
    "huggingface-tools/image-transformation"
    ]
]

ImageTransformationTool implements a different description in its configuration and class. Using the tool configuration description.


## Setup model and HuggingGPT

We create an instance of HuggingGPT and use ChatGPT as the controller to rule the above tools.

In [3]:
from langchain.llms import OpenAI
from langchain.experimental import HuggingGPT

In [7]:
llm = OpenAI(model_name="gpt-3.5-turbo")
agent = HuggingGPT(llm, hf_tools)

%env OPENAI_API_BASE=http://localhost:8000/v1

env: OPENAI_API_BASE=http://localhost:8000/v1


## Run an example

Given a text, show a related image and video.

In [10]:
agent.run("please show me a video and an image of (based on the text) 'a boy is running'")



[1m> Entering new TaskPlaningChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSystem: #1 Task Planning Stage: The AI assistant can parse user input to several tasks: [{"task": task, "id": task_id, "dep": dependency_task_id, "args": {"input name": text may contain <resource-dep_id>}}]. The special tag "dep_id" refer to the one generated text/image/audio in the dependency task (Please consider whether the dependency task generates resources of this type.) and "dep_id" must be in "dep" list. The "dep" field denotes the ids of the previous prerequisite tasks which generate a new resource that the current task relies on. The task MUST be selected from the following tools (along with tool description, input name and output type): ['document_qa: This is a tool that answers a question about an document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text t

100%|██████████| 50/50 [00:30<00:00,  1.62it/s]


running image_generator({'prompt': 'a boy is running'})


100%|██████████| 25/25 [00:01<00:00, 14.92it/s]




[1m> Entering new ResponseGenerationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe AI assistant has parsed the user input into several tasks and executed them. The results are as follows:
video_generator({'prompt': 'a boy is running'})
status: completed
result: d39969.mp4
image_generator({'prompt': 'a boy is running'})
status: completed
result: 6bb46f.png

Please summarize the results and generate a response.[0m

[1m> Finished chain.[0m


'The AI assistant successfully executed multiple tasks based on the user input of "a boy is running." A video generator created a video with the filename d39969.mp4 and an image generator created a PNG image with the filename 6bb46f.png. Both tasks were completed successfully.'