# FastChat

FastChat is an open-source platform designed to facilitate the training, serving, and evaluation of large language models (LLMs) such as chatbots. It provides functionalities like state-of-the-art training and evaluation methodologies, a distributed multi-model serving system, OpenAI-compatible RESTful APIs, and a web UI to interact with the models.

### Core Features
- Training and Evaluation: Offers the ability to train models with state-of-the-art methodologies and to evaluate them using sets of challenging multi-turn open-ended questions. It automates the evaluation process by leveraging strong LLMs to act as judges and assess the quality of the models' responses.
- Multi-Model Serving System: Provides a distributed system to serve multiple models simultaneously, allowing users to interact with different LLMs effectively.
- Web UI and APIs: Incorporates a web UI for user interaction and OpenAI-compatible RESTful APIs, making it a local drop-in replacement for OpenAI APIs.

### Integration with AutoGen
AutoGen is a tool that can leverage FastChat to initiate endpoints and perform inference on various models locally. It can be used to interact with multiple LLMs on a local machine and provides a seamless interface to perform tasks like text completion and chat completion using different models, making it suitable for LLM applications.

### 1. Download Model Checkpoints and FastChat

In [None]:
# Download Checkpoints: Model checkpoints, such as those of ChatGLM-6B, need to be downloaded and properly set up.
git clone https://huggingface.co/THUDM/chatglm2-6b


# Clone FastChat: FastChat needs to be cloned and properly configured to function as a local drop-in replacement for OpenAI APIs.
git clone https://github.com/lm-sys/FastChat.git
cd FastChat
pip3 install --upgrade pip  # enable PEP 660 support
pip3 install -e ".[model_worker,webui]"

    # OR

pip install fschat[model_worker,webui]


# If you're on Mac
brew install rust cmake

### 2. Initiate Server
Ensure that the servers are properly configured and that any encountered errors are resolved.


In [None]:
python -m fastchat.serve.controller
python -m fastchat.serve.model_worker --model-path chatglm2-6b
python -m fastchat.serve.openai_api_server --host localhost --port 8000

### 3. Interact with Model using AutoGen: 
Once the servers are up and running, models can be directly accessed through the `openai-python` library as well as `autogen.oai.Completion` and `autogen.oai.ChatCompletion`.

In [None]:
from autogen import oai

# create a text completion request
response = oai.Completion.create(
    config_list=[{
        "model": "chatglm2-6b",
        "api_base": "http://localhost:8000/v1",
        "api_type": "open_ai",
        "api_key": "NULL", # just a placeholder
    }],
    prompt="Hi",
)
print(response)


# create a chat completion request
response = oai.ChatCompletion.create(
    config_list=[{"model": "chatglm2-6b", "api_base": "http://localhost:8000/v1", "api_type": "open_ai", "api_key": "NULL"}],
    messages=[{"role": "user", "content": "Hi"}]
)
print(response)

### 6. Interacting with Multiple Local LLMs
AutoGen can be configured to interact with multiple LLMs simultaneously, and inference code can be written to specify which models to interact with.
To interact with multiple models, launch the multi-model worker specifying each model path and then interact using Autogen.

In [None]:
python -m fastchat.serve.multi_model_worker --model-path lmsys/vicuna-7b-v1.3 --model-names vicuna-7b-v1.3 --model-path chatglm2-6b --model-names chatglm2-6b

In [None]:
response = oai.ChatCompletion.create(
    config_list=[
        {
            "model": "chatglm2-6b",
            "api_base": "http://localhost:8000/v1",
            "api_type": "open_ai",
            "api_key": "NULL",
        },
        {
            "model": "vicuna-7b-v1.3",
            "api_base": "http://localhost:8000/v1",
            "api_type": "open_ai",
            "api_key": "NULL",
        }
    ],
    messages=[{"role": "user", "content": "Hi"}]
)
print(response)

## Summary

FastChat serves as a comprehensive platform for deploying and interacting with various language models, and Autogen leverages FastChat's capabilities to perform local LLM applications. By integrating Autogen with FastChat, users can seamlessly interact with models like ChatGLMv2-6b, conduct inferences, and manage multiple local LLMs effectively, all the while utilizing the OpenAI-compatible API provided by FastChat.