# Inference with SWIFT

SWIFT is an open-source framework from ModelScope that supports large model training, inference, evaluation, and deployment. With SWIFT, you can easily achieve a complete pipeline from model training to application.

This tutorial will detail how to use SWIFT for inference, including installation steps and an inference example. We will use Yi-1.5-6B-Chat for demonstration.


## 🚀 Run with Colab


## Installation

First, we need to install the necessary dependencies.

(Optional) You can set the global pip mirror to speed up downloads:


In [None]:
!pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/

Install ms-swift:


In [None]:
!pip install 'ms-swift[llm]' -U

## Start Inference

Before starting inference, note that your computer's memory and GPU memory should be sufficient. If not, you might encounter errors.

| Model           | GPU Memory Usage | Disk Usage |
| -------------- | ---------------- | ---------- |
| Yi-1.5-6B-Chat | 11.5G            | 14.7G      |


First, set the environment variable:


In [None]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

Next, load the model and tokenizer:


In [None]:
from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType, get_default_template_type,
)
from swift.utils import seed_everything

# Select model type, here we use Yi-1.5-6B-Chat
model_type = ModelType.yi_1_5_6b_chat
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')  # Template type

# Load model and tokenizer
kwargs = {}
model, tokenizer = get_model_tokenizer(model_type, model_kwargs={'device_map': 'auto'}, **kwargs)

# Set generation config
model.generation_config.max_new_tokens = 128

# Get template
template = get_template(template_type, tokenizer)

# Set random seed
seed_everything(42)

Now, let's perform inference:


In [None]:
# Prepare input query
query = 'Hello!'

# Perform inference using the template
response, history = inference(model, template, query)

# Print query and response
print(f'query: {query}')
print(f'response: {response}')

The above code will output something like this:

```
query: Hello!
response: Hi! How can I help you today?
```


With this, you have learned how to perform inference using SWIFT with the Yi series models. If you encounter any issues, you can refer to the [SWIFT official documentation](https://www.modelscope.cn/models/01-ai/Yi-1.5-6B-Chat) for more help.