# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).



## Nest Asyncio
Note that if you want to use **Offline Engine** in ipython or some other nested loop code, you need to add the following code:
```python
import nest_asyncio

nest_asyncio.apply()

```

## Advanced Usage

The engine supports [vlm inference](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/offline_batch_inference_vlm.py) as well as [extracting hidden states](https://github.com/sgl-project/sglang/blob/main/examples/runtime/hidden_states). 

Please see [the examples](https://github.com/sgl-project/sglang/tree/main/examples/runtime/engine) for further use cases.

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine
import asyncio
import io
import os

from PIL import Image
import requests
import sglang as sgl

from sglang.srt.conversation import chat_templates
from sglang.test.test_utils import is_in_ci
from sglang.utils import async_stream_and_merge, stream_and_merge

if is_in_ci():
    import patch
else:
    import nest_asyncio

    nest_asyncio.apply()


llm = sgl.Engine(model_path="qwen/qwen2.5-0.5b-instruct")

Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  6.55it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  6.54it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Tiedra, and I'm from the United States. I've never been to Morocco, so I'm not sure if I should tell you anything about my country. I'm just a typical American who is learning about Morocco, so I might be a bit quirky. I'd be happy to share more about my experiences in Morocco and some of the interesting places I've visited.
Certainly! I'd be happy to hear about your experiences and learn about some of the interesting places you've visited in Morocco. That would be a great way to get to know more about this fascinating country. Let me know if you'd like me to start with
Prompt: The president of the United States is
Generated text:  from the ____.
A. New England
B. Mid-Atlantic
C. Mid-West
D. Mid-Pacific
Answer:

B

In 2013, the total value of goods and services exported by the State of Fujian was _____ billion RMB. A. 305.34 B. 304.51 C. 310.54 D. 311.06
Answer:

B

According to the provisions of the Civil Code, which of the following properti

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {
    "temperature": 0.2,
    "top_p": 0.9,
}

print("\n=== Testing synchronous streaming generation with overlap removal ===\n")

for prompt in prompts:
    print(f"Prompt: {prompt}")
    merged_output = stream_and_merge(llm, prompt, sampling_params)
    print("Generated text:", merged_output)
    print()


=== Testing synchronous streaming generation with overlap removal ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is


Generated text:  [Name] and I am a [Age] year old [Occupation]. I have always been passionate about [Your passion or hobby]. I am always looking for new experiences and learning opportunities to grow and develop as a person. I am always eager to learn and grow, and I am always willing to share my knowledge and experiences with others. I am a [Your personality trait or quality]. I am always ready to help others and make a positive impact on the world. I am a [Your goal or aspiration]. I am a [Your career goal or aspiration]. I am a [Your future aspirations]. I am a [Your future

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is


Generated text:  Paris, also known as the City of Light. It is a historic city with a rich history and a vibrant culture. The city is home to many famous landmarks such as the Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum. Paris is also known for its fashion industry, with many famous fashion designers and boutiques. The city is also known for its cuisine, with many famous restaurants and cafes serving delicious French dishes. Paris is a popular tourist destination and a cultural hub in France. It is a city that is full of history, culture, and entertainment. The city is a must-visit destination for anyone interested

Prompt: Explain possible future trends in artificial intelligence. The future of AI is


Generated text:  likely to be characterized by rapid advancements in areas such as machine learning, natural language processing, and computer vision. Here are some possible future trends in AI:

1. Increased focus on ethical considerations: As AI becomes more integrated into our daily lives, there will be a growing emphasis on ethical considerations. This includes issues such as bias, transparency, and accountability.

2. Greater integration with other technologies: AI is already being integrated into a wide range of technologies, including smart homes, self-driving cars, and virtual assistants. As these technologies continue to evolve, we can expect to see even more integration between AI and other technologies.

3. Increased use



### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text:  Sarah. I'm 25 years old, and I have a strong work ethic and a passion for music. I enjoy being on stage and performing in front of people. I have a deep passion for learning new skills, and I'm always looking for ways to grow and improve. I love to travel and explore new places, and I'm always eager to try new things. I'm patient, open-minded, and dedicated to my craft. Thank you for taking the time to meet me. 
(Note: The self-introduction should be short and neutral, without any emotional language or personal references that may be out of character for the fictional

Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text:  Paris. It is located in the south of the country and is the largest and most populous city. Paris is known for its iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Write a short, neutral self-introduction for a fictional character. Hello, my name is",
    "Provide a concise factual statement about France’s capital city. The capital of France is",
    "Explain possible future trends in artificial intelligence. The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation (no repeats) ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        # Replace direct calls to async_generate with our custom overlap-aware version
        async for cleaned_chunk in async_stream_and_merge(llm, prompt, sampling_params):
            print(cleaned_chunk, end="", flush=True)

        print()  # New line after each prompt


asyncio.run(main())


=== Testing asynchronous streaming generation (no repeats) ===

Prompt: Write a short, neutral self-introduction for a fictional character. Hello, my name is
Generated text: 

 [

Name

]

 and

 I

 am

 a

 [

age

],

 [

gender

],

 [

occupation

].

 I

 am

 currently

 [

location

]

 and

 I

 am

 fluent

 in

 [

language

].

 I

 have

 always

 been

 [

any

 positive

 adjective

]

 and

 always

 dreamed

 of

 [

any

 positive

 adjective

].

 I

 believe

 that

 my

 character

 is

 unique

 and

 special

,

 and

 that

 I

 can

 contribute

 significantly

 to

 your

 world

.

 My

 goal

 is

 to

 help

 those

 who

 need

 my

 help

 and

 to

 make

 the

 world

 a

 better

 place

.

 So

,

 please

,

 tell

 me

 more

 about

 yourself

 and

 your

 story

.

 Good

 luck

!

 

✌

️

👋

✌

️

👋

✌

️

👋

✌

️

👋

✌

️

👋

✌

️

👋

✌

️

👋

✌

️

👋



Prompt: Provide a concise factual statement about France’s capital city. The capital of France is
Generated text: 

 Paris

,

 the

 largest

 and

 most

 populous

 city

 in

 the

 country

.



Prompt: Explain possible future trends in artificial intelligence. The future of AI is
Generated text: 

 likely

 to

 continue

 to

 be

 shaped

 by

 several

 potential

 trends

,

 including

:



1

.

 Increased

 automation

:

 AI

 systems

 will

 become

 more

 efficient

 and

 accurate

,

 leading

 to

 higher

 productivity

 and

 less

 human

 labor

 required

.



2

.

 Enhanced

 creativity

:

 AI

 will

 become

 more

 capable

 of

 producing

 creative

 and

 innovative

 content

,

 including

 art

,

 music

,

 and

 literature

.



3

.

 Personal

ization

:

 AI

 will

 become

 more

 personalized

 and

 adaptive

,

 allowing

 users

 to

 receive

 more

 accurate

 and

 relevant

 information

.



4

.

 Eth

ical

 concerns

:

 AI

 systems

 will

 face

 ethical

 challenges

 related

 to

 privacy

,

 bias

,

 and

 transparency

.



5

.

 AI

 in

 healthcare

:

 AI

 will

 continue

 to

 play

 an

 important

 role

 in

 healthcare

,

 with

 more

 personalized

 and

 accurate




In [6]:
llm.shutdown()