# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-27 11:43:31 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.23it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.17it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.13it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.51it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.36it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print("===============================")
    print(f"Prompt: {prompt}\nGenerated text: {output['text']}")

Prompt: Hello, my name is
Generated text:  Jo and I am excited to be a new member of the Lion's Club of Golden Bay.
I have been a resident of Takaka for nearly 30 years and have always been involved in the community in one way or another.
I am looking forward to being involved with the Lion's Club and contributing to the many wonderful projects and events that you are involved in. I am particularly interested in supporting youth and community projects.
I am a keen outdoors person and enjoy hiking, tramping and mountain biking in the beautiful hills of the Tasman region. I also love spending time with my family and friends, and enjoying the laid-back atmosphere of our
Prompt: The president of the United States is
Generated text:  the head of state and head of government of the United States, indirectly elected to a four-year term by the people through the Electoral College. The officeholder serves as both the commander-in-chief of the Armed Forces and the head of the Executive branch of

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()


=== Testing synchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Ben

,

 and

 I

 am

 a

 software

 engineer

 with

 a

 passion

 for

 building

 web

 applications

 that

 are

 both

 functional

 and

 beautiful

.

 I

 have

 a

 strong

 foundation

 in

 software

 development

 and

 have

 worked

 on

 various

 projects

,

 from

 small

 startups

 to

 large

-scale

 enterprise

 applications

.

 I

 am

 excited

 to

 bring

 my

 skills

 and

 experience

 to

 your

 project

.


My

 expertise

 includes

:


I

 am

 a

 strong

 believer

 in

 the

 importance

 of

 user

 experience

 and

 user

 interface

 design

.

 I

 believe

 that

 a

 well

-designed

 application

 should

 not

 only

 be

 functional

 but

 also

 visually

 appealing

 and

 easy

 to

 use

.

 I

 am

 proficient

 in

 a

 variety

 of

 design

 tools

,

 including

 Sketch

,

 F

igma

,

 Adobe

 XD

,

 and

 In

Vision

.


I

 am

 a

 skilled

 programmer



Prompt: The capital of France is
Generated text: 

 a

 city

 of

 stunning

 beauty

,

 rich

 history

,

 and

 world

-class

 culture

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 charming

 cafes

 and

 bist

ros

,

 there

's

 no

 shortage

 of

 things

 to

 see

 and

 do

 in

 Paris

.


Here

 are

 some

 of

 the

 top

 attractions

 and

 activities

 to

 add

 to

 your

 Paris

 itinerary

:


1

.

 The

 E

iff

el

 Tower

:

 No

 trip

 to

 Paris

 is

 complete

 without

 a

 visit

 to

 the

 iconic

 E

iff

el

 Tower

.

 You

 can

 take

 the

 stairs

 or

 elevator

 to

 the

 top

 for

 breathtaking

 views

 of

 the

 city

.


2

.

 The

 Lou

vre

 Museum

:

 The

 Lou

vre

 is

 one

 of

 the

 world

's

 largest

 and

 most

 famous

 museums

,

 housing

 an

 impressive

 collection

 of

 art



Prompt: The future of AI is
Generated text: 

 built

 on

 the

 shoulders

 of

 giants

 –

 researchers

,

 engineers

,

 and

 practitioners

 who

 have

 spent

 decades

 working

 to

 advance

 the

 field

.

 In

 this

 panel

 discussion

,

 we

'll

 explore

 the

 history

 of

 AI

,

 key

 milestones

,

 and

 the

 pioneers

 who

 made

 significant

 contributions

 to

 its

 development

.

 Join

 us

 as

 we

 reflect

 on

 the

 past

,

 celebrate

 the

 present

,

 and

 look

 to

 the

 future

 of

 AI

.


Andy

 Rub

ins

 Co

-

Founder

,

 Google

 X




Andy

 Rubin

 is

 an

 American

 computer

 programmer

,

 entrepreneur

,

 and

 inventor

.

 He

 is

 the

 founder

 of

 Danger

 Inc

.,

 the

 creator

 of

 the

 T

-Mobile

 Side

kick

,

 and

 the

 co

-founder

 and

 former

 Chief

 Technical

 Officer

 (

CT

O

)

 of

 Google

's

 Android




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print(f"\nPrompt: {prompt}")
        print(f"Generated text: {output['text']}")


asyncio.run(main())


=== Testing asynchronous batch generation ===



Prompt: Hello, my name is
Generated text:  Natanael Almeida. I'm a Brazilian musician, producer and composer. I play the guitar, piano and keyboards. I have been playing music since I was a child and I'm passionate about creating and performing music.
I have been influenced by various styles, from classical to rock, pop and electronic music. My music is a fusion of different styles and I love experimenting with new sounds and techniques.
I have been working as a musician and producer for many years, and I have had the opportunity to work with many different artists and projects. I have also been involved in various music production and composition projects, creating music for films,

Prompt: The capital of France is
Generated text:  a must-visit destination for many travelers. Paris is renowned for its stunning architecture, art museums, fashion, cuisine, and romantic atmosphere. Here are some tips for visiting Paris, a city that will leave you enchanted and wanting to return.
1. Plan

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())


=== Testing asynchronous streaming generation ===

Prompt: Hello, my name is
Generated text: 

 Sy

ed

 R

iz

wan

 Khan

.

 I

 am

 a

 Muslim

.

 My

 wife

,

 Kare

ena

,

 is

 Hindu

.

 Our

 

7

-year

-old

 son

,

 Har

oon

,

 is

 a

 Muslim

.

 I

 was

 arrested

 at

 New

 York

 City

's

 John

 F

.

 Kennedy

 Airport

 on

 November

 

20

,

 

200

1

,

 because

 the

 U

.S

.

 government

 is

 afraid

 of

 me

.

 I

 am

 afraid

.

 I

 am

 afraid

 of

 being

 separated

 from

 my

 wife

 and

 my

 son

.

 I

 am

 afraid

 of

 being

 tortured

.

 I

 am

 afraid

 of

 being

 sent

 back

 to

 India

 where

 I

 may

 be

 killed

.


I

 am

 a

 peaceful

 man

.

 I

 love

 my

 family

.

 I

 love

 my

 country

.

 I

 am

 a

 doctor

.



Prompt: The capital of France is
Generated text: 

 a

 city

 that

 is

 steep

ed

 in

 history

,

 art

,

 fashion

,

 and

 romance

.

 Paris

 is

 a

 city

 that

 is

 full

 of

 iconic

 landmarks

,

 museums

,

 and

 cultural

 attractions

.

 Visitors

 can

 explore

 the

 famous

 E

iff

el

 Tower

,

 the

 Lou

vre

 Museum

,

 Notre

 Dame

 Cathedral

,

 and

 the

 Arc

 de

 Tri

omp

he

.

 The

 city

 is

 also

 famous

 for

 its

 fashion

,

 with

 world

-ren

owned

 designers

 like

 Chanel

,

 D

ior

,

 and

 Louis

 V

uit

ton

 having

 stores

 and

 bout

iques

 in

 the

 city

.

 Paris

 is

 also

 known

 for

 its

 beautiful

 parks

 and

 gardens

,

 such

 as

 the

 Luxembourg

 Gardens

 and

 the

 Tu

il

eries

 Garden

.

 The

 city

 has

 a

 lively

 nightlife

,

 with

 many

 bars

,



Prompt: The future of AI is
Generated text: 

 human




Andrew

 Ng

,

 a

 leading

 AI

 expert

 and

 educator

,

 talks

 about

 the

 importance

 of

 human

-centered

 AI

 that

 serves

 people

 and

 society

.


Andrew

 Ng

,

 a

 leading

 AI

 expert

 and

 educator

,

 talks

 about

 the

 importance

 of

 human

-centered

 AI

 that

 serves

 people

 and

 society

.


The

 future

 of

 AI

 is

 human




Andrew

 Ng

,

 a

 leading

 AI

 expert

 and

 educator

,

 talks

 about

 the

 importance

 of

 human

-centered

 AI

 that

 serves

 people

 and

 society

.


Andrew

 Ng

 is

 a

 well

-known

 expert

 in

 artificial

 intelligence

 (

AI

)

 and

 education

.

 He

 is

 the

 co

-founder

 of

 Cour

ser

a

,

 an

 online

 learning

 platform

 that

 partners

 with

 top

 universities

 to

 offer

 courses

 and

 degree

 programs

.

 Ng

 is

 also

 the

 founder




In [6]:
llm.shutdown()