# Offline Engine API

SGLang provides a direct inference engine without the need for an HTTP server, especially for use cases where additional HTTP server adds unnecessary complexity or overhead. Here are two general use cases:

- Offline Batch Inference
- Custom Server on Top of the Engine

This document focuses on the offline batch inference, demonstrating four different inference modes:

- Non-streaming synchronous generation
- Streaming synchronous generation
- Non-streaming asynchronous generation
- Streaming asynchronous generation

Additionally, you can easily build a custom server on top of the SGLang offline engine. A detailed example working in a python script can be found in [custom_server](https://github.com/sgl-project/sglang/blob/main/examples/runtime/engine/custom_server.py).

## Offline Batch Inference

SGLang offline engine supports batch inference with efficient scheduling.

In [1]:
# launch the offline engine

import sglang as sgl
from sglang.utils import print_highlight
import asyncio

llm = sgl.Engine(model_path="meta-llama/Meta-Llama-3.1-8B-Instruct")

INFO 11-22 11:30:50 weight_utils.py:243] Using model weights format ['*.safetensors']


Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:02,  1.26it/s]


Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.15it/s]


Loading safetensors checkpoint shards:  75% Completed | 3/4 [00:02<00:00,  1.12it/s]


Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.51it/s]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:02<00:00,  1.36it/s]



### Non-streaming Synchronous Generation

In [2]:
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

outputs = llm.generate(prompts, sampling_params)
for prompt, output in zip(prompts, outputs):
    print_highlight("===============================")
    print_highlight(f"Prompt: {prompt}\nGenerated text: {output['text']}")

### Streaming Synchronous Generation

In [3]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing synchronous streaming generation ===")

for prompt in prompts:
    print_highlight(f"\nPrompt: {prompt}")
    print("Generated text: ", end="", flush=True)

    for chunk in llm.generate(prompt, sampling_params, stream=True):
        print(chunk["text"], end="", flush=True)
    print()

Generated text: 

 Lucy

,

 and

 I

 am

 a

 Senior

 at

 the

 University

 of

 Virginia

.

 I

'm

 a

 double

 major

 in

 Economics

 and

 Psychology

 with

 a

 minor

 in

 Business

.


Outside

 of

 the

 classroom

,

 I

'm

 an

 avid

 traveler

 and

 enjoy

 exploring

 new

 places

.

 I

'm

 also

 a

 huge

 dog

 lover

 and

 volunteer

 at

 the

 local

 animal

 shelter

.


In

 terms

 of

 career

 aspirations

,

 I

'm

 interested

 in

 pursuing

 a

 career

 in

 finance

 or

 consulting

.

 I

'm

 excited

 to

 learn

 more

 about

 these

 fields

 and

 connect

 with

 like

-minded

 professionals

.


I

'm

 looking

 forward

 to

 connecting

 with

 you

 and

 learning

 more

 about

 your

 experiences

!

 What

 are

 your

 interests

 and

 career

 goals

?


Nice

 to

 meet

 you

!

 I

'm

 currently

 working

 as

 a

 consultant




Generated text: 

 a

 must

-

visit

 destination

 for

 anyone

 who

 loves

 history

,

 art

,

 fashion

,

 and

 food

.

 The

 city

 is

 steep

ed

 in

 culture

 and

 has

 a

 unique

 charm

 that

 is

 hard

 to

 resist

.

 Here

 are

 some

 of

 the

 top

 things

 to

 do

 and

 see

 in

 Paris

:


The

 E

iff

el

 Tower

:

 This

 iconic

 landmark

 is

 a

 must

-

visit

 for

 any

 traveler

 to

 Paris

.

 You

 can

 take

 the

 elevator

 to

 the

 top

 for

 stunning

 views

 of

 the

 city

,

 or

 dine

 at

 the

 Mich

elin

-star

red

 restaurant

 on

 the

 first

 floor

.


The

 Lou

vre

 Museum

:

 Home

 to

 some

 of

 the

 world

's

 most

 famous

 paintings

,

 including

 the

 Mona

 Lisa

,

 the

 Lou

vre

 is

 a

 must

-

visit




Generated text: 

 not

 only

 about

 machines

 learning

 to

 think

 like

 humans

,

 but

 also

 about

 humans

 learning

 to

 think

 like

 machines




Art

ificial

 Intelligence

 (

AI

)

 has

 been

 around

 for

 decades

,

 but

 its

 rapid

 development

 in

 recent

 years

 has

 sparked

 a

 surge

 of

 interest

 in

 the

 field

.

 As

 AI

 continues

 to

 advance

,

 we

 are

 faced

 with

 a

 multitude

 of

 questions

:

 How

 will

 AI

 change

 the

 world

?

 Will

 it

 replace

 human

 workers

?

 Can

 AI

 truly

 think

 like

 a

 human

?


The

 answer

 to

 the

 last

 question

 is

 a

 res

ounding

 "

no

,"

 at

 least

 not

 yet

.

 While

 AI

 systems

 have

 made

 tremendous

 progress

 in

 sim

ulating

 human

-like

 intelligence

,

 they

 still

 lack

 the

 complex

 cognitive

 abilities

 and

 nuanced

 emotional

 experiences




### Non-streaming Asynchronous Generation

In [4]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]

sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous batch generation ===")


async def main():
    outputs = await llm.async_generate(prompts, sampling_params)

    for prompt, output in zip(prompts, outputs):
        print_highlight(f"\nPrompt: {prompt}")
        print_highlight(f"Generated text: {output['text']}")


asyncio.run(main())

### Streaming Asynchronous Generation

In [5]:
prompts = [
    "Hello, my name is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = {"temperature": 0.8, "top_p": 0.95}

print_highlight("\n=== Testing asynchronous streaming generation ===")


async def main():
    for prompt in prompts:
        print_highlight(f"\nPrompt: {prompt}")
        print("Generated text: ", end="", flush=True)

        generator = await llm.async_generate(prompt, sampling_params, stream=True)
        async for chunk in generator:
            print(chunk["text"], end="", flush=True)
        print()


asyncio.run(main())

Generated text: 

 Mia

.

 I

'm

 a

 

16

-year

-old

 student

 from

 Sydney

,

 Australia

.

 I

've

 been

 studying

 English

 as

 a

 second

 language

 for

 about

 

6

 months

 now

.

 I

 really

 enjoy

 learning

 English

 and

 I

 think

 it

's

 a

 beautiful

 language

.

 I

'm

 planning

 to

 travel

 to

 the

 United

 States

 next

 year

 for

 a

 gap

 year

,

 so

 I

 want

 to

 improve

 my

 English

 skills

 as

 much

 as

 possible

 before

 I

 go

.


My

 favorite

 subjects

 in

 school

 are

 English

 literature

,

 history

,

 and

 science

.

 I

 also

 enjoy

 playing

 the

 piano

 and

 reading

 books

.

 I

'm

 a

 bit

 shy

,

 but

 I

'm

 trying

 to

 become

 more

 confident

 in

 speaking

 English

.


I

'm

 looking

 for

 someone

 to

 practice

 my

 English




Generated text: 

 a

 city

 that

 is

 rich

 in

 history

,

 architecture

,

 art

,

 and

 culture

.

 It

 is

 a

 place

 that

 has

 been

 the

 center

 of

 many

 significant

 events

 throughout

 the

 centuries

 and

 has

 been

 the

 seat

 of

 power

 for

 some

 of

 the

 most

 influential

 figures

 in

 European

 history

.

 From

 the

 iconic

 E

iff

el

 Tower

 to

 the

 world

-ren

owned

 Lou

vre

 Museum

,

 Paris

 is

 a

 city

 that

 is

 steep

ed

 in

 history

 and

 is

 a

 must

-

visit

 destination

 for

 anyone

 interested

 in

 the

 past

.


Paris

,

 the

 capital

 of

 France

,

 is

 a

 city

 that

 has

 a

 long

 and

 complex

 history

.

 The

 city

 has

 been

 inhabited

 by

 various

 cultures

 throughout

 the

 centuries

,

 from

 the

 Gaul

s

 to

 the

 Romans

,

 and




Generated text: 

 here

,

 and

 it

's

 going

 to

 change

 everything

.


Art

ificial

 intelligence

 (

AI

)

 has

 been

 a

 staple

 of

 science

 fiction

 for

 decades

,

 but

 it

's

 no

 longer

 just

 a

 fantasy

.

 AI

 is

 now

 a

 reality

,

 and

 it

's

 transforming

 industries

,

 changing

 the

 way

 we

 live

 and

 work

,

 and

 opening

 up

 new

 opportunities

 for

 innovation

 and

 growth

.


From

 virtual

 assistants

 like

 Siri

 and

 Alexa

 to

 self

-driving

 cars

 and

 medical

 imaging

,

 AI

 is

 already

 making

 a

 significant

 impact

 on

 our

 daily

 lives

.

 But

 the

 future

 of

 AI

 is

 even

 more

 exciting

,

 with

 advancements

 in

 areas

 like

:


Machine

 Learning

:

 AI

 systems

 that

 can

 learn

 from

 data

 and

 improve

 their

 performance

 over

 time

.


Natural

 Language

 Processing




In [6]:
llm.shutdown()