Skip to content

BentoML - v1.1.4

Compare
Choose a tag to compare
@aarnphm aarnphm released this 30 Aug 01:17
· 597 commits to main since this release
7a83d99

🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML v1.14 and OpenLLM v0.2.27. See an example service definition for SSE streaming with Llama2.

  • Added response streaming through SSE to the bentoml.io.Text IO Descriptor type.
  • Added async generator support to both API Server and Runner to yield incremental text responses.
  • Added supported to ☁️ BentoCloud to natively support SSE streaming.

🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs.

  • Added /v1/generate_stream endpoint for streaming responses from LLMs.

    curl -N -X 'POST' 'http://0.0.0.0:3000/v1/generate_stream' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
      "prompt": "### Instruction:\n What is the definition of time (200 words essay)?\n\n### Response:",
      "llm_config": {
        "use_llama2_prompt": false,
        "max_new_tokens": 4096,
        "early_stopping": false,
        "num_beams": 1,
        "num_beam_groups": 1,
        "use_cache": true,
        "temperature": 0.89,
        "top_k": 50,
        "top_p": 0.76,
        "typical_p": 1,
        "epsilon_cutoff": 0,
        "eta_cutoff": 0,
        "diversity_penalty": 0,
        "repetition_penalty": 1,
        "encoder_repetition_penalty": 1,
        "length_penalty": 1,
        "no_repeat_ngram_size": 0,
        "renormalize_logits": false,
        "remove_invalid_values": false,
        "num_return_sequences": 1,
        "output_attentions": false,
        "output_hidden_states": false,
        "output_scores": false,
        "encoder_no_repeat_ngram_size": 0,
        "n": 1,
        "best_of": 1,
        "presence_penalty": 0.5,
        "frequency_penalty": 0,
        "use_beam_search": false,
        "ignore_eos": false
      },
      "adapter_name": null
    }'

What's Changed

New Contributors

Full Changelog: v1.1.3...v1.1.4