Feature/ait 51 token streaming granular history #3014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

mschristensen merged 5 commits into AIT-129-AIT-Docs-release-branch from feature/AIT-51-token-streaming-granular-history

Dec 16, 2025

+438 −0

Contributor

mschristensen commented Dec 10, 2025

Description

Adds a "Token Streaming" section to the AIT docs with a page for token streaming with a message per token.

Covers:

Using a realtime client on the agent side to guarantee order
Publishing tokens without awaiting the acknowledgement for high throughput
Common patterns for token publishing and subscribing:
- Continuous token stream
- Token streams for distinct responses
- Token streams with explicit start/stop events
Common patterns for client hydration:
- Using rewind
- Using persisted history with untilAttach
- Loading complete responses from the database and hydrating tokens for live responses

Note that the 100 message rewind limit will change soon, and these docs will be updated to reflect that.

Checklist

Commits have been rebased.
Linting has been run against the changed file(s).
The PR adheres to the writing style guide and contribution guide.

mschristensen requested a review from GregHolmes

December 10, 2025 09:58

coderabbitai bot commented Dec 10, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/AIT-51-token-streaming-granular-history

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

GregHolmes added the review-app label

ably-ci temporarily deployed to ably-docs-feature-ait-5-3wdatg

December 10, 2025 10:12

Inactive

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              // ✅ Do this - publish without await for maximum throughput
+              for await (const event of stream) {
+                if (event.type === 'token') {
+                  channel.publish('token', event.text);

Contributor

lawrence-forooghian Dec 10, 2025

Do we have any guidance on how users are meant to handle the result of the publish in this scenario? In some failure modes (e.g. a bunch of messages end up queued client-side and then get failed due to the connection becoming SUSPENDED, but the user just ploughs on publishing subsequent messages) they might end up with gaps in the published token stream.

Contributor

lawrence-forooghian Dec 10, 2025

(Or, perhaps an even more realistic scenario: some publishes are rejected due to rate limits but we plough ahead with subsequent publishes, some of which might succeed once the rate limiting subsides)

Contributor Author

mschristensen Dec 10, 2025

We are considering a page about discontinuity handling generally, and I think we can consider how to tackle this problem as part of that, but needs some more thinking. I'll make a note. If you have any ideas on how to handle that I'm all ears :)

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              ```javascript
+              const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
+              const responses = new Map();

Contributor

lawrence-forooghian Dec 10, 2025 •

edited

Loading

A "Track responses by ID" comment, as above, would be useful here I think.

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
+              // Track responses by ID
+              const responses = new Map();

Contributor

lawrence-forooghian Dec 10, 2025

I'm not sure that it makes sense to suggest storing the partial responses in the case where we don't have explicit start and stop events given that the storage will potentially grow unboundedly. I'd suggest perhaps only showing the Map solution in the explicit start / stop events case and perhaps here just log the response ID alongside the message. Or have I missed something?

Contributor Author

mschristensen Dec 10, 2025

I included it because I wanted to illustrate that responses could be multiplexed on the channel (see "even when delivered concurrently" above, although we will likely have a specific page for this concept in more detail). I think in this case it's okay - the example is intended to be illustrative (and I wanted it to show how the client would append tokens for the same response together). In a real app, you would likely have more complex solutions if the data could genuinely grow large enough to cause memory issues (e.g. local storage and loading only the data into memory that is currently visible at your scroll position, and so on).

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              // Handle response stop
+              await channel.subscribe('stop', (message) => {
+                const responseId = message.extras?.headers?.responseId;
+                const finalText = responses.get(responseId);

Contributor

lawrence-forooghian Dec 10, 2025

Perhaps (assuming that the idea of the responses map is just to accumulate response content during generation) remove from responses?

Contributor Author

mschristensen Dec 10, 2025

Could do, although to the comment above, the example is intended to be illustrative, and if you want to render the messages, they need to be somewhere (and I think it's out of scope for this page to discuss strategies for managing and displaying unbounded data in web apps generally)

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx Show resolved Hide resolved

GregHolmes reviewed

View reviewed changes

Contributor

GregHolmes left a comment

I've only made a couple minor suggestions on starting the sentences earlier on. Other than that, I think you've got this spot on.

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx Outdated Show resolved Hide resolved

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx Outdated Show resolved Hide resolved

GregHolmes force-pushed the AIT-129-AIT-Docs-release-branch branch from 8c010c8 to 5014fd3 Compare

December 12, 2025 09:41

GregHolmes force-pushed the feature/AIT-51-token-streaming-granular-history branch from dc18bb1 to d9f4b08 Compare

December 12, 2025 09:58

GregHolmes temporarily deployed to ably-docs-feature-ait-5-3wdatg

December 12, 2025 09:58

Inactive

GregHolmes force-pushed the AIT-129-AIT-Docs-release-branch branch from 5014fd3 to a863e09 Compare

December 12, 2025 10:59

lawrence-forooghian mentioned this pull request

AI Transport: Add a guide for token streaming using the OpenAI SDK #3024

Open

3 tasks

GregHolmes force-pushed the AIT-129-AIT-Docs-release-branch branch from a863e09 to 2f792d4 Compare

December 15, 2025 09:37

mschristensen added 2 commits

December 15, 2025 13:36


          ait/token-streaming: add message per token page

41b0354


          ait/message-per-token: add intro

cdcbecb

Add intro describing the pattern, its properties, and use cases.

mschristensen force-pushed the feature/AIT-51-token-streaming-granular-history branch from d9f4b08 to 0e663af Compare

December 15, 2025 13:37

ably-ci temporarily deployed to ably-docs-feature-ait-5-3wdatg

December 15, 2025 13:37

Inactive

mschristensen added 3 commits

December 15, 2025 13:38


          ait/message-per-token: add token publishing

8ae4a5b

Includes continuous token streams, correlating tokens for distinct
responses, and explicit start/end events.


          ait/message-per-token: token streaming patterns

c20195c

Splits each token streaming approach into distinct patterns and shows
both the publish and subscribe side behaviour alongside one another.


          ait/message-per-token: client hydration patterns

52e32d8

Includes hydration with rewind and hydration with persisted history +
untilAttach. Describes the pattern for handling in-progress live
responses with complete responses loaded from the database.

mschristensen force-pushed the feature/AIT-51-token-streaming-granular-history branch from 0e663af to 52e32d8 Compare

December 15, 2025 13:38

mschristensen requested a review from GregHolmes

December 15, 2025 13:38

ably-ci temporarily deployed to ably-docs-feature-ait-5-3wdatg

December 15, 2025 13:38

Inactive

mschristensen mentioned this pull request

ai-transport: add message per response doc #3018

Merged

GregHolmes approved these changes

View reviewed changes

mschristensen merged commit e5ea672 into AIT-129-AIT-Docs-release-branch

7 checks passed

mschristensen deleted the feature/AIT-51-token-streaming-granular-history branch

December 16, 2025 12:23

Member

paddybyers commented Dec 16, 2025

Is it in this doc or another doc that we would discuss streaming with ephemeral messages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels