-
Notifications
You must be signed in to change notification settings - Fork 45
chat: bring own database guide #2949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c4e4962 to
fc6ba27
Compare
AndyTWF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update page data to make this visible on the nav.
|
|
||
| This article covers the following options: | ||
|
|
||
| 1. Using [outbound webhooks](/docs/platform/integrations/webhooks). This can be a [HTTP endpoint](/docs/platform/integrations/webhooks/generic), [AWS Lambda](/docs/platform/integrations/webhooks/lambda), [Azure Function](/docs/platform/integrations/webhooks/azure), [Google Function](/docs/platform/integrations/webhooks/gcp-function) and others. Messages will arrive to your own system as they are published to a room. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than list 3 types of serverless function, perhaps call out one of them and maybe mention something like Kafka instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kafka is part of the "outbound streaming". I've added examples to that list too. If you think it reads/looks better without the services list I can remove both lists from this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think just one example per section, e.g. lambda and kafka is probably enough
| Use `channel.message` as the event type. | ||
|
|
||
| You need to consider: | ||
| - Redundancy. In case of failure, Ably will retry delivering the message to your webhook, but only for a short period of time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to check this, I'm not sure we do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, I checked before writing this.
But it's just 2-3 retries for about 1 min max. Docs say 5 minutes for batched, you mentioned we kind of want to stay away from the batched format in webhooks so that doesn't matter. I had a paragraph suggesting it initially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Ably is not able to eventually send the message to the webhook, how can the developer know that? Is there a way to retrieve webhook failed messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's logged on the log metachannel [meta]log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the [meta]log channel is the place to find these errors. Mentioned it in another section, I could add a link here too.
| 3. If you want to store reaction summaries, always update the reactions field when receiving a reaction summary update (action `4` or `message.summary`). | ||
|
|
||
| <Code> | ||
| ```typescript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ```typescript | |
| ```javascript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, and removed the : Message type from example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could keep the : Message, its just for the language selector :)
|
|
||
| ## Decoding and storing messages | ||
|
|
||
| Regardless of the delivery mechanism, you will need to decode the received messages into Chat messages. Details of the mapping from Ably Pub/Sub messages to Chat messages are available in the [Integrations](/docs/chat/integrations) documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is confusing for me. If I'm an Ably Chat user, using the Ably Chat SDK... why would I have to decode anything? I should be already managing Chat messages, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we are right now. Integrations (webhooks, outbound streaming, , etc) are a pub/sub features, so you need to do the encoding/decoding yourself.
In the future it would be nice if those were chat-specific to make it simpler but they're not right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future agreed direction is that the actual payloads will be the same across product, but that the SDKs (e.g. Chat) will have methods that convert these to their local representations
|
|
||
| 1. Save it to your own database. You can index by `serial`, this is the global unique identifier for a message, and is also used to sort messages in the canonical global order. | ||
| 2. If the message already exists by `serial`, it means you have received an update, delete, or reaction summary update. To check if you need to update the message, you can use the `version.serial` to compare the latest version of the message you have received with the version of the message you have in your database. Lexicographically higher means newer version. | ||
| 3. If you want to store reaction summaries, always update the reactions field when receiving a reaction summary update (action `4` or `message.summary`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a "reaction summary update"? Is it an special message type, with different metadata? Is there a link to related documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a message with action message.summary. Over the wire it's a full regular message, but at the time of feature implementation in Chat this was a partial message that only had the summary (and identifying basics like timestamp and serial).
The chat SDK still exposes these via room.messages.reactions.subscribe() as a reactions summary event.
In Pub/Sub you get them via channel.subscribe().
Over integrations they look like messages with action message.summary.
Not sure what doc to point you to, probably https://ably.com/docs/chat/rooms/message-reactions, and maybe also annotations: https://ably.com/docs/messages/annotations#subscribe-to-annotation-summaries-.
| Use `channel.message` as the event type. | ||
|
|
||
| You need to consider: | ||
| - Redundancy. In case of failure, Ably will retry delivering the message to your webhook, but only for a short period of time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Ably is not able to eventually send the message to the webhook, how can the developer know that? Is there a way to retrieve webhook failed messages?
| Benefits of using an Ably queue: | ||
|
|
||
| - You can consume it from your servers, meaning overall this is fault-tolerant. Ably takes care of the complexity of maintaining a queue. | ||
| - You can use multiple queues and configure which channels go to which queue (use `.*::\$chat$` regex to match all chat rooms). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't talk about directives, replace this with "use your own regular expression to match all your chat rooms" or something similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could just remove the parenthesis if we really want to avoid this. Or go with what @AndyTWF suggested: suggest prefixing the room names if they want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we're trying to nudge users is to group chat rooms etc under a common namespace, e.g. chat:*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed all mentions of ::$chat suggesting a prefix if they need to filter.
| Benefits: | ||
| - Full control over publishing. | ||
| - Opportunity to add extra validation before publishing to Ably. | ||
| - You can publish messages directly via the Chat REST API, and avoid having to encode/decode Chat Messages to and from Ably Pub/Sub messages. You can bypass using an SDK entirely or you can use the Chat SDK for publishing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say just use the chat SDK, all the enocde/decode part for me (as an external developer) sounds confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding to have to do it is a benefit when you publish via your own servers or fetch via history.
But we don't have this benefit in the other methods of integrating right now.
I'd leave this paragraph in, perhaps mention this in the next section as well. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem I see is that for me, as a developer that wants to use Ably Chat, is difficult to understand why should I care about PubSub... I want to use the chat, how is that implemented in Ably should be transparent for me. I see that I have to encode or decode from PubSub and that sounds complex and unnecessary. I just want to manage my messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a progressive disclosure of complexity that we aim for - there will be some use-cases where we have to introduce Pub/Sub to the mix.
In this case, once we do the SDK changes to take integration payloads as they are and turn them directly into chat messages... people won't need to worry about Pub/Sub.
|
|
||
| You need to consider: | ||
| - You need to handle updates and deletes on your own. | ||
| - Storing message reactions can be difficult since you will not have access to the aggregate (summaries) Ably provides. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we offer an API for this in the SDK or REST APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can:
- Fetch history (includes summaries)
- Fetch single message by serial (includes summaries)
- Fetch "own summary" but not really useful here
But in this context they're all a big annoyance since you need to actively fetch instead and you need to decide when to do it.
| - For each message, only the latest version of the message is returned. | ||
| - You will need to decide when and which rooms to import messages from. | ||
| - You can import the same room multiple times (deduplicate by `serial` and `version.serial`), but you will need to always fetch from the first message to make sure you don't miss any updates or deletes of older messages. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, every message retrieved via the history API is a billable history, so implementing this is not free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add a note about this? I think perhaps this is better suited for the history page or pricing docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every message sent via an integration is too - so history isn't alone in this regard
215c729 to
7fb2c92
Compare
|
I've pushed the last round of changes. Added link in navigation. Changed title and file to "Export chat messages", sounds clearer to me than "bring your database". |
|
(I'll squash the commits before merge) |
| meta_keywords: "chat, data, export, stream, storage, Ably, chat SDK, realtime messaging, dependability, cost optimisation" | ||
| --- | ||
|
|
||
| Ably Chat is designed to be a simple and easy to use realtime chat solution that handles any scale from 1:1 and small group chats to large livestream chats with millions of users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Ably Chat is designed to be a simple and easy to use realtime chat solution that handles any scale from 1:1 and small group chats to large livestream chats with millions of users. | |
| Ably Chat is a simple and easy to use realtime chat solution that handles any scale from 1:1 and small group chats to large livestream chats with millions of users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied, thanks!
|
|
||
| Ably Chat is designed to be a simple and easy to use realtime chat solution that handles any scale from 1:1 and small group chats to large livestream chats with millions of users. | ||
|
|
||
| Ably holds data for the purpose of providing realtime experiences. While Ably Chat provides flexible data retention for messages (30 days by default, up to a year on request), applications often need longer-term storage or additional control over their data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Ably holds data for the purpose of providing realtime experiences. While Ably Chat provides flexible data retention for messages (30 days by default, up to a year on request), applications often need longer-term storage or additional control over their data. | |
| Ably holds data for the purpose of providing realtime experiences. While Ably Chat provides flexible data retention for messages (30 days by default, up to a year on request), some applications may need longer-term storage or additional control over their data. |
More a nit pick, so feel free to ignore :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
applied, thanks!
|
|
||
| ## Different ways to export data from Ably Chat | ||
|
|
||
| We will explain each in detail, and provide code examples for each. This is an overview of the different ways to export data from Ably Chat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line isn't needed, Perhaps just the line, This article covers the following options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
|
|
||
| 1. Save it to your own database. You can index by `serial`, this is the global unique identifier for a message, and is also used to sort messages in the canonical global order. | ||
| 2. If the message already exists by `serial`, it means you have received an update, delete, or reaction summary update. To check if you need to update the message, you can use the `version.serial` to compare the latest version of the message you have received with the version of the message you have in your database. Lexicographically higher means newer version. | ||
| 3. If you want to store reaction summaries, always update the reactions field when receiving a reaction summary update (action `4` or `message.summary`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should add some links here to direct users to sections on message serials/versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what to link to, and where to put it exactly. Maybe https://ably.com/docs/chat/rooms/messages#ordering-update-delete ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a link in a "read more about ..." sentence. This section was changed a lot since this comment though.
| Ably holds data for the purpose of providing realtime experiences. While Ably Chat provides flexible data retention for messages (30 days by default, up to a year on request), some applications may need longer-term storage or additional control over their data. | ||
|
|
||
| This guide presents different ways to use Ably Chat and store chat data in your own systems, which can help you meet your data retention requirements, as well as help you build more complex use cases such as search, analytics, and more. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a section (or sections) that needs to go here that presents the big picture and key decision points associated with this guide.
- What might be my motivation for doing this? (e.g. is my database the ultimate source of truth, do I just want to have an audit of what goes through Ably)
- What are the tradeoffs of each of these positions - reliability, audit, scale etc
- If I choose to have my own database, do I want it to play a role in chat hydration?
- If my DB is going to be the source of truth, how do I handle "recent history" that's not landed in Ably yet.
- What schema do I need in my database?
| ```javascript | ||
| const saveOrUpdateMessage = (message) => { | ||
| // Check if the message already exists in your own database by `serial` | ||
| const existingMessage = await getMessageBySerial(message.serial); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reflecting on this example - for many database engines this would be racy and may throw errors (e.g. if a message and an update happen in very quick succession). Whether or not to update would usually happen atomically at the database layer e.g. SQL's INSERT ... ON DUPLICATE KEY UPDATE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see how people might be tempted to just add implementations for getMessageBySerial, saveMessage and updateMessage instead of actually writing a saveOrUpdateMessage for their own system.
That wasn't the purpose of the example when I wrote it. I meant this example to show the logic: how should you decide insert, update or discard.
I'm not sure how to fix it:
- explain in the text that this is an example to illustrate how to compare versions/etc, not meant to be a code template
- change it to use SQL and some assumed schema (I can mention said schema early in the article)? That might not be great for those who don't use SQL but I guess it'll be readable enough to be understood by anyone.
- something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking for a while, I think the best thing to do is to recommend:
- storing all versions of a message
- if storing reactions summaries is important, use a separate table (or whatever) indexed by message.serial only. version has nothing to do with summaries anyway.
- if they want reactions history the only good way right now is to store raw annotations; challenge is that they need to process them correctly (for example: ignore deletes and inserts that have no effect, correctly apply unique and distinct rules); alternative is to store a log of summaries over time - so store summaries when a message.summary event is published and always ignore summaries on updates
|
|
||
| All webhook integrations allow you to use a regex filter on the channel name to control which channels the webhook should be triggered for. Use a common prefix in the name of chat rooms that you want to trigger a webhook for, and use the prefix as the filter. | ||
|
|
||
| Use `channel.message` as the event type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be discussing presence and channel lifecycle in this guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a bit out of scope.
Maybe mention that those exist? I can imagine people might want to keep a history of when users come online so saving presence updates can be useful for that use case but I'd focus on messages for this guide? I'd say there can be another one on saving presence history if we think it's important?
|
|
||
| Pros: | ||
| - Use your existing queue system to process and save messages from Ably. | ||
| - You have control over saving messages to your own database. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this also a pro of webhooks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe poor wording.
I was thinking that if you stream to your own queuing system, you can then configure your own retries, retention period, how long the queues can get... and consume at your own pace, hence more control of the actual queue->database ingestion.
With webhooks you need to do something when the hooks get called, or else you will likely miss it (bar the a few retries). sure, you can queue it from a webhook, but if you can queue directly why not do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to actually say what I meant it to say
b9acefa to
755cd89
Compare
aeae63d to
9d9ab8a
Compare
| - **Compliance and legal requirements**: Meet data retention policies, maintain audit trails for support conversations, or fulfill regulatory requirements. | ||
| - **Analytics and business intelligence**: Build dashboards, train ML models, analyze customer sentiment, or track support quality metrics. | ||
| - **Enhanced functionality**: Implement features that need the chat history, such as search. | ||
| - **Data sovereignty**: Maintain your own database as the canonical source of truth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just single source of truth here rather than data sovereignty - the latter has a wider meaning, that the data generated within a country abides by that nations laws/frameworks. So more closely aligned with compliance/legal than whose database is the authority on what's "right".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
| Consider the following when exporting chat data: | ||
|
|
||
| - **Database schema**: Design your schema to allow you to easily build the features you need, keeping in mind scale and reliability. | ||
| - **Version history requirements**: Decide whether you need to store all versions of messages or just the latest version (see [Decision 1](#decision-1) below). If you need all versions, you'll need to design your schema accordingly. If you want to retrieve messages as shown in a chat window (latest version of each message), consider duplicating the latest message data in a separate table to avoid filtering at query time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence here I think needs to be clearer that this only applies if you've got "all versions".
Should we recommend a pattern where you have one table for the "latest" of everything, then store the entire version history separately (i.e. place the "main table' emphasis on the chat state and not the audit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we recommend a pattern where you have one table for the "latest" of everything, then store the entire version history separately (i.e. place the "main table' emphasis on the chat state and not the audit)
Maybe this is a detail they can work out. In some cases it's better to just iterate over. In the case of search they might use an external service for the index anyway and if that's setup to uniquely index by serial they'll just always have the latest version without extra faff.
I was on the fence initially, but I'll remove mention of "separate table".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll keep this bullet point simple as it already points at a bigger section on the issue.
| - **Indexing for search**: consider indexing only the latest version of each message, or have a way to indicate which search results represent current versions. | ||
| - **Concurrent writes**: New messages, updates, and deletes will arrive concurrently, so your database system must handle this. Depending on your database, consider reducing roundtrips, managing locks, and handling race conditions. | ||
| - **Scale and reliability trade-offs**: Depending on the scale of your application, you need to consider how you will scale up and down the parts that handle the ingestion of messages from Ably. | ||
| - **Data latency and consistency**: If you publish directly to Ably (recommended), there will be a small delay between a message being published and it arriving in your database via integrations. Think of how to mitigate if needed for situations where the most up-to-date data is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any recommendations we can provide here rather than just "think of how"? We need to present from a position of expertise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost removed this sentence. It's important to call out the short delay but I don't think it'll be an issue or unexpected in any way.
I'm not so sure what to recommend. I imagine it's simply not an issue for most cases. One can fetch message by serial if needed. Example: they have a feature which allows users to attach an image to a message post-publish. It works via a POST multi-part which has the msg serial. On their server-side they need to check existence of the message. If ingestion is slow (unlikely to be slow to the point of users doing an action, but for the example's sake), local existence check fails, and as an extra step after local check fails, they could do a GET by serial to ably? This can be filtered by if serial>(some timestamp x minutes ago) to save requests for bad serials.
Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative (publish via their internal infrastructure) is the solution here I think? It's the trade off between being the source of truth (and so in the write path) or reducing latency and accepting you won't always have the latest data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@splindsay-92 That is the only way to always have the latest data. It comes with more downsides: failure modes, how to do updates and deletes, and so on.
I'm not sure if to call this out more, as the sentence starts with "if you publish directly to Ably". I think it's clear enough that if you don't publish directly to Ably you don't have this problem (but inherit others, and there's a section for that).
Maybe this?
| - **Data latency and consistency**: If you publish directly to Ably (recommended), there will be a small delay between a message being published and it arriving in your database via integrations. Think of how to mitigate if needed for situations where the most up-to-date data is required. | |
| - **Data latency and consistency**: If you publish directly to Ably (recommended), there will be a small delay between a message being published and it arriving in your database via integrations. If you need your database to be the source of truth consider [publishing via your own servers](#publish-via-own). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this point is important, and I'd expand a bit more.
We can be clear that.. We recommend you publish via Ably if your application is highly latency sensitive, but doing so will incur an eventual consistency cost (your servers may be out-of-date for a short time with clients before consuming the new message over the integration).
If you must have strong consistency, then you should publish via your own backend, incurring a small latency cost, but guaranteeing your server will always maintain the latest state.
Our recommendation isn't fixed, which is what the above implies, and in fact if you do need strong consistency, we would recommend the opposite I think? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@splindsay-92 I've rephrased, what do you think of it now? I just imply that by default publishing is via ably (which is likely the case) and offer publish via your own as a solution to data-in-your-db latency if needed.
Removed the (recommended), it was a bit out of place.
| There are two decisions to make when saving messages. | ||
|
|
||
|
|
||
| #### Decision 1: full version history or just the latest version? <a id="decision-1"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be H3 level section heading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also capital letter on Full?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| 1. Do you need to store only the current state of the reactions, historic snapshots of the current state, or the full history of all individual reactions? | ||
| - If you only need the current state (latest summary), simply save the values provided in the latest message with action `message.summary`. Uniquely index by `roomName` and `serial`. | ||
| - If you need to store historic snapshots, but don't require every single change, you can simply store all `message.summary` events for every message. | ||
| - If you need a full history of all reactions, you need to store individual reaction events (called [_"raw reactions"_](https://ably.com/docs/chat/rooms/message-reactions#raw-reactions) in Chat). To re-create the current state you need to apply the same logic Ably applies to each type of reaction. Some published reactions will have no effect on the summary, such as double publishing a reaction of type unique, or removing a reaction that does not exist. Due to the number of individual reaction events, keep in mind the cost of storage, compute, and your Ably per-message cost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we document this logic anywhere, like how we actually construct the summary? Otherwise this is a bit hand-wavey.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're described here: https://ably.com/docs/messages/annotations#annotation-types
It's not a step-by-step how to replicate but it defines the summarisation method clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added link
|
|
||
| You need to consider: | ||
| - **Redundancy**: In case of failure, Ably will retry delivering the message to your webhook, but only for a short period. You can see errors in the [`[meta]log` channel](/docs/platform/errors#meta). | ||
| - **Ordering**: Messages can arrive out-of-order. You can sort them using their `serial` and `version.serial` properties. However, if you're storing reaction summaries, out-of-order delivery can cause an older summary to overwrite a newer one, causing temporary inconsistencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory this shouldn't happen as all summaries are published from one region. Perhaps worth omitting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, kind of. I did think of that initially actually, it's the retires that are trouble. Leave it in?
| You need to consider: | ||
| - **Redundancy**: In case of failure, Ably will retry delivering the message to your webhook, but only for a short period. You can see errors in the [`[meta]log` channel](/docs/platform/errors#meta). | ||
| - **Ordering**: Messages can arrive out-of-order. You can sort them using their `serial` and `version.serial` properties. However, if you're storing reaction summaries, out-of-order delivery can cause an older summary to overwrite a newer one, causing temporary inconsistencies. | ||
| - **Consistency**: Missing webhook calls will lead to inconsistencies between your database and Ably, which can be difficult to resolve. Detect if this happens using the `[meta]log` channel and use the [history endpoint](#history-endpoint) to backfill missing data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should call out Ably's 4 pillars here to provide some reassurance that this very rarely happens.
Or perhaps worth revolving it around "webhook endpoint unavailable"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it as if your webhook is down situation.
Webhook calls that fail to make it clear that we don't think the webhook won't be called. Missing kind of says we won't make the call
fcfa439 to
1bb3a78
Compare
| - If you predict that your reaction summaries will get clipped and you need to store the list of clientIds who reacted, consider storing individual reaction events (called _"raw reactions"_ in Chat). | ||
| - The totals in clipped summaries represent the grand total of reactions, not just those included in the truncated clientId list. | ||
|
|
||
| If you do not need to store message reactions, you can simply discard them. Never store the `reactions` (or `annotations`) field and ignore messages with action `message.summary`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If you do not need to store message reactions, you can simply discard them. Never store the `reactions` (or `annotations`) field and ignore messages with action `message.summary`. | |
| If you do not need to store message reactions, you can simply discard them - i.e, by not storing the `reactions` (or `annotations`) fields and ignoring any messages with action `message.summary`. |
Feel free to ignore/re-word, but the original sentences sounded a bit strange to me :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
| - **Indexing for search**: consider indexing only the latest version of each message, or have a way to indicate which search results represent current versions. | ||
| - **Concurrent writes**: New messages, updates, and deletes will arrive concurrently, so your database system must handle this. Depending on your database, consider reducing roundtrips, managing locks, and handling race conditions. | ||
| - **Scale and reliability trade-offs**: Depending on the scale of your application, you need to consider how you will scale up and down the parts that handle the ingestion of messages from Ably. | ||
| - **Data latency and consistency**: If you publish directly to Ably (recommended), there will be a small delay between a message being published and it arriving in your database via integrations. Think of how to mitigate if needed for situations where the most up-to-date data is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this point is important, and I'd expand a bit more.
We can be clear that.. We recommend you publish via Ably if your application is highly latency sensitive, but doing so will incur an eventual consistency cost (your servers may be out-of-date for a short time with clients before consuming the new message over the integration).
If you must have strong consistency, then you should publish via your own backend, incurring a small latency cost, but guaranteeing your server will always maintain the latest state.
Our recommendation isn't fixed, which is what the above implies, and in fact if you do need strong consistency, we would recommend the opposite I think? :)
splindsay-92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More for thought, but when I was reading the sections on Using a webhook via integration rules etc.. it got me thinking..
As a customer reading this, would it be better for the titles to be something If you need guaranteed ordering then talk about AblyQueues and list the downsides etc? This way, the customer looks for the title that fits their problem/use-case then reads about how we can best solve it? :)
AndyTWF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there - just one last thing :)
|
|
||
| ## Filtering rooms and event types <a id="filtering"/> | ||
|
|
||
| Integration rules allow you to filter which Ably channels are forwarded to your own system using a regular expression on the channel name. This is a simple way to reduce the volume of messages you need to process by only receiving messages from the chat rooms you are interested in. Use a common prefix in the name of chat rooms that you want to trigger an integration rule for, and use the prefix as the filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Integration rules allow you to filter which Ably channels are forwarded to your own system using a regular expression on the channel name. This is a simple way to reduce the volume of messages you need to process by only receiving messages from the chat rooms you are interested in. Use a common prefix in the name of chat rooms that you want to trigger an integration rule for, and use the prefix as the filter. | |
| Integrations allow you to filter which Ably channels are forwarded to your own system using a regular expression on the channel name. This is a simple way to reduce the volume of messages you need to process by only receiving messages from the chat rooms you are interested in. Use a common prefix in the name of chat rooms that you want to trigger an integration rule for, and use the prefix as the filter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(And ditto in other places - as rules means namespaces in many cases)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good spot! thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
AndyTWF
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy once other reviewers comments are resolved
splindsay-92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT, just needs the commits to be squashed then lets merge it!
570b67d to
68b5261
Compare
Description
Add chat guide to save messages to your own database. https://ably.atlassian.net/browse/CHA-1145
Checklist