-
Notifications
You must be signed in to change notification settings - Fork 73
Delta sync #1382
Comments
We can cover this requirement by:
It is really glad to see this request coming from the community as it validates that this use case can be useful for people. General target will be to support diffs in OfflineClient. Going to create Roadmap issue soon so we can put some timing on when and how things will be delivered. @ntziolis Do you have time to collaborate on requirements/approach for this? |
Forgot to mention that we are not only client side and js. |
This is just awesome. I'd love to assist in this. My background is in building replication engines so lmk how I can participate. I think true offline capabilities that do NOT force single cloud vendor lock-in or custom solutions are the last major piece in the GraphQL all the way puzzle. Since I now seem to be talking to the like minded I wanne run something by you I have been thinking about for a long time now: Our goal should be to to handle data the same way on the clientside as on the serverside. For me the backend today starts clientside already. Really everything that retrieves and stores data I see as backend of my actual app. And it seems strange that we use different tool chains and APIs to handle the data. So the goal should be to have have at least a subset of what is available server side on the client side as well, with the same API. So: Further when making sure such a layer is configurable in the sense which operations (eg filters like equals, contains etc ) are available and how they are exposed (down to customising the name and structure). It would be possible to provide at least a subset of the functionality available on the server which would allow to send any query to the server as well to ensure the user operates on the latest set of data. What do you think about something like this? Or How would you go about consuming the cached data in a meaningful manner when trying to execute filtering etc clientside? |
In regards to delta sync I wanted to get started with an initial list of factors and scenarios. The following list keeps in mind both data storage and transfer requirements I see 3 main factors in play (standard replication stuff):
Server metadata storage strategies for sync:
Having the ability to choose between the two approaches is crucial to not requiring a specific implementation or downstream datasource capabilities for sync to work. While it should absolutely be possible to leverage them when available. Feel free to let me know if im overshooting. |
This is very much the target of this. We might provide some out of the box deployment options later, but target here is to provide a flexible package that works out of the box with the existing backends. Really like the ideas. There is no overshooting as from my point of view so many people are looking for anything like this for some time.
Yes. This is pretty much sums it up and it is possible now in Apollo. We can have offline pagination and online pagination etc. This quite challenging task as it will involving:
For the moment we kinda focusing on giving fully featured offline behavior and great user experience related to that. Developers should be able to work seamlessly with any GraphQL objects (Files, Subscriptions etc.) Once we do that right, next stage will be to go towards storage improvements and deltas. I will need more time to write a proposal on how DeltaSync will work. Then we can collect some feedback from industry and create individual github issues for collaboration. In relation to the second comment. I will need to put some diagram on how this will work and write some proposal to open conversation. This is too large topic to simply draft that on the single github issue. |
@ntziolis Thanks for reaching out. I'm going to work on a general proposal for diffing capability so we can collaborate better. |
This is exactly what Meteor.js does if you haven't checked it out. They let the client subscribe to a data set then transparently stream that data into a client side MongoDB implementation (minimongo) that matches the server's mongo client API. The server then tracks active subscriptions against the mongodb oplog and sends any diffs down to subscribed clients. It is totally transparent and reactive for the client. The client and server can easily share code because they have the same database API. While it has its downsides (mongo lockin, scaling, performance, not very actively developed anymore) nothing has managed to match the meteor dx so far in my opinion. I think the idea of the client having the same data API as the server is key to unlocking a lot of code reuse and really powerful features. Meteor's architecture might be a good place to look for some inspiration. I'm excited to see where this project goes! I'm evaluating using Apollo and Prisma for my react-native project and this seems like the missing piece of the puzzle. Unfortunately I don't have enough experience with apollo/graphql yet to contribute much but I would like to help wherever I can. |
@AndrewMorsillo 100% agree on that meteor is exactly where we wanne end up from a dev experience perspective. In fact my team has used meteor to build our first 3 enterprise SaaS solutions, but by now have migrated them to a GraphQL (key reasons where seamless external rest service integration, manageability, long term framework support, no-lock-in to specific technologies / frameworks on the backend and scale issues). The goal for is to build a data backend independent version of what meteor delivers in regards to data handling server/client-side. Once exists everyone can build their on providers for their data backend without tech stack lock-in. |
@ntziolis I'm in the same boat as you. I'm switching from meteor to graphql for the same reasons in the next iteration of my project. Agreed 100% on the goal. Providing what you get from meteor in a more open backend agnostic fashion will be the ultimate dream for js development. |
I think the best way to start with this is simply to enable the application to Query specific data on the server and Subscribe for results when:
We currently have Proposal for clientThe client can have new methods for registering queries/subscriptions:
For example: // Wait with request after becoming online
public initialDelay: number = 0;
// Interval used for pooling
public interval?:
// Even some extra metadata
public requiresWifi: boolean Developers will be able to trigger Query Refresh manually (and force subscriptions to reconnect:
Related work
Open for comments, opinions and contributions |
Totally agree with doing this step by a step and using a client side only approach in the first step that doesn't require server side changes. In addition we should make this as transparent as possible. Looking at what
I think effectively what we want is what We could achieve this by wrapping the In regards to the subscriptions connected to a watchQuery:
|
We will follow up with server side node.js package but IMHO is best to start with client side usages first and try them to see if we even need anything from server or it can be done in framework user space.
Yes. This pretty much sums intentions here 💯
Love it. Going to work on the base for that and post update in comming days.
This is already there, however it is a very naive approach and we do not resubscribe on app restart.
Awesome idea! I totally forgot about the fact that those should be interconnected. |
@wtrocki regarding |
@alidcastano The current status should be seen as stepping stone. Reusing an existing db project is absolutely something we are looking into, keeping in mind that the end goals are to:
To your question: I'm still in the process of researching the fit of existing browser based in memory dbs out there for fit for this project, so if anyone has pointers to project not mentioned in the below list please feel free to pile on:
Update: |
👍 for not imposing specific technologies. Part of the appeal of the Apollo tools is that they can be glued together for slightly different stacks/use cases. |
@xtagon I just started looking into this space myself so there may be technical nuances I'm not seeing - but in general, data synchronization and conflict resolutions are hard problems to solve, why not use an existing, battle-tested solution in the interim? it'll be the difference between being able to use a production-ready solution next week versus next year. apollo-servers pubsub implementation, for example, just provides an abstraction layer - to which the community can create their own tech specific implementations. the redis package being the most popular one right now. I can understand not wanting to use a framework specific solution (such as redux for caching, which I'm glad Apollo moved away from) but there's lots of great work (and ecosystems!) in this space in JS land, why not take advantage of them? are there some incompatibilities I'm not aware of? seems like apollo-cache-persist already exposes the necessary API for it |
@ntziolis regarding |
@ntziolis here's a comment I found that adds to your database list: prisma/prisma#1659 (comment) |
@alidcastano Thank you so much for listing that out. I will research the list of the databases that were provided. |
After quick check I think we can list 2 categories of soluions:
Both will have some advantages and disadvantages. |
@alidcastano Thank you for the link, saw that previously but couldn't find it anymore so thank you! just to be clear, my statements where oriented towards the end goal and should not mean that in the interim it wouldn't be a good idea to bridge with an existing technology. Im all for using an interim solution until we have this ironed out in a non technology specific manner. My main concerns for choosing an interim solution are:
So as @wtrocki has rightfully put there are 2 distinct problems (interim and long term) and the tech choices for both are driven by different factors which likely result in different choices for each. Just to shed more light on why a graphql based replication protocol makes sense:
Totally correct, the difference is the feasibility and complexity of such an implementation: For example the CouchDB protocol makes certain assumptions on DB hooks that simply do not exists on most commonly used SQL DBs hence making it impossible to implement the protocol. Last I wanne say: I understand solving replication is a hard problem. In fact I have been building replication engines for the past 10 years and prior to graphql I haven't seen a technology that would allow for a generic stack solution hence I' committed in building this out because its a hard problem. Also I'm weird and I just love replication :) |
@ntziolis appreciate you writing this out -- it exactly aligns with what I've been learning these last two days, so glad it's nicely summarized here for others. I'll also add this quote I found from an interview with one of the Pouchch maintainers:
the Pouchdb/Couchdb combo just seems like the most out of the box, production ready solution for offline support, which is the reason that I mentioned it. I personally prefer SQL but there doesn't seem to be an equivalent stack for it yet (I wonder if there's a particular reason for that) regarding your third point, about exposing database specific APIs - to what extent can that even be avoided? if you look at orms like Knex.js, for example, even they expose certain fields/methods that are only available in certain SQL dialects. totally agree with |
Generally there are replication solutions for SQL DBs available that work really well but they are extremely cost prohibitive (often priced per replication endpoint) and functionality wise more geared towards internal enterprise use cases.
I was referring to what level of service can be expected from the replication backend. Some replication use cases possible in Pouch will be hard to generalize hence will not be supported.
This is why we need to decouple the offline engine from the "dialect" being used. In regards to the offline engine we need have generalized requirements that allow the engine to work with all kinds of data backends. This is more about form and types of possible filters. NOT about how they are expressed. The dialect to request data however should be up to each project. And I think herin lies the beauty of graphql as it allows each model (or even query) to use its own dialect. Think prisma vs sequelize. Each provide their own way to specify filters and pagination but both are in proper graphql still.
100% correct. My goal wouldn't be to ever target 100% use case coverage, maybe not even 90% but instead the most common use cases to finally enable existing app stacks to allow offline capabilities even if its no where near 100%. The alternative right now is 0% offline capable for most of these projects. |
Really nice ideas in the thread so I want to summarize everything and create actionable items from this super thread. Over the next day going to create the following issues:
|
Small update for this. Currently, we have:
We have integrated libraries into popular community packages and this enabled us to really tackle this issue. For the moment the only challenge is to pick the right options for the backend. OptionsThe challenge we have now is to see if we should stick with the single open source project that will enable offline diff capabilities. We cannot rely 100% of subscriptions as users should be able to get the changes even when they weren't subscribed at the time. This is where streaming platforms like kafka come as much better alternative to AMQ EvenSourcing/Event Log using KafkaKafka is designed to handle changelogs in very efficient way. Building a generic solution for event streaming with filter supportGenerally, we could check if we can build some pluggable library that will be able to work with any general-purpose pub/sub mechanism and storage that will store data that is partitioned for well-known filter categories. Limitation for this is that filters will need to be fully available and introducing new filters will bring a lot of additional processing that will be needed to aggregate data again. Apply event log on actual data setAdding lastModified or anything on the actual table can give developers the ability to get diff for the changes |
@ssd71 This is the top level issue that we have moved from offix that covers the work you are doing. As you see this is very old requirement coming from community and it is really exciting to see us finally moving that forward (on different repo). I have added it to post top level progress on the work we have done |
Graphback.dev now supports datasynchronization in beta phase. Please check our documentation |
A generalized way to handle changes in data on the server while client was offline without refetching all data after reconnect as well as transparently execute initial data loading without requiring user to have all data locally already for app to function.
I completely understand that this is not a client only feature but this library seemed to be the best place to start a discussion on how we would go about establishing a standardized way to achieve this capability (server to client data replication) when leveraging a graphql backend
Describe the solution you'd like
Describe alternatives you've considered
The text was updated successfully, but these errors were encountered: