-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
App Architecture for User Data #9
Comments
@kirlat and @irina060981 please take a look at this and let me know your thoughts and questions. There are a fair number of assumptions in here about the different pieces of the architecture for working with user data and authenticated services, etc which likely also need some discussion. |
Hello, Bridget and Kirill!
So we will have several instances of IndexedDB for each page (text) it is used, for each browser, for each environment. And the only thing that could connect all of them - user identification data. And from the other hand we should use advantages of having saved locally data.
|
Thanks for laying this down! There are several very important decisions we have to make that, I believe, will define how successful our development will be in the future. Because of this, I would like to offer a discussion of several architectural issues that we face. Our architecture will depend on what decisions would we make on these (and other) issues. A. Working with both webextension and Safari app extension code I've got a good sense of how difficult and time consuming it might be to support two codebases that do pretty much the same thing, but use different technology stacks, even if the difference is not so significant (a different background code). So I think our goal would be to minimize re-implementation of similar code. Since we are using different architectural solutions (embedded lib is a client side code), webextension (client side code in isolated environment and a protected background code), Safari app extension (same as webbextension but instead of background we have an app extension written in Swift), PWA (similar to webextension in a way). We should try to have one piece of authentication/authorizaton code that will work for all clients (if possible, because there are some challenges here). JS seems to be best for this as it's a common denominator for all our clients. B. If we want to be successful on mobile, we need to think of how to minimize data throughput. Mobile data is still slow, I believe, even in developed countries. So we should: C. Different security environments. Webextension, Safari app extension, and PWA can considered be trusted (in a way), but embedded lib is not. The library is running in the same environment with other scripts loaded by the page, and any malicious script (loaded by the page unintentionally or injected into it) can get access to all data of the embedded library (please correct me if I'm wrong here). This means we cannot trust embedded lb to store any secrets. This also means a different security architecture for the library and the rest of the apps. Unfortunately, this contradicts (A). Should really think about possible solutions here. Considering all that it would be tempting to shift some of our current logic to the servers side. Now we, during a lexical query request, retrieve lemma from one source, lemma translations from the other, and then execute several definitions requests. If we shift this to the server we could:
Of course, there are several obvious drawbacks to that solution.
I'm not sure what would be the best solution, but it's very tempting, I think, to move business logic somewhere where it can be implemented once and work for (be shared by) all clients (preferably in a protected environment) and where it also can be updated without the need to update each client implementation. Maybe there are other that server-side solutions that could help us to implement this? Maybe we can use service worker to host all our business and authorization logic? It seems to be supported by a both Chrome, FF and Safari in a different extent. It's JS and we can share it between all clients. But I am not 100% sure what limitations we would have there. What are your thoughts on this? I think we should define our general strategy before going into details. Thanks! |
I agree with @irina060981 that we could probably add authorization token as a parameter of client adapter queries (or add authorization info to queries in some other way). That should be relatively simple. If we want to have it optional, we can probably use a mixin for authorization logic. Regarding IndexedDB and same origin policy we could probably put the database into a service worker or a background script (if service worker functionality would be adequate for us then it's the best because we can us it in Safari, it seems). So in a content script or an embedded lib, when we need data, we send a message (DOM event probably as a universal solution) to the SW (i.e. service worker), SW will request IndexedDB and remote server if necessary, and respond with data in a response message. This way all data across different tabs will be shared and in sync. I'm for moving out as much of business logic from a UI Controller as possible. I think the role of a UI Controller should be to coordinate UI elements only and provide their interactions. A business logic should be somewhere else. I like the concept of Queries, so we can probably use data queries the same way we use lexical queries. If that be not enough, we can introduce a specialized data controller. |
Service workers could be ideal to provide a first level of caching for the apps. Can't find any clear info about whether we can use them in webextensions or not. But it also seems to be no info about it being prohibited either ... |
Yes, I think we would be able to work with the indexedDb in the background script, so that all content stored by the background script is in the same db, regardless of what page the user was on when they are using the extension. And for the PWA we would use a service worker. (We may be able to use service workers in webextensions but it's not really clear to me. I do know that google has stated of the goals for the the webextension manifest v3 as "Modernizing to align with new web capabilities, such as supporting Service Workers as a new type of background process" |
I need to think about the authentication issues with the embedded library vs the PWA. As I articulated in the release scope comments in Slack, for user data storage I am leaning towards using AWS Serverless Stack, which includes AWS API Gateway, AWS Lambda and either AWS DynamoDB or S3 or both. We would use OAuth2 and AuthO's API authorization flows with JWT to protect access to the AWS API gateway for user data storage/retrieval. I believe I want to to stick with a microservices approach and client-side authentication. Some links that might be helpful here: https://yos.io/2017/09/03/serverless-authentication-with-jwt/ |
I was thinking that the WordListController and WordListComponent (as well as a WordListItem Component) would go in the components repository. WordList and WordListItem would be data model objects in the data-models repository. We could start development with them in a separate repository, but per our refactoring goals, we are trying to reduce the number of dependencies. Plus I think for any Alpheios application the wordlist is a core component. |
Certainly UserDataQuery and DBSyncController are separate from UIController. Whether they belongs in core components is a little less clear to me. It depends in part, I think, on whether we can make this functionality available to the embedded library in a secure way or not. |
The questions about combining server requests and optimizing data syncing all require a little more thought. Will try to respond further on these soon. |
Hello, Bridget and Kirill!
Client side solution (as we have now) has some advantages for desktop usage:
May be it could be useful to create a light version for mobile (with special sign - for mobile) - because if someone tries to use it with poor connection - he could choose to get only morph-data (for example) and use normaly? If to be honest in my practice I had much experience with classic client-server architecture (like Kirill suggested) and thought that it is the only good way. And I had experience with problems with server overload and constant growth of upgrade costs. And first when I saw this client-side implementation I was surprised. And now I could see advantages of this approach. Because it seems to me that if Alpheios Extension would be used in study process and for example a whole class starts to use it at the same time - it won't be very easy to server. |
Irina, agree with everything you said! There is no ideal solution here, and every approach will probably have it's advantages and drawbacks. I like client-based architecture better for what we do (as I understand you are 🙂), but I see some potential issues with it that we may face later. So I thought if we discuss it now, we can probably find some approaches to make it more bearable. Even if there is no solution, we would still keep those issues in mind while writing our code, and it will help us to create a better one, I believe. Once we fully aware of the problems we can try to minimize their consequences. With what I've learned so far we would probably still have to manage at least three versions of an authentication code (webextension+PWA/Safari/embedded lib) (sigh 😞) |
I love our discussions and consider them to be a very important part of our workflow :) But I think that approach with JWT tokens (thank you, Bridget, for links - I have used tokens before but hadn't ever read such a clear description as in the first link) could be very helpful. About using security issues here - there is not very "fresh" article about security questions. But it could be helpful - it suggests to use Chrome Identity API. From the article:
Do you have experience with it? |
I am using But Identity API works only in background-related pages, not in the client-side scripts. So it's not an option for an embedded lib (it has to use a different authentication workflow anyway, more on that later). And for Safari it's a no go too 🙁. However, there is more to it: encryption libraries that we use to generate items of our requests (like random byte array generators and hash functions). Those libraries tend to be environment-specific too 😢. |
Some refs: For webextension (all browsers) and PWA we'll probably use what is called "Authorization Code Grant Flow with PKCE": https://auth0.com/docs/api-auth/grant/authorization-code-pkce For embedded lib (since it cannot be trusted and we cannot store secrets in a client-side script) the best choice is "Implicit Grant Flow": https://auth0.com/docs/api-auth/grant/implicit And the authentication/authorization code in Safari has probably be within the app extension, which means a different codebase. |
Thank you, Kirill, for explanations! |
For MacOS application - I think it needs this And it is a new challenge for making Safari App Extension a next state of the art, I think. 🙂 |
Some additional thoughts based upon our discussion at today's check-in: Whether or not it ends up being possible to avoid cross-domain restrictions on indexed db for the webextension, for the embedded library and reader applications we know we will have cross-domain restrictions. So the design has to take that into account. Since we want to support a single user account across multiple applications (webextension, mobile reader, etc) the remote user data store is the location which be the authoritative source of the user data. The IndexedDb can be used as a local cache to support fast and offline access but it will always need to be updated from the remote user data store in order to provide a fully up-to-date view of the user's data. We must have an API that protects us from the need to duplicating the business logic around retrieving remote data and merging it with the local indexeddb. Any client side feature, such as a word list, should not need to know the details of where the data is coming from. This is the point of the DBSyncController in the above proposed design. While we can store entire complete Homonym (or other alpheios data-model) objects in the user data stores (both remote and local) and may decide to do so in some cases for performance reasons or to support offline access, the main purpose of the user data store is to store information that is unique to an individual user's experience with the Alpheios applications. We probably do not want to be duplicating data that comes from our remote services across each and every user data store, of which, in the case of the local indexeddb, there could be multiple for each domain the user visits. Storing the data in structures that can be directly serialized to/from the Alpheios Data Model objects is appealing but if we do this we need to have a way to easily identify the state of that data model object and whether or not it can or needs to be filled in with data from remote services. |
It might also be that the persistent structure of a user data object is a subset of what is stored in the local indexed db. The DBSyncController might be responsible for deciding which properties of an Alpheios Data Model object to populate from where. The DBSyncController could then also implement ClientAdapter interfaces so that it can be used as a source for LexicalQuery data. For example, I could see a scenario like the following: With a fresh start: WordListController requests WordList from DbSyncController At his point, the WordList data in the LocalDB is identical to that in the RemoteDB Then, the user clicks on a Word on the Wordlist, and the UI initiates a LexicalQuery LexicalQuery asks DbSyncController for the word Upon completion of the LexicalQuery WordListController updates the WordListItem with the full Homonym Later user clicks on the a word which is in the WordList and which already has a full Homonym stored for it in the local LocalDB, and the UI initiates a LexicalQuery LexicalQuery asks DbSyncController for the word Although the need to support versioning of service results is probably a lower priority, we could add additional business logic into the both the DBSyncController and the LeixcalQuery to check version flags on a Homonym's component parts against service output to find out of the local store needs to be updated. But if the local indexeddb is understood to be a temporary, incomplete storage, and the RemoteDB doesn't store full Homonym data, then this is maybe less of a concern. @kirlat and @irina060981 does this make sense to both of you? What potential pitfalls do you see in it? |
as an alternative/addendum to this statement:
I could see the code getting messy if the DBSyncController has to know too much about the individual data model objects. So an alternative might be to have objects which are candidates for remote storage implement a toPersistentJSON method, or the like, which could be used to create the minimal version for remote storage, and set an inComplete flag on those components of it which are not full representations. |
Another thing to think about: All user data objects should probably be versioned themselves, so that we can deal gracefully with future data structure changes. E..g. so that if need be, we can quickly differentiate between a WordListItem version 1.0 and WordListItem version 2.0 without having to examine the datastructure. |
👍 for data versioning. It might also be beneficial to version the REST API of remote services. If we use GraphQL we wont need this as they suggest introducing new fields as a preferred way of versioning: https://graphql.org/learn/best-practices/#versioning. For versioning of a JS objects such as WordListItem it would probably be cleaner to integrate version info into their class names (i.e. have a separate class for each new version) rather than having a version field inside a class and some conditional logic in methods that will rely on the version filed value. The latter can become convoluted easily. What do you think? For comparing data objects, there is a |
I have some thoughts here too. We have different data to arrange locally and remotely:
We have several storages that should be synchronized somehow:
All 5 items could change data in first 4 items (according to previous list) I think that we need here some central data Controller (maybe DBSync or maybe simply DataSync), that will have some rules to sync data using the following conditions:
And it should have access to Remote UserDatabase API, to IndexedDB methods, to Vuex data update events - in both ways - write/read And it should be able to imported to content part or to background part. And we couldn't define obvious priority for remote or local data, because some data has source remotely, some part of the data has source locally. I agree, such controller could became really long codded file/environment. |
And I think such DataController should be used inside LexicalQiuery Similiar to Bridget's proposal:
Each updated part of the data - creates/updates wordItem instance inside wordList (with current data of context for the current Lexical request) instance and it uploads to Vuex (from Vuex to UI components) and resaved with updated data to local and to remote. And on first page load we need to upload current wordlist (after authorization)
And when a user changes the data (place important flag for example or adds new context usage) or delete
Also user could remove some worditem I think there could be very different scenarios that could be implemented one by one . |
I think we should divide all sync procedures by type of the data (similiar to Lexical request):
|
And create sync rules for each one according to
inside some Controller |
I am not sure how I feel about that. I guess another option here is to use Protocol Buffers (https://codeclimate.com/blog/choose-protocol-buffers/). It sounds like the solution they provide to data versioning issues in service interactions is similar to that of the GraphQL approach - I i.e by relying on ionly adding not removing or changing fields. As this came up for me while thinking about the exchange of data to/from the CRUD microservice for the remoteDb for the wordlists, it seems that maybe that is the problem Protocol Buffers were designed to address. Do either of you have experience with them? |
There are some very good points here, and I think we need to be careful about the scope of this data controller, and limit it to persistent data accesses and not involve it in application state data. If we assume that for all persistent storage (including the local indexed db solution in "persistent" even if that is debatable) we will require user authentication, then we could call it UserDataSyncController or something like that. The the point about being able to be imported to content or background, I will copy what I just put in the slack discussion here: For the different interface to IndexedDb in the Webextension and the EmbedLib, ideally I think this should work similar to that which we have already discussed needing for the Auth object. I.e. we need an abstraction that allows the rest of the application to not care if this is happening in the background or the content side, and then an implementation of that abstraction that gets handed to the UIController's constructor |
I have not worked with Protobuf, but heard good things about them. I think they should be nearly ideal for inter-service communications. I probably misunderstood your point about WordListItem versioning. I was thinking you was talking about versioning it for using within an application (i.e. that we might have some modules/components that were using both V1 and V2 of it at the same time), not about transferring it over the network. 🙂 I think protobuf might be beneficial for storing data too, in some situations. |
It is important, on my opinion, that we won't end up with a huge do-it-all data controller as it might grow into something that is hard to maintain. To avoid this we probably should:
|
Ah yes, sorry I wasn't clear about that. I don't think that a single version of the application should be actively trying to save multiple versions at the same time, but it might need to be able to read older versions. That is, a newer version of the application shouldn't break if it encounters data that was saved by an older version. |
Agree with these points. |
this was implemented in the 3.0 release. Future work on user data management will be discussed separately. |
The following is a proposal for the application architecture design for managing user data
The need is to have a way to work with data sources efficiently locally, while keeping data in sync across multiple application instances.
The requirements for the user word-in-context lists are used as the example use case here, but the idea is to develop an architecture which is flexible enough to handle various data types and data sources, and which works across applications (Webextension, Embedded Library, etc)
For example, a user might do lookups on both a mobile device and on the desktop and each should be updating the user's wordlist to add the words as they are looked up. Similar requirements will be in place for user preferences and other sorts of user data.
I'm proposing a design which uses:
In the above diagram, some of the steps are represented as synchronous when they will need to be asynchronous but the basic flow is this:
[001] - [002] Upon application initialize, controllers subscribe to events which interact with data
[003] - User requests a word list display by clicking a button on the word list tab
[004] - Wordlist Vue component requests Wordlist data from the UIController
[004] - UI Controller delegates the request to the WordListController
[005] - Wordlist Controller requests data from a UserDataQuery object
[006] - UserDataQuery object requests data from the DBSyncController
[008] - [020] DbSyncController interacts with remote and local data sources to retrieve and merge data
(In there, the assumption is that we might have a ProtectedClientAdapter which knows how to interact with data sources which require authentication. Exact details of that still need to be worked out but the idea is to isolate the business logic around authentication/authorization from that of managing and merging data sources -- in other scenarios the DbSyncAdapter could use the regular ClientAdapter to retrieve data from non-protected sources)
[021] DbSyncController returns the fully merged data set to the WordListController
[022-023] WordListController instantiates the WordList data model objects and supplies them to the UIController
[024] UIController updates the data sent to the WordList view
The WordList can also be updated by events which are not specific requests to the WordList view component. For example, the requirements call for for every word being looked up to be added to the user's word list. In [001] The WordListController subscribes to the MORPH_DATA_READY event which happens upon word lookup. The UIController might also subscribe to a WORDLIST_DATA_READY event which happens whenever WordListData is updated.
[026] User initiates a word lookup
[027] UIController requests data from the LexicalQuery
[028] LexicalQuery publishes its MORPH_DATA_READY event
[029] WordListController receives the MORPH_DATA_READY event, updates the WordList data model object and then initiates a request to the DBSyncController to store the updated data.
[030] - [043] The DBSyncController interacts with the remote and local data stores to update the data (In reality the update events would probably be asynchronous but they are shown
synchronously in the diagram)
[044] WordListController publishes a [WORDLIST_DATA_UPDATE] event
[045] UIController receives the [WORDLIST_DATA_UPDATE] event and updates the WordList view accordingly so that when the user accesses it next it is up to date
The DBSyncController could implement different approaches to synchronizing with the the remote data store depending upon where the code is running. If in a PWA, for example, it could use ServiceWorkers and BackgroundSync to queue up requests when the user is offline, options which are not currently available to the Webextension.
The text was updated successfully, but these errors were encountered: