Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-93] Proposal for non-blocking SQL Lab persistence #21385

Closed
justinpark opened this issue Sep 8, 2022 · 7 comments
Closed

[SIP-93] Proposal for non-blocking SQL Lab persistence #21385

justinpark opened this issue Sep 8, 2022 · 7 comments
Labels
sip Superset Improvement Proposal

Comments

@justinpark
Copy link
Member

justinpark commented Sep 8, 2022

[SIP-93] Proposal for async SQL Lab persistence

Motivation

At Airbnb we have a performance (lag) issue using SQLLAB_BACKEND_PERSISTENCE feature.
Since SQLLAB_BACKEND_PERSISTENCE designed in a blocking way (as shown in the following), user can be stuck while tab_state table in busy state. (even unable to run a query while the tabstateview endpoint timed out)

export function setActiveQueryEditor(queryEditor) {
return function (dispatch) {
const sync = isFeatureEnabled(FeatureFlag.SQLLAB_BACKEND_PERSISTENCE)
? SupersetClient.post({
endpoint: encodeURI(`/tabstateview/${queryEditor.id}/activate`),
})
: Promise.resolve();
return sync
.then(() => dispatch({ type: SET_ACTIVE_QUERY_EDITOR, queryEditor }))
.catch(response => {
if (response.status !== 404) {

In order to improve this lag issue, we'd like to have a non-blocking mechanism for this db syncing workflow. i.e. like g-doc auto saving. moreover we'd like to design a hybrid solution (i.e. keep using redux-localstorage and save same snaphot in remote) to keep the advantage (offline saving) of using local storage.

Proposed Change

The proposed solution is simple. We would like to keep using redux-localstorage as well as the current localStorage mechanism (as SQLLAB_BACKEND_PERSISTENCE turned OFF). Like QueryAutoSync does, we would like to build a component that observes the changes of queryEditor and then periodically sync the updated queryEditor snapshot from redux state using existing /tabstateview/ api.
For the payload wise, we'd like to use extra_json in tab_state table to store the snapshot(by json format) of queryEditor as stored in localStorage (rather than mapping each value in a proper table column like sql, label, query_limit).
As existing persistence feature does, we will design similar hydration flow in frontend to initialize the sqlLab state from the backend. With this json blob it will simply convert to json and then merge with the existing local state.

New or Changed Public Interfaces

We will replace prefetch(/tabstateview/ call) block code in sqllab actions on SQLLAB_BACKEND_PERSISTENCE (or introduce new SQLLAB_BACKEND_PERSISTENCE_ASYNC FLAG if preferred) and need to pass the active_tab value including extra_json in this case.

New dependencies

I do not for-see any new dependencies.

Migration Plan and Compatibility

I do not for-see any migration required.

Rejected Alternatives

None.

to: @etr2460 @ktmud @john-bodley
cc: @betodealmeida

@john-bodley john-bodley changed the title [SIP-88] Proposal for non-blocking sqlLab persistence [SIP-88] Proposal for non-blocking SQL Lab persistence Sep 8, 2022
@ktmud
Copy link
Member

ktmud commented Sep 8, 2022

I like this idea. Can you elaborate more on which states will be pulled to local storage and which states will be kept exclusively on remote? I'd imagine we probably shouldn't keep even a temporary local copy of big payloads such as query results.

image

For states that can be loaded fresh upon page load--such as tables and databases, they should also be skipped for localStorage in this async approach.

image

There are also some duplicates in saved states, too, that we may want to clean up:

image

I also noticed we are saving previous queries of current active tab even though they may never be used again. Should be able to save a lot of space if we just clean them up.


To summarize, I think it'd be useful to start cleaning up current saved states when SQLLAB_BACKEND_PERSISTENCE is off, before migrating to this hybrid approach.

@justinpark
Copy link
Member Author

justinpark commented Sep 8, 2022

I like this idea. Can you elaborate more on which states will be pulled to local storage and which states will be kept exclusively on remote?

Good point. There's a helper function for this and planning to reuse this function to trim the unnecessary items.
And it will only push the draft (editor) states (the gap before saved in server) to the local storage. Table (schema) metadata are exclusively stored on remote.

export function clearQueryEditors(queryEditors) {
return queryEditors.map(editor =>
// only return selected keys
Object.keys(editor)
.filter(key => PERSISTENT_QUERY_EDITOR_KEYS.has(key))

I'd imagine we probably shouldn't keep even a temporary local copy of big payloads such as query results.

agree. I want to remove the current local storage copy of query results (The query results will retain while tab is opened)
but only stores the resultsKey that requests the cached result from remote.

To summarize, I think it'd be useful to start cleaning up current saved states

yes. It will trim the unnecessary saved states during this job.

@john-bodley john-bodley changed the title [SIP-88] Proposal for non-blocking SQL Lab persistence [SIP-93] Proposal for non-blocking SQL Lab persistence Nov 16, 2022
@michael-s-molina
Copy link
Member

Interesting idea @justinpark . I wonder if we should provide both sync and async persistence options or just provide the async version as the evolution of the feature. If it's the latter, we could reuse the same feature flag given that we're planning to clean up the current state, which will modify the existing feature anyway.

@geido
Copy link
Member

geido commented Nov 22, 2022

Nice proposal @justinpark

@mistercrunch
Copy link
Member

Took a look today, this is comprehensive, seems reasonable and I don't think there's anything controversial here.

+1

Though I'd love to get @betodealmeida 's blessing since I believe he implemented the original SQLLAB_BACKEND_PERSISTENCE

@justinpark
Copy link
Member Author

justinpark commented Nov 30, 2022

If it's the latter, we could reuse the same feature flag given that we're planning to clean up the current state, which will modify the existing feature anyway.

@michael-s-molina
I agree with the idea if no concerns found with this movement. I can make the improvement under same feature flag (SQLLAB_BACKEND_PERSISTENCE)

@justinpark justinpark moved this from PRE-DISCUSSION to ACTIVE DISCUSSION in SIPs (Superset Improvement Proposals) Apr 25, 2023
@justinpark justinpark moved this from ACTIVE DISCUSSION to VOTING (in the mailing list) in SIPs (Superset Improvement Proposals) May 15, 2023
@john-bodley john-bodley added the sip Superset Improvement Proposal label Jun 7, 2023
@justinpark justinpark moved this from [VOTE] thread opened to [RESULT][VOTE] Approved in SIPs (Superset Improvement Proposals) Oct 6, 2023
@justinpark justinpark moved this from [RESULT][VOTE] Approved to In Development in SIPs (Superset Improvement Proposals) Oct 27, 2023
@michael-s-molina
Copy link
Member

Closing the SIP as it was approved.

@justinpark justinpark moved this from In Development to Implemented / Done in SIPs (Superset Improvement Proposals) Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sip Superset Improvement Proposal
Projects
Status: Implemented / Done
Development

No branches or pull requests

6 participants