Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(reports): Clear last value when state is WORKING #19941

Merged

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented May 4, 2022

SUMMARY

This PR fixes an issue with the alert feature, specifically the execution log reports different alert values for different states within the same execution run. Per this Slack thread the underlying issue is the report schedule synopsis—from which the execution log is created, i.e., a point-in-time snapshot—is when the state is WORKING the last_value and last_value_row_json need to be cleared otherwise they incorrectly reflect the values of the previous execution run.

I’m actually not sure whether there’s any value logging the WORKING state to the execution log. Does it provide any relevant information to the user?

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before Migration

Screen Shot 2022-05-03 at 8 06 09 PM

After Migration

Screen Shot 2022-05-03 at 8 06 37 PM

TESTING INSTRUCTIONS

Added unit tests and migration. The migration successfully ran in under a minute on a replica of Airbnb's production database with ~ 50,000 execution logs.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley requested a review from a team as a code owner May 4, 2022 03:17
@@ -94,35 +94,39 @@ def __init__(
self._start_dttm = datetime.utcnow()
self._execution_id = execution_id

def set_state_and_log(
def update_report_schedule_and_log(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt that update_report_schedule_and_log was more reflective of the function logic.

self._report_schedule.last_state = state
self._report_schedule.last_eval_dttm = dttm
self._report_schedule.last_eval_dttm = datetime.utcnow()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to pass in the timestamp which isn't used elsewhere.

@@ -132,7 +136,7 @@ def create_log(
end_dttm=datetime.utcnow(),
value=self._report_schedule.last_value,
value_row_json=self._report_schedule.last_value_row_json,
state=state,
state=self._report_schedule.state,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to pass in the state given the report schedule record has been updated with the new state.

@codecov
Copy link

codecov bot commented May 4, 2022

Codecov Report

Merging #19941 (17ed438) into master (aff10a7) will decrease coverage by 0.19%.
The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master   #19941      +/-   ##
==========================================
- Coverage   66.52%   66.33%   -0.20%     
==========================================
  Files        1714     1712       -2     
  Lines       65052    64049    -1003     
  Branches     6722     6722              
==========================================
- Hits        43279    42489     -790     
+ Misses      20061    19848     -213     
  Partials     1712     1712              
Flag Coverage Δ
hive 53.67% <20.00%> (+0.76%) ⬆️
mysql 82.03% <86.66%> (+0.09%) ⬆️
postgres 82.09% <86.66%> (+0.10%) ⬆️
presto 53.53% <20.00%> (+0.77%) ⬆️
python 82.52% <86.66%> (+0.11%) ⬆️
sqlite 81.83% <86.66%> (+0.08%) ⬆️
unit 49.03% <0.00%> (+0.99%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/reports/commands/execute.py 91.60% <86.66%> (+0.09%) ⬆️
superset/views/users/api.py 86.20% <0.00%> (-2.69%) ⬇️
superset/datasets/commands/importers/v0.py 91.47% <0.00%> (-2.15%) ⬇️
superset/connectors/base/models.py 86.70% <0.00%> (-1.95%) ⬇️
superset/tasks/scheduler.py 67.39% <0.00%> (-1.50%) ⬇️
superset/models/helpers.py 90.40% <0.00%> (-0.74%) ⬇️
superset/sql_validators/presto_db.py 82.43% <0.00%> (-0.47%) ⬇️
superset/initialization/__init__.py 91.42% <0.00%> (-0.30%) ⬇️
superset/reports/dao.py 83.20% <0.00%> (-0.27%) ⬇️
superset/views/utils.py 80.69% <0.00%> (-0.24%) ⬇️
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aff10a7...17ed438. Read the comment docs.

@john-bodley john-bodley force-pushed the john-bodley--fix-report-last-value branch from 39d2abf to 6903c48 Compare May 4, 2022 21:38
@john-bodley john-bodley requested a review from ktmud May 5, 2022 00:11
@ktmud
Copy link
Member

ktmud commented May 5, 2022

Is the report scheduler currently adding two execution logs for every execution, one in WORKING state, one in COMPLETE? Was the original intention to update current WORKING execution to COMPLETE (or FAILED) once an execution is done?

@john-bodley john-bodley requested a review from etr2460 May 6, 2022 17:18
.all()
):
schedule.last_value = None
schedule.last_value_row_json = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley do you think there's a chance that if there is a report that's in a working state, that we could potentially wind up altering the value after the report has completed? Just based off your migration timing of around a minute and most of these reports take under a minute to run, we may wind up with some reports that wind up with a completed state and no value, i.e., the commit happens here after the state has changed on the report side. Just wondering if it's safer not to change reports that are in progress.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eschutho I guess with any migration there's a risk of a race condition happening whilst the migration is running. I've replicated the session.commit() to ensure that there's also a commit directly after the ReportSchedule which likely only has a handful of reports in WORKING state.

Copy link
Member

@eschutho eschutho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just one question, but not a blocker.

@eschutho
Copy link
Member

eschutho commented May 9, 2022

Is the report scheduler currently adding two execution logs for every execution, one in WORKING state, one in COMPLETE?

Yes, the logs are immutable, so you'll get two logs, one for each state.

Was the original intention to update current WORKING execution to COMPLETE (or FAILED) once an execution is done?

No, I don't believe this was the intention.

@eschutho
Copy link
Member

eschutho commented May 9, 2022

I’m actually not sure whether there’s any value logging the WORKING state to the execution log. Does it provide any relevant information to the user?

@john-bodley I suppose it could give the user a start time. It's usually the same as the scheduled time, but if the report is taking a long time to run, it could indicate to the user that something is happening.

@ktmud
Copy link
Member

ktmud commented May 10, 2022

No, I don't believe this was the intention.

Then I don't understand why we have an end_dttm (and "duration") for each log? If we are not updating the log record, i.e., it is a "fire and forgot" operation, shouldn't a start_ddm be enough? The log list table is also less readable when you have duplicate records with same execution id, same timestamps but one has empty values and a spinner icon that never stops.

@john-bodley
Copy link
Member Author

I think this PR makes things less wrong so it's directionally correct. There's likely merit in rethinking what the desirable UX is from a reporting standpoint, though that can be handled in a different PR.

@john-bodley john-bodley merged commit 89a844a into apache:master May 13, 2022
@john-bodley john-bodley deleted the john-bodley--fix-report-last-value branch May 13, 2022 04:54
philipher29 pushed a commit to ValtechMobility/superset that referenced this pull request Jun 9, 2022
* fix(reports): Clear last value when state is WORKING

* Update cbe71abde154_fix_report_schedule_and_log.py

Co-authored-by: John Bodley <john.bodley@airbnb.com>
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants