Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add migration to add created_by_fk as explicit owner for charts and datasets #20617

Merged
merged 2 commits into from
Jul 26, 2022

Conversation

john-bodley
Copy link
Member

@john-bodley john-bodley commented Jul 6, 2022

SUMMARY

#19854 introduced frontend logic which would check that (for Alpha users) only owners can edit datasets. The issue is the definition of ownership between the frontend and backend differs. Specifically the frontend merely checks whether the user is listed as a owner—per the owners relationship—whereas the backend uses the check_ownership method which checks (in addition to the owner and owners relationships) whether the user is the creator. This dichotomy means that creators who are not explicitly listed as owners are unable to edit their datasets.

Unwinding the logic surfaces additional observations regarding ownership for various assets, i.e., charts, dashboards, datasets, and reports:

  1. The DAO logic correctly adds the creator as an explicit owner (example) so this isn't an issue for any asset which was created via a DAO commend.
  2. Prior to the DAO logic only dashboards contained pre_add logic which added the creator as an explicit owner meaning that this issue only impacts historical charts and datasets.
  3. The check_ownership method contained a check for an owner relationship however grepping the code per git grep "\bowner = relationship\b" yielded no matches, i.e., it served no purpose.

Given these insights this PR performs the following:

  1. It adds a database migration to add all creators as owners (if not already listed) for both charts and datasets. Granted this may be a little presumptive, i.e., we may be re-adding someone as an "owner" who was previously removed, however there's no way to determine what the intent of the current database state is.
  2. It updates the check_ownership method to remove checking the owner (obsolete) and created_by (addressed in the migration) fields.

Finally a few things worth noting:

  1. The frontend likely should rely on an API call to check ownership to adhere to the DRY principle and ensure that the frontend and backend logic for ownership is consistent.
  2. The backend SQLAlchemy pre_add, pre_update, etc. checks should likely be deprecated given the logic also resides in the DAO commands. This really muddies the code logic and makes it hard to grok how things work (or should work).
  3. I'm not really sure why the DAO validate command also populates fields, i.e., adds owners. This seems somewhat contradictory and non-intuitive, i.e., validation shouldn't update the model. I gather it was likely for efficiency reasons, but I sense there is merit in rewriting the DAO logic to break up the validation and population phases.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Tested the migration and confirmed that,

SELECT 
  s.id, 
  s.created_by_fk
FROM 
  slices s
LEFT OUTER JOIN 
  slice_user su
ON 
 s.id = su.slice_id AND 
 s.created_by_fk = su.user_id  
WHERE 
  su.slice_id IS NULL

,

SELECT 
  sd.id, 
  sd.created_by_fk
FROM 
   sl_datasets sd
LEFT OUTER JOIN 
  sl_dataset_users sdu
ON 
 sd.id = sdu.dataset_id AND 
 sd.created_by_fk = sdu.user_id  
WHERE 
  sdu.dataset_id IS NULL

and

SELECT 
  t.id, 
  t.created_by_fk
FROM 
  tables t
LEFT OUTER JOIN 
  sqlatable_user su
ON 
 t.id = su.table_id AND 
 t.created_by_fk = su.user_id  
WHERE 
  su.table_id IS NULL

returned no rows.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@john-bodley john-bodley requested a review from a team as a code owner July 6, 2022 06:22
@john-bodley john-bodley changed the title fix: Add migration to add created_by_fk as owner fix: Add migration to add created_by_fk as explicit owner for charts and datasets Jul 6, 2022
@codecov
Copy link

codecov bot commented Jul 6, 2022

Codecov Report

Merging #20617 (da6cecd) into master (f0ca158) will decrease coverage by 11.95%.
The diff coverage is 0.00%.

❗ Current head da6cecd differs from pull request most recent head fe93669. Consider uploading reports for the commit fe93669 to get more accurate results

@@             Coverage Diff             @@
##           master   #20617       +/-   ##
===========================================
- Coverage   66.79%   54.84%   -11.96%     
===========================================
  Files        1753     1752        -1     
  Lines       65618    65613        -5     
  Branches     6952     6938       -14     
===========================================
- Hits        43831    35986     -7845     
- Misses      20023    27867     +7844     
+ Partials     1764     1760        -4     
Flag Coverage Δ
hive ?
mysql ?
postgres ?
presto 53.69% <0.00%> (-0.06%) ⬇️
python 58.06% <0.00%> (-24.82%) ⬇️
sqlite ?
unit 50.68% <0.00%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/views/base.py 55.70% <0.00%> (-19.66%) ⬇️
superset/utils/dashboard_import_export.py 0.00% <0.00%> (-100.00%) ⬇️
superset/key_value/commands/upsert.py 0.00% <0.00%> (-89.14%) ⬇️
superset/key_value/commands/update.py 0.00% <0.00%> (-88.89%) ⬇️
superset/key_value/commands/delete.py 0.00% <0.00%> (-85.30%) ⬇️
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
superset/key_value/commands/delete_expired.py 0.00% <0.00%> (-80.77%) ⬇️
superset/dashboards/commands/importers/v0.py 15.62% <0.00%> (-76.25%) ⬇️
superset/datasets/commands/update.py 25.88% <0.00%> (-68.10%) ⬇️
superset/datasets/commands/create.py 30.18% <0.00%> (-67.86%) ⬇️
... and 298 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

Copy link
Member

@ktmud ktmud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change makes sense to me.

)
.filter(SqlaTableUser.table_id == None),
)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the same for the new dataset models, too? As the shadow-writing is already in process.

dataset_user_association_table = sa.Table(
"sl_dataset_users",
Model.metadata, # pylint: disable=no-member
sa.Column("dataset_id", sa.ForeignKey("sl_datasets.id"), primary_key=True),
sa.Column("user_id", sa.ForeignKey("ab_user.id"), primary_key=True),
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ktmud for the tip. I've updated the migration and re-tested.

@ktmud
Copy link
Member

ktmud commented Jul 6, 2022

we may be re-adding someone as an "owner" who was previously removed

Since the problem seems to be a legacy dataset not created by DAO now has an empty owners list because the creator was not explicitly added, I wonder if it is safer to just set the creator as the sole owner for datasets/slices where the owners list is empty?

@john-bodley
Copy link
Member Author

@ktmud per your comment,

Since the problem seems to be a legacy dataset not created by DAO now has an empty owners list because the creator was not explicitly added, I wonder if it is safer to just set the creator as the sole owner for datasets/slices where the owners list is empty?

I hear what you're saying, though the old check_ownership logic was based on always including the creator as an owner regardless of who was listed and thus I think you could argue either way which approach is best—acknowledging that neither are perfect—, i.e., the current migration ensures code logic parity whereas your suggestion likely reduces the level of churn in terms of re-adding the creator as an owner who previously was removed.

@john-bodley
Copy link
Member Author

ping @ktmud

@john-bodley john-bodley requested a review from ktmud July 20, 2022 03:07
@ktmud
Copy link
Member

ktmud commented Jul 20, 2022

Do we need a rebase? There is recently a new migration merged.

@john-bodley john-bodley merged commit e1094e2 into apache:master Jul 26, 2022
@john-bodley john-bodley deleted the john-bodley--fix-ownership branch July 26, 2022 17:22
@AAfghahi
Copy link
Member

AAfghahi commented Aug 1, 2022

Hello, just to flag this. When I tried to do a migration with my database that has just examples, this required me to drop the database and then re-populate.

@ktmud
Copy link
Member

ktmud commented Aug 1, 2022

@AAfghahi can you share the error messages you got?

@AAfghahi
Copy link
Member

AAfghahi commented Aug 1, 2022

Hey, sorry I thought that I wrote in here but forgot to press comment apparently.

I did not save the error message unfortunately, however. I remember that it would hit this migration and then error out because sl_dataset_user (which I think is the new fk) was null and that caused an error. The sql alchemy error said that it was most likely a database error.
``

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants