Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Dataset tagging import/update api #22304

Draft
wants to merge 10,000 commits into
base: master
Choose a base branch
from

Conversation

cccs-RyanK
Copy link
Contributor

SUMMARY

This work is related to a previous PR which added the ability to tag datasets in the backend. This PR adds the support tobe able to add tags to datasets via the API.

Currently there is an ongoing PR to add tags to the front-end.

It is behind the TAGGING_SYSTEM feature flag

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Tests added to tests/integration_tests/datasets/api_tests.py

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags: TAGGING_SYSTEM
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Dec 1, 2022

Codecov Report

Merging #22304 (f2fda0b) into master (c09b8c6) will decrease coverage by 0.31%.
The diff coverage is 90.00%.

❗ Current head f2fda0b differs from pull request most recent head 5eb40ce. Consider uploading reports for the commit 5eb40ce to get more accurate results

@@            Coverage Diff             @@
##           master   #22304      +/-   ##
==========================================
- Coverage   67.18%   66.87%   -0.31%     
==========================================
  Files        1903     1844      -59     
  Lines       74275    70433    -3842     
  Branches     8110     7670     -440     
==========================================
- Hits        49898    47099    -2799     
+ Misses      22265    21346     -919     
+ Partials     2112     1988     -124     
Flag Coverage Δ
hive 52.48% <32.50%> (?)
mysql 78.00% <90.00%> (-1.35%) ⬇️
postgres 78.06% <90.00%> (-1.36%) ⬇️
presto 52.37% <32.50%> (?)
python 81.26% <90.00%> (+1.71%) ⬆️
sqlite 76.41% <47.50%> (-1.52%) ⬇️
unit 50.88% <40.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/tags/utils.py 82.92% <82.92%> (ø)
superset/datasets/commands/importers/v1/utils.py 63.00% <93.75%> (+9.07%) ⬆️
superset/datasets/dao.py 94.80% <100.00%> (+3.50%) ⬆️
superset/datasets/schemas.py 97.43% <100.00%> (+1.20%) ⬆️
superset/tags/exceptions.py 100.00% <100.00%> (+20.00%) ⬆️
superset/tags/models.py 93.79% <100.00%> (+0.35%) ⬆️

... and 1084 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@@ -148,6 +156,26 @@ def import_dataset(
return dataset


def extract_tags(config: Dict[str, Any]) -> Tuple[Dict[str, Any], Optional[List[str]]]:
tags = None
if "tags" in config.keys() and feature_flag_manager.is_feature_enabled(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a broader usage in our app of is_feature_enabled coming from this import directly from superset import is_feature_enabled, any reason why not using that instead?

@@ -167,6 +169,11 @@ def update(
if "metrics" in properties:
cls.update_metrics(model, properties.pop("metrics"), commit=commit)

if "tags" in properties and feature_flag_manager.is_feature_enabled(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

return config, tags


def import_tags(dataset: SqlaTable, existing: bool, tags: Optional[List[str]]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this method, it's only doing something if tags has value in it, so why is tags Optional then? Can we just mark it as required and not call this method if tag if empty for example? or do you see a reason to keep it as is?

@@ -148,6 +156,26 @@ def import_dataset(
return dataset


def extract_tags(config: Dict[str, Any]) -> Tuple[Dict[str, Any], Optional[List[str]]]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a unit test for this util function

return config, tags


def import_tags(dataset: SqlaTable, existing: bool, tags: Optional[List[str]]) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get unit test for this function

from superset.tags.models import get_tag, ObjectTypes, Tag, TaggedObject, TagTypes


def validate_custom_tags(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test?



def add_custom_object_tags(
tags: List[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test?

rusackas and others added 14 commits June 29, 2023 12:01
Co-authored-by: Aleksey Karpov <al.karpov@okko.tv>
…tils/client-ws-app (apache#24501)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… in /superset-websocket (apache#24522)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…websocket (apache#24524)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
)

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Justin Park <justinpark@apache.org>
@cccs-rc cccs-rc force-pushed the Dataset-Tagging-Import/Update-API branch from 4e2b091 to 0505f88 Compare March 6, 2024 19:09
@github-actions github-actions bot added doc Namespace | Anything related to documentation embedded plugins dependencies:npm github_actions Pull requests that update GitHub Actions code packages dependencies:python labels Mar 6, 2024
@geido geido marked this pull request as draft March 8, 2024 18:57
@geido
Copy link
Member

geido commented Mar 8, 2024

Hey @cccs-RyanK I moved this to a draft as it seems it requires a rebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies:npm dependencies:python doc Namespace | Anything related to documentation embedded github_actions Pull requests that update GitHub Actions code packages plugins review:draft size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet