Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add unique constraint to tagged_objects #26654

Merged
merged 16 commits into from
Jan 19, 2024
Merged
21 changes: 18 additions & 3 deletions superset/daos/tag.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,29 @@
object_type: ObjectType, object_id: int, tag_names: list[str]
) -> None:
tagged_objects = []
for name in tag_names:

# striping and de-dupping
clean_tag_names: set[str] = {tag.strip() for tag in tag_names}

for name in clean_tag_names:
type_ = TagType.custom
tag_name = name.strip()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have led to dups, if you add "hello" and "hello " it would probably trigger an error. Wasn't the root cause but might as well clean it up.

tag = TagDAO.get_by_name(tag_name, type_)
tag = TagDAO.get_by_name(name, type_)
tagged_objects.append(
TaggedObject(object_id=object_id, object_type=object_type, tag=tag)
)

# Check if the association already exists
existing_tagged_object = (
db.session.query(TaggedObject)
.filter_by(object_id=object_id, object_type=object_type, tag=tag)
.first()
)

if not existing_tagged_object:
tagged_objects.append(

Check warning on line 73 in superset/daos/tag.py

View check run for this annotation

Codecov / codecov/patch

superset/daos/tag.py#L73

Added line #L73 was not covered by tests
TaggedObject(object_id=object_id, object_type=object_type, tag=tag)
)

db.session.add_all(tagged_objects)
db.session.commit()

Expand Down
7 changes: 7 additions & 0 deletions superset/migrations/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,10 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

import os
import sys

# hack to be able to import / reuse migration_utils.py in revisions
module_dir = os.path.dirname(os.path.realpath(__file__))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a breakthrough, I tried before to get some sort of migration_utils.py module going to reuse code across revisions, and never was able to do it. Here's the hack that enables it. This is great because constraint handling and such get verbose if done right, and all the dialect-specific stuff shouldn't be repeated in each revision. Hoping we can grow migration_utils in the future to simplify and improve revisions.

sys.path.append(module_dir)
46 changes: 46 additions & 0 deletions superset/migrations/migration_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from alembic.operations import BatchOperations, Operations

Check warning on line 18 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L18

Added line #L18 was not covered by tests

naming_convention = {

Check warning on line 20 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L20

Added line #L20 was not covered by tests
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"uq": "uq_%(table_name)s_%(column_0_name)s",
}


def create_unique_constraint(

Check warning on line 26 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L26

Added line #L26 was not covered by tests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we had some trouble in the past with some databases where we couldn't delete the constraint because the constraint name couldn't be inferred. Would it be possible to explicitly pass a constraint name here and in drop_unique_constraint to prevent this from happening again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember seeing this before too, and three's also some instance of us using a naming_convention parameter in alembic. I thought about using this feature but wasn't clear on how it would affect things... Let me try something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the naming_convention feature should do the trick, it's in the latest commit. https://alembic.sqlalchemy.org/en/latest/ops.html#alembic.operations.AbstractOperations.batch_alter_table.params.naming_convention

op: Operations, index_id: str, table_name: str, uix_columns: list[str]
) -> None:
with op.batch_alter_table(

Check warning on line 29 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L29

Added line #L29 was not covered by tests
table_name, naming_convention=naming_convention
) as batch_op:
batch_op.create_unique_constraint(index_id, uix_columns)

Check warning on line 32 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L32

Added line #L32 was not covered by tests


def drop_unique_constraint(op: Operations, index_id: str, table_name: str) -> None:
dialect = op.get_bind().dialect.name

Check warning on line 36 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L35-L36

Added lines #L35 - L36 were not covered by tests

with op.batch_alter_table(

Check warning on line 38 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L38

Added line #L38 was not covered by tests
table_name, naming_convention=naming_convention
) as batch_op:
if dialect == "mysql":

Check warning on line 41 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L41

Added line #L41 was not covered by tests
# MySQL requires specifying the type of constraint
batch_op.drop_constraint(index_id, type_="unique")

Check warning on line 43 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L43

Added line #L43 was not covered by tests
else:
# For other databases, a standard drop_constraint call is sufficient
batch_op.drop_constraint(index_id)

Check warning on line 46 in superset/migrations/migration_utils.py

View check run for this annotation

Codecov / codecov/patch

superset/migrations/migration_utils.py#L46

Added line #L46 was not covered by tests
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import enum

import migration_utils as utils
import sqlalchemy as sa
from alembic import op
from sqlalchemy import Column, Enum, Integer, MetaData, Table
from sqlalchemy.sql import and_, func, select

# revision identifiers, used by Alembic.
revision = "96164e3017c6"
down_revision = "59a1450b3c10"


class ObjectType(enum.Enum):
# pylint: disable=invalid-name
query = 1
chart = 2
dashboard = 3
dataset = 4


# Define the tagged_object table structure
metadata = MetaData()
tagged_object_table = Table(
"tagged_object",
metadata,
Column("id", Integer, primary_key=True),
Column("tag_id", Integer),
Column("object_id", Integer),
Column("object_type", Enum(ObjectType)), # Replace ObjectType with your Enum
)

index_id = "uix_tagged_object"
table_name = "tagged_object"
uix_columns = ["tag_id", "object_id", "object_type"]


def upgrade():
bind = op.get_bind() # Get the database connection bind

# Reflect the current database state to get existing tables
metadata.reflect(bind=bind)

# Delete duplicates if any
min_id_subquery = (
select(
[
func.min(tagged_object_table.c.id).label("min_id"),
tagged_object_table.c.tag_id,
tagged_object_table.c.object_id,
tagged_object_table.c.object_type,
]
)
.group_by(
tagged_object_table.c.tag_id,
tagged_object_table.c.object_id,
tagged_object_table.c.object_type,
)
.alias("min_ids")
)

delete_query = tagged_object_table.delete().where(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a little scary, needs review, but should be ok

tagged_object_table.c.id.notin_(select([min_id_subquery.c.min_id]))
)

bind.execute(delete_query)

# Create unique constraint
utils.create_unique_constraint(op, index_id, table_name, uix_columns)


def downgrade():
utils.drop_unique_constraint(op, index_id, table_name)
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""merging two heads

Revision ID: 15a2c68a2e6b
Revises: ('96164e3017c6', 'a32e0c4d8646')
Create Date: 2024-01-18 12:12:52.174742

"""

# revision identifiers, used by Alembic.
revision = "15a2c68a2e6b"
down_revision = ("96164e3017c6", "a32e0c4d8646")

import sqlalchemy as sa
from alembic import op


def upgrade():
pass


def downgrade():
pass
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
"""merging

Revision ID: 1cf8e4344e2b
Revises: ('e863403c0c50', '15a2c68a2e6b')
Create Date: 2024-01-19 08:42:37.694192

"""

# revision identifiers, used by Alembic.
revision = "1cf8e4344e2b"
down_revision = ("e863403c0c50", "15a2c68a2e6b")

import sqlalchemy as sa
from alembic import op


def upgrade():
pass


def downgrade():
pass
Loading
Loading