Add unique constraints to schema yaml file #132

stuartmcalpine · 2024-05-31T10:15:20Z

The schema.yaml file now has the list of unique column constraints, this is read during database creation in the creation script.

Also have added owner and owner_type to the unique constraints of the dataset table.

JoanneBogart

This will do the job, but I think the schema and code would benefit from some reorganization. The goal should be to make the code as generic as possible, so that, for example, adding another table or adding a constraint to a table that didn't have any before doesn't require any changes in the code. I think this can be done by organizing the schema by table, something like

tables:
    execution:
         column_definitions:
                 <stuff>
    execution_alias:
         column_definitions:
                 <more stuff>
         unique_constraints:
                  <other stuff>
    provenance:
           <and so forth>

The tables element isn't strictly necessary but makes it a little clearer what's going on and also leaves a place for non-table-specific information in case we need to add some in the future. Then the code can look like

#  load the schema
for tbl in tables:
      <process the columns>
      if "unique_constraints" in tbl.keys():
             <process uniqueness constraints>

The only thing I see which wouldn't immediately fit into this arrangement is the special handling for dependencies, depending on whether a production schema is present or not (does that happen only with sqlite?). We might be able to avoid special code by adding another optional key under a table, say "sqlite" or maybe "no_production" (used for now only by dependency) which describes what's needed. If that's too awkward, there would have to be some conditional code in the table-handling which would only be activated when table is "dependency", but a lot less than there is now.

stuartmcalpine · 2024-06-03T12:41:54Z

Have made the schema.yaml a bit more generic as suggested.

Creating the schema now no longer needs individual functions for each table, there is a generic _BuildTable function that basically takes everything fom the yaml.

Still a bit of a special case for the dependency table in the schema creation, _FixDependencyColumns, but I think it's ok. Reduces the code a good bit which is nice

JoanneBogart

The new handling of tables looks good, but more can be done to make this more general. See inline comments for a couple suggestions.

JoanneBogart · 2024-06-03T18:25:33Z

scripts/create_registry_schema.py

-_Execution(schema)
-_ExecutionAlias(schema)
-_Provenance(schema)
+for table_name in [


A construction like this, where the code specifies table names, shouldn't be necessary. There could be a routine get_table_names() which would look them up from schema_data

JoanneBogart · 2024-06-03T18:54:37Z

scripts/create_registry_schema.py

+        "__tablename__": table,
+    }
+
+    if (


This needs to be done in a more general fashion; there are several cases it doesn't handle:

multiple uniqueness constraints

multiple indices

one or more indices but no uniqueness constraints.

The first thing to do is to come up with a better representation of these things in the schema yaml file which supports multiple indices and constraints in a single table. We don't have any need for that now but someday we might. Write a routine which can handle a list of uniqueness constraints, outputting the sort of metadata sqlalchemy expects, and another for a list of indices.

Then, rather than having if ... elif ... elif... to handle all the cases, it would be somewhat nicer (but not crucial) to add on metadata for uniqueness constraints if there are any, then the same thing for indices.

Think I have this now, should be able to add an arbitrary number of unique constraints and indexes to each table.

I've also added a doc string to the schema.yaml file

JoanneBogart

Looks good!
One minor suggestion (optional): to avoid possible future confusion you might change all occurrences of indexs to indexes since that's an actual word.

Add unique constraints to schema yaml file

1118b04

stuartmcalpine mentioned this pull request May 31, 2024

Change unique constraints, and make then flexible #125

Closed

stuartmcalpine requested a review from JoanneBogart May 31, 2024 10:15

JoanneBogart requested changes May 31, 2024

View reviewed changes

Make the schema.yaml more generic

ca87303

stuartmcalpine requested a review from JoanneBogart June 3, 2024 12:42

JoanneBogart requested changes Jun 3, 2024

View reviewed changes

stuartmcalpine added 2 commits June 4, 2024 17:01

Allow for multiple unique constraints and column indexes

c4070ae

Fix typo

2212d3a

stuartmcalpine requested a review from JoanneBogart June 4, 2024 15:04

JoanneBogart approved these changes Jun 4, 2024

View reviewed changes

stuartmcalpine added 3 commits June 5, 2024 22:59

Fix conflicts

c08edfb

Update version and changelog

b519e24

Fix

9f1c20b

stuartmcalpine merged commit 987067a into main Jun 5, 2024
20 checks passed

stuartmcalpine deleted the u/stuart/update_dataset_constraints branch June 5, 2024 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unique constraints to schema yaml file #132

Add unique constraints to schema yaml file #132

stuartmcalpine commented May 31, 2024

JoanneBogart left a comment •

edited

Loading

stuartmcalpine commented Jun 3, 2024

JoanneBogart left a comment

JoanneBogart Jun 3, 2024

stuartmcalpine Jun 4, 2024

JoanneBogart Jun 3, 2024

stuartmcalpine Jun 4, 2024

JoanneBogart left a comment

Add unique constraints to schema yaml file #132

Add unique constraints to schema yaml file #132

Conversation

stuartmcalpine commented May 31, 2024

JoanneBogart left a comment • edited Loading

Choose a reason for hiding this comment

stuartmcalpine commented Jun 3, 2024

JoanneBogart left a comment

Choose a reason for hiding this comment

JoanneBogart Jun 3, 2024

Choose a reason for hiding this comment

stuartmcalpine Jun 4, 2024

Choose a reason for hiding this comment

JoanneBogart Jun 3, 2024

Choose a reason for hiding this comment

stuartmcalpine Jun 4, 2024

Choose a reason for hiding this comment

JoanneBogart left a comment

Choose a reason for hiding this comment

JoanneBogart left a comment •

edited

Loading