-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unique constraints to schema yaml file #132
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will do the job, but I think the schema and code would benefit from some reorganization. The goal should be to make the code as generic as possible, so that, for example, adding another table or adding a constraint to a table that didn't have any before doesn't require any changes in the code. I think this can be done by organizing the schema by table, something like
tables:
execution:
column_definitions:
<stuff>
execution_alias:
column_definitions:
<more stuff>
unique_constraints:
<other stuff>
provenance:
<and so forth>
The tables
element isn't strictly necessary but makes it a little clearer what's going on and also leaves a place for non-table-specific information in case we need to add some in the future. Then the code can look like
# load the schema
for tbl in tables:
<process the columns>
if "unique_constraints" in tbl.keys():
<process uniqueness constraints>
The only thing I see which wouldn't immediately fit into this arrangement is the special handling for dependencies, depending on whether a production schema is present or not (does that happen only with sqlite?). We might be able to avoid special code by adding another optional key under a table, say "sqlite" or maybe "no_production" (used for now only by dependency
) which describes what's needed. If that's too awkward, there would have to be some conditional code in the table-handling which would only be activated when table is "dependency", but a lot less than there is now.
Have made the schema.yaml a bit more generic as suggested. Creating the schema now no longer needs individual functions for each table, there is a generic Still a bit of a special case for the dependency table in the schema creation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new handling of tables looks good, but more can be done to make this more general. See inline comments for a couple suggestions.
scripts/create_registry_schema.py
Outdated
_Execution(schema) | ||
_ExecutionAlias(schema) | ||
_Provenance(schema) | ||
for table_name in [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A construction like this, where the code specifies table names, shouldn't be necessary. There could be a routine get_table_names()
which would look them up from schema_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
scripts/create_registry_schema.py
Outdated
"__tablename__": table, | ||
} | ||
|
||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be done in a more general fashion; there are several cases it doesn't handle:
- multiple uniqueness constraints
- multiple indices
- one or more indices but no uniqueness constraints.
The first thing to do is to come up with a better representation of these things in the schema yaml file which supports multiple indices and constraints in a single table. We don't have any need for that now but someday we might. Write a routine which can handle a list of uniqueness constraints, outputting the sort of metadata sqlalchemy expects, and another for a list of indices.
Then, rather than having if
... elif
... elif
... to handle all the cases, it would be somewhat nicer (but not crucial) to add on metadata for uniqueness constraints if there are any, then the same thing for indices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I have this now, should be able to add an arbitrary number of unique constraints and indexes to each table.
I've also added a doc string to the schema.yaml file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
One minor suggestion (optional): to avoid possible future confusion you might change all occurrences of indexs
to indexes
since that's an actual word.
The
schema.yaml
file now has the list of unique column constraints, this is read during database creation in the creation script.Also have added
owner
andowner_type
to the unique constraints of thedataset
table.