Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store associated production schema when creating non-production schema #124

Merged
merged 14 commits into from
May 31, 2024

Conversation

JoanneBogart
Copy link
Collaborator

Linkage between a non-production schema and a production schema is determined when the non-production schema is created. Save the production schema name in the Provenance table of the non-production schema during creation. Fetch it when a connection to a schema is made.

production but is called something else) and registering "legacy" datasets:
* eliminate old script create_registry_db.py
* add column to provenance table to keep track of associated production db
* make all column names lowercase (easier to deal with in psql)
* hide imports of git as much as possible.  It's not required for typical
  user, only for those creating schemas
* try to keep line length <= 80 in most cases
Copy link
Collaborator

@stuartmcalpine stuartmcalpine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine overall, left some comments in the review.

In addition:

  • Remove GitPython from dependencies from pyproject.toml?
  • CI test for new column in prov?
  • Update changelog and code version
  • Update database version

type: "StringShort"
description: "Describes the software that can read the dataset (e.g., 'gcr-catalogs', 'skyCatalogs')"
cli_optional: True
access_api_configuration:
type: "String"
description: "Additional (text) info which may be needed by access_API"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lowercase the API in the description also

if db_connection.dialect == "sqlite":
prov_name = "provenance"
else:
prov_name = ".".join([self._schema, "provenance"])

# prov_table = self._metadata.tables[prov_name]
if prov_name in self._metadata.tables and get_db_version:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More general comment, is there any reason why prov_name shouldn't in the list of tables, i.e.,

if prov_name in self._metadata.tables

If this is not true, shouldnt an error be raised?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so. This is not code I changed. Perhaps the thinking was then that there should be support for old-style databases with no provenance table, but there is no reason for it now.

if db_connection.dialect == "sqlite":
prov_name = "provenance"
else:
prov_name = ".".join([self._schema, "provenance"])

# prov_table = self._metadata.tables[prov_name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left over comment? Can delete?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

stmt = stmt.order_by(prov_table.c.provenance_id.desc())
with self._engine.connect() as conn:
results = conn.execute(stmt)
conn.commit()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need commit for queries (I think)

Can also remove the commit below if true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Now removed.

@@ -242,12 +250,17 @@ def _insert_provenance(
One of "create", "migrate"
comment : str, optional
Briefly describe reason for new version
associated_production : str, defaults to "production"
Namae of production schema, if any, this schema may reference
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Name"

@JoanneBogart
Copy link
Collaborator Author

JoanneBogart commented May 29, 2024

I think I've addressed everything except for adding a unit test for the new db field. I'm not sure such a test would accomplish much. In fact I'd like to get rid of test_database.test_db_version (the only way it can fail is if you forget - as I did - to update the test. It doesn't really test the code). I'd also like to get rid of test_database_functions.test_db_version. Here there is something to test - whether the database version of the current code is compatible with the database you're connecting to - but the test as it stands now, since the code version running the query is known to be the same as the one that created the db - doesn't really check that.

Instead there should be a check when people use the code for compatibility: major versions of db should match and minor version db for the code running should be >= minor version of the db it's connecting to. I'll try to get that in this PR.

@stuartmcalpine
Copy link
Collaborator

I'd also like to get rid of test_database_functions.test_db_version.

I'm happy for that "test" to go, it doesn't do anything as you say

@JoanneBogart
Copy link
Collaborator Author

For now I've just eliminated those two tests. While trying to add the code to make the db version check for all users I discovered it was more work than I thought because (unless I'm misunderstanding) the fetch of version happens every time a table is established. It really is only needed once per connection. That will take more time and thought than I want to put in right now.
So this is perhaps ready for review now.

@JoanneBogart JoanneBogart merged commit d77691b into main May 31, 2024
8 checks passed
@JoanneBogart JoanneBogart deleted the u/jrbogart/specify_prod_schema branch May 31, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants