Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Fix schema checks in database updates, again #974

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

kim
Copy link
Contributor

@kim kim commented Mar 14, 2024

It turns out that the changes introduced in #734 do not result in more reliable detection of incompatible schema updates. This is because the datastructures involved can be converted into each other, but that conversion is not bijective.

Fix this by manually adjusting the schema of the existing table to be comparable to the proposed table.

Also log details about a schema mismatch to the user-retrievable database log, in unified diff format.

Description of Changes

Please describe your change, mention any related tickets, and so on here.

API and ABI breaking changes

If this is an API or ABI breaking change, please apply the
corresponding GitHub label.

Expected complexity level and risk

How complicated do you think these changes are? Grade on a scale from 1 to 5,
where 1 is a trivial change, and 5 is a deep-reaching and complex change.

This complexity rating applies not only to the complexity apparent in the diff,
but also to its interactions with existing and future code.

If you answered more than a 2, explain what is complex about the PR,
and what other components it interacts with in potentially concerning ways.

@kim kim requested review from mamcx and jdetter March 14, 2024 10:37
.iter()
.cloned()
.sorted_by_key(|x| x.col_pos)
.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like the only change necessary is to sort the columns here, all the other stuff is done in TableSchema::from_def, so I propose that instead, we create TableSchema:sort_columns(mut self) that also recompute the row_type and keep the rest the same:

fn compute_row_type(columns: &[ColumnSchema]) -> ProductType {
        ProductType::new(
            columns
                .iter()
                .map(|c| ProductTypeElement {
                    name: Some(c.col_name.clone()),
                    algebraic_type: c.col_type.clone(),
                })
                .collect(),
        )
    }

    pub fn sort_columns(&mut self) {
        self.columns.sort_by_key(|x| x.col_pos);
        self.row_type = Self::compute_row_type(&self.columns);
    }

then is only:

        if let Some(known_schema) = known_tables.remove(proposed_table_name) {
            let mut known_schema =
                TableSchema::from_def(known_schema.table_id, TableDef::from(known_schema.into_owned()));
            known_schema.sort_columns();

            let proposed_schema = TableSchema::from_def(known_schema.table_id, proposed_schema_def);

            if proposed_schema != known_schema {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not correct, from_def also adds additional “generated” definitions and doesn’t correctly remove them.

Please feel free to fix the datatypes instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that in from_def the generated get duplicates from the ones supplied by TableDef?

Anyway, if is the case then approve this PR and later fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, from_def assumes that the given TableDef cannot know about those generated values (because the module cannot know about them). However, converting an existing TableSchema into a TableDef will leave the generated ones in place. So if we turn that into a TableSchema again, there will be duplicates.

I added this to the docstring.

It is hard to tell which conversions should be legal -- my expectation would be that TableSchema -> TableDef should "forget" about anything a TableDef coming from a module would not contain. I would also expect that TableSchema::new ensures the same ordering as TableSchema::from_def.

It is also odd that all collections are Vecs, when semantically they should be sets. I suppose that's because what is stored in the system tables is canonical ultimately.

One approach could be to make validated correct all of those potential pitfalls, and return a newtype. So we know we're comparing Validated(TableSchema) == Validated(Tableschema).

Copy link
Contributor

@mamcx mamcx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Fix the issues with round-trips is for a later PR

@jdetter jdetter added the test-with-bots Recommend to test under higher CCU label Mar 14, 2024
It turns out that the changes introduced in #734 do not result in more
reliable detection of incompatible schema updates. This is because the
datastructures involved can be converted into each other, but that
conversion is not bijective.

Fix this by manually adjusting the schema of the existing table to be
comparable to the proposed table.

Also log details about a schema mismatch to the user-retrievable database log,
in unified diff format.
@kim kim force-pushed the kim/fix-db-updates-again branch from 5bc6d2c to bf633b5 Compare March 15, 2024 06:57
@kim kim added this pull request to the merge queue Mar 15, 2024
Merged via the queue into master with commit e9db89e Mar 15, 2024
6 checks passed
@kim kim deleted the kim/fix-db-updates-again branch March 18, 2024 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test-with-bots Recommend to test under higher CCU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants