Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dolt table import requires an empty column for AUTO_INCREMENT columns #5848

Closed
alecstein opened this issue May 1, 2023 · 5 comments
Closed
Assignees
Labels
bug Something isn't working cli enhancement New feature or request

Comments

@alecstein
Copy link
Contributor

alecstein commented May 1, 2023

This works.

create table bodegas (
  bodega_id INT AUTO_INCREMENT,
  bodega_name VARCHAR(200),
  primary key (`bodega_id`)
  );
  
insert into bodegas values (NULL, 'The Bodega of Cats')

This also works.

create table bodegas (
  bodega_id INT AUTO_INCREMENT,
  bodega_name VARCHAR(200),
  primary key (`bodega_id`)
  );
  
insert into bodegas (bodega_name) values ('The Bodega of Cats')

However this does not work:

create table bodegas (
  bodega_id INT AUTO_INCREMENT,
  bodega_name VARCHAR(200),
  primary key (`bodega_id`)
  );

bodegas.csv

bodega_name
The Bodega of Cats
> dolt table import -u bodegas.csv
Error determining the output schema.
cause: input primary keys do not match primary keys of existing table
When attempting to move data from csv file:bodega.csv to bodegas, could not determine the output schema.
Schema File: ""
explicit pks: ""

To restate the point more concisely, you can do insert into bodegas (bodega_name) values ('The Bodega of Cats') without drawing a foul, but you can't insert the same data

bodega_name
The Bodega of Cats

from a CSV.

You should be allowed to insert files via CSV without providing the bodega_id column.

@alecstein alecstein added the bug Something isn't working label May 1, 2023
@alecstein alecstein changed the title dolt table import requires an empty column fro AUTO_INCREMENT columns dolt table import requires an empty column for AUTO_INCREMENT columns May 1, 2023
@alecstein alecstein added the enhancement New feature or request label May 1, 2023
@fulghum
Copy link
Contributor

fulghum commented May 2, 2023

It seems like dolt table import's logic to map from the source csv to the destination table schema is smart enough to know that any columns with default values don't need to be included in the csv. Changing the repro in issue #5855 to use a column default (instead of a trigger) allowed dolt table import to happily import the csv.

dolt table import doesn't seem to make the same decision for an auto_increment column though. If we're going to do that for cols with default values, it seems fair to do the same for auto_increment columns.

The docs for dolt table import don't explain this detail today either:

If the schema for the existing table does not match the schema for the new file, the import will be aborted by default. To overwrite both the table and the schema, use -c -f.

Seems like we should make two changes:

  • treat auto_increment columns as optional to specify in the input file for dolt table import, just like we do for default columns.
  • update the docs for dolt table import to explain how we handle columns with default values that don't need to be supplied.

@timsehn timsehn added the cli label May 3, 2023
@nicktobey nicktobey self-assigned this May 3, 2023
@nicktobey
Copy link
Contributor

This appears to be a case of overzealous checks. import.go::newImportSqlEngineMover has a check that every PK exists in the csv and errors if it doesn't. Removing that check prints a warning further down the line:

Warning: There are fewer columns in the import file's schema than the table's schema.
If unintentional, check for any typos in the import file's header.

But all rows get successfully added.

We probably want to fix this check to skip auto incrementing rows.

What's curious is why the workaround from #5855 worked, if this check exists. It turns out, our SQL engine appears to consider a result table column to not be a primary key if it has a default value!

@alecstein
Copy link
Contributor Author

What's curious is why the workaround from #5855 worked, if this check exists. It turns out, our SQL engine appears to consider a result table column to not be a primary key if it has a default value!

You can correct me if I'm wrong, but here's how I interpret the CSV import logic: "make sure that the CSV has all the necessary primary key columns, except if it has a default value for one of those columns. In that case, replace the column with the default value." This is equivalent to saying "if it has a default value, consider this column to not be a primary key, for the purpose of this initial check." Which seems reasonable to me.

However I still think that the "correct" way to import a CSV is to check that the inserted row has all of its primary key values upon inserting, and not upon reading. Whether this can be done efficiently I don't really know.

@nicktobey
Copy link
Contributor

There's definitely a bug on our end where tables with default values have an incorrect schema. I filed it as dolthub/go-mysql-server#1753

I agree with your reasoning here though: we perform this check as soon as the CSV is read, but it's overbroad and we should instead be running checks like this when we're actually performing the inserts... and we are. This check appears to be redundant, in order to fail early and return a more helpful error message. But it's not helpful if it's wrong.

I could relax the check, but I don't think we're going to be able to fix #5855 unless we remove it entirely. So that's what I'm going to do.

@nicktobey
Copy link
Contributor

Should be fixed as of #5881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants