Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate column tags, use column name for identity in all cases #3963

Closed
druvv opened this issue Jul 29, 2022 · 4 comments · Fixed by #5922
Closed

Eliminate column tags, use column name for identity in all cases #3963

druvv opened this issue Jul 29, 2022 · 4 comments · Fixed by #5922
Labels
bug Something isn't working diff enhancement New feature or request merge Issues relating to merge

Comments

@druvv
Copy link
Contributor

druvv commented Jul 29, 2022

Today, Dolt has an internal tag system that is used to assign identities to columns. The tag system helps Dolt diff and merge columns even when they have been renamed or modified.

For example, let's say I have a users table with a name as the primary key and a phone number stored as an integer. If I rename the phone number column, the internal tag of that column remains the same.

$ dolt sql -q "CREATE table users (name VARCHAR(100) PRIMARY KEY, phone BIGINT);"
$ dolt sql -q "INSERT INTO users VALUES ('druvv', 4224224242);"
$ dolt sql -q "SELECT * from users;"
+-------+------------+
| name  | phone      |
+-------+------------+
| druvv | 4224224242 |
+-------+------------+
$ dolt schema tags
+-------+--------+-------+
| table | column | tag   |
+-------+--------+-------+
| users | name   | 9815  |
| users | phone  | 15960 |
+-------+--------+-------+
$ dolt sql -q "ALTER TABLE users RENAME column phone to phone_number"
$ dolt schema tags
+-------+--------------+-------+
| table | column       | tag   |
+-------+--------------+-------+
| users | name         | 9815  |
| users | phone_number | 15960 |
+-------+--------------+-------+

When I merge the users table with the renamed column against another users table with the original columns names, the columns containing the phone numbers are matched by their tags. Their row values are merged.

$ dolt commit -am "initial data"
commit rqnmsovvbk8nokh2nphbaf4d2pslfasg (HEAD -> main)
Author: druvv <dhruv@dolthub.com>
Date:  Thu Jul 28 16:41:22 -0700 2022

        initial data
$ dolt checkout -b other
Switched to branch 'other'
$ dolt sql -q "INSERT INTO users VALUES ('vader', 9999999999)";
$ dolt commit -am "add vader"
commit dmfftn1quviend96n8ml1qpi7u77b8ul (HEAD -> other)
Author: druvv <dhruv@dolthub.com>
Date:  Thu Jul 28 16:42:23 -0700 2022

        add vader
$ dolt checkout main
$ dolt sql -q "ALTER TABLE users RENAME column phone to phone_number"
commit rinom96u0ilhutuh84gs812lqcmh88ak (HEAD -> main)
Author: druvv <dhruv@dolthub.com>
Date:  Thu Jul 28 16:45:16 -0700 2022

        rename column phone to phone_number
        
$ dolt merge other
Updating rinom96u0ilhutuh84gs812lqcmh88ak..dmfftn1quviend96n8ml1qpi7u77b8ul
users | 1 +
1 tables changed, 1 rows added(+), 0 rows modified(*), 0 rows deleted(-)
$ dolt sql -q "SELECT * from users;"
+-------+--------------+
| name  | phone_number |
+-------+--------------+
| druvv | 4224224242   |
| vader | 9999999999   |
+-------+--------------+

Depending on the sequence of schema alterations that occur on separate branches. The tags of a column that should be mergeable diverge and the merge aborts. For example, let's continue from above:

$ dolt sql -q "ALTER TABLE users ADD COLUMN side VARCHAR(10) DEFAULT 'dark side'"
$ dolt sql -q "ALTER TABLE users MODIFY COLUMN phone_number CHAR(10);"
$ dolt commit -am "add side column and change phone number to char"
commit psdr3929odno8gl1sohbca6hst25rmsi (HEAD -> main)
Merge: rinom96u0ilhutuh84gs812lqcmh88ak dmfftn1quviend96n8ml1qpi7u77b8ul
Author: druvv <dhruv@dolthub.com>
Date:  Thu Jul 28 16:54:57 -0700 2022

        add side column and change phone number to char

$ dolt checkout -b other2
$ dolt sql -q "INSERT INTO users VALUES ('anakin', 1111111111);"
$ dolt sql -q "ALTER TABLE users MODIFY COLUMN phone_number CHAR(10);"
$ dolt commit -am "add anakin and modify column to char"
commit pu9vvg15elualdqve5rgltmlgdikrjes (HEAD -> other2)
Author: druvv <dhruv@dolthub.com>
Date:  Thu Jul 28 16:55:42 -0700 2022

        add anakin and modify column to char
$ dolt checkout main
$ dolt merge other2
Updating psdr3929odno8gl1sohbca6hst25rmsi..pu9vvg15elualdqve5rgltmlgdikrjes
Merge aborted due to error
cause: schema conflict found, merge aborted. Please alter schema to prevent schema conflicts before merging.
       schema conflicts for table users:
        two columns with the same name 'phone_number' have different tags

The statement that led to this error is:
dolt sql -q "ALTER TABLE users ADD COLUMN side VARCHAR(10) DEFAULT 'dark side'"

The reason why the phone_number column's tag differs between the main branch other2 is due to the side column being added prior to the phone_number column's modification. Tags are generated in a psuedo-random manner and they are seeded with the underlying representation of the existing columns. Because the existing columns changed on one branch but not the other, the generated tags differed.

In the near future, we plan to remove the tag system and improve this unexpected behavior. We encourage you to leave feedback below or a reaction if you run into this issue.

@druvv
Copy link
Contributor Author

druvv commented Sep 19, 2022

A simpler reproduction of this issue is the following:

dolt init
dolt sql -q "CREATE table t (pk int PRIMARY KEY);"
dolt commit -Am "create table with key"

dolt checkout -b "right"
dolt sql -q "alter table t add column col1 int;"
dolt sql -q "alter table t add column col2 int;"
dolt commit -Am "add columns in order"

dolt checkout main
dolt sql -q "alter table t add column col2 int;"
dolt sql -q "alter table t add column col1 int;"
dolt commit -Am "add columns in reverse order"

> dolt merge right
Updating b7mkl40n5706vev52gf8v8u6cgisppbr..3t83jotilrp15nc7rovdraff5lhimb03
Unable to stage changes: add and commit to finish merge
error: Failed to commit changes.
cause: schema conflict found, merge aborted. Please alter schema to prevent schema conflicts before merging.
       schema conflicts for table t:
       	two columns with the same name 'col2' have different tags. See https://github.com/dolthub/dolt/issues/3963
       	two columns with the same name 'col1' have different tags. See https://github.com/dolthub/dolt/issues/3963

@timsehn timsehn added merge Issues relating to merge diff labels Oct 5, 2022
@druvv
Copy link
Contributor Author

druvv commented Nov 19, 2022

We added the ability to override the tag system when necessary. This will allow you to unblock yourself if a merge error like this occurs. The command is called dolt schema update-tag. Here is an example on how it can be used:

@test "column_tags: create table on two separate branches, merge them together by updating tags" {

@zachmu zachmu changed the title two columns with the same name have different tags Eliminate column tags, use column name for identity in all cases Feb 11, 2023
@zachmu
Copy link
Member

zachmu commented Feb 11, 2023

I have a simpler ask for this: make tags a storage only detail that never impact the logic of diff and merge. Diff and merge always match by column name.

@timsehn
Copy link
Sponsor Contributor

timsehn commented May 10, 2023

@fulghum is going to fix the specific case of two similarly names and schema'ed tables added on different branches that end up with different tags should merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working diff enhancement New feature or request merge Issues relating to merge
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants