Merged PRs

dolt

7126: fixes dolt version out of date warning
Changes dolt version out of date warning to do a check against GitHub if the current build version is ahead of the stored latest release version. Fixes inconsistencies where the current build is running a version that was released within the past week.
7125: Clear out reflog contents consistently after GC
When GC is executed, the in-memory reflog data buffer was being cleared out so that only the one, most recent entry was kept. For a sql-server, this means you can log back in and see one entry in the reflog. For a sql CLI command, since it's a new process running now, it doesn't have the reflog data buffer in memory anymore, so it has an empty reflog. This meant there was a slight behavior difference between using GC and checking the reflog depending on whether you are connecting to a sql-server or using the sql CLI (or silently connecting to a sql-server through the sql CLI command when running in local-remote mode).
To smooth this small inconsistency out, the reflog data buffer is now completely cleared out during GC.
7123: dolt table import: json,csv: Support BOM file headers.
The semantics are as follows:
For CSV files, the default import is an uninterpreted character encoding where newline has to match 0xa and the delimeters have to match. In general Dolt expects UTF8, but non-UTF8 characters in string fields can make it through to the imported table for encodings which are close enough to ASCII, for example. If there is a UTF8, UTF16LE or UTF16BE BOM header, then character decoding of the input stream switches to the indicated encoding.
For JSON files, the default import is UTF8 character encoding. If there is a UTF8, UTF16LE or UTF16BE BOM header, then character decoding of the input stream switches to the indicated encoding.
7118: Allow automatic merging in the presence of collation changes.
This allows automatic merging in the case where:
- One branch changes the collation of the column.
- The other branch modifies cells in that column.
  It's still a requirement that only one branch is allowed to modify the column definition. So for instance, if one branch changes the collation, and the other branch widens the column, that will still be a schema merge conflict. There's no reason we can't allow it, but the logic is more complicated so I'm saving it for a follow-up PR.
7104: Feature: Support BLOB/TEXT columns in unique indexes, without requiring a prefix length
Allows TEXT and BLOB columns to be used in unique keys, without requiring that a prefix length be specified. This causes the secondary index to store a hash of the content, which is used to enforce the uniqueness constraint. This is useful to enforce uniqueness over very long fields without having to specify a threshold with a prefix length.
This feature is supported by MariaDB and PostgreSQL, but not by MySQL. A new SQL system variable strict_mysql_compatibility is also introduced in case customers want to opt-out of extensions like this and stick to the exact behavior of MySQL. The default value of strict_mysql_compatibility is false.
Unique secondary indexes using content-hashed fields have several restrictions, such as not being eligible for use in range scans or in any scans that require a specific order.
There are two remaining tasks to wrap up this feature. Neither one is a correctness issue that would cause incorrect data to be added to the index, so they seemed like good candidates for follow-up PRs.
- Use the real content value in uniqueness constraint error messages – When an unique key violation error is thrown from a content-hashed secondary index, the hashed content value is used in the error message, instead of the real content value. This makes the error message difficult to use, but doesn't affect correctness, or errors from unique indexes that don't use content-hashed fields.
- Validate real content value on hash collision – When a hash collision occurs, we should fallback to look at the full content and make sure it's not a false positive, but this is not implemented yet. This should be extremely unlikely, does not affect unique indexes that don't use content-hashed fields, and without this check, we're still enforcing uniqueness, there's just a small risk of a false positive where we'd incorrectly identify two values as the same if their SHA1 hash is the same.
  Depends on: dolthub/go-mysql-server#2186
  Related to: #7040

go-mysql-server

2193: Set the original_name field in response metadata in addition to the name field
A customer reported that the MySQL C++ Connector library was unable to retrieve column name information from a Dolt sql-server. After looking at the two wire captures between MySQL and Dolt, this is because the MySQL C++ Connector library pulls the column name from the original_name field, not from the name field.
I've updated the unit tests that assert the expected response metadata fields are populated, and I'll follow up next with some changes in the Dolt repo to our C++ Connector library acceptance tests so that they use response metadata and assert that it is filled in.
After that, it would be good to proactively look at any other response metadata fields that we aren't setting. For example, the Flags field seems important to fill in correctly for tooling to use.
2186: Feature: Support BLOB/TEXT columns in unique indexes, without requiring a prefix length
Allows TEXT and BLOB columns to be used in unique keys, without requiring that a prefix length be specified. This causes the secondary index to store a hash of the content, instead of the content itself, and then that hash is used to enforce the uniqueness constraint. This is useful to enforce uniqueness over very long fields without having to specify a threshold with a prefix length.
This feature is supported by MariaDB and PostgreSQL, but not by MySQL. A new SQL system variable strict_mysql_compatibility is also introduced in case customers want to opt-out of extensions like this and stick to the exact behavior of MySQL. The default value of strict_mysql_compatibility is false.
Unique secondary indexes using content-hashed fields have several restrictions, such as not being eligible for use in range scans or in any scans that require a specific order.
The GMS in-memory secondary index implementation takes a simple approach – it doesn't actually hash encode the content-hashed fields, and instead includes the full column value. This is consistent with how the GMS in-memory index implementation handles other features, such as prefix lengths, which are also a no-op and the full content is stored in the secondary index.
Dolt integration: #7104
Related to: #7040

Closed Issues

6709: dolt_merge() MySql return content missing column names since Dolt 1.11.1
7116: Dolt Checkout: -B Support

Latency

Read Tests	MySQL	Dolt	Multiple
covering_index_scan	2.11	2.86	1.4
groupby_scan	13.22	17.32	1.3
index_join	1.34	5.0	3.7
index_join_scan	1.27	2.14	1.7
index_scan	34.33	55.82	1.6
oltp_point_select	0.17	0.43	2.5
oltp_read_only	3.3	7.7	2.3
select_random_points	0.32	0.72	2.2
select_random_ranges	0.39	0.87	2.2
table_scan	34.33	55.82	1.6
types_table_scan	75.82	161.51	2.1
reads_mean_multiplier			2.1

Write Tests	MySQL	Dolt	Multiple
oltp_delete_insert	5.67	5.99	1.1
oltp_insert	2.86	2.97	1.0
oltp_read_write	7.43	14.73	2.0
oltp_update_index	2.86	3.07	1.1
oltp_update_non_index	2.91	2.97	1.0
oltp_write_only	4.03	7.3	1.8
types_delete_insert	5.67	6.55	1.2
writes_mean_multiplier			1.3

Overall Mean Multiple	1.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.29.3

Merged PRs

dolt

go-mysql-server

Closed Issues

Latency