Long identifier truncation improvements by labkey-adam · Pull Request #6564 · LabKey/platform

labkey-adam · 2025-04-15T14:58:15Z

Rationale

Not heeding database-specific truncation rules for identifiers has led to several issues related to long column, table, and alias names, particularly those with non-ASCII characters. Here are two example issues that are resolved by this PR:

Issue 52714: Problems with >63 byte at UTF-8 non-ASCII characters in field names
Issue 52210: LKSM: Not showing fields with cyrillic characters in sample type grid

In these cases, DomainImpl.generateStorageColumnName() would call an AliasManager.decideAlias() variant that usually returned the name as the storage column name, without first making it "legal," which lead to silent truncation to 63 UTF-8 bytes on PostgreSQL. First fix attempt was to call a different method to ensure makeLegalName() was invoked when generating storage names, but this led to some undesirable behavior, such as provisioned columns named "Group" and "User" being stored as "group_" and "user_" (they're keywords on PostgreSQL), which then confused index creation (which of course should know the storage names, but they don't currently). IMO, provisioned column storage names should match the column names as much as possible (I'm less concerned about this for aliases). We're already quoting them appropriately. So the fix here is to stop using AliasManager to generate storage column names; instead, a new simple class StorageNameGenerator is now responsible for generating legal and unique storage column names. It calls the dialect-specific truncation method and uniquifies names with a suffix counter, but otherwise it leaves the characters as is. This means we'll start seeing special characters in provisioned tables' column names.

Changes

Introduce StringUtilsLabKey.truncateStartToUtf8ByteLimit() that truncates from the right end of the string. Add tests. Simplify overly complex truncateToUtf8ByteLimit().
Move the truncation and "make legal" methods to SqlDialect to allow these to be dialect-specific
Implement correct truncation in PostgreSQL dialect
Pass in a @NotNull SqlDialect to more AliasManager calls
Introduce FallBackDialect for the unfortunate cases where a SqlDialect is not provided to AliasManager; it implements conservative truncation rules that work across all supported databases
Eliminate useLegacyMaxLength flag
Force callers to specify the number of characters they need reserved for suffixes, etc. Previous code made blanket assumptions that were often incorrect, unnecessary, and/or redundant.
Switch to surrogate-pair-aware truncation so we don't end up with half characters in our names
Widen StorageColumnName and mvIndicatorStorageColumnName to ensure even the longest generated names will fit

ColumnInfo.getAlias() ColumnInfo.getSelectName()

some fuzz testing

fix double-quoting (makeLegalIdentifier) of names in specimen land

…e awkward truncation rule

labkey-jeckels

Minor suggestions. I haven't test the code.

api/src/org/labkey/api/util/StringUtilsLabKey.java

api/src/org/labkey/api/data/dialect/PostgreSql91Dialect.java

api/src/org/labkey/api/query/AliasManager.java

api/src/org/labkey/api/util/StringUtilsLabKey.java

… method names, new surrogate-pair-aware char truncation methods, test for broken surrogates. Also, code review feedback.

labkey-adam · 2025-05-02T01:01:33Z

Merged to #6498

labkey-matthewb added 30 commits March 25, 2025 14:34

I don't think we need placeholder Results(null)

9d6d158

null check

c9fb866

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

dfea40d

DatabaseIdentifier

f38667d

ColumnInfo.getAlias() ColumnInfo.getSelectName()

DatabaseIdentifier

2c736b2

use SimpleFilter.getSQLFragment(tableInfo)

b0d91ee

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

95a1806

TableInfo.getMetaDataName()

20abb10

comment

7f7b305

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

8c7ed0b

rm _defaultTableInfo

c5286c1

sweep for usages of makeLegalIdentiier() and append(alias)

faca06c

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

ede8f90

DatabaseIdentifier.getString() -> getid()

aac3d9b

some fuzz testing

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

461433f

more fuzz testing

a1f201d

fix double-quoting (makeLegalIdentifier) of names in specimen land

fix exp.xml

002f9a4

fix un-tabled columinfo

232cf39

oops

304f9c9

sql generation

b35c115

makeDatabaseIdentifier()

047ab74

oor

72d451f

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

9d51f8d

!equals()

9cbe193

search for "+ col.getAlias() +" (implicit .toString())

27f124f

support STR_TABLE_ALIAS in SimpleFilter.SQLClause

f9cb64c

support STR_TABLE_ALIAS in SimpleFilter.SQLClause

63bff51

SampleDatasetTable fix

1e56ac1

getProperties()

c2b9144

newCohortLabel

7ca1a4b

labkey-adam added 5 commits April 20, 2025 08:07

More lenient claimName()

3628122

Merge remote-tracking branch 'origin/develop' into fb_long_identifiers

fb47046

Minor cleanup of specimen domain kinds

24dd8a9

IntelliJ tricked me into deleting these...

c378814

Remove temporary assertions

aa3fb0c

labkey-adam requested review from labkey-jeckels and labkey-matthewb April 21, 2025 14:17

labkey-adam added 5 commits April 21, 2025 11:06

StorageNameGenerator junit test

ea235c6

Merge remote-tracking branch 'origin/develop' into fb_long_identifiers

a88e780

Expand StorageColumnName and mvIndicatorStorageColumnName to eliminat…

db1c79c

…e awkward truncation rule

Remove unused specimen index methods

d468977

Remove misplaced comment

643bfc6

labkey-jeckels approved these changes Apr 22, 2025

View reviewed changes

labkey-adam and others added 12 commits April 22, 2025 11:49

StringUtilsLabKey: add cyrillic example, better UTF-8 byte truncation…

3358b28

… method names, new surrogate-pair-aware char truncation methods, test for broken surrogates. Also, code review feedback.

Merge remote-tracking branch 'origin/develop' into fb_long_identifiers

aedc668

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

1388e9b

Fix merge conflicts

841ede3

columnName -> columnNameFragment

97063a4

Comments and visibility

03ad95f

Mostly spelling

ceea609

remove AntTask debug hack

06c4650

Imports

23415fd

Merge remote-tracking branch 'origin/develop' into fb_long_identifiers

867293b

Merge remote-tracking branch 'origin/develop' into fb_databaseidentifier

c6a8b7c

Merge branch 'fb_databaseidentifier' into fb_long_identifiers

d786698

labkey-adam mentioned this pull request Apr 30, 2025

Merge fb_long_identifiers changes into fb_databaseidentifier #6616

Merged

Get platform building again

2f8fa85

labkey-adam closed this May 2, 2025

labkey-adam deleted the fb_long_identifiers branch May 3, 2025 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long identifier truncation improvements#6564

Long identifier truncation improvements#6564
labkey-adam wants to merge 89 commits intodevelopfrom
fb_long_identifiers

labkey-adam commented Apr 15, 2025 •

edited

Loading

Uh oh!

labkey-jeckels left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

labkey-adam commented May 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

labkey-adam commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Changes

Uh oh!

labkey-jeckels left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

labkey-adam commented May 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

labkey-adam commented Apr 15, 2025 •

edited

Loading