feat(core): hash entity identifiers if too long for db engine#846
Merged
AlessandroPomponio merged 1 commit intoApr 14, 2026
Merged
Conversation
Signed-off-by: Alessandro Pomponio <alessandro.pomponio1@ibm.com>
Member
|
We should probably note in title and description the dB that has the restriction e.g MySQL has a x limit in primary keys. Since entity id is primary … should also note if/if not SQLite has similar issue |
Member
Author
|
@michael-johnston let me know if this is better |
michael-johnston
approved these changes
Apr 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds automatic hashing for long entity identifiers to prevent database field length violations. The entity identifier is used as a primary key in SQLSampleStore. When an entity identifier exceeds 700 characters, it is now hashed using SHA256 to ensure it stays within the 768-character INNODB (MySQL) database limit for columns used as indexes while maintaining uniqueness and determinism.
NOTE: this issue stems from a limitation of the INNODB engine on MySQL when used with the utf8mb4 charset. Attempting to create a VARCHAR column with an index on it (such as a primary key) that is above 768 characters would fail with the following error:
3072/4 = 768 is the maximum length possible
Resolves #382
Files Changed
📄
orchestrator/schema/entity.pyModified the
entity_identifier_from_properties_and_valuesfunction to handle long identifiers. The function now checks if the generated identifier exceeds 700 characters (safe threshold below the 768-character database limit). If it does, the identifier is hashed using SHA256 and prefixed with "hash-" to indicate it's a hashed value. Short identifiers remain human-readable for debugging purposes.📄
tests/schema/test_entity.pyAdded comprehensive test coverage for the new identifier hashing functionality:
test_entity_identifier_short_not_hashed: Verifies short identifiers remain unchanged and human-readabletest_entity_identifier_long_hashed: Confirms long identifiers (4539+ chars) are properly hashed with the "hash-" prefix and stay within database limits (69 chars total)test_entity_identifier_different_points_different_identifiers: Ensures different input points produce unique identifiers for both short and long casestest_entity_identifier_threshold_boundary: Tests edge cases at exactly the 700-character threshold to verify the cutoff works correctly