-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception due to unicode on testnet #297
Comments
This is HUGE pain in the neck. The tx metedata that is being inserted is
which is being encoded as JSON:
which of course is not valid UTF8 and hence the database rejects it. Even worse, the above is encoded as UTF8 by the Haskell |
Let me reiterate what a HUGE pain in the neck this is. The string in question is decoded to UTF8 by the Haskell I can catch and log the exception:
but then there is a second excpetion:
This suggests that I need to more fully validate the JSON before insertion. |
We can store invalid UTF8 in PostgreSQL, but only as |
I have tried a manual UTF8 decode:
and that too has the same problem. |
From: https://stackoverflow.com/questions/31671634/handling-unicode-sequences-in-postgresql
|
Currently think this is due to the way Haskell/Persistent serializes this to send to postgres. Currently trying to validate this theory. |
Coming up with a neat fix for this is not going to be easy. |
I was looking into this and it seems these properties hold:
|
The only temporary solution to encode it to a JSON, and then to |
Couldn't you store two columns? the bytea, and the JSON when it doesn't contain the non-text character, so that anyone is free to use whatever they need. |
@erikd The To avoid a leaky abstraction, the exact values (as given by the user) must be stored. This roughly means either a blob (= no structure in the DB) or JSON string values that somehow preserve the original meaning: for instance base64-encoded string values in the DB (so recovering the structure at the expense of extra processing in the application in order to translate from base64 to the actual value). |
TxMetadata is stored as JSON and that JSON is stored in a 'jsonb' column of PostgreSQL. However, there are limitations to that Postgres 'jsonb' data type, specifically, it cannot contain Uniciode NUL characters. This temporary fix simply drops TxMetadata JSON objects that would otherwise be rejected by Postgres. Hopefully a better solution will be will be dreamt up and implemented later. Temporary workaround fix for: #297
TxMetadata is stored as JSON and that JSON is stored in a 'jsonb' column in PostgreSQL. However, there are limitations to that Postgres 'jsonb' data type. Specifically, it cannot contain Uniciode NUL characters. This temporary fix simply drops TxMetadata JSON objects that would otherwise be rejected by Postgres. Hopefully a better solution will be will be dreamt up and implemented later. Temporary workaround fix for: #297
TxMetadata is stored as JSON and that JSON is stored in a 'jsonb' column in PostgreSQL. However, there are limitations to that Postgres 'jsonb' data type. Specifically, it cannot contain Uniciode NUL characters. This temporary fix simply drops TxMetadata JSON objects that would otherwise be rejected by Postgres. Hopefully a better solution will be will be dreamt up and implemented later. Temporary workaround fix for: #297
TxMetadata is stored as JSON and that JSON is stored in a 'jsonb' column in PostgreSQL. However, there are limitations to that Postgres 'jsonb' data type. Specifically, it cannot contain Uniciode NUL characters. This temporary fix simply drops TxMetadata JSON objects that would otherwise be rejected by Postgres. Hopefully a better solution will be will be dreamt up and implemented later. Temporary workaround fix for: #297
The temporary workaround fix (simply dropping metadata objects that cannot be inserted in Postgres) has been merged to master. |
This how it looks now with temp fix:
|
Whether docker is repaired, and what is the repaired version |
Its not fixed, but |
Closing. |
On
testnet
network, cardano-db-sync attags/5.0.0
inextended
mode, is throwing exceptions at block 1816809 due to unicode parsing/handling issues. It cannot sync past this block.The text was updated successfully, but these errors were encountered: