Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"\u0000" in sql will lead to invalid message format error #1115

Closed
frogcjn opened this issue Aug 23, 2016 · 13 comments
Closed

"\u0000" in sql will lead to invalid message format error #1115

frogcjn opened this issue Aug 23, 2016 · 13 comments

Comments

@frogcjn
Copy link

frogcjn commented Aug 23, 2016

INSERT INTO "content"(
    "name"
)
VALUES
    (
        '"1234\u0000"',
    );

OK with command line.
not OK with ng.

@frogcjn frogcjn changed the title \u0000 in sql will lead to invalid message format error "\u0000" in sql will lead to invalid message format error Aug 23, 2016
@vitaly-t
Copy link
Contributor

vitaly-t commented Aug 25, 2016

That's an invalid SQL that the command-line tools automatically fix for you, but this library won't.

PostgreSQL requires that strings with Unicode-coded symbols in them start with E (not case sensitive):

INSERT INTO content(name) VALUES(E'\u0020\u0000')

@brianc
Copy link
Owner

brianc commented Aug 25, 2016

thanks @vitaly-t!

@brianc brianc closed this as completed Aug 25, 2016
@harshitgupta
Copy link

But the error message that we are getting 'invalid message format' doesn't help. You will have to run the query in some sql client to find out the problem. So we have scope of improvement here I believe.

@vitaly-t
Copy link
Contributor

vitaly-t commented Oct 31, 2018

@brianc , @charmander

Guys, I'm afraid this issue was closed prematurely. There is indeed a bug in the driver related specifically to Unicode symbol \u0000, which results in throwing error invalid message format.

This one works fine:

select E'\u0001' as value

And these ones throw that error:

select E'\u0000' as value

select '\0' as value

This needs to be re-opened, and escalated into a bug.

@nazarhussain
Copy link

I agree that error message from node-postures is misleading. But I am afraid this issue directly related to PostgreSQL server.

psql (10.5, server 9.6.9)
Type "help" for help.

postgres=# select E'\u0000' as value;
ERROR:  invalid Unicode escape value at or near "E'\u0000"
LINE 1: select E'\u0000' as value;
               ^
postgres=# select E'\u0001' as value;
 value
-------
 \x01
(1 row)

You may notice that invalid unicode value is also appears directly on the PostgreSQL Server.

The main problem is PostgreSQL does not consider \u0000 a valid unicode character, while NodeJS consider it valid.

What I can assume here, it's because the nature of zero terminated strings philosophy of C language. It's just a wild guess.

@vitaly-t
Copy link
Contributor

@nazarhussain That error looks different, and makes me think it just doesn't like the E prefix, for some reasons. I believe psql usually provides prefixes on its own.

Also, I wonder how pgAdmin deals with this, because it works fine there.

@nazarhussain
Copy link

@vitaly-t I also tried same with Valentia Studio, which uses JDBC adapter and got same error which psql shows. So I still believe its problem with PostgreSQL server. I don't tied PgAdmin but it may be using some escaping before running the query, as its a web interface right.

@sehrope
Copy link
Contributor

sehrope commented Oct 31, 2018

The error message difference is due to the lack of escaping the backslash in the JS code so the \u0000 is evaluated in the client, not by the server. This makes the query string itself sent to the backend contain an invalid 0-byte and thus gets rejected differently.

Here's the two situations:

// Single backslash evaluated to a zero byte locally which is rejected at message level by server:
> client.query("SELECT E'\u0000'").catch((err) => console.error('%s', err)).then(console.log)
error: invalid message format

// Double backslash gets sent to the server correctly
> client.query("SELECT E'\\u0000'").catch((err) => console.error('%s', err)).then(console.log)
error: invalid Unicode escape value at or near "E'\u0000"

You can see it in the lengths of the strings too:

// Single backslash
> console.log('%s', "SELECT E'\u0000'".length);
11
// Double backslash
> console.log('%s', "SELECT E'\\u0000'".length);
16

This does indicate a bug but not directly related to these error messages. The driver shouldn't be sending strings with zero bytes to the backend as that's invalid and a potential security issue. I'm going to open a separate issue for rejecting those queries in the client as they violate the FEBE protocol.

@vitaly-t
Copy link
Contributor

vitaly-t commented Oct 31, 2018

@sehrope Does this mean the client should escape those on its own? Or if not, then how do you think this should be handled?

This happens way more often than one might think. When you have an app that relies on type Buffer and bytea data type, symbol \u0000 may often appear after hex/utf-8 conversions.

@sehrope
Copy link
Contributor

sehrope commented Oct 31, 2018

If the intention is to store that literal string with the backslash then the backslash should be escaped.

If the intention is to have string with the zero byte itself be saved then I don't think that's possible. There's no valid way to have a zero byte in there at all. It's either going to be an invalid Unicode string or an invalid message containing a zero byte.

Note that this is a difference in the set of valid values for strings in PostgreSQL and JS. The database does not allow zero bytes but JS does.

@vitaly-t
Copy link
Contributor

vitaly-t commented Oct 31, 2018

Let's consider a practical example:

const buffer = new Buffer.alloc(12);

buffer.write("some", "utf-8");
buffer.write(" data", 5, "utf-8");

const columnData = '\\x' + buffer.toString('utf-8');

const visualized = JSON.stringify(columnData);

console.log(visualized);
//=> "\\xsome\u0000 data\u0000\u0000"

We can see that the column data that needs to go into the server in fact contains multiple \u0000 entries. And if we try sending it into a bytea column, the client-side driver will intercept and throw invalid message format.

Does this mean the client needs to replace each \ with \\? How else can we get the data into the server? This is a practical example, simplified.


Relevant links:

UPDATE

Actually, there may be a confusion between using b.toString('hex') and b.toString('utf-8'). The latter would need escaping, but not the former.

@charmander
Copy link
Collaborator

@vitaly-t '\x…' is a hex representation of a bytea, so buffer.toString('utf-8') is incorrect.

@nazarhussain
Copy link

@sehrope The unicode standards says 0000–007F is a valid starting range for control and basic latins https://www.unicode.org/charts/PDF/U0000.pdf

Do you know PostgresSQL follows which Unicode standard, for which 0000 is an invalid character? Any reference or explanation directly from PostgreSQL that why they don't follow a standard and used an exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants