Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments for MySQL, ASCII Null, and Data Migration #934

Open
phinjensen opened this issue Nov 11, 2017 · 6 comments
Open

Comments for MySQL, ASCII Null, and Data Migration #934

phinjensen opened this issue Nov 11, 2017 · 6 comments

Comments

@phinjensen
Copy link
Contributor

phinjensen commented Nov 11, 2017

Comments for https://www.endpointdev.com/blog/2014/02/mysql-ascii-null-and-data-migration/
By Mark Johnson

To enter a comment:

  1. Log in to GitHub
  2. Leave a comment on this issue.
@phinjensen
Copy link
Contributor Author

original author: Greg Sabino Mullane
date: 2014-02-26T16:24:20-05:00

It gets even worse for MySQL. Postgres (rightly) will refuse to even store a null character in a text (or varchar) field because it makes no sense:

greg=# select length('07989409006' || chr(0) || '989409') AS len;
ERROR: null character not permitted

@phinjensen
Copy link
Contributor Author

original author: Jon Jensen
date: 2014-02-26T17:00:32-05:00

Hmm. I disagree with you on the "makes no sense" part. Text fields encoding ASCII or UTF-8 should be able to store an ASCII NUL. It's a valid ASCII character, and UTF-8 is a superset of ASCII.

PostgreSQL 8.1 allowed it:

select length('abc' || chr(0) || 'def');
length

  7

(1 row)

But as you show, at some point it stopped being allowed. What standard is that supposedly adhering to? Feels like a regression to me, not an improvement.

I thought maybe it only applied to UTF-8, but I tried it in a C-locale cluster in PostgreSQL 9.3 and it's rejected there too:

select length('abc' || E'\0' || 'def');

ERROR: invalid byte sequence for encoding "SQL_ASCII": 0x00

Which, again, feels like a regression to me. NUL aka chr(0) is a valid ASCII and UTF-8 character.

@phinjensen
Copy link
Contributor Author

original author: Adrian Klaver
date: 2014-02-26T19:19:16-05:00

If you want to know the reason it was changed look at this thread:

http://www.postgresql.org/message-id/200803031528.m23FS4G9040185@wwwmaster.postgresql.org

@phinjensen
Copy link
Contributor Author

original author: Jon Jensen
date: 2014-02-26T19:32:36-05:00

Thanks for the pointer. That's not what I would have expected from the PostgreSQL project: easy fix rather than correct more involved fix. Oh well. Someone has to do the work and apparently nobody needed NUL bad enough to do the work.

@phinjensen
Copy link
Contributor Author

original author: Greg Sabino Mullane
date: 2014-02-26T21:19:48-05:00

Well I meant makes no sense from a SQL perspective, not a strict Unicode one. One really ought to use bytea if you have nulls in your strings. Sure, Postgres could handle it, but think how many things would break, from Postgres functions, to drivers, to apps.

@phinjensen
Copy link
Contributor Author

original author: Jon Jensen
date: 2014-02-26T23:38:13-05:00

Well, we don't have to imagine -- it actually worked fine before. I don't recall seeing any indication in the mailing list thread that an actual failure on NUL was known, just concern that there could be one somewhere.

Only in the world of C is ASCII NUL such a strange character. Granted, PostgreSQL is written in C, but so are Perl, Python, and Ruby, and they all explicitly support NUL in strings.

By way of comparison, an obscure feature such as IEEE floating point -0.0 is supported in PostgreSQL, and that is far less useful to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant