Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot handle multibyte characters (ie. Japanese) #134

Closed
dkastl opened this issue Jan 22, 2015 · 23 comments
Closed

Cannot handle multibyte characters (ie. Japanese) #134

dkastl opened this issue Jan 22, 2015 · 23 comments

Comments

@dkastl
Copy link

dkastl commented Jan 22, 2015

It seems, that multibyte characters are not supported at all.

The following example uses Japanese characters:

With PSQL:

testdb=# SELECT '日本語' AS japanese;
 japanese 
----------
 日本語
(1 row)

With PGCLI:

testdb> SELECT '日本語' AS japanese;
'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

You will receive the same error for every query, that has to deal with a table containing multibyte characters.

@amjith
Copy link
Member

amjith commented Jan 23, 2015

Confirmed that this is a bug. I'll have to drill down and find out where that error is coming from (possibly in psycopg2).

Sorry about the trouble. I'll see if I can fix it soon.

Thank you for reporting.

@amjith
Copy link
Member

amjith commented Jan 23, 2015

This works correctly in Python 3. I think I just need to wrap the inputs in unicode and it should work.

@amjith
Copy link
Member

amjith commented Jan 25, 2015

I did some testing and found out that this is an inconsistency in psycopg2. I've filed a bug with psycopg2 (psycopg/psycopg2#282).

But this bug doesn't affect the sql statments that return unicode values that are stored in the database.

For ex:

misago_testforum> insert into abc values ('日本語', '日本語')
INSERT 0 1
misago_testforum> select * FROM abc;
+-----+-----+
| a   | b   |
|-----+-----|
| 日本語 | 日本語 |
+-----+-----+
SELECT 2

So if multi-byte char values are stored in the database and we retrieve them, then they are rendered correctly. But it fails to render correctly when we do a no op such as

misago_testforum> SELECT '日本語' AS japanese;
'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

@dkastl
Copy link
Author

dkastl commented Jan 25, 2015

Thank you for looking at this!

My initial test was with a table that contained Japanese characters, so if that works now, that's great!
The sample query I posted was just an easy way to reproduce it without a table.

I can help to test, if it's working, but can I just install the latest version with pip install pgcli?

@thedrow
Copy link

thedrow commented Jan 25, 2015

It seems that this is a postgresql feature. Not a bug.

@amjith
Copy link
Member

amjith commented Jan 25, 2015

@dkastl I haven't released the latest version to PyPI.

You can install pgcli directly from this git repo using the following command:

pip install -e git+git@github.com:amjith/pgcli.git#egg=pgcli

Can you please try that and let me know if that fixes your issue?

Thanks!

@dkastl
Copy link
Author

dkastl commented Jan 26, 2015

@amjith Thank you! It worked now as you said, when I select from an actual table.

So for "normal" tables it works, but for tables with an hstore column inside, which again contains Japanese, the error seems to remain.

Well, this is now a not so common use case, and it was just a coincidence, that I tested it with a table, that uses hstore, so this probably affects only very few people.

But for completeness, here the SQL to reproduce it:

CREATE EXTENSION hstore;

CREATE TABLE test (
    id serial PRIMARY KEY,
    name varchar,
    attributes hstore
);

INSERT INTO test (id, name, attributes) VALUES 
    (1, 'Alphabet', 'key => "alphabet"'), 
    (2, '日本語',   'key => "alphabet"'),
    (3, '日本語',   'key => "日本語"')
;

Then the query for id 1 and 2 work, the query for id 3 fails:

testdb> SELECT * FROM test WHERE id != 3;
+------+----------+-------------------+
|   id | name     | attributes        |
|------+----------+-------------------|
|    1 | Alphabet | "key"=>"alphabet" |
|    2 | 日本語      | "key"=>"alphabet" |
+------+----------+-------------------+
SELECT 2

testdb> SELECT * FROM test WHERE id = 3;
'ascii' codec can't decode byte 0xe6 in position 8: ordinal not in range(128)

Using psql console works for all cases.

@amjith
Copy link
Member

amjith commented Jan 26, 2015

I'm not too familiar with the hstore. I know it's the no-sql feature of postgres. Does it mean that the 'attribute' column won't have a datatype (such as varchar?).

Because the author of psycopg2 shed some light on why the simple case of SELECT '日本語' AS japanese; didn't work. It's because postgres returns the type as 'unknown'.

I'll have to check what is the type returned by hstore. I'll see how to handle this properly.

Thanks for the detailed test case. I appreciate it very much. :)

@dkastl
Copy link
Author

dkastl commented Jan 26, 2015

Well, I just use hstore sometimes because it's convenient for certain use cases, but I don't know about details. When you look at the hstore documentation, then there are various ways to return the hstore content using functions, for example svals(attributes) and then it works correctly.

> SELECT svals(attributes) FROM test WHERE id = 3;
+---------+
| svals   |
|---------|
| 日本語     |
+---------+

The following query actually works, if you cast as text:

testdb> SELECT '日本語'::text AS japanese;
+------------+
| japanese   |
|------------|
| 日本語        |
+------------+

So the explanation makes sense and it seems to be a problem of unknown datatype.

If you cast the hstore column as text it also works:

> SELECT attributes::text FROM test WHERE id = 3;
+--------------+
| attributes   |
|--------------|
| "key"=>"日本語" |
+--------------+

Would it make sense to cast any unknown datatype as text to get rid of the error?

@amjith amjith closed this as completed in 750206c Jan 26, 2015
@amjith
Copy link
Member

amjith commented Jan 26, 2015

I've fixed it temporarily by adding a unicode type-caster for hstore. But I've also requested psycopg2 author to default to unicode instead of utf-8 string. Can you try installing the code from master and try again?

pip install -e git+git@github.com:amjith/pgcli.git#egg=pgcli

Thanks!

@dkastl
Copy link
Author

dkastl commented Jan 26, 2015

Thanks!
Could you tell me, how I can make sure, to pull the most recent version?
I think just running pip install didn't update.

$ sudo pip uninstall pgcli
Can't uninstall 'pgcli'. No files were found to uninstall.

The way I uninstalled it before didn't work.
So I just run again

sudo pip install -e git+https://github.com/amjith/pgcli.git#egg=pgcli
Obtaining pgcli from git+https://github.com/amjith/pgcli.git#egg=pgcli
  Updating ./src/pgcli clone
  Running setup.py (path:/home/daniel/src/pgcli/setup.py) egg_info for package pgcli

  Installing extra requirements: 'egg'
Requirement already satisfied (use --upgrade to upgrade): click>=3.2 in /usr/local/lib/python2.7/dist-packages (from pgcli)
Requirement already satisfied (use --upgrade to upgrade): Pygments>=2.0 in /usr/local/lib/python2.7/dist-packages (from pgcli)
Requirement already satisfied (use --upgrade to upgrade): jedi==0.8.1 in /usr/local/lib/python2.7/dist-packages (from pgcli)
Requirement already satisfied (use --upgrade to upgrade): prompt-toolkit==0.26 in /usr/local/lib/python2.7/dist-packages (from pgcli)
Requirement already satisfied (use --upgrade to upgrade): psycopg2>=2.5.4 in /usr/local/lib/python2.7/dist-packages (from pgcli)
Requirement already satisfied (use --upgrade to upgrade): sqlparse>=0.1.14 in /usr/local/lib/python2.7/dist-packages (from pgcli)
Requirement already satisfied (use --upgrade to upgrade): docopt in /usr/local/lib/python2.7/dist-packages (from prompt-toolkit==0.26->pgcli)
Requirement already satisfied (use --upgrade to upgrade): six>=1.8.0 in /usr/local/lib/python2.7/dist-packages (from prompt-toolkit==0.26->pgcli)
Requirement already satisfied (use --upgrade to upgrade): wcwidth in /usr/local/lib/python2.7/dist-packages (from prompt-toolkit==0.26->pgcli)
Installing collected packages: pgcli
  Running setup.py develop for pgcli

    Creating /usr/local/lib/python2.7/dist-packages/pgcli.egg-link (link to .)
    pgcli 0.13.0 is already the active version in easy-install.pth
    Installing pgcli script to /usr/local/bin

    Installed /home/daniel/src/pgcli
Successfully installed pgcli
Cleaning up...

... but it seems it didn't update or it still doesn't work to run SELECT * FROM test WHERE id = 3;.

@amjith
Copy link
Member

amjith commented Jan 26, 2015

I think you might need a -U option to make sure it is upgraded.

sudo pip install -U -e git+https://github.com/amjith/pgcli.git#egg=pgcli

@dkastl
Copy link
Author

dkastl commented Jan 26, 2015

Hmm, did not work after upgrading (which required the flag --force-reinstall to upgrade).
But let's see if psycopg2 can provide a solution for this. I don't think hstore support is a blocking issue.

@amjith
Copy link
Member

amjith commented Jan 26, 2015

I've reopened the issue.

This is what I did to test the hstore issue after the fix.

misago_testforum> \d test
+-----------+-------------------+---------------------------------------------------+
| Column    | Type              | Modifiers                                         |
|-----------+-------------------+---------------------------------------------------|
| id        | integer           | not null default nextval('test_id_seq'::regclass) |
| name      | character varying |                                                   |
| attribute | hstore            |                                                   |
+-----------+-------------------+---------------------------------------------------+
Indexes:
    "test_pkey" PRIMARY KEY, btree (id)

misago_testforum> SELECT * FROM test
+------+----------+-------------------+
|   id | name     | attribute         |
|------+----------+-------------------|
|    1 | Alphabet | "key"=>"alphabet" |
|    2 | 日本語      | "key"=>"alphabet" |
|    3 | 日本語      | "key"=>"日本語"      |
+------+----------+-------------------+
SELECT 3
misago_testforum>

Can you tell me what's the structure of the table you're trying this on?

@dkastl
Copy link
Author

dkastl commented Jan 26, 2015

It's exactly the same example you're using as described in a previous comment.
Maybe I'm just not able to upgrade using the Git repository.
So if it works for you, I would close the ticket.

@amjith
Copy link
Member

amjith commented Jan 26, 2015

I'll try and release the new version soon. I'm waiting to merge in a big pull request that adds schema support. Otherwise I'd release it today. Sorry about the wait.

@dkastl
Copy link
Author

dkastl commented Jan 26, 2015

No problem at all. I was just interested trying out pgcli. The schema support is even more important for me. Thanks a lot for your work!

@fzerorubigd
Copy link

I don't know if this is relevant or not, but pressing any key in Persian layout, crash pgcli , Can not even type a single persian character.

postgres> Traceback (most recent call last):
  File "/usr/local/bin/pgcli", line 9, in <module>
    load_entry_point('pgcli==0.13.0', 'console_scripts', 'pgcli')()
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 610, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 590, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 782, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click/core.py", line 416, in invoke
    return callback(*args, **kwargs)
  File "/home/develop/src/pgcli/pgcli/main.py", line 327, in cli                                                                                 
    pgcli.run_cli()
  File "/home/develop/src/pgcli/pgcli/main.py", line 179, in run_cli
    document = cli.read_input(on_exit=AbortAction.RAISE_EXCEPTION)
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/__init__.py", line 251, in read_input
    next(g)
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/__init__.py", line 348, in _read_input
    self._redraw()
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/__init__.py", line 191, in _redraw
    self.renderer.render(self)
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/renderer.py", line 469, in render
    output.flush()
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/terminal/vt100_output.py", line 150, in flush
    self.stdout.write(data)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u063a' in position 4: ordinal not in range(128)

Using the current master 27a363e in a Debian 7 (whezy) in a docker container.

@amjith
Copy link
Member

amjith commented Jan 28, 2015

It looks like the error might be coming from prompt_toolkit library.

Since you have prompt_toolkit installed (as part of the dependency), can you try launching ptpython. It's a python repl that comes bundled with prompt_toolkit. If you get the same problem with ptpython, I'll at least know it's contained within the prompt_toolkit lib.

@fzerorubigd
Copy link

Yes. I think its prompt_toolkit

In [2]: Traceback (most recent call last):
  File "/usr/local/bin/ptpython", line 9, in <module>
    load_entry_point('prompt-toolkit==0.26', 'console_scripts', 'ptpython')()
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/contrib/entry_points/ptpython.py", line 77, in run
    startup_paths=startup_paths, always_multiline=always_multiline)
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/contrib/repl.py", line 189, in embed
    cli.start_repl(startup_paths=startup_paths)ode (off) [F7] Multiline (off) - CPython 2.7.3                                                    
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/contrib/repl.py", line 48, in start_repl
    on_exit=AbortAction.RAISE_EXCEPTION)
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/__init__.py", line 251, in read_input
    next(g)
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/__init__.py", line 348, in _read_input
    self._redraw()
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/__init__.py", line 191, in _redraw
    self.renderer.render(self)
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/renderer.py", line 469, in render
    output.flush()
  File "/usr/local/lib/python2.7/dist-packages/prompt_toolkit/terminal/vt100_output.py", line 150, in flush
    self.stdout.write(data)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0635' in position 22: ordinal not in range(128)

Do I need to report it some where else?

@amjith
Copy link
Member

amjith commented Jan 28, 2015

This is prompt-toolkit's github issues page: https://github.com/jonathanslenders/python-prompt-toolkit/issues

I'd file it myself, but without the ability to reproduce it, I'm not sure I can provide enough details.

@amjith
Copy link
Member

amjith commented Mar 4, 2015

@dkastl Previously when I fixed the unicode issue with hstore, I didn't fix it the right way. That's why you weren't able to get it to work. I was basically hardcoding the oid for hstore which is wrong. Because the oid for hstore will be different for each database.

I have now fixed the issue (the right way) and released a new version (0.16.1) to PyPI. If you do a pip install -U pgcli it should get you the new version with the fix.

Sorry about the wrong fix. :(

@dkastl
Copy link
Author

dkastl commented Mar 4, 2015

@amjith Thanks a lot!
It seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants