Displaying Unicode Text in irb

Chong-Yee Khoo edited this page Nov 2, 2013 · 2 revisions

If you find that Unicode text is not showing up properly in irb, try the following (instructions for Mac OS X, rbenv):

The Problem

iTerm2 was set up properly, and the shell would happily accept text containing Unicode characters. However, irb would not.

Launch irb and try pasting some Unicode text...such as "Déšť dští ve Španělsku zvlášť tam kde je pláň"

$ irb
>> "D dtve panlsku zvl tam kde je "
=> "D dtve panlsku zvl tam kde je "

Notice the missing text? All the characters with diacriticals are gone.

Check what encoding irb is using for its text.

>> puts __ENCODING__
US-ASCII
=> nil

Maybe we should try starting irb with the --encoding flag?

$ irb -EUTF-8
>> puts __ENCODING__
UTF-8
=> nil

This sets the encoding to UTF-8, but irb is still not happy.

Try pasting "西班牙的雨大多數落在平原上" into irb

>> "    "

The solution requires recompiling Ruby by linking it against a specific library, and making sure an environment variable is set properly.

Compile Ruby with readline

irb as shipped is compiled against libedit on Mac OS X, which means it doesn't handle Unicode text very well. We need to recompile Ruby is against readline, instead of libedit. Follow the instructions set out in [this blogpost] (https://coderwall.com/p/wdm-_q)

In my case,

$ cd .rbenv/versions/2.0.0-p247/lib/ruby/2.0.0/x86_64-darwin12.5.0/
$ otool -L readline.bundle

produced

$ /usr/lib/libedit.3.dylib (compatibility version 2.0.0, current version 3.0.0)

Th reference to libedit.3.dylib shows that Ruby was using libedit instead of readline. libedit doesn't know how to deal with Unicode text, apparently.

The solution suggested is to recompile Ruby using readline instead, as described in the [ruby-build wiki] (https://github.com/sstephenson/ruby-build/wiki).

$ brew install readline
$ RUBY_CONFIGURE_OPTS=--with-readline-dir="$(brew --prefix readline)" rbenv install 2.0.0-p247

after this, otool reports

 $ /usr/local/opt/readline/lib/libreadline.6.2.dylib (compatibility version 6.0.0, current version 6.2.0)

Ruby is now using readline, and irb should now display Unicode text properly.

Make Sure $LANG is Set Properly

If the above instructions don't solve the problem, make sure that the $LANG environment variable has a sensible value.

The $LANG environment variable is normally set depending on the Language and Region settings in System Preferences (Mac OS X).

$ echo $LANG

should print out the region setting.

In my setup, however, I use a custom "Region" setting, and $LANG was blank. I edited my .zshrc file to include the following:

export LANG="en_GB.UTF-8"

and hey presto - irb now displays Unicode text!

$ irb
>> "Es blüht so grün wie Blüten blüh'n im Frühling".encoding
=> #<Encoding:UTF-8>
>> "Lenn délen édes éjen édent remélsz".unicode_titlecase
=> "Lenn Délen Édes Éjen Édent Remélsz"
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.