Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

Use UTF-8 encoding for std{in,out,err} #11168

Closed

Conversation

execjosh
Copy link
Contributor

This fixes issue #11162.

File constructor takes a QTextCodec *, codec; but, if codec is NULL, then it assumes "binary" mode, which causes non-ASCII characters to be converted to NUL (\0) in File::write.

This change passes the codec for UTF-8 to the File constructor for the std{in,out,err} instances, thus opening them in text mode.

@execjosh
Copy link
Contributor Author

@metaskills: Could you try this patch on your end? It seems to be working for me :)

# stdout.coffee
require("system").stdout.write "ascii 日本語 ✓"
phantom.exit 0
$ ./bin/phantomjs stdout.coffee | hexdump -C
00000000  61 73 63 69 69 20 e6 97  a5 e6 9c ac e8 aa 9e 20  |ascii ......... |
00000010  e2 9c 93                                          |...|
00000013

@metaskills
Copy link
Contributor

@execjosh It does work!!! Cant wait till 1.9.1 :)

@metaskills
Copy link
Contributor

FYI, I found out that the built version in master does not "flush" stdout till the process exists. For example:

$ ./bin/phantomjs examples/stdin-stdout-stderr.coffee
^C

Where the 1.9 downloaded version does this.

$ phantomjs examples/stdin-stdout-stderr.coffee
Hello, system.stdout.write!
Hello, system.stdout.writeLine!
Hello, system.stderr.write!
Hello, system.stderr.writeLine!
system.stdin.readLine(): 
^C

@metaskills
Copy link
Contributor

I can report that one of these two commits has caused the problem with flushing. So if I revert back to da71c5f and build, I get the expected behavior.

@execjosh
Copy link
Contributor Author

Hmm... 54f1f60 shouldn't have any functional effects; therefore, the culprit must be e212d7e. Hmm...

I'm rebuilding the latest qt in phantomjs right now; so, it'll be a bit before I can build again.

Looking at the commit itself, though, it should have been split into two: one to refactor the File construction; and, one to use the Terminal's codec.

@execjosh
Copy link
Contributor Author

Okay, well, it looks like File's implementation uses a QTextStream, which buffers even if the QFile backend specifies QIODevice::Unbuffered. Hmm...

@execjosh
Copy link
Contributor Author

It should work as expected now!

@JamesMGreene
Copy link
Collaborator

Thanks for the quick turnaround, @execjosh! 👍

@execjosh
Copy link
Contributor Author

I wasn't completely satisfied with the fact that the Terminal encoding could go out of sync with system.std{in,out,err}; so, I went ahead and hacked in a signal.

Also, the examples/outputEncoding.{coffee,js} wasn't working for my local build. After some sniffing around, I found that the text codecs were only being linked on Windows. I've gotten the link part to work with 6bc6d63; though, I had to manually build src/qt/src/plugins/codecs to get the build to pass.

Can anyone help fix the qt build to automatically make the codecs?

@metaskills
Copy link
Contributor

I wasn't completely satisfied with the fact that the Terminal encoding
could go out of sync with system.std{in,out,err};

I had thought about the same thing. Glad you are addressing it. I could not build this either. I ended up with the following error. Are you saying that I need to do something before or after the build fail to build again?

g++ -headerpad_max_install_names -sectcreate __TEXT __info_plist Info.plist -arch i386 -o ../bin/phantomjs MachIPC.o phantom.o callback.o webpage.o webserver.o main.o csconverter.o utils.o networkaccessmanager.o cookiejar.o filesystem.o system.o env.o terminal.o encoding.o config.o childprocess.o repl.o gif_err.o gifalloc.o egif_lib.o gif_hash.o quantize.o gifwriter.o mongoose.o linenoise.o utf8.o qcommandline.o minidump_file_writer.o convert_UTF.o md5.o string_conversion.o crash_generation_client.o exception_handler.o minidump_generator.o dynamic_images.o breakpad_nlist_64.o bootstrap_compat.o file_id.o macho_id.o macho_utilities.o macho_walker.o string_utilities.o moc_phantom.o moc_callback.o moc_webpage.o moc_webserver.o moc_networkaccessmanager.o moc_cookiejar.o moc_filesystem.o moc_system.o moc_env.o moc_terminal.o moc_config.o moc_childprocess.o moc_repl.o moc_qcommandline.o qrc_phantomjs.o qrc_ghostdriver.o qrc_WebKit.o qrc_InspectorBackendStub.o    -L/Users/kencollins/Repositories/phantomjs/src/qt/lib -L/Users/kencollins/Repositories/phantomjs/src/qt/plugins/codecs -lqcncodecs -lqjpcodecs -lqkrcodecs -lqtwcodecs -lQtWebKit -L/Users/kencollins/Repositories/phantomjs/src/qt/lib -framework ApplicationServices -framework CoreFoundation -framework Security -framework Carbon -framework AppKit -framework SystemConfiguration -framework CoreServices -lQtGui -lQtNetwork -lQtCore -lz -lm 
ld: warning: directory not found for option '-L/Users/kencollins/Repositories/phantomjs/src/qt/plugins/codecs'
ld: library not found for -lqcncodecs
collect2: ld returned 1 exit status
make[1]: *** [../bin/phantomjs] Error 1
make: *** [sub-src-phantomjs-pro-make_default-ordered] Error 2

@execjosh
Copy link
Contributor Author

Okay, just as I suspected. The ./build.sh script isn't building the codecs.

I had to build them manually, like this:

$ pushd src/qt/src/plugins/codecs && make && popd

The codecs are automagically placed in src/qt/plugins/codecs and bin/phantomjs can then be linked.

@execjosh
Copy link
Contributor Author

Okay, I think I fixed the text codec build stuff (had to add a step to src/qt/preconfig.sh). It builds on my mac; but, I have not tested it on linux* nor Windows (I think it should work, though).

Can anyone help out and confirm that the linux* and Windows builds didn't get broken, and that the mac build works, too? It'd probably be best to run ./build.sh in a fresh clone, though.

@execjosh
Copy link
Contributor Author

I'm building on an Ubuntu box right now. I don't have the ability to build on Windows, though; so, I hope someone can help out on that front :)

@execjosh
Copy link
Contributor Author

Looks to me like the build is fine for Ubuntu :)

@metaskills
Copy link
Contributor

I can confirm the build is fine and as expected on OS X.

@ariya
Copy link
Owner

ariya commented Mar 30, 2013

It's too big of a change to be backported to 1.9.1. I think it needs to be splitted into two parts: critical fixes for the encoding itself and everything else (including new codes etc). For the latter, please file separate issues so that they can be tracked.

@execjosh
Copy link
Contributor Author

execjosh commented Apr 2, 2013

@ariya: Here's the list of problems and their consequences that I had jumbled into this pull request.

  • File treats binary mode as ASCII (or actually Latin-1, I think)
  • Consequence: UTF-8 and other encodings get garbled
  • File's text mode uses QTextStream
  • Consequence: writes are always buffered, even if the wrapped QFile was opened as unbuffered–this behavior is inconsistent with binary mode
  • System doesn't specify an encoding for std{in,out,err}
  • Consequence: ASCII-only binary mode garbles UTF-8 text
  • The encoding of a File instance cannot be changed
  • Consequence: Encoding can only be specified at time of construction
  • Terminal doesn't emit a signal when its encoding gets changed
  • Consequence: Nobody can receive notification of the change
  • On Mac, CJK text codecs are not available with the default build as-is
  • Consequence: East Asian encodings cannot be used

It makes the most sense for the std{in,out,err} encodings to always be in sync with the Terminal module. However, for this pull request, the std{in,out,err} encodings can be set to UTF-8 by default and File can be taught how to automagically flush when necessary in text mode. Then, the CJK codecs and Terminal encoding synchronization can each be in separate pull requests.

Does that sound like what you had in mind?

If the wrapped `QFile` was opened with `QIODevice::Unbuffered`, any
writes should be unbuffered.  However, as currently implemented,
using `QTextStream` when the `File` is in "text" mode causes all
reads/writes to be buffered.

This modification forces a flush in `File::write` if the wrapped
`QFile` was opened with `QIODevice::Unbuffered`.
This fixes issue ariya#11162.

`File` constructor takes a `QTextCodec *`, codec; but, if codec is
`NULL`, then it assumes "binary" mode, which causes non-ASCII
characters to be converted to NUL (`\0`) in `File::write`.

This change passes the codec for UTF-8 to the `File` constructor for
the `std{in,out,err}` instances, thus opening them in *text mode*.
@metaskills
Copy link
Contributor

I just can't wait to get these fixes in the wild. I have an open ticket on my mocha-phantomjs project to utilize the stdout features. It will make a lot of our code go away and open doors for things that we could not do before. Thanks to everyone for their hard work!

@JamesMGreene
Copy link
Collaborator

Not to tangent too much, but since he is being active here:
@execjosh: Any chance you want to tackle #10980 in another PR? :)

@ariya
Copy link
Owner

ariya commented Apr 4, 2013

@execjosh Thanks for the patch restructuring. I'll give a try!

@execjosh
Copy link
Contributor Author

execjosh commented Apr 4, 2013

You know, looking at #10973, there might need to be a way to set stdout to binary mode. The way I have it set up now is to always be in binary mode; but, after this change, it will always be in text mode.

The switch can be a part of implementing #10973, though, I think. Since we don't have anything like node's Buffer, it might even work as-is.

@execjosh
Copy link
Contributor Author

execjosh commented Apr 4, 2013

@JamesMGreene: That is definitely the next step for not only the standard streams; but, our filesystem API in general! I had attempted a year ago. However, implementing everything in one blow was a bigger bite than I could chew. I'd like to give it another go :)

@execjosh
Copy link
Contributor Author

@ariya: Does it work well for you? Synchronizing with the Terminal's encoding depends on this; so, once this gets your approval, then I'll open another pull request with the synchronization code.

@ariya
Copy link
Owner

ariya commented Apr 13, 2013

Sorry for the delay. Everything looks good so I merged it with minor tweaks (1) 4-space indentation (2) link to the issue in the first commit for a good cross-ref.

Thanks!

@ariya ariya closed this Apr 13, 2013
@execjosh execjosh deleted the fix-issue-11162-stdout-stderr-encoding branch April 13, 2013 13:25
@execjosh
Copy link
Contributor Author

Great! Sorry about the indentation problem. By the way, do you prefer the full URL in the commit message, as opposed to just the number?

@ariya
Copy link
Owner

ariya commented Apr 14, 2013

I think number is necessary so that it's referenced from the issue. The full URL can be there as a convenience bonus (when checking the log outside GitHub viewer).

ariya pushed a commit that referenced this pull request Apr 29, 2013
ariya pushed a commit that referenced this pull request Apr 29, 2013
If a `File` is in "text" mode, then it has an encoding.  This
encoding defaults to UTF-8; however, it can be set only at time of
construction (by using `fs.open`).

This modification allows the user to change the encoding on-the-fly
for "text" mode `File` instances.

See #11234 #11234
Spin off from #11168 #11168
ariya pushed a commit that referenced this pull request Apr 29, 2013
@grawk
Copy link

grawk commented May 8, 2013

Pardon my ignorance if the information is elsewhere, but when will 1.9.1 be released? I've added myself to the mailing list for future updates.
EDIT:
According to the mailing list, 1.9.1 is released. So I guess my question now is, when can I install 1.9.1 via brew or nom?

JamesMGreene referenced this pull request Jun 25, 2013
See [issue 333][1] and pull request #192.

**Caveat**

`File::read` currently takes no parameters and is equivalent to a
"`readAll`".  This will be changed later to match [IO/A Spec's
`Stream#read`][2]; but, should still be noted.

[1]: http://code.google.com/p/phantomjs/issues/detail?id=333
[2]: http://wiki.commonjs.org/wiki/IO/A#Instance_Methods
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants