Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS.Cmd \n -> \r\n conversions on Windows #65

Open
hcarty opened this issue Oct 28, 2016 · 3 comments
Open

OS.Cmd \n -> \r\n conversions on Windows #65

hcarty opened this issue Oct 28, 2016 · 3 comments

Comments

@hcarty
Copy link
Contributor

hcarty commented Oct 28, 2016

Under Windows, using the following programs the \n in data is converted to \r\n when the two programs are compiled and run as main.exe test.exe:

screen shot 2016-10-28 at 3 43 04 pm

Under something more Unix-y there is no issue, I'm guessing because there is no difference there between "text" and "binary" IO.

main.ml

open Rresult
open Bos

let () =
  let exe = Fpath.of_string Sys.argv.(1) |> R.error_msg_to_invalid_arg in
  let cmd = Cmd.(v (p exe)) in
  let data = "Hello world\n\x00\x00booooo" in
  let cmd_stdin = OS.Cmd.in_string data in
  let run = OS.Cmd.run_io ~err:OS.Cmd.err_stderr cmd cmd_stdin in
  match OS.Cmd.out_string run with
  | Ok (output, (_, `Exited 0)) ->
    if output = data then
      print_endline "Good"
    else (
      print_endline "Not good";
      Printf.printf "%S\n" data;
      Printf.printf "%S\n" output
    )
  | _ -> print_endline "Bad"

test.ml

open Rresult
open Bos

let () =
  OS.File.read OS.File.dash
  |> R.error_msg_to_invalid_arg
  |> print_string
@dbuenzli
Copy link
Owner

dbuenzli commented Oct 30, 2016

Isn't the problem here that you are using Pervasives.print_string in test.ml ? I guess this is what does the translation.

I just checked and the OS.Cmd combinators only use Unix.openfile (which AFAIK is oblivious of the distinction between text and binary) and OS.File only uses open_in_bin.

@dbuenzli
Copy link
Owner

In fact you are using OS.File.read with OS.File.dash which in this case will not use open_in_bin but simply use Pervasives.stdin. So I guess this is the actual source of the problem.

I could use Pervasives.set_binary_mode_in on Pervasives.stdin before doing the read but there's no way to query the status to set it back to its previous state. I guess these are the only things that can be done:

  1. Simply document this fact and let users call set_binary_mode_{in,out} themselves.
  2. Call set_binary_mode_{in,out} on stdin and stdout in OS.File's IO functions and document the side effect performed on the channel.

What would Windows users expect ?

I kind of think that these automatic translations are more harmful than beneficial so I'm tempted by 2. With 1. it basically means that non Windows user have to think about this and they will not.

/cc @fdopen @dra27

@dbuenzli dbuenzli added the bug label Oct 30, 2016
@dra27
Copy link

dra27 commented Nov 1, 2016

The source of this problem is print_string - all the calls in main.ml use binary Unix.file_descr so test.ml should only receive \n on stdin, therefore the fact that stdin is in text mode won't matter. However, since stdout is in text mode in test.ml, print_string will translate \n\r\n which main.ml reads back without translation. However, if main.ml actually had "Hello world\r\n\x00\x00booooo" then this example would work, but for scary and unstable reasons! (and weird examples like "Hello world\r\n\n\x00\x00booooo" would not work)

I think the expectation would be that stdhandles are treated in the same way as files, certainly. I don't have a reasoned opinion on which way it should be, though, I'm afraid! There is already a disparity between how Unix and Pervasives do this, for example:

Unix.(write stdout "\n" 0 1) |> ignore;
Pervasives.(output stdout "\n" 0 1) |> ignore

writes "\n\r\n" to stdout on Windows.

I expect that the rationale for this may be that the standard library behaves like the underlying C runtime (where \n really should mean "do whatever is necessary to make a line-end" and in Microsoft C by default does translations) vs the Unix library which aims to behave the same way on all platforms as far as is reasonably possible.

You can detect whether a channel is in binary mode via an exposed runtime function, but you have to use a slightly unscrupulous, if entirely safe, C stub to get at it:

#include <caml/memory.h>

CAMLextern int caml_channel_binary_mode (void *);

CAMLprim value caml_ml_channel_binary_mode(value vchannel)
{
  CAMLparam1(vchannel);
  CAMLreturn(Val_bool(caml_channel_binary_mode(*((void **) (Data_custom_val(vchannel))))));
}

which can then be used with

external get_binary_mode_in : in_channel -> bool = "caml_ml_binary_mode"
external get_binary_mode_out : out_channel -> bool = "caml_ml_binary_mode"

It's a surprising omission from Pervasives...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants