Skip to content
This repository has been archived by the owner on Mar 7, 2024. It is now read-only.

REPL on windows doesn't recognize newlines #1

Closed
tizoc opened this issue Apr 10, 2017 · 48 comments
Closed

REPL on windows doesn't recognize newlines #1

tizoc opened this issue Apr 10, 2017 · 48 comments
Labels

Comments

@tizoc
Copy link
Member

tizoc commented Apr 10, 2017

No description provided.

@tizoc
Copy link
Member Author

tizoc commented Apr 10, 2017

@rkoeninger when you run this code in SBCL, whats the output? (you will see a bunch of warnings, ignore those, what is important is the result of the call to read-byte)

(setq *standard-two-way* (make-two-way-stream *standard-input* *standard-output*))
(setq *stinput* *standard-two-way*)
(read-byte *stinput* nil -1)

@tizoc tizoc added the bug label Apr 10, 2017
@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

@rkoeninger have you discovered anything new related to this? Also, you don't have this problem in the CLisp port, right?

@rkoeninger
Copy link
Member

@tizoc I'm going to look at this next.

And no, the CLisp port doesn't have this problem.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Ok, great. Btw, on my system the result of (read-byte *stinput* nil -1) then <enter> is 10. If it is different in your system then we have found the problem.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Well, it can be different, any combination of 13 and 10 is expected.

@rkoeninger
Copy link
Member

@tizoc If I start sbcl and enter (read-byte *STANDARD-INPUT* nil -1) on the repl and hit <enter>, nothing happens, it just goes to the next line.

If it type a and then hit <enter>, it prints 97, as expected.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

@rkoeninger you hit enter to complete the expression or you mean enter after the expression has been submitted to the repl?

It has to be: (read-byte *STANDARD-INPUT* nil -1)<enter><enter>

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Btw are you using any kind of alternative terminal program?

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

If it turns out the output is correct, here is another thing to try:

(define toplineread_loop
  -1 [] -> (exit 0)
  Byte _ -> (error "line read aborted")  where (= Byte (hat))
  Byte Bytes -> (let Line (compile (/. X (<st_input> X)) Bytes (/. E nextline))
                     It (record-it Bytes)
                     _ (output "~%Line: ~R~%Bytes: ~R~%" Line Bytes) \\ ++LINE ADDED++
                  (if (or (= Line nextline) (empty? Line))
                      (toplineread_loop (read-byte (stinput))
                                        (append Bytes [Byte]))
                      (@p Line Bytes)))
      where (element? Byte [(newline) (carriage-return)])
  Byte Bytes -> (toplineread_loop (read-byte (stinput))
                                  (if (= Byte -1)
                                      Bytes
                                      (append Bytes [Byte]))))

If the reason is not that the newline character is not recognized correctly, then it has to do with the parser not being able to handle it (probably because of some extraneous byte that gets inserted in the input).

@rkoeninger
Copy link
Member

I've been using ConEmu. The (READ-BYTE *STANDARD-INPUT* NIL -1) returns 10 for me on the standard Windows cmd terminal.

But the Shen REPL still doesn't handle newlines even when run from cmd.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Ok, good, just eliminating possibilities.

@rkoeninger
Copy link
Member

I wrote your above code in the kl because I didn't want to re-build the shen:

(defun shen.toplineread_loop (V3817 V3818)
    (cond
        ((and (= -1 V3817) (= () V3818))
         (exit 0))
        ((= V3817 (shen.hat))
         (simple-error "line read aborted"))
        ((element? V3817 (cons (shen.newline) (cons (shen.carriage-return) ())))
         (let Line (compile (lambda X (shen.<st_input> X)) V3818 (lambda E shen.nextline))
         (let It (shen.record-it V3818)
 \\ ADDED LINE BELOW
         (let V1234234 (do (print "Line: ") (do (print Line) (do (print "Bytes: ") (print V3818))))
         (if (or (= Line shen.nextline) (empty? Line))
           (shen.toplineread_loop (read-byte (stinput)) (append V3818 (cons V3817 ())))
           (@p Line V3818))))))
        (true
         (shen.toplineread_loop
           (read-byte (stinput))
           (if (= V3817 -1) V3818 (append V3818 (cons V3817 ())))))))

I start the shen repl, type (+ 1 2) and hit <enter>, it spits out "Line: "shen.nextline"Bytes: "[40 0 43 0 32 0 49 0 32 0 50 0 41 0] and doesn't print the sum or do anything else, just like before.

Using cmd just in case.

@rkoeninger
Copy link
Member

In the CLisp port, the above version of shen.toplineread_loop spits out "Line: "[[+ 1 2]]"Bytes: "[40 43 32 49 32 50 41].

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Well, there is the problem. It was not that the bytes had the value multiplied by 10, it is that between reads a NULL byte is inserted. This is what causes the parser to fail.

iirc you were running SBCL 1.3.15, I just checked the release notes of 1.3.16 and I don't see anything related to I/O bug fixes.

In my own install this:

(list (read-byte *stinput* nil -1) (read-byte *stinput* nil -1))<enter><enter><enter>

results in:

(10 10)

No weird bytes added in between.

Not sure what causes it, but we are getting closer.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

I'm now thinking that your SBCL install may be using a multibyte STDIN, could that be it? Trying to find if there is a way to control that in SBCL.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

@rkoeninger whats your output for this?

(stream-external-format *standard-input*)

mine is:

(:UTF-8 :REPLACEMENT #\REPLACEMENT_CHARACTER)

@rkoeninger
Copy link
Member

(:UCS-2LE :REPLACEMENT #\REPLACEMENT_CHARACTER). I figured it was 16-bit.

Can I insert something like (setf *stream-external-format* *standard-input* :utf-8) somewhere?

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Do you launch Shen from the console?

What I found (on random threads by Python and Go programmers having encoding issues in the console) is that you can change the encoding to UTF-8 by doing:

chcp 65001

So doing so before launching Shen may help.

I don't know if the encoding of STDIN can be changed after the process has started (or how it can be done in SBCL)

@rkoeninger
Copy link
Member

I don't think chcp 65001 helped.

I'm not a CommonLisp person, but I'm imaging wrapping the stdin stream in another stream that translates to a particular encoding.

Or having a different version of read-byte that reads two bytes, ignoring the second.

Does this make sense?

(IF (FIND :WIN32 *FEATURES*)) ; or identify a 16-bit encoding
  (DEFUN read-byte (S) (LET ((B (READ-BYTE S NIL -1))) (READ-BYTE S NIL -1) B)))
  (DEFUN read-byte (S) (READ-BYTE S NIL -1)))

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Thats not going to work, because it only "fixes" the problem for the console input, breaking it for everything else (including STDIN when it is not the terminal). It may also work in your environment but cause problems for other users.

Btw, the chcp command would be for launching SBCL, not Shen. I'm guessing here, but I think whatever SBCL decides is the port encoding at first, will be inherited by the Shen binary, and not change between runs, thats why the binary provided by Mark works fine for you, while the one you are compiling yourself doesn't (the port code didn't change anything related IO, the only relevant thing changing between a binary and the other is the environment in which it was compiled).

So, somehow, the SBCL input encoding has to change before generating the Shen binary, not after.

@rkoeninger
Copy link
Member

I ran chcp before build, then tried running shen.exe.

There's no way to say something like (setq *stinput* (with-format :ascii *standard-input*))?

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

I don't know, I tried to find a way to do it and the best thing I found was the chcp command that fixed the issue to other people for what they were doing. If I have time during the weekend I will investigate it more.

It is also not entirely clear to me how the terminal decides which encoding to send to the program, the older Shen.exe, and your CLisp compiled one do not have this problem, their STDIN is in a single-byte encoding. And what is more important, the old Shen.exe uses almost the same code, what is different is the environment in which it was compiled (that is, whatever version of Windows and SBCL Mark used at the time), there was no code to set the encoding of STDIN.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

@rkoeninger can you try obtaining the SBCL version from the old Shen binary? Being that it was compiled without the changes that get rid of the SBCL flags, --version should still work on it (or maybe it uses /version on windows, no idea).

If we know the version, we can then see if something changed between SBCL versions regarding terminal I/O. It will also let you test the current code compiled with that older version of SBCL to see how it behaves.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Also, when running that Shen binary, whats the output of: (STREAM-EXTERNAL-FORMAT (value *stinput*))

@rkoeninger
Copy link
Member

rkoeninger commented Apr 13, 2017

the sbcl that comes in the 19.2 download is version 1.1.1. When I try to run it, I get:

Expression unexpectedly false: win32-os.c:465
 ... VirtualAlloc(addr, len, (mem_info.State == MEM_RESERVE)? MEM_COMMIT: MEM_RESERVE, PAGE_EXECUTE_READWRITE)
     ===> returned #X00000000,
     (in thread 00000000) ... Win32 thinks:
     ===> code 487, message => Attempt to access invalid address.

 ... CRT thinks:
     ===> code 0, message => No error
ensure_space: failed to validate 943718400 bytes at 0x22300000
(hint: Try "ulimit -a"; maybe you should increase memory limits.)

(STREAM-EXTERNAL-FORMAT (value *stinput*)) gives [UCS-2LE REPLACEMENT �funex1345�]. I embedded that in the kl.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Well, thats very interesting, it means that the encoding of STDIN is the same as the one compiled with a newer SBCL version (that throws a big part of my theory out of the window, what still stands is the difference in SBCL versions used to compile it). I will investigate later and see if I find what changed between 1.1.1 and 1.3.x.

@rkoeninger
Copy link
Member

rkoeninger commented Apr 13, 2017

Oh, sorry the [UCS-2LE...] output was from the version i've been trying to build. I guess that didn't tell us anything we didn't know earlier.

I don't know what the pre-built shen.exe would say.

Trying to run the shen.exe that comes with the standard download gives me the ugly error above.

...i thought it worked before? I'll have to try it on my other machine, or on something that doesn't have sbcl installed separately.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

I found this: https://bugs.launchpad.net/sbcl/+bug/1660906

It is a fix introduced 1.3.15 with the description bug fix: better console IO on Windows.

I don't think that is the issue, leaving it here just for the record.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

Aaaaand, in version 1.1.2: enhancement: Console I/O streams use UCS-2.

This is why the old binary works and the new one doesn't.

I guess this is the code that does that:

https://github.com/sbcl/sbcl/blob/202e00628820daf9a6d141daea222273cb252f6d/src/code/fd-stream.lisp#L2557-L2566

I don't know what the solution is yet but I'm starting to get a better idea of how to approach it.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

And the change: sbcl/sbcl@eac461c

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

@rkoeninger just pushed a new branch named fix-win-console-io, can you try that one?

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

If that doesn't work we have to figure out another way to make a separate stream for STDIN. I know binding it to the file descriptor with id 0 will work on unix-like systems, not sure if it will work on windows.

@rkoeninger
Copy link
Member

rkoeninger commented Apr 13, 2017

It built, and when I ran shen.exe:

couldn't read from #<SB-SYS:FD-STREAM for "descriptor 0" {10001F0FE3}>: Access is denied.

And then it crashed.

@tizoc
Copy link
Member Author

tizoc commented Apr 13, 2017

I found something very weird. If I type anything else other than <enter> as the first character, it breaks in the same way you just saw (but without crashing). If I instead start the interaction with an <enter> (without typing anything else before), then the REPL works fine.

@rkoeninger
Copy link
Member

For me, shen.exe fails before I have the opportunity to provide any input.

If I type (READ-BYTE (SB-SYS:MAKE-FD-STREAM 0 :INPUT T :ELEMENT-TYPE '(UNSIGNED-BYTE 1) :EXTERNAL-FORMAT :UTF-8)) NIL -1) on the sbcl repl, i get the Access Denied and another repl prompt.

@rkoeninger
Copy link
Member

rkoeninger commented Apr 13, 2017

This library seems to have the ability to wrap a stream in another stream with a different format:

http://weitz.de/flexi-streams/#make-flexi-stream

It's built on gray streams:

https://github.com/sbcl/sbcl/blob/master/src/pcl/gray-streams.lisp

@tizoc
Copy link
Member Author

tizoc commented Apr 14, 2017

@rkoeninger I made some changes, one that fixes the intermittent io errors I was having on OSX when using the new branch, another one that may fix the crash you were having on windows, give it another try when you get a chance.

And just in case, don't think I'm not considered your suggestions, it is just that I'm trying to solve this at the right level, wrapping the streams will cause different issues because doing so just masks the problem instead of solving it, and as a result we may end having different problems (like for example, what happens when STDIN is not the console, but a redirected file).

@rkoeninger
Copy link
Member

fix-win-console-io no longer builds on my windows machine:

; caught ERROR:
;   during macroexpansion of
;   (EXTERN-ALIEN "_get_osfhandle" (FUNCTION INT INT-PTR)). Use *BREAK-ON-SIGNALS*
;   to intercept.
;
;    unknown alien type: COMMON-LISP-USER::INT-PTR

@tizoc
Copy link
Member Author

tizoc commented Apr 14, 2017

@rkoeninger mmm.... it is supposed to be defined on windows, maybe it needs some prefix (one I don't know of).

Does it work if you change INT-PTR to (SIGNED 64) (this is assuming your windows is 64 bits, if not, use 32 instead).

@tizoc
Copy link
Member Author

tizoc commented Apr 14, 2017

In the SBCL REPL, does this work? (sb-win32:get-osfhandle 0)

Edited: the prefix is sb-win32 instead of sb!win32

@rkoeninger
Copy link
Member

(SIGNED 64) allows it to build, but when I run shen.exe, i get copies of bad thing to be a type specifier: "EXCEPTION_ACCESS_VIOLATION" flying down the screen.

(sb!win32:get-osfhandle 0) raises Package SB!WIN32 does not exist..

@tizoc
Copy link
Member Author

tizoc commented Apr 14, 2017

@rkoeninger I borrowed a machine with windows, I can now try stuff without bothering you (thanks for your patience btw). I explained the situation in SBCL's irc, but the answers are not very helpful (it is basically, "well, shen shouldn't be doing that"), I'm not even being told if what I'm trying to do is possible at all or plain nonsense.

@tizoc
Copy link
Member Author

tizoc commented Apr 14, 2017

@rkoeninger just pushed a new branch fix-win-console-io-again. I don't like that fix at all, but it seems to work. Ctrl+C and Ctrl+D don't work like in other OSes, but I don't know if thats because of how the windows terminal works or something else (this is not something that older versions of Shen handled).

@rkoeninger
Copy link
Member

When I try to build the fix-win-console-io-again branch, I get:

Heap exhausted during garbage collection: 0 bytes available, 16 requested.
 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB   LUB  !move  Alloc  Waste   Trig    WP  GCs Mem-age
   0:     0     0     0     0     0     0     0     0     0        0     0 10737418    0   0  0.0000
   1: 18423     0     0     0  9852     0     0     0     4 322732016 98320 10737418    0   0  1.0000
   2: 32459 32458     0     0 15639   118    12     0    48 516265008 453584  2000000 15604   0  0.0339
   3:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   4:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   5:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   6:     0     0     0     0  1107   354     0     0     0 47874048     0  2000000 1062   0  0.0000
   Total bytes allocated    = 1073189920
   Dynamic-space-size bytes = 1073741824
GC control variables:
   *GC-INHIBIT* = true
   *GC-PENDING* = true
   *STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 14784(tid 00000000011F71B0):
Heap exhausted, game over.

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> quit
Really quit? [y] y

@tizoc
Copy link
Member Author

tizoc commented Apr 15, 2017

Sorry, I messed something up when writing the code again on my machine (what I did in the windows machine worked, I just could not commit from there so I just rewrote the code here, but made a mistake).

Already pushed the fixed version.

@rkoeninger
Copy link
Member

With your latest change, it builds and the shen repl handles input as expected.

I can even do Ctrl + D and hit Enter and it exits the shen repl.

Also, the test suite passes just fine (~5.1s), so I guess file reads are still good.

Good job, man! Thanks for all your time and effort on this!

@tizoc
Copy link
Member Author

tizoc commented Apr 15, 2017

I can even do Ctrl + D and hit Enter and it exits the shen repl.

Oh, thats great. The "windows" machine I was testing on was a VirtualBox VM running on someone else's Linux computer, I guess thats why those keys were being weird when I tested.

Btw, don't merge anything yet, I'm thinking of a cleaner solution that doesn't have to special-case windows or check for *stinput*, but it requires a small change in the kernel. I have to go now, but once it is done I will let you know so that you can give it a try and tell me your opinion.

@tizoc
Copy link
Member Author

tizoc commented Apr 24, 2017

This should be fixed now.

@tizoc tizoc closed this as completed Apr 24, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants