Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting codepage UTF-8 (65001) produces garbled double quotes in GHC output #824

Closed
Lokathor opened this issue Aug 20, 2015 · 21 comments
Closed

Comments

@Lokathor
Copy link

When stack sets the code page within cmd.exe to try and make GHC output appear correct, it actually makes it worse instead. The fancy double quotes that GHC uses end up totally garbled.

example garbled text:

D:\Dropbox\dev\fbmessageparse\src\Main.hs:11:21:
Not in scope: ΓÇÿhGetContentsΓÇÖ
Perhaps you meant ΓÇÿgetContentsΓÇÖ (imported from Prelude)

Using Windows 7 64bit, which might be the problem since the stack for windows builds are done on Windows 8.1, but the codepages should be consistent enough across windows versions.

@snoyberg
Copy link
Contributor

Please see #793 and https://ghc.haskell.org/trac/ghc/ticket/10762. The reason for this change wasn't display, but to work around a GHC bug. I've already gotten the necessary changes merged upstream with GHC.

@borsboom
Copy link
Contributor

@Lokathor I don't have a Windows 7 machine to test on (Windows 10 has never exhibited this problem for me), but I just pushed a commit that may help. Can you try it?

@Lokathor
Copy link
Author

well I checked out stack and tried to build it from the repo using my current version. Things seemed good for a while until it was time to build the stack package itself:

stack-0.1.3.1: configure
Configuring stack-0.1.3.1...
ghc: warning: _tzset from msvcrt is linked instead of __imp__tzset
stack-0.1.3.1: build
Preprocessing library stack-0.1.3.1...

D:\Dropbox\dev\stack\src\Stack\Docker.hs:66:18:
Could not find module ΓÇÿSystem.Posix.SignalsΓÇÖ
Perhaps you meant System.Posix.Internals (from base-4.8.1.0)
Use -v to see a list of the files searched for.

D:\Dropbox\dev\stack\src\Stack\Exec.hs:25:18:
Could not find module ΓÇÿSystem.Posix.ProcessΓÇÖ
Perhaps you meant
System.Posix.Types (from base-4.8.1.0)
System.Win32.Process (needs flag -package-key Win32-2.3.1.0@Win32_JH0ECVJd
FmmG0JOvttvGqi)
Use -v to see a list of the files searched for.
ghc: warning: _tzset from msvcrt is linked instead of __imp__tzset
Completed all 125 actions.

-- While building package stack-0.1.3.1 using:
C:\Users\Daniel\AppData\Local\Programs\stack\x86_64-windows\ghc-7.
10.2\bin\runhaskell.exe -package=Cabal-1.22.4.0 -clear-package-db -global-pack
age-db -package-db=C:\Users\Daniel\AppData\Roaming\stack\snapshots\x86_64-window
s\lts-3.0\7.10.2\pkgdb\ C:\Users\Daniel\AppData\Local\Temp\stack7928\Setup.hs --
builddir=.stack-work\dist\x86_64-windows\Cabal-1.22.4.0\ build exe:stack --ghc-o
ptions -hpcdir .stack-work\dist\x86_64-windows\Cabal-1.22.4.0\hpc.hpc\ -ddump-h
i -ddump-to-file
Process exited with code: ExitFailure 1

@borsboom
Copy link
Contributor

@Lokathor I've fixed the build error, but at this point I very much doubt my fix will actually work on Windows.

@snoyberg
Copy link
Contributor

I think the change I've just pushed will solve the problem, can you give it a shot?

@borsboom
Copy link
Contributor

I'm curious if the fix works for symbols that contain unicode. My hunch is that it won't (they'll still be garbage, since the encoding of stdout/stderr is UTF-8 on Windows at this point, so transliteration won't be performed when writing to them). Not sure if it's worth worrying about, since hopefully messing with the codepage/encoding won't be necessary anymore once the next version of GHC is released (although I suppose we might want to keep it enabled for older GHCs, maybe?)

@Lokathor
Copy link
Author

Did the fix add a new dependency? When I tried again just now it couldn't even get to the part where stack is built because a package before it failed:

D:\Dropbox\dev\stack>stack build
Setting codepage to UTF-8 (65001) to ensure correct output from GHC
Didn't see ignore-0.1.0.0 in your package indices. Updating and trying again.
Updating package index Hackage (mirrored at https://github.com/commercialhaskell

Fetched package index.
Populated index cache.
pcre-light-0.4.0.3: download
Glob-0.7.5: download
system-filepath-0.4.13.4: download
pcre-light-0.4.0.3: configure
system-filepath-0.4.13.4: configure
system-filepath-0.4.13.4: build
Glob-0.7.5: configure
Glob-0.7.5: build
Glob-0.7.5: install
system-filepath-0.4.13.4: install
Progress: 3/
-- While building package pcre-light-0.4.0.3 using:
C:\Users\Daniel\AppData\Local\Programs\stack\x86_64-windows\ghc-7.
10.2\bin\runhaskell.exe -package=Cabal-1.22.4.0 -clear-package-db -global-pack
age-db -package-db=C:\Users\Daniel\AppData\Roaming\stack\snapshots\x86_64-window
s\lts-3.0\7.10.2\pkgdb\ C:\Users\Daniel\AppData\Local\Temp\stack1268\pcre-light-
0.4.0.3\Setup.lhs --builddir=.stack-work\dist\x86_64-windows\Cabal-1.22.4.0\ con
figure --user --package-db=clear --package-db=global --package-db=C:\Users\Danie
l\AppData\Roaming\stack\snapshots\x86_64-windows\lts-3.0\7.10.2\pkgdb\ --depende
ncy=base=base-4.8.1.0-5e8cb96faebe2db97f24c6e11c6070d6 --dependency=bytestring=b
ytestring-0.10.6.0-e962539fa73878c53cfd606fc18d1ab5 --libdir=C:\Users\Daniel\App
Data\Roaming\stack\snapshots\x86_64-windows\lts-3.0\7.10.2\lib --bindir=C:\Users
\Daniel\AppData\Roaming\stack\snapshots\x86_64-windows\lts-3.0\7.10.2\bin --data
dir=C:\Users\Daniel\AppData\Roaming\stack\snapshots\x86_64-windows\lts-3.0\7.10.
2\share --docdir=C:\Users\Daniel\AppData\Roaming\stack\snapshots\x86_64-windows
lts-3.0\7.10.2\doc\pcre-light-0.4.0.3 --htmldir=C:\Users\Daniel\AppData\Roaming
stack\snapshots\x86_64-windows\lts-3.0\7.10.2\doc\pcre-light-0.4.0.3 --haddockdi
r=C:\Users\Daniel\AppData\Roaming\stack\snapshots\x86_64-windows\lts-3.0\7.10.2
doc\pcre-light-0.4.0.3
Process exited with code: ExitFailure 1
Logs have been written to: D:\Dropbox\dev\stack.stack-work\logs\pcre-light-
0.4.0.3.log

Configuring pcre-light-0.4.0.3...
Setup.lhs: Missing dependency on a foreign library:
* Missing C library: pcre
This problem can usually be solved by installing the system package that
provides this library (you may need the "-dev" version). If the library is
already installed but in a non-standard location then you can use the flags
--extra-include-dirs= and --extra-lib-dirs= to specify where it is.
ghc: warning: _tzset from msvcrt is linked instead of __imp__tzset

7

@borsboom
Copy link
Contributor

That was added by #831 and we're planning to remove it.

@snoyberg
Copy link
Contributor

Perhaps on windows we should change the code page but not the character encoding. That will make the dump files and logs utf8, but use proper replacement characters.

The error you're running into with install is due to #831, see the discussion there.

@borsboom
Copy link
Contributor

Perhaps on windows we should change the code page but not the character encoding. That will make the dump files and logs utf8, but use proper replacement characters.

I'm pretty sure we're talking about the same thing, but just to be entirely clear, do you mean these two hSetEncoding lines?

Just want to understand the implications. First: is a handle's character encoding inherited by sub-processes? I don't think so, since handles don't have an encoding at the OS level (it's just part of GHC I/O subsystem). So changing these would make no difference at all in terms of what the ghc process writes to stdout/stderr? And the ghc process will write UTF-8 regardless, because we've set the code page to? If that's all correct, then for sure I agree with your suggestion (our code will then read the UTF-8 from GHC, decode it to Text, and then it'll be transliterated appropriately when we re-write it to stderr).

@snoyberg
Copy link
Contributor

Bingo

On Sun, Aug 23, 2015, 8:53 PM Emanuel Borsboom notifications@github.com
wrote:

Perhaps on windows we should change the code page but not the character
encoding. That will make the dump files and logs utf8, but use proper
replacement characters.

I'm pretty sure we're talking about the same thing, but just to be
entirely clear I mean these two hSetEncoding lines

hSetEncoding stdout utf8

.

Just want to understand the implications. First: is a handle's character
encoding inherited by sub-processes? I don't think so, since handles don't
have an encoding at the OS level (it's just part of GHC I/O subsystem). So
changing these would make no difference at all in terms of what the ghc
process writes to stdout/stderr? And the ghc process will write UTF-8
regardless, because we've set the code page to? If that's all correct, then
for sure I agree with your suggestion (our code will then read the UTF-8
from GHC, decode it to Text, and then it'll be transliterated appropriately
when we re-write it to stderr).


Reply to this email directly or view it on GitHub
#824 (comment)
.

@borsboom
Copy link
Contributor

Ok, that change is made.

@borsboom
Copy link
Contributor

@Lokathor The pcre dependency introduced in #831 has now been removed, so you should be able to build.

@Lokathor
Copy link
Author

I was able to build the current revision and it causes GHC's special quotes to display properly.

example:

Old stack:
D:\Dropbox\dev\stack.stack-work\install\x86_64-windows\lts-3.0\7.10.2\bin>stack exec ghc -- --make Main.hs
Setting codepage to UTF-8 (65001) to ensure correct output from GHC
[1 of 1] Compiling Main ( Main.hs, Main.o )

Main.hs:5:8: Not in scope: ΓÇÿfooΓÇÖ

Newly built stack:
D:\Dropbox\dev\stack.stack-work\install\x86_64-windows\lts-3.0\7.10.2\bin>stack2.exe exec ghc -- --make Main.hs
Caching build plan
[1 of 1] Compiling Main ( Main.hs, Main.o )

Main.hs:5:8: Not in scope: `foo'

@borsboom
Copy link
Contributor

Thanks for confirming.

@dtaskoff
Copy link

I got this yesterday both with the official stack release (1.5.1) and after trying upgrade --git

Setting codepage to UTF-8 (65001) to ensure correct output from GHC
Cabal file warning in ...cabal: Ignoring unknown section type: custom-setup
Cabal file warning in ...cabal: Ignoring unknown section type: custom-setup
Cabal file warning in ...cabal: Ignoring unknown section type: custom-setup
Cabal file warning in ...cabal: Ignoring unknown section type: custom-setup
Invalid package ID: "base-4.9.1.0 bytestring-0.10.8.1"

@snoyberg
Copy link
Contributor

snoyberg commented Aug 29, 2017 via email

@dtaskoff
Copy link

dtaskoff commented Aug 29, 2017

Okay, I'll first try the workaround mentioned here and if it doesn't work, I'll open a new issue.

@Ciantic
Copy link

Ciantic commented Jun 20, 2018

I haven't done anything fancy just installed new stack on a new computer.

Some examples below:

stack build
WARNING: caWcARhNeIN Gi:s  coaucth oef  idsa toeu:t  oCf: /dUasteer:s /Cj:a/rUispe/rAsp/pjDaartiap//LAopcpaDla/tPar/oLgorcaamls//Psrtoagcrka/mxs8/6s_t6a4c-kw/ixn8d6o_w6s4/-gwhicn-d8o.w0s./2g\hlci-b8\.p0a.c2k\algieb.\cpoancfk.adg\ep.accoknafg.ed.\cpaacchkea
ggeh.cc awcihlel
 gshece  wainl lo lsde ev iaenw  oolfd  tvhiiesw  poafc ktahgies  dpba.c kUasgee  'dgbh.c- pUksge  r'egchacc-hpek'g  troe cfaicxh.e
' to fix.
WARNING: cache is out of date: C:/Users/jarip/AppData/Local/Programs/stack/x86_64-windows/ghc-8.0.2\lib\package.conf.d\package.cache
ghc will see an old view of this package db. Use 'ghc-pkg recache' to fix

It mutates the text on each run:

WARNING: cache is out of date: WCA:R/NUIsNeGr:s /cjaacrhiep /iAsp poDuta toaf/ Ldoactael:/ PCr:o/gUrsaemrss//sjtaaricpk//Axp8p6D_a6t4a-/wLioncdaolw/sP/grhco-g8.r0ams./2s\tlaicbk/x\8p6ackag_e6.4c-ownifn.ddo\wpsa/cgkhacg-e8..c0a.c2h\el
igbh\cp awciklalg es.eceo nafn. do\lpda cvkiaegwe .ocfa cthhei
sg hpca cwkialgle  sdebe.  aUns eo l'dg hvci-epwk go fr etchaicsh ep'a ctkoa gfei xd.b

I also tried to reinstall stack and removing C:\sr but it just keeps talking in languages I don't comprehend.

@snoyberg
Copy link
Contributor

This also looks unrelated to the issue at hand. That said, I think this is something that's fixed on master. Can you try stack upgrade --git and see if the problem persists? If so, please open a new issue.

@Kenzku
Copy link

Kenzku commented Apr 4, 2022

Hej,

I tried Haskell today, and I saw something like this as well, they looks good if I copy over to a text editor, but they looks very different in the Mac terminal (in the attachment)

ghci> :info Bool
type Bool :: *
data Bool = False | True
-- Defined in ‘ghc-prim-0.7.0:GHC.Types’
instance Eq Bool -- Defined in ‘ghc-prim-0.7.0:GHC.Classes’
instance Ord Bool -- Defined in ‘ghc-prim-0.7.0:GHC.Classes’
instance Enum Bool -- Defined in ‘GHC.Enum’
instance Show Bool -- Defined in ‘GHC.Show’
instance Read Bool -- Defined in ‘GHC.Read’
instance Bounded Bool -- Defined in ‘GHC.Enum’

Screenshot 2022-04-04 at 16 16 45

Any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants