Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GHC_CHARENC=UTF-8 when calling GHC #5742

Closed
nad opened this issue Jan 15, 2022 · 3 comments
Closed

Use GHC_CHARENC=UTF-8 when calling GHC #5742

nad opened this issue Jan 15, 2022 · 3 comments
Assignees
Labels
backend: ghc Haskell code generation backend ("MAlonzo") type: enhancement Issues and pull requests about possible improvements unicode
Milestone

Comments

@nad
Copy link
Contributor

nad commented Jan 15, 2022

We use hGetContents in callCompiler':

(_, out, err, p) <-
liftIO $ createProcess
(proc cmd args) { std_err = CreatePipe
, std_out = CreatePipe
}
-- In -v0 mode we throw away any progress information printed to
-- stdout.
case out of
Nothing -> __IMPOSSIBLE__
Just out -> forkTCM $ do
-- The handle should be in text mode.
liftIO $ hSetBinaryMode out False
progressInfo <- liftIO $ hGetContents out
mapM_ (reportSLn "compile.output" 1) $ lines progressInfo
errors <- liftIO $ case err of
Nothing -> __IMPOSSIBLE__
Just err -> do
-- The handle should be in text mode.
hSetBinaryMode err False
hGetContents err

The handles out and err use the default (locale) encoding. That is perhaps appropriate, if we know nothing about the encoding used by the process that we call. We only use callCompiler' to call GHC. Does GHC use the locale encoding when writing to stdout or stderr?

@nad nad added status: info-needed More information is needed from the bug reporter to confirm the issue. unicode labels Jan 15, 2022
@nad nad added this to the 2.6.3 milestone Jan 15, 2022
@nad
Copy link
Contributor Author

nad commented Jan 15, 2022

If the environment variable GHC_CHARENC is set to UTF-8, then GHC appears to use UTF-8 for stdout and stderr:

https://github.com/ghc/ghc/blob/0dc723957d0fdb5909f145405b775efea0fe2f6e/ghc/Main.hs#L113-L118

https://github.com/ghc/ghc/blob/0dc723957d0fdb5909f145405b775efea0fe2f6e/libraries/ghc-boot/GHC/HandleEncoding.hs#L9-L17

The variable GHC_CHARENC does not seem to be mentioned in the GHC documentation, but I still think we should make use of it. It seems to be available from at least GHC 8.

@nad nad changed the title Should callCompiler' use a different encoding? Use GHC_CHARENC=UTF-8 when calling GHC Jan 15, 2022
@nad nad added backend: ghc Haskell code generation backend ("MAlonzo") type: enhancement Issues and pull requests about possible improvements and removed status: info-needed More information is needed from the bug reporter to confirm the issue. labels Jan 15, 2022
@nad
Copy link
Contributor Author

nad commented Jan 15, 2022

Perhaps callCompiler or callCompiler' are used by other packages. I suggest that we add an optional encoding argument to these procedures.

@andreasabel
Copy link
Member

The variable GHC_CHARENC does not seem to be mentioned in the GHC documentation,

I reported this upstream: https://gitlab.haskell.org/ghc/ghc/-/issues/20963

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: ghc Haskell code generation backend ("MAlonzo") type: enhancement Issues and pull requests about possible improvements unicode
Projects
None yet
Development

No branches or pull requests

2 participants