.. author:: Alp Mestanogullari
.. date-accepted::
.. ticket-url::
.. implemented::
Contents
Define and use a hierarchy of data types for all the errors, warnings and suggestions that GHC emits. Only render the error values in GHC-the-program, at the very last minute, to allow GHC API users to grab those error values instead of the current error documents. This new representation would be a lot easier to work with for IDE tooling developers and GHC API users in general: pattern matching on the new data types would replace parsing of the error messages.
Up until this day, the errors that GHC emits are represented as mere textual documents. They often involve fancy layout constructs but GHC's code never manipulates errors, warnings or suggestions as values of some algebraic data types that are later rendered in the user's terminal; errors only ever exist as documents.
This means that developers of IDE-style tooling (e.g
Haskell IDE Engine) have
to parse the error messages and warnings to implement some of their
features. This is not ideal and is made worse by the possibility that some
error messages might change in their contents or formatting, from a GHC
version to another. It would be a lot simpler for those developers to
put their hands on good old Haskell values that describe the errors,
warnings and suggestions that GHC reports for a given Haskell module.
The IDE tools and GHC API programs in general would then be able to
inspect those values and extract any relevant information without having
to parse any text, by simply traversing ADTs (e.g collect suggestions
and offer a feature that applies them all with a simple command or
keystroke). The ghc
library would also come with code for rendering
those errors, making it easy for GHC API consumers to reuse all of GHC's
error printing infrastructure.
Note: the textual rendering of the errrors, warnings and suggestions should remain identical, this proposal really is about GHC's "internal" (it is exposed to GHC API users, so not entirely internal) representation of errors.
The current representation of errors and warnings in GHC is based on the following data types.:
data ErrMsg = ErrMsg {
errMsgSpan :: SrcSpan,
errMsgContext :: PrintUnqualified,
errMsgDoc :: ErrDoc,
errMsgShortString :: String,
errMsgSeverity :: Severity,
errMsgReason :: WarnReason
}
data ErrDoc = ErrDoc {
errDocImportant :: [MsgDoc],
errDocContext :: [MsgDoc],
errDocSupplementary :: [MsgDoc]
}
type MsgDoc = SDoc
-- for completeness: warnings are represented with the same type
-- as errors.
type WarnMsg = ErrMsg
where SDoc
is a type from GHC's pretty printing infrastructure that
represents configurable textual documents.
GHC then maintains a bag of ErrMsg
and a bag of WarnMsg
as
compilation proceeds and reports them when appropriate.:
data TcLclEnv = TcLclEnv
{ ...
, tcl_errs :: TcRef Messages -- Place to accumulate errors
, ...
}
type Messages = (WarningMessages, ErrorMessages)
type WarningMessages = Bag WarnMsg
type ErrorMessages = Bag ErrMsg
We propose to replace ErrDoc
with several algebraic data types, each
representing the different errors/warnings that might arise from a given
GHC subsystem. For example (simplified)::
data RenamerError
= NotInScope OccName [Name] -- unknown name, suggestions
| ...
data TypecheckerError
= OccursCheck Type Type
| ...
...
We could even split error types further if necessary, making it a slightly more elaborate/deep hierarchy. The exact shape of the said hierarchy has yet to be determined, as it will be best informed by staring at the error generation code that GHC has today.
We would also provide a toplevel error sum type, GHCError
, but only in the
driver.:
data GHCError
= PsError ParseError
| RnError RenamerError
| TcError TypecheckerError
| ...
Each subsystem would stick to its dedicated error type and the driver
could then easily wrap those up appropriately to get a GHCError
. This would
allow us to store and more generally treat uniformly errors from different
systems at the top of the call stack (therefore avoiding potential cycles), and
GHC API users would be free to handle the error values differently.
We would quite want to likely define many helpful little functions to assist in
the construction of error values, for each subsystem, e.g
notInScopeError :: OccName -> Name -> RenamerError
or
occursCheckError :: Type -> Type -> TypecheckerError
, with some possibly
requiring a suitable monadic context to gather extra information to stick in the
error value. We may or may not want to move that logic to a dedicated place.
We could then have an IsError
class, defined as follows::
class IsError e where
errorMessage :: e -> ErrDoc
coming with instances for all the subsystem-specific error types, as well as
GHCError
, so that any value of one of those types can be used directly to
generate an error document without having to resort to any kind of
wrapping/unwrapping. This would all be equivalent to the entire ErrDoc
building code that GHC has right now. Most of that code would in fact simply
move without undergoing any serious change.
Error consumers (the GHC program, GHC API users) could be presented with
ParserError
, RenamerError
, TypecheckerError
or GHCError
values,
depending on the parts of the API that are used. They would be free to just call
errorMessage
on them to generate error message documents, or do something
more interesting with some or all of the error values, using good old pattern
matching to provide a specific interpretation for the errors of interest.
For error producers, the main change is that the different subsystems will define error types and helper functions to build error values, along with the (existing) error document building code.
It is important to note that errorMessage
ties this proposal back with
the existing system. Right now, GHC immediately emits error messages
(i.e a textual representation of the errors) and has a lot of code for
rendering all the relevant information (e.g expressions or types)
with some helpful messages. This proposal merely suggests that we keep
this code but call it much later, when GHC's job with the module is done
and the compilation has failed (for errors) or succeeded with warnings,
that we need to report too. GHC would simply keep around all the relevant
information that the textual rendering of those errors requires,
as values of suitably defined algebraic data types, with all the
expressions, types, contexts, suggestions and more stored in fields of
those ADTs.
If necessary, we could define a separate hierarchy for warnings and
update the definitions of ErrorMessages
and WarningMessages
given earlier as follows::
-- defined as a subsystem-driven hierarchy (like for errors) if required
data GHCWarning
= UnnecessaryImport ModuleName
| ...
warningMessage :: GHCWarning -> ErrDoc
type ErrorMessages = Bag GHCError
type WarningMessages = Bag GHCWarning
type Messages = (WarningMessages, ErrorMessages) -- as before
(The alternative being to just store GHCError
values in both bags and
augment GHCError
with a constructor dedicated to warnings.)
These types would only be used in the driver... except that the TcRn
monad and other pieces of code seem to be relying on Messages
. This
suggests that we might have to either parametrize Messages
on the
concrete warning and error types, or replace those uses of Messages
with
an accumulation of values of the corresponding subsystem's error and warning
types, essentially reimplementing the Messages
machinery. The option of
just using GHCError
in all those places is dismissed because of the import
cycles that this would come with.
Finally, we would have to update some error reporting infrastructure
to work with GHCError
or IsError e => e
values instead of ErrDoc
.
That is the point at which the actual rendering of error messages would happen,
under this proposal, right before calling the code that logs the said errors.
A consequence of implementing this proposal would be that the Messages
type
that GHC API users consume would now carry error and warning values that
they can render but also inspect, without parsing. A lot of the work would be
about actually moving all the error rendering code away from where we create
errors, and defining suitable types that carry the data around until
it is time to report the errors to the user.
By turning errors into proper values, tooling authors would be able to get rid of their error parsing code and finally be able to concisely inspect, render or "customize" error messages. This is the main attraction of this proposal. However, we list below a few compelling applications that are at best very cumbersome to write with the current error documents and which would be made a lot easier if the current proposal is accepted.
- An IDE tool might want to gather the bindings and their types, as listed by GHC in some type errors, to allow editors to display those in a tooltip or minibuffer, helping the developer figure out the right combination of those bindings by visually placing this information next to the code.
- An IDE tool might use the suggestions that GHC would embed in error values to present automated refactoring options to the user (e.g enabling language extensions, importing a module, fixing a typo in the spelling of a name).
- Display squiggles of different colors depending on the nature of the error (e.g "not in scope" errors in some color, "couldn't match" errors in another, and so on, or perhaps discriminating in larger groups). This could help Haskell developers focus on one of those groups (e.g typos in names, missing imports) before proceeding with the more subtle type errors.
- An IDE tool might ask GHC to defer expensive analyses typically done during error message construction (e.g. computing valid hole fits) and instead query GHC for the analysis result asynchronously (or even only when requested by the user), shrinking the edit/typechecking iteration time.
- If GHC ever wants to assign error codes to all the possible errors that it can produce (e.g to give an in-depth explanation of all errors and possible solutions in some error reference document), we could very easily derive or manually implement sensible schemes quite trivially from the error data types.
One drawback is that the wrapping in ParserError
, GHCError
and
friends can be a bit verbose, becoming more verbose still as we introduce
additional levels to the hierarchy. This can be mitigated by going for a
rather flat hierarchy like the one presented above, with GHCError
at the
top, the subsystem-specific error types below and nothing else: the different
constructors of those subsystem-specific error types would contain all the
relevant information for a specific error that GHC can emit, the indirection
would stop there. This option seems to be a good compromise and that is why
it is the one we used in the previous section.
The major cost of implementing this proposal is the sheer amount of
refactoring that will be necessary to emit error values and move the
rendering to much later, essentially delegating this work to each subystem
and combining everything in the implementation of errorMessage
.
Some complications are anticipated around the TcRn
monad (and perhaps
others), because this is a case of two subsystems using common code. A
straightforward update to the Messages
related types, used by TcRn
,
would leave us with those two subsystems depending on GHCError
, introducing
import cycles. A better idea would probably involve parametrizing TcRn
(and
the other types involved here) by the error/warning type(s) stored/accumulated,
with functions to change the error type. This would surely involve quite a bit
of refactoring but would give us precise types as far as errors are concerned,
in a similar spirit to how mtl
users write
MonadExcept SomeErrorType m => ...
, to say that a piece of code possibly
throws errors of type SomeErrorType
.
We considered open variants of this design, where we do not build sum types
all the way up to GHCError
, but where GHCError
is an
existential wrapper around an open union of error types that provide
suitable instances::
class HasErrMsg e where
errorMessage :: e -> ErrDoc
data GHCError where
GHCError :: (Typeable e, HasErrMsg e) => e -> GHCError
This would allow us to work around the whole import cycles problem,
at the price of being a lot more cumbersome to use: error consumers would
have to use Typeable
to implement specific behaviours for some types of
errors. This price is likely a higher one to pay in the long run than the
refactorings necessary to implement this proposal, or than the
import cycles that would come with using this proposal's GHCError
everywhere.
we would work around when implementing the current proposal,
We have not fully fleshed out the entire list of error types that would have to be defined, since we believe this is something that will be best done by scanning GHC's code, looking for functions that emit error messages and trying to adapt them to emit a suitable error value. This however did not seem very relevant to describing the idea behind this proposal, as it is mostly about determining what constructors we should have in the "leaf error types" and which pieces of data have to be stored in those constructors, while the proposal puts forward an idea that does not strictly depend on the concrete errors that are constructed, stored and reported.
Well-Typed LLP will implement this proposal with financial support from Richard Eisenberg, under NSF grant number 1704041.