Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package bsc's pretty-printer and parser into standalone libraries #546

Open
RyanGlScott opened this issue Feb 15, 2023 · 0 comments
Open

Comments

@RyanGlScott
Copy link
Contributor

This is a proposal to change the way that bsc packages up its Haskell code slightly to allow easier reuse by external tools that need to consume or produce Bluespec code. I've already discussed parts of this proposal with various folks at Bluespec Inc., so this issue serves to lay out everything in one place.

Motivation

Bluespec is a convenient target language for high-level languages that want to compile down to hardware. In particular, I would like to be able to generate Bluespec code from a Haskell library, and some day I would also like to be able to ingest the generated code for analysis purposes. To put it another way, I need Haskell code for pretty-printing and parsing Bluespec.

To my knowledge, the only fully fledged Haskell code for doing this lives in bsc itself. It is certainly possible to extract the relevant subset of bsc's code needed for my purposes and turn it into a standalone library, and that is what I have done in the language-bluespec library. However, this is not a perfect solution, as I have now effectively created a fork of bsc. If there are any subsequent changes to the Bluespec language, this means that I would have to keep this fork in sync with new changes to bsc, which doesn't sound very scalable in the long term.

Proposal

I propose to:

  • Create a .cabal file that includes all of the AST-related data types for representing BH and BSV code, along with their Pretty instances. One possible name for this library could be bsc-syntax.
  • Create another .cabal file (which depends on the previous one) for parsing BH and BSV code. One possible name for this library could be bsc-parse.
  • If there is any code that these libraries rely on that does not obviously belong in either library, put that in a separate .cabal file of its own. Possible names for this library could be bsc-util or bsc-core.

These .cabal files likely would not be used by bsc's own build system, as it can continue to build them with Make as before. The benefit of putting things into .cabal files would come after uploading these .cabal-ized libraries to Hackage, where external users could then depend on them.

Future work

Things that are not in the immediate scope of this proposal (but perhaps would be worth considering later):

Redesigning the module hierarchy

Currently, bsc uses a mostly flat module naming hierarchy (see here). This is not the convention that most libraries on Hackage use nowadays, however. When I factored out code into language-bluespec, I renamed modules to things that include a Language.Bluespec.* prefix.

That being said, this is not a strict requirement. We can continue to use the module names as they currently exist in bsc for now. They might look a little odd when juxtaposed with other module names, but they will still work.

Making a "surface syntax" AST

Currently, the various AST data types have constructors that are only reachable in certain compiler passes. For instance, see this comment on IdK:

bsc/src/comp/CSyntax.hs

Lines 170 to 174 in 034050d

data IdK
= IdK Id
| IdKind Id Kind
-- this should not exist after typecheck
| IdPKind Id PartialKind

IdPKind doesn't exist in the surface syntax of BH, and as such, it serves no utility for my personal needs. @quark17 suggested that it might be worthwhile to have distinct AST types that represent the raw, parsed syntax, which would then be turned into the current AST data types during compilation. That way, the "surface syntax" AST would only include things that are directly relevant to parsing and pretty-printing, which could potentially cut down on the surface area of the library.

That being said, doing this work would be somewhat involved and likely above my pay grade. I'm going to exclude it from this proposal, but it might be worth doing at some point.

Prior art

@quark17 notes here that there is a long-term goal of making the code in bsc more modular. I view this proposal as an important step towards that goal, as the pretty-printing and parsing portions of bsc are relatively self-contained pieces of functionality that could be re-used in other projects.

#166 puts every Haskell module in bsc into a single .cabal file in pursuit of being able to more easily integrate with package management tools. This proposal is largely working towards the same end goal, but instead of putting everything into one .cabal file, this proposal splits up the functionality into multiple smaller libraries. I don't believe this proposal is incompatible with #166, however—it would just mean that a "top-level" .cabal file that includes everything in bsc would need to depend on the smaller libraries.

Unresolved questions

Maintenance

Making .cabal files for everything is only one step of the process. The next step would be to make Hackage releases of the libraries to Hackage, most likely in conjunction with bsc releases. I am not clear on who would own this part of the proposal, but perhaps it makes the most sense for the bsc release manager to do this part.

Dealing with global state

Some parts of the bsc codebase depend on global state, which presents a challenge for refactoring. For example, consider this part of the code that relates to how identifiers are pretty-printed:

bsc/src/comp/IdPrint.hs

Lines 70 to 82 in bd141b5

-- hack: suppress the package name for operators
getBSVIdString :: Id -> String
getBSVIdString a = (getBSVIdStringz a)
getBSVIdStringz :: Id -> String
getBSVIdStringz a
| getIdBase a == fsEmpty = internalError "CVPrint.getIdStr: empty identifier"
| getIdQual a == fsEmpty = getIdBaseStringz a
| not (isIdChar (head (getIdBaseStringz a))) = getIdBaseStringz a -- operators
| (not show_qual) && (getIdQual a == fsPrelude) =
getIdBaseStringz a -- suppress "Prelude::" unless flag is on
| (not show_qual) && (getIdQual a == fsPreludeBSV) =
getIdBaseStringz a -- suppress "Prelude::" unless flag is on
| otherwise = getIdQualString a ++ "::" ++ getIdBaseStringz a

The key part is show_qual, which is defined as:

bsc/src/comp/IdPrint.hs

Lines 18 to 19 in bd141b5

show_qual :: Bool
show_qual = "-show-qualifiers" `elem` progArgs

And where progArgs is defined as:

{-# NOINLINE progArgs #-}
progArgs :: [String]
progArgs = unsafePerformIO $ do
args <- getArgs
bscopts <- getEnvDef "BSC_OPTIONS" ""
return $ (words bscopts) ++ args
getEnvDef :: String -> String -> IO String
getEnvDef e d = CE.catch (getEnv e) handler
where
handler :: CE.SomeException -> IO String
handler _ = return d

Eek! progArgs is reading the value of an environment variable and then using unsafePerformIO to return it in a pure context. This seems like something that we would not want to include in a general-purpose library—or, at the very least, we would want to make this option more explicit. I am unclear on what the best way to handle this should be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant