-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking PR for v0.10.0 release #1340
Open
bobbinth
wants to merge
104
commits into
main
Choose a base branch
from
next
base: main
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* chore: fix no-std errors * fix: Falcon DSA decorators and tests
This commit introduces two things of interest: 1. The `miette` crate, with dependency configuration to support no-std builds. This crate provides the foundation of the diagnostics infra that will be used in later commits. It is primarily based around a Diagnostics trait, with a derive-macro similar to thiserror for decorating Error types with diagnostics data. It also provides some primitives for source spans and source file information, which ties into those diagnostics to print source snippets in errors with spans. 2. The `diagnostics` module, which in addition to re-exporting the parts of `miette` that we want to use, also defines some utility types that make expressing common error structures more convenient. It also defines the `SourceFile` alias which we will use throughout the crate when referencing the source file some construct was derived from.
This commit adds `thiserror` to the workspace for use in defining error types, however it is a bit odd due to some background context you will want to know: Currently, the `std::error::Error` trait is only stable in libstd, and its use in libcore is unstable, behind the `error_in_core` trait. This makes defining `Error` types very awkward in general. As a result of this, `thiserror` currently depends on libstd. However, the crate author, dtolnay, has expressed that once `error_in_core` stabilizes, `thiserror` will support no-std environments. It is expected that this will stabilize soon-ish, but there are no definite dates on feature stabilization. Even though `thiserror` ostensibly requires libstd, it actually is trivial to support no-std with it, by simply choosing _not_ to emit the `std::error::Error` trait implementation when building for no-std, or by enabling the `error_in_core` feature when building with nightly. The crate author, dtolnay, has expressed that they would rather not maintain that as a build option since `error_in_core` is so close to stabilization. So to bridge the gap, I've forked `thiserror` and implemented the necessary modifications, as well as `miette`, used for diagnostics, which depends on `thiserror` internally. In the future, when `thiserror` natively supports no-std builds, we can remove these forked dependencies in favor of mainline. In the meantime, we can benefit from the use of `thiserror`'s ergonomic improvements when it comes to defining error types, and it allows us to use `miette` as well.
NOTE: This commit has `use` statements, and has types/functions that do not exist in the source tree yet. This is because this commit is being introduced retroactively. This commit leaves the `parser` module disconnected from the actual module hierarchy in this commit. This commit implements a new LALR(1) grammar and parser for the Miden Assembly language, which will replace the existing MASM parsing code. It consists of the following components: * The grammar, expressed using `lalrpop` (which supports no-std builds if you were wondering). This grammar is LALR(1) in formal grammar parlance, and can be found in `assembly/src/parser/grammar.lalrpop`. Many common validations and optimizations are performed during parsing, as we can restrict the space of what is possible to express in the grammar itself, rather than having to implement it manually via recursive descent. * The lexer, found in `assembly/src/parser/token.rs`, which makes use of `logos` (also no-std compatible), a well-established and fast lexer-generator. It is defined in terms of a `Token` type, on which the lexer trait is derived using a set of rules attached to each variant of the `Token` enum. There are various ways you can approach defining lexers, but in our case I opted for a stricter definition, in which the full MASM instruction set is tokenized as keywords, rather than parsing the instruction names later in the pipeline. This means that typos are caught immediately during parsing, with precise locations and diagnostics which tell the user what tokens are expected at the erroneous location. * The parser interface, found in `assembly/src/parser/mod.rs`, which is basically a thin wrapper around instantiating the lexer, named source file, and invocation of the generated LALRPOP parser. * A set of types and a trait for expressing source-spanned types. The `SourceSpan` type expresses the range of bytes covered by a token in the source file from which it was parsed, and is composed of two u32 indices. The lexer emits these indices for each token. The grammar then can make use of those indices to construct a `SourceSpan` for each production as it sees fit. The `Spanned` trait is implemented for types which have an associated `SourceSpan`; typically these types would be the types making up the AST, but it is also useful as the AST is lowered to propagate source locations through the pipeline. Lastly, the `Span` type allows wrapping third-party types such that they implement `Spanned`, e.g. `Span<u32>` is a spanned-`u32` value, which would otherwise be impossible to associate a `SourceSpan` to. * Two error types, `LexerError` and `ParsingError`, the latter of which can be converted into from the former. These make use of the new diagnostics and `thiserror` infrastructure and make for a good illustration of how ergonomic such types can be with those additions.
This commit introduces a simple implementation of the Prettier-style source code formatting infrastructure using the algorithm of Philip Wadler, and extended with some extra features recently described in a blog post by GitHub user justinpombrio. This commit does not make use of the infrastructure yet, that will come in later PRs which introduce changes to the AST.
This commit adds the implementations of the AST types which are new: * `Form`, represents items which are valid at the top-level of a MASM module. The parser produces a vector of `Form` when parsing a module, which is then later translated into a `Module` (coming in a later commit) during semantic analysis. After semantic analysis, this type is never used again. * `Block`, represents a block in Miden Assembly, i.e. a flat sequence of operations. These are akin to "regions" in compiler parlance, a subtle extension of basic blocks that allows instructions to have nested regions/blocks, whereas strict basic blocks in a typical SSA compiler do not permit nesting in this way. Since we have structured control flow operations, our blocks have region-like semantics. * `Op`, represents the full MASM instruction set, unlike `Instruction` which represents the subset without control flow ops. * `Constant` and `ConstantExpr`, which represent the subset of the syntax for constant expressions and definitions. Unlike the previous parser and AST, we do not evaluate constants during parsing - except to do infallible constant folding where possible - but instead do it later during semantic analysis when we have the full set of constant definitions on hand. This lets constant definitions appear anywhere in the source file, and in any order as long as there are no cyclic dependencies. * `Immediate<T>`, which represents instruction immediates generically, and in a form that supports the superposition of literal values and constant identifiers anywhere that immediates are allowed. These are then resolved to concrete values during semantic analysis. Immediates are thus represented as `Immediate<T>` in the AST universally, except in a small number of cases where we may only want to allow literals. * `Ident`, which represents a cheaply-clonable identifier in the source code (not quite interned, but close in many cases). When parsing a string into an `Ident`, it imposes the general set of validation rules which apply to bare (unquoted) identifiers, such as those used for module names, or import aliases. An `Ident` can be constructed without enforcing those rules, such as the case for `ProcedureName`, which uses `Ident` internally, but enforces a looser set of rules so as to support quoted identifiers in the source code. `Ident` is used anywhere where an identifier is represented in the syntax tree. NOTE: These modules are disconnected from the module hierarchy in this commit, and may reference types that are not listed, or types which have familiar names but which will have new implementations in later commits. Please keep that in mind during review.
This commit introduces the `Visit` and `VisitMut` traits and associated helpers, which can be used to succinctly express analysis and rewrite passes on the Miden Assembly syntax tree. No such passes are implemented in this commit, but will be defined in subsequent commits.
This commit is the first in a sequence of commits that represent the refactoring of the `assembly` crate to use the new parser, etc., to introduce the remaining AST changes, and then propagate those changes in addition to refactoring parts of the compilation pipeline that can take advantage of new features and analysis that were previously not available. This commit specifically refactors/rewrites the set of types which represent various details about procedures and procedure "aliases", i.e. re-exported procedures. There are some new types implemented as well, to better represent the specificity of a particular procedure identifier, and to build on other types in a more structured fashion. NOTE: This commit references things which are not yet implemented in the source tree, this is intentional so as to let you focus on this set of related changes abstractly, and then be able to review other later changes with this context in mind.
This commit builds on changes to the procedure types to represent a richer set of targets, with varying degrees of specificity: * The `MastRoot` variant remains unchanged, but gains a source span for use later during compilation * The `ProcedureName` variant represents a local name * The `ProcedurePath` variant represents an unresolved projection of an imported module/function. It depends on the current module context to resolve. * The `AbsoluteProcedurePath` variant represents a resolved projection of an imported module/function. This type is used when we have resolved a `ProcedurePath` to an imported module, and thus know the absolute path of the imported function. However, the distinction between an absolute path which is "fully-resolved", i.e. not an alias, and an absolute path which is "partially-resolved", i.e. possibly-aliased, is not represented here. Instead, that distinction is implicit dependiing on the phase of compilation we are in. This will become clearer in later commits. Later, this type will be used to represent the targets of any instruction which references a callable (name or mast root)
This commit refactors the `LibraryPath` and `LibraryNamespace` types to build on the `Ident` type, support `Spanned`, and to provide better building blocks for other types, such as `FullyQualifiedProcedureName`. In addition, the structure of the `library` namespace is cleaned up, and the `MaslLibrary` type has its internals rewritten to make use of the new parsing infrastructure. The `Library` trait was refactored to remove the associated `ModuleIter` type, as many forms of iterators have "unnameable" types which cannot be easily expressed when defining a trait implementation. Instead, we make use of RPIT (return-position impl trait) to acheive the same goal, while retaining the ability to impose some useful constraints on the type.
This commit refactors the `Instruction` syntax tree type in the following ways: * Remove the various `Call*`, `Exec*`, etc. instruction variants in favor of `Call`, `Exec`, `SysCall`, and `ProcRef`, all of which now take an `InvocationTarget`. * Replace all immediate values with `Immediate<T>` * Introduce wrapper types where necessary to support the `Serializable` and `Deserializable` traits, and to shield us from breaking upstream changes to those types (to some degree). See `SignatureKind` and `DebugOptions` specifically. * Give the `AdviceInjectorNode`, `DebugOptions`, and other enumerated types explicit physical representation and discriminant values, so that we can safely serialize/deserialize the discriminant tags. * Allow expressing a slightly wider range of variation in the `DebugOptions` syntax, which is then converted to the more explicit representation during semantic analysis. * Make use of the new `PrettyPrint` trait for formatting
This is the last commit which will contain changes to the abstract syntax tree, and is the one which answers the question: "what does a parsed module look like?". This commit introduces a new `Module` type, which supercedes the previous `ProgramAst`, `ModuleAst` and `Module` types, providing all of the information you might want to know about a given module, as well as supporting the core functionality necessary to parse, serialize, and pretty print modules. A `Module` has a `ModuleKind`, which identifies what type of module it represents: an executable (e.g. what was previously a `ProgramAst`), a library (e.g. `what was previously a `ModuleAst`), or a kernel (which previously had no specific representation). The `ModuleKind` dictates the semantics of the module, but in practice all the "kinds" of modules are virtually identical except for these slight semantic differences, which is why they have been unified here. These semantics are validated by the semantic analysis pass, and catch the set of things you are not allowed to do in certain types of modules, e.g. exporting from an executable module, syscall/call in a kernel module, `begin` in a library module. Lastly, this commit removes the old `ModuleImports` type, which is superceded by these changes (as well as subsequent ones in other parts of the assembler). In its place, is a new `Import` type which represents all of the details about a specific module import. Each `Module` has a set of `Import` associated with it, which it uses to resolve names, as well as determine syntax-level inter-module dependencies.
This commit allows enabling/disabling the storage of debug info and source code in serialized modules and various AST structures.
…lated to cycle check
…e column builder (#1258)
…onacci Signed-off-by: GopherJ <alex_cj96@foxmail.com>
Signed-off-by: GopherJ <alex_cj96@foxmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is tracking PR for v0.10.0 release.