diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index dce864eb7..f1c15eecb 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -2,10 +2,11 @@ ## Our Pledge -In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making -participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, -disability, ethnicity, gender identity and expression, level of experience, nationality, personal appearance, race, -religion, or sexual identity and orientation. +In the interest of fostering an open and welcoming environment, we as contributors and +maintainers pledge to making participation in our project and our community a +harassment-free experience for everyone, regardless of age, body size, disability, +ethnicity, gender identity and expression, level of experience, nationality, personal +appearance, race, religion, or sexual identity and orientation. ## Our Standards @@ -22,39 +23,47 @@ Examples of unacceptable behavior by participants include: * The use of sexualized language or imagery and unwelcome sexual attention or advances * Trolling, insulting/derogatory comments, and personal or political attacks * Public or private harassment -* Publishing others' private information, such as a physical or electronic address, without explicit permission +* Publishing others' private information, such as a physical or electronic address, without + explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Our Responsibilities -Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate -and fair corrective action in response to any instances of unacceptable behavior. +Project maintainers are responsible for clarifying the standards of acceptable behavior and +are expected to take appropriate and fair corrective action in response to any instances of +unacceptable behavior. -Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, -and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor -for other behaviors that they deem inappropriate, threatening, offensive, or harmful. +Project maintainers have the right and responsibility to remove, edit, or reject comments, +commits, code, wiki edits, issues, and other contributions that are not aligned to this Code +of Conduct, or to ban temporarily or permanently any contributor for other behaviors that +they deem inappropriate, threatening, offensive, or harmful. ## Scope -This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project -or its community. Examples of representing a project or community include using an official project e-mail address, posting -via an official social media account, or acting as an appointed representative at an on-line or off-line event. Representation -of a project may be further defined and clarified by project maintainers. +This Code of Conduct applies both within project spaces and in public spaces when an +individual is representing the project or its community. Examples of representing a project +or community include using an official project e-mail address, posting via an official +social media account, or acting as an appointed representative at an on-line or off-line +event. Representation of a project may be further defined and clarified by project +maintainers. ## Enforcement -Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at -osscodeofconduct@telliam.com. The project team will review and investigate all complaints, and will respond in a way that -it deems appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the -reporter of an incident. Further details of specific enforcement policies may be posted separately. +Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by +contacting the project team at osscodeofconduct@telliam.com. The project team will review +and investigate all complaints, and will respond in a way that it deems appropriate to the +circumstances. The project team is obligated to maintain confidentiality with regard to the +reporter of an incident. Further details of specific enforcement policies may be posted +separately. -Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions -as determined by other members of the project's leadership. +Project maintainers who do not follow or enforce the Code of Conduct in good faith may face +temporary or permanent repercussions as determined by other members of the project's +leadership. ## Attribution -This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at -[http://contributor-covenant.org/version/1/4][version] +This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, +available at [http://contributor-covenant.org/version/1/4][version] [homepage]: http://contributor-covenant.org [version]: http://contributor-covenant.org/version/1/4/ diff --git a/GOVERNANCE.md b/GOVERNANCE.md index c1cfe2795..87fffe992 100644 --- a/GOVERNANCE.md +++ b/GOVERNANCE.md @@ -3,60 +3,67 @@ This document describes the governance model for this project. (This is adapted from the template provided by http://oss-watch.ac.uk/resources/benevolentdictatorgovernancemodel ) ## Overview -This project is led by a benevolent dictator and managed by the community. That is, the community actively contributes -to the day-to-day maintenance of the project, but the general strategic line is drawn by the benevolent dictator. In -case of disagreement, they have the last word. It is the benevolent dictator's job to resolve disputes within the -community and to ensure that the project is able to progress in a coordinated way. In turn, it is the community's job -to guide the decisions of the benevolent dictator through active engagement and contribution. +This project is led by a benevolent dictator and managed by the community. That is, the +community actively contributes to the day-to-day maintenance of the project, but the general +strategic line is drawn by the benevolent dictator. In case of disagreement, they have the +last word. It is the benevolent dictator's job to resolve disputes within the community and +to ensure that the project is able to progress in a coordinated way. In turn, it is the +community's job to guide the decisions of the benevolent dictator through active engagement +and contribution. ## Roles and responsibilities ### Benevolent dictator (project lead) -Typically, the benevolent dictator, or project lead, is self-appointed. However, because the community always has the -ability to fork, this person is fully answerable to the community. The project lead's role is a -[difficult one](http://producingoss.com/html-chunk/social-infrastructure.html#benevolent-dictator-qualifications): they -set the strategic objectives of the project and communicate these clearly to the community. They also have to understand -the community as a whole and strive to satisfy as many conflicting needs as possible, while ensuring that the project -survives in the long term. - -In many ways, the role of the benevolent dictator is less about dictatorship and more about diplomacy. The key is to -ensure that, as the project expands, the right people are given influence over it and the community rallies behind the -vision of the project lead. The lead's job is then to ensure that the committers (see below) make the right decisions -on behalf of the project. Generally speaking, as long as the committers are aligned with the project's strategy, the -project lead will allow them to proceed as they desire. +Typically, the benevolent dictator, or project lead, is self-appointed. However, because the +community always has the ability to fork, this person is fully answerable to the community. +The project lead's role is a [difficult one](http://producingoss.com/html-chunk/social-infrastructure.html#benevolent-dictator-qualifications): +they set the strategic objectives of the project and communicate these clearly to the +community. They also have to understand the community as a whole and strive to satisfy as +many conflicting needs as possible, while ensuring that the project survives in the long +term. + +In many ways, the role of the benevolent dictator is less about dictatorship and more about +diplomacy. The key is to ensure that, as the project expands, the right people are given +influence over it and the community rallies behind the vision of the project lead. The +lead's job is then to ensure that the committers (see below) make the right decisions on +behalf of the project. Generally speaking, as long as the committers are aligned with the +project's strategy, the project lead will allow them to proceed as they desire. ### Committers -Committers are contributors who have made several valuable contributions to the project and are now relied upon to -both write code directly to the repository and screen the contributions of others. In many cases they are programmers -but it is also possible that they contribute in a different role. Typically, a committer will focus on a specific aspect -of the project, and will bring a level of expertise and understanding that earns them the respect of the community and -the project lead. The role of committer is not an official one, it is simply a position that influential members of the -community will find themselves in as the project lead looks to them for guidance and support. - -Committers have no authority over the overall direction of the project. However, they do have the ear of the project -lead. It is a committer's job to ensure that the lead is aware of the community's needs and collective objectives, and -to help develop or elicit appropriate contributions to the project. Often, committers are given informal control over -their specific areas of responsibility, and are assigned rights to directly modify certain areas of the source code. -That is, although committers do not have explicit decision-making authority, they will often find that their actions -are synonymous with the decisions made by the lead. +Committers are contributors who have made several valuable contributions to the project and +are now relied upon to both write code directly to the repository and screen the +contributions of others. In many cases they are programmers but it is also possible that +they contribute in a different role. Typically, a committer will focus on a specific aspect +of the project, and will bring a level of expertise and understanding that earns them the +respect of the community and the project lead. The role of committer is not an official one, +it is simply a position that influential members of the community will find themselves in as +the project lead looks to them for guidance and support. + +Committers have no authority over the overall direction of the project. However, they do +have the ear of the project lead. It is a committer's job to ensure that the lead is aware +of the community's needs and collective objectives, and to help develop or elicit +appropriate contributions to the project. Often, committers are given informal control over +their specific areas of responsibility, and are assigned rights to directly modify certain +areas of the source code. That is, although committers do not have explicit decision-making +authority, they will often find that their actions are synonymous with the decisions made by +the lead. ### Contributors -Contributors are community members who either have no desire to become committers, or have not yet been given the -opportunity by the benevolent dictator. They make valuable contributions, such as those outlined in the list below, -but generally do not have the authority to make direct changes to the project code. Contributors engage with the -project through communication tools, such as email lists, and via reports and -[patches](http://oss-watch.ac.uk/resources/softwarepatch) attached to issues in the issue tracker, as detailed in -our [community tools document](http://oss-watch.ac.uk/resources/communitytools). - -Anyone can become a contributor. There is no expectation of commitment to the project, no specific skill requirements -and no selection process. To become a contributor, a community member simply has to perform one or more actions that -are beneficial to the project. +Contributors are community members who either have no desire to become committers, or have +not yet been given the opportunity by the benevolent dictator. They make valuable +contributions, such as those outlined in the list below, but generally do not have the +authority to make direct changes to the project code. Contributors engage with the project +through communication tools, such as email lists, and via reports and [patches](http://oss-watch.ac.uk/resources/softwarepatch) +attached to issues in the issue tracker, as detailed in our [community tools document](http://oss-watch.ac.uk/resources/communitytools). -Some contributors will already be engaging with the project as users, but will also find themselves doing one or more -of the following: +Anyone can become a contributor. There is no expectation of commitment to the project, no +specific skill requirements and no selection process. To become a contributor, a community +member simply has to perform one or more actions that are beneficial to the project. +Some contributors will already be engaging with the project as users, but will also find +themselves doing one or more of the following: * Supporting new users (current users often provide the most effective new user support) * Reporting bugs * Identifying requirements @@ -67,44 +74,52 @@ of the following: * Fixing bugs * Adding features -As contributors gain experience and familiarity with the project, they may find that the project lead starts relying -on them more and more. When this begins to happen, they gradually adopt the role of committer, as described above. +As contributors gain experience and familiarity with the project, they may find that the +project lead starts relying on them more and more. When this begins to happen, they +gradually adopt the role of committer, as described above. ### Users -Users are community members who have a need for the project. They are the most important members of the community: -without them, the project would have no purpose. Anyone can be a user; there are no specific requirements. +Users are community members who have a need for the project. They are the most important +members of the community: +without them, the project would have no purpose. Anyone can be a user; there are no specific +requirements. -Users should be encouraged to participate in the life of the project and the community as much as possible. User -contributions enable the project team to ensure that they are satisfying the needs of those users. Common user -activities include (but are not limited to): +Users should be encouraged to participate in the life of the project and the community as +much as possible. User contributions enable the project team to ensure that they are +satisfying the needs of those users. Common user activities include (but are not limited to): * Evangelizing about the project * Informing developers of project strengths and weaknesses from a new user's perspective * Providing moral support (a 'thank you' goes a long way) * Providing financial support -Users who continue to engage with the project and its community will often find themselves becoming more and more -involved. Such users may then go on to become contributors, as described above. +Users who continue to engage with the project and its community will often find themselves +becoming more and more involved. Such users may then go on to become contributors, as +described above. ### Support -All participants in the community are encouraged to provide support for new users within the project management -infrastructure. This support is provided as a way of growing the community. Those seeking support should recognize that -all support activity within the project is voluntary and is therefore provided as and when time allows. A user requiring -guaranteed response times or results should therefore seek to purchase a support contract from a vendor. (Of course, -that vendor should be an active member of the community.) However, for those willing to engage with the project on its -own terms, and willing to help support other users, the community support channels are ideal. +All participants in the community are encouraged to provide support for new users within the +project management infrastructure. This support is provided as a way of growing the +community. Those seeking support should recognize that all support activity within the +project is voluntary and is therefore provided as and when time allows. A user requiring +guaranteed response times or results should therefore seek to purchase a support contract +from a vendor. (Of course, that vendor should be an active member of the community.) +However, for those willing to engage with the project on its own terms, and willing to help +support other users, the community support channels are ideal. ### Contribution process -Anyone can contribute to the project, regardless of their skills, as there are many ways to contribute. For instance, a -contributor might be active on the project mailing list and issue tracker, or might supply -[patches](http://oss-watch.ac.uk/resources/softwarepatch). The various ways of contributing are described in more detail -in our [roles in open source document](http://oss-watch.ac.uk/resources/rolesinopensource). +Anyone can contribute to the project, regardless of their skills, as there are many ways to +contribute. For instance, a contributor might be active on the project mailing list and +issue tracker, or might supply [patches](http://oss-watch.ac.uk/resources/softwarepatch). +The various ways of contributing are described in more detail in our +[roles in open source document](http://oss-watch.ac.uk/resources/rolesinopensource). ### Decision-making process -The benevolent dictatorship model does not need a formal conflict resolution process, since the project lead's word is -final. If the community chooses to question the wisdom of the actions of a committer, the project lead can review their -decisions by checking the communication archives, and either uphold or reverse them. +The benevolent dictatorship model does not need a formal conflict resolution process, since +the project lead's word is final. If the community chooses to question the wisdom of the +actions of a committer, the project lead can review their decisions by checking the +communication archives, and either uphold or reverse them. diff --git a/OneFlow/ReadMe.md b/OneFlow/ReadMe.md index 642a6e2e2..186ade1bc 100644 --- a/OneFlow/ReadMe.md +++ b/OneFlow/ReadMe.md @@ -1,8 +1,8 @@ # GIT OneFlow support scripts The scripts in this folder are used for release and feature branch management. This repository follows the [OneFlow](https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow#develop-finishing-a-release-branch) -model and work-flow. With one active long term branch 'develop'. The master -branch is present and long term but is not active, it only points to the latest -official release (including preview releases) of the project. This is a convenience -to allow getting the latests released source quickly. Generally the scripts used here -are only for release managers and are not required (or even an option) for most contributors. +model and work-flow. With one active long term branch 'develop'. The master branch is +present and long term but is not active, it only points to the latest official release +(including preview releases) of the project. This is a convenience to allow getting the +latests released source quickly. Generally the scripts used here are only for release +managers and are not required (or even an option) for most contributors. diff --git a/docfx/CommandLine/api/index.md b/docfx/CommandLine/api/index.md index 20356254d..6de856d79 100644 --- a/docfx/CommandLine/api/index.md +++ b/docfx/CommandLine/api/index.md @@ -1,3 +1,3 @@ # About -Ubiquity.NET.CommandLines contains general extensions for .NET. to support command line +`Ubiquity.NET.CommandLine` contains general extensions for .NET. to support command line parsing using `System.CommandLine` diff --git a/docfx/CommandLine/index.md b/docfx/CommandLine/index.md index 20356254d..e35c04e85 100644 --- a/docfx/CommandLine/index.md +++ b/docfx/CommandLine/index.md @@ -1,3 +1,3 @@ # About -Ubiquity.NET.CommandLines contains general extensions for .NET. to support command line +Ubiquity.NET.CommandLine contains general extensions for .NET. to support command line parsing using `System.CommandLine` diff --git a/docfx/ReadMe.md b/docfx/ReadMe.md index d7377923f..d9249112c 100644 --- a/docfx/ReadMe.md +++ b/docfx/ReadMe.md @@ -4,43 +4,41 @@ > to the doc generation and the process for maintainers of this library (Who clearly don't > have great memories or are otherwise easily confused. :nerd_face:) -DocFX is used to generate the documentation for this library. There is confusion on -what the "statictoc" template means and requires. It is ***LITERALLY*** that, the -Table of Contents (TOC) is statically generated such that the entire site is servable -from a file path. This ***DOES NOT*** mean that the default+modern template is -unusable for hosted static site scenarios like 'gh-pages' in GitHub. It only means -that the TOC support will ***require*** a hosted site to provide the contents needed -by the generated TOC client side scripting. That's it. Don't fear the built-in -templates (Despite the lack of decent docs explaining the details [Yeah, this -project previously fell into those gaps and even constructed a complete custom template -to deal with it... Sigh, what a waste of time... :facepalm: ]) +DocFX is used to generate the documentation for this library. There is confusion on what the +"statictoc" template means and requires. It is ***LITERALLY*** that, the Table of Contents +(TOC) is statically generated such that the entire site is servable from a file path. This +***DOES NOT*** mean that the default+modern template is unusable for hosted static site +scenarios like 'gh-pages' in GitHub. It only means that the TOC support will +***require*** a hosted site to provide the contents needed by the generated TOC client side +scripting. That's it. Don't fear the built-in templates (Despite the lack of decent docs +explaining the details [Yeah, this project previously fell into those gaps and even +constructed a complete custom template to deal with it... Sigh, what a waste of time... +:facepalm: ]) ## Changes Over Time -DocFX has obsoleted the `docfxconsole` NuGet package that was used to run docfx for -a project via MSBUILD. Instead it focused on a .NET tool to do it all via the -command line. Ultimately the docfx.json serves as the "project" file for the -different site builds. The PowerShell script `Build-Docs.ps1` was updated to use -the new tool directly. Using that script should have little or no impact on the -overall flow. There is a "no-targets" project in the solution to enable easier access to the -input files but does not itself, generate any docs - it's just a placeholder. +DocFX has obsoleted the `docfxconsole` NuGet package that was used to run docfx for a +project via MSBUILD. Instead it focused on a .NET tool to do it all via the command line. +Ultimately the docfx.json serves as the "project" file for the different site builds. +The PowerShell script `Build-Docs.ps1` was updated to use the new tool directly. Using that +script should have little or no impact on the overall flow. There is a "no-targets" project +in the solution to enable easier access to the input files but does not itself, generate any +docs - it's just a placeholder. ## Files used by the docs generation -There are a lot of files used to generate the docs and the concept of a Table of -Contents (TOC) gets confusing fast when using docfx. So this tries to explain them -all. +There are a lot of files used to generate the docs and the concept of a Table of Contents +(TOC) gets confusing fast when using docfx. So this tries to explain them all. ### .gitignore -This marks the files that are generated as ignored for GIT operations (Don't -include generated sources in the repo - the automated build will generate them). -Some specific files are excluded from this but most of the api-* folders are -ignored. +This marks the files that are generated as ignored for GIT operations (Don't include +generated sources in the repo - the automated build will generate them). Some specific files +are excluded from this but most of the api-* folders are ignored. ### docfx.json -This file serves as the "project" file for docs generation. Sadly, docfx has -deprecated old `docfxconsole` that supported creation of a csproj project file to -generate the docs from. So this file serves as the only equivalent of a project -file. Unfortunately, it is in JSON format and unlike any other project is unusable -directly in an IDE as they don't understand the format of such a thing. +This file serves as the "project" file for docs generation. Sadly, docfx has deprecated old +`docfxconsole` that supported creation of a csproj project file to generate the docs from. +So this file serves as the only equivalent of a project file. Unfortunately, it is in JSON +format and unlike any other project is unusable directly in an IDE as they don't understand +the format of such a thing. ### favicon.ico This provides the standard web browser icon for the site as a whole. @@ -49,37 +47,33 @@ This provides the standard web browser icon for the site as a whole. Markdown for the index (Home) of this site. ### toc.yml -YAML file containing the links for the Table of contents for the SITE as a whole -(across all child pages/folders). This is for the TOP row navigation on the site. -(It has NOTHING to do with the left navigation for any given sub folder, other than -having the same name, confusingly.) +YAML file containing the links for the Table of contents for the SITE as a whole (across all +child pages/folders). This is for the TOP row navigation on the site. (It has NOTHING to do +with the left navigation for any given sub folder, other than having the same name, +confusingly.) >NOTE -> The TOC.YML file format used in these topics is DIFFERENT from what is auto -> generated. +> The TOC.YML file format used in these topics is DIFFERENT from what is auto generated. ### Folders There are a few folders containing input for the site generation. #### api-* -These folders contain the Generated contents for each project as (YAML) metadata -files parsed and generated from the source. +These folders contain the Generated contents for each project as (YAML) metadata files +parsed and generated from the source. ##### api-*/Index.md -This contains the main landing page for a given library it has the top bar -navigation from [toc.yml](#tocyml) AND the left navigation from the generated -[toc.yml](#generated-left-nav-tocyml) for this API library. (Confusingly, a file -with the same name, but a completely different purpose!) +This contains the main landing page for a given library it has the top bar navigation from +[toc.yml](#tocyml) AND the left navigation from the generated +[toc.yml](#generated-left-nav-tocyml) for this API library. (Confusingly, a file with the +same name, but a completely different purpose!) #### Generated left nav TOC.YML -This version of the overloaded toc.yml file name is used to produce the left -navigation for an API surface. This is generated from the source files and normally -contains the set of namespaces, types, enums etc... Each library project generates -it's own version of this file. Since this is generated it is listed in the -[.gitignore](#gitignore) file. +This version of the overloaded toc.yml file name is used to produce the left navigation for +an API surface. This is generated from the source files and normally contains the set of +namespaces, types, enums etc... Each library project generates it's own version of this file. +Since this is generated it is listed in the [.gitignore](#gitignore) file. #### Library Content -These folders (named after the `*` portion of the [api-*](#api-*) folder names -contains manually written additional files, articles, samples etc... related to a -given library. - +These folders (named after the `*` portion of the [api-*](#api-*) folder names contains +manually written additional files, articles, samples etc... related to a given library. diff --git a/docfx/antlr-utils/api/index.md b/docfx/antlr-utils/api/index.md index c43360f9a..ff7b78337 100644 --- a/docfx/antlr-utils/api/index.md +++ b/docfx/antlr-utils/api/index.md @@ -1,3 +1,2 @@ # Ubiquity.NET.ANTLR.Utils -This namespace supports all of extensions to support use of ANTLR as a parsing -engine. +This namespace supports all of extensions to support use of ANTLR as a parsing engine. diff --git a/docfx/extensions/api/index.md b/docfx/extensions/api/index.md index 251494e7b..a453a5d2d 100644 --- a/docfx/extensions/api/index.md +++ b/docfx/extensions/api/index.md @@ -20,14 +20,15 @@ multiple other Ubiquity.NET projects. expressions to validate input parameters. * DictionaryBuilder to enable dictionary initializer style initialization of `ImmutableDictionary` with significantly reduced overhead. - - This leverages an `ImmutableDictionary.Builder` under the hood to build the dictionary. - When the `ToImmutable()` method is called the builder is converted to the immutable state without any - overhead of a copy or re-construction of hash tables etc... + - This leverages an `ImmutableDictionary.Builder` under the hood to build + the dictionary. When the `ToImmutable()` method is called the builder is converted to + the immutable state without any overhead of a copy or re-construction of hash tables + etc... * KvpArrayBuilder to enable array initializer style initialization of `ImmutableArray>` with significantly reduced overhead. - - This leverages an `ImmutableArray.Builder` under the hood to build the array directly. - When the `ToImmutable()` method is called the builder is converted to the immutable state without any - overhead of a copy. - - Since this is an array and not a dictionary there is no overhead for allocating, initializing or copying - any hash mapping for the keys. + - This leverages an `ImmutableArray.Builder` under the hood to build the array + directly. When the `ToImmutable()` method is called the builder is converted to the + immutable state without any overhead of a copy. + - Since this is an array and not a dictionary there is no overhead for allocating, + initializing or copying any hash mapping for the keys. diff --git a/docfx/extensions/index.md b/docfx/extensions/index.md index 455637038..7e6c50857 100644 --- a/docfx/extensions/index.md +++ b/docfx/extensions/index.md @@ -1,7 +1,7 @@ # About Ubiquity.NET.Extensions contains general extensions for .NET. This is -a bit of a [grab bag](https://www.merriam-webster.com/dictionary/grab%20bag) of functionality used by -but not actually part of multiple other Ubiquity.NET projects. +a bit of a [grab bag](https://www.merriam-webster.com/dictionary/grab%20bag) of +functionality used by but not actually part of multiple other Ubiquity.NET projects. ## Key support * Computing a hash code for a ReadOnlySpan of bytes using @@ -10,22 +10,23 @@ but not actually part of multiple other Ubiquity.NET projects. - This is useful for implementing the RAII pattern in .NET. * MustUseReturnValueAttribute that is compatible with the [MustUseRetVal](https://github.com/mykolav/must-use-ret-val-fs) package. -* StringNormalizer extensions to support converting line endings of strings - for interoperability. +* StringNormalizer extensions to support converting line endings of strings for + interoperability. * Fluent style parameter value validation extensions. - - These are useful when passing parameters to a function that produces a - result that is fed to the base constructor. These are also useful in body - expressions to validate input parameters. + - These are useful when passing parameters to a function that produces a result that is + fed to the base constructor. These are also useful in body expressions to validate + input parameters. * DictionaryBuilder to enable dictionary initializer style initialization of `ImmutableDictionary` with significantly reduced overhead. - - This leverages an `ImmutableDictionary.Builder` under the hood to build the dictionary. - When the `ToImmutable()` method is called the builder is converted to the immutable state without any - overhead of a copy or re-construction of hash tables etc... + - This leverages an `ImmutableDictionary.Builder` under the hood to build + the dictionary. When the `ToImmutable()` method is called the builder is converted to + the immutable state without any overhead of a copy or re-construction of hash tables + etc... * KvpArrayBuilder to enable array initializer style initialization of `ImmutableArray>` with significantly reduced overhead. - - This leverages an `ImmutableArray.Builder` under the hood to build the array directly. - When the `ToImmutable()` method is called the builder is converted to the immutable state without any - overhead of a copy. - - Since this is an array and not a dictionary there is no overhead for allocating, initializing or copying - any hash mapping for the keys. + - This leverages an `ImmutableArray.Builder` under the hood to build the array + directly. When the `ToImmutable()` method is called the builder is converted to the + immutable state without any overhead of a copy. + - Since this is an array and not a dictionary there is no overhead for allocating, + initializing or copying any hash mapping for the keys. diff --git a/docfx/interop-helpers/api/index.md b/docfx/interop-helpers/api/index.md index 3babdecbc..dbfea6c2f 100644 --- a/docfx/interop-helpers/api/index.md +++ b/docfx/interop-helpers/api/index.md @@ -1,28 +1,27 @@ # About -Ubiquity.NET.InteropHelpers helper support common to low level interop libraries. -While this library is intended to support the Ubiquity.NET.Llvm interop requirements -there isn't anything bound to that library in the support here. That is it is -independent and a useful library for any code base providing interop support. +`Ubiquity.NET.InteropHelpers` helper support common to low level interop libraries. While +this library is intended to support the Ubiquity.NET.Llvm interop requirements there isn't +anything bound to that library in the support here. That is it is independent and a useful +library for any code base providing interop support. # Key Features * String handling - * A lot of interop deals with strings in some form or another and handling them - is a major amount of effort for most interop libraries. The support provided - here enables lazy evaluation/marshalling and encoding of native strings and - managed strings. These allow a simple `byte[]` to store a native string and - ONLY marshals to a UTF16 managed string once when needed. This allows storing - and passing strings in their native form for FAST retrieval from a native call - and then providing that same string as an `in` parameter in another call. All - without the need to marshal from native and then back again just for the call. - This is a MAJOR performance enhancement for APIs that deal in strings. + * A lot of interop deals with strings in some form or another and handling them is a + major amount of effort for most interop libraries. The support provided here enables + lazy evaluation/marshalling and encoding of native strings and managed strings. These + allow a simple `byte[]` to store a native string and ONLY marshals to a UTF16 managed + string once when needed. This allows storing and passing strings in their native form + for FAST retrieval from a native call and then providing that same string as an `in` + parameter in another call. All without the need to marshal from native and then back + again just for the call. This is a MAJOR performance enhancement for APIs that deal in + strings. * Delegates and NativeCallbacks as Function pointers - * Function pointers are a new feature of C# that makes for very high performance - interop scenarios. However, sometimes the callback for a function pointer - actually needs additional data not part of the parameters of the function to - work properly. This library provides support for such scenarios where a - delegate is used to "capture" the data while still supporting AOT scenarios. - (NOTE: Marshal.GetFunctionPointerForDelegate() must dynamically emit a thunk - that contains the proper signature and the captured "this" pointer so is NOT - AOT friendly) The support offered in this library, though a bit more tedious, - is AOT friendly. + * Function pointers are a new feature of C# that makes for very high performance interop + scenarios. However, sometimes the callback for a function pointer actually needs + additional data not part of the parameters of the function to work properly. This + library provides support for such scenarios where a delegate is used to "capture" the + data while still supporting AOT scenarios. (NOTE: `Marshal.GetFunctionPointerForDelegate()` + must dynamically emit a thunk that contains the proper signature and the captured + "this" pointer so is NOT AOT friendly) The support offered in this library, though a + bit more tedious, is AOT friendly. diff --git a/docfx/interop-helpers/index.md b/docfx/interop-helpers/index.md index cd527b041..dbfea6c2f 100644 --- a/docfx/interop-helpers/index.md +++ b/docfx/interop-helpers/index.md @@ -1,26 +1,27 @@ # About -Ubiquity.NET.InteropHelpers helper support common to low level interop libraries. While this -library is intended to support the Ubiquity.NET.Llvm interop requirements there isn't anything -bound to that library in the support here. That is it is independent and a useful library for -any code base providing interop support. +`Ubiquity.NET.InteropHelpers` helper support common to low level interop libraries. While +this library is intended to support the Ubiquity.NET.Llvm interop requirements there isn't +anything bound to that library in the support here. That is it is independent and a useful +library for any code base providing interop support. # Key Features * String handling - * A lot of interop deals with strings in some form or another and handling them - is a major amount of effort for most interop libraries. The support provided here - enables lazy evaluation/marshalling and encoding of native strings and managed strings. - These allow a simple `byte[]` to store a native string and ONLY marshals to a UTF16 - managed string once when needed. This allows storing and passing strings in their - native form for FAST retrieval from a native call and then providing that same string - as an `in` parameter in another call. All without the need to marshal from native and - then back again just for the call. This is a MAJOR performance enhancement for APIs - that deal in strings. + * A lot of interop deals with strings in some form or another and handling them is a + major amount of effort for most interop libraries. The support provided here enables + lazy evaluation/marshalling and encoding of native strings and managed strings. These + allow a simple `byte[]` to store a native string and ONLY marshals to a UTF16 managed + string once when needed. This allows storing and passing strings in their native form + for FAST retrieval from a native call and then providing that same string as an `in` + parameter in another call. All without the need to marshal from native and then back + again just for the call. This is a MAJOR performance enhancement for APIs that deal in + strings. * Delegates and NativeCallbacks as Function pointers * Function pointers are a new feature of C# that makes for very high performance interop scenarios. However, sometimes the callback for a function pointer actually needs - additional data not part of the parameters of the function to work properly. This library - provides support for such scenarios where a delegate is used to "capture" the data while - still supporting AOT scenarios. (NOTE: Marshal.GetFunctionPointerForDelegate() must - dynamically emit a thunk that contains the proper signature and the captured "this" pointer - so is NOT AOT friendly) The support offered in this library, though a bit more tedious, - is AOT friendly. + additional data not part of the parameters of the function to work properly. This + library provides support for such scenarios where a delegate is used to "capture" the + data while still supporting AOT scenarios. (NOTE: `Marshal.GetFunctionPointerForDelegate()` + must dynamically emit a thunk that contains the proper signature and the captured + "this" pointer so is NOT AOT friendly) The support offered in this library, though a + bit more tedious, is AOT friendly. + diff --git a/docfx/llvm/ReleaseNotes.md b/docfx/llvm/ReleaseNotes.md index e36986ce4..89085864f 100644 --- a/docfx/llvm/ReleaseNotes.md +++ b/docfx/llvm/ReleaseNotes.md @@ -1,45 +1,45 @@ # Release Notes -# V20.1.8.Alpha -Major re-work to support LLVM 20 and .NET 9 and later with focus on performance and heading towards AOT. -Major changes include: +# V20.1.8 +Major re-work to support LLVM 20 and .NET 9 and later with focus on performance and heading +towards AOT. Major changes include: * OrcJIT v2 support - Including functioning lazy materialization from the AST - Even on Microsoft Windows! :wink: * Opaque pointers - - Underlying LLVM uses only opaque pointers however these wrappers account for - that as much as possible without significant impact on calling code. The wrappers, - when used with debug information, support tracking the LLVM type of the `pointee` - for you in most cases. Though if not using any debug information or otherwise dealing - in the raw types applications will need to keep track of the type of a pointer - instead of relying on the LLVM IR to do that for you. + - Underlying LLVM uses only opaque pointers however these wrappers account for that as + much as possible without significant impact on calling code. The wrappers, when used + with debug information, support tracking the LLVM type of the `pointee` for you in + most cases. Though if not using any debug information or otherwise dealing in the raw + types applications will need to keep track of the type of a pointer instead of relying + on the LLVM IR to do that for you. * Dropped reference equality to support multi-threaded nature of OrcJIT. - - Things got complicated and broke around chapter 5 of the Kaleidoscope tutorials. - The basic problem with interning is that it doesn't account for ownership. In fact - it downright ignores the point. This is a serious problem when dealing with a - multi-threaded JIT engine as you might end up disposing something you own that - was transferred to the native API or worse an alias is resolved to an owned - instance which is then destroyed - OOPS! [Bad idea - seemed like a good idea - at the time! :facepalm: ] + - Things got complicated and broke around chapter 5 of the Kaleidoscope tutorials. The + basic problem with interning is that it doesn't account for ownership. In fact it + downright ignores the point. This is a serious problem when dealing with a + multi-threaded JIT engine as you might end up disposing something you own that was + transferred to the native API or worse an alias is resolved to an owned instance which + is then destroyed - OOPS! [Bad idea - seemed like a good idea at the time! + :facepalm: ] * Consumers need to consider IDispose and "ownership" in general - - Usually this is as simple as a `using` statement to properly handle - cleanup in scope if there is an exception. Sometimes it takes a bit more - thought to handle properly. (.NET, C# and the general community have no concept of the - common native code "move semantics" pattern) + - Usually this is as simple as a `using` statement to properly handle cleanup in scope + if there is an exception. Sometimes it takes a bit more thought to handle properly. + (.NET, C# and the general community have no concept of the common native code + "move semantics" pattern) * A significant impact of ownership is in the LLVM transition to DebugRecords - - While the core API has not changed, the ownership of the classes has and - in particular that a DIBuilder is now a "ref like" type (`ref struct` in C#). + - While the core API has not changed, the ownership of the classes has and in particular + that a `DIBuilder` is now a stand alone type and NOT accesible as a member of a + `Module`. - Such a type can reference the module and compile unit but is NOT owned by the module and therefore not exposed as a property for it. ## Breaking changes -This is a major release and there are a LOT of changes though they are all fairly -small. Ownership is the big change as things have changed in the underlying LLVM and -this library has shifted to NOT support cached instances as it confuses the question -of ownership (more specifically the release of resources). A look at the samples and -test code will show that the core of the library didn't change but some things about -how you use it did. Additional support packages were created to move common re-usable -code out of the samples and now exist as part of the same release. [See: the [index -for these docs](../index.md) for more details] +This is a major release and there are a LOT of changes though they are all fairly small. +Ownership is the big change as things have changed in the underlying LLVM and this library +has shifted to NOT support cached instances as it confuses the question of ownership (more +specifically the release of resources). A look at the samples and test code will show that +the core of the library didn't change but some things about how you use it did. Additional +support packages were created to move common re-usable code out of the samples and now exist +as part of the same release. [See: the [index for these docs](../index.md) for more details] ### New projects If you are just starting out and not upgrading to this release then @@ -52,13 +52,14 @@ If you are just starting out and not upgrading to this release then With the 10.* release the Ubiquity.NET.Llvm.* libs have made a number of breaking changes. While these are mostly small and easily adapted to, they are still a breaking change. Thus, these changes were held to only occur on a Major release. Despite the pain of updating code -we think the changes are worth the effort to create a cleaner simpler and more consistent library. +we think the changes are worth the effort to create a cleaner simpler and more consistent +library. ### Library name changes -With the 10.* release the names of the assemblies, and therefore the packages, are all changed. -This was done to unify them all under a common organization name to allow use of the facilities -provided by NuGet for organizations and to help clarify these libraries from some other similar -projects no longer maintained. +With the 10.* release the names of the assemblies, and therefore the packages, are all +changed. This was done to unify them all under a common organization name to allow use of +the facilities provided by NuGet for organizations and to help clarify these libraries from +some other similar projects no longer maintained. | Old Name | New Name | |---------------------------|--------------| @@ -67,16 +68,17 @@ projects no longer maintained. | Ubiquity.NET.Llvm | Ubiquity.Net.Llvm | ### Library initialization -The library initialization and target registration was changed to better reflect the requirements -and proper sequence of usage. In particular the `Ubiquity.Net.Llvm.Interop.Library.InitializeLLVM()` -static function now returns an `ILibLLVM` interface (that is still disposable). All target registration -is done through this interface. This prevents accidental use of the registration methods **BEFORE** +The library initialization and target registration was changed to better reflect the +requirements and proper sequence of usage. In particular the +`Ubiquity.Net.Llvm.Interop.Library.InitializeLLVM()` static function now returns an +`ILibLLVM` interface (that is still disposable). All target registration is done through +this interface. This prevents accidental use of the registration methods **BEFORE** initializing the library (as that's a guaranteed app crash!) -### C#8 and non-Nullable references -With the 10.* release the Ubiquity.NET.Llvm.* libs all updated to target .NET Standard 2.1 and C#8. This allows -use of nullable types to make nullability more explicit. This necessitated a few minor breaking changes in the -object model surface. +### C# 8 and non-Nullable references +With the 10.* release the Ubiquity.NET.Llvm.* libs all updated to target .NET Standard 2.1 +and C#8. This allows use of nullable types to make nullability more explicit. This +necessitated a few minor breaking changes in the object model surface. | Name | Description | |-----------------|--------------| @@ -84,9 +86,11 @@ object model surface. ### Renamed instruction predicate enumerator values -The comparison instruction predicates `Ubiquity.NET.Llvm.Instructions.[Predicate|IntPredicate]`were renamed for greater -consistency and clarity (Some of the float predicates had 'Than' in the name while the integer counterparts did not. (See: -[Bug #152](https://github.com/UbiquityDotNET/Llvm.NET/issues/152) for details.) +The comparison instruction predicates +`Ubiquity.NET.Llvm.Instructions.[Predicate|IntPredicate]`were renamed for greater +consistency and clarity (Some of the float predicates had 'Than' in the name while the +integer counterparts did not. (See: [Bug #152](https://github.com/UbiquityDotNET/Llvm.NET/issues/152) +for details.) | Old Name | New Name | |------------------------|--------------| @@ -112,97 +116,108 @@ Some APIs had inconsistent, misspelled or confusing names and were updated. | Module.AddFunction | Module.CreateFunction() | The Create vs Add between debug info and raw native was always confusing | ### Types no longer disposable -Some types are no longer disposable. The LLVMObjectRef type uses SafeHandle, which eliminates the need -for Dispose for top level global objects. The following types no longer support IDispose: +Some types are no longer disposable. The LLVMObjectRef type uses SafeHandle, which +eliminates the need for Dispose for top level global objects. The following types no longer +support IDispose: * Transforms.ModulePassManager * Transforms.FunctionPassManager * ObjectFile.TargetBinary ### Altered Behavior #### Context.CreateStructType() -As part of resolving [bug #184](https://github.com/UbiquityDotNET/Llvm.NET/issues/184) the CreateStructType -methods were re-evaluated and found lacking in functionality (the bug) and clarity. The docs were misleading -and the implementations overly restrictive in some cases. Thus these have been re-worked to make it more clear -when a Sized vs. Opaque structure type is created, in particular, for **ALL** overloads taking a 'packed' parameter -a sized type is created, even if the size is 0 because no members are provided. This allows creation of named or -anonymous empty structs, used in many languages. To create a named opaque type then the overload with just the -name is used. This isn't expected to impact many consumers, other than the tests, but it is a breaking change. +As part of resolving [bug #184](https://github.com/UbiquityDotNET/Llvm.NET/issues/184) the +CreateStructType methods were re-evaluated and found lacking in functionality (the bug) and +clarity. The docs were misleading and the implementations overly restrictive in some cases. +Thus these have been re-worked to make it more clear when a Sized vs. Opaque structure type +is created, in particular, for **ALL** overloads taking a 'packed' parameter a sized type is +created, even if the size is 0 because no members are provided. This allows creation of +named or anonymous empty structs, used in many languages. To create a named opaque type then +the overload with just the name is used. This isn't expected to impact many consumers, other +than the tests, but it is a breaking change. #### Context.CreateConstantString() -The behavior of Context.CreateConstantString(string) has changed slightly. It now constructs a valid C string with -a null terminator, which is generally what would be expected of something called "string". (The (string,bool) -overload remains, to allow apps to be explicit with intent) Additionally, the ConstantDataSequential.IsString -property now reflects whether the string is a C string (terminating null but no embedded nulls) and the -ConstantDataSequential.IsI8Sequence was added to provide the previous behavior of IsString, which was simply that -the underlying sequence element type was i8 (with or without a terminator) +The behavior of Context.CreateConstantString(string) has changed slightly. It now constructs +a valid C string with a null terminator, which is generally what would be expected of +something called "string". (The (string,bool) overload remains, to allow apps to be explicit +with intent) Additionally, the ConstantDataSequential.IsString property now reflects whether +the string is a C string (terminating null but no embedded nulls) and the +ConstantDataSequential.IsI8Sequence was added to provide the previous behavior of IsString, +which was simply that the underlying sequence element type was i8 (with or without a +terminator) ### Removed redundant APIs -LLVM has made additional APIs available in the standard LLVM-C library that are either identical to or functionality -equivalent to APIs that were custom in previous versions of the Ubiquity.NET.Llvm DLLs. This is only observable at -the interop library layer where some of the custom APIs were removed and replaced with the official ones. +LLVM has made additional APIs available in the standard LLVM-C library that are either +identical to or functionality equivalent to APIs that were custom in previous versions of +the `Ubiquity.NET.Llvm` DLLs. This is only observable at the interop library layer where +some of the custom APIs were removed and replaced with the official ones. | Removed custom API | New Official API | |--------------------|------------------| | LibLLVMFoo [TBD] | LLVMFoo [TBD] | ### Disabled ORCJIT LazyFunction binding -Unfortunately, the ORCJIT truly lazy function generation callback support is currently disabled. LLVM itself is -transitioning to the ORCJIT v2 and in the process broke the lazy function binding support (At least for Windows+COFF). -Previously a workaround for the issue of the COFF exports was applied in the Llvm.NET ORCJIT library code for symbol -lookups. However, with ORCJIT v2 the JIT itself is doing lookups and it does so only for external symbols assuming the -symbols it generates internally will be exports, but are not (at least for COFF modules anyway). For more details see -the LLVM bugs [25493](https://bugs.llvm.org/show_bug.cgi?id=25493) and [28699](https://bugs.llvm.org/show_bug.cgi?id=28699) +Unfortunately, the ORCJIT truly lazy function generation callback support is currently +disabled. LLVM itself is transitioning to the ORCJIT v2 and in the process broke the lazy +function binding support (At least for Windows+COFF). Previously a workaround for the issue +of the COFF exports was applied in the Llvm.NET ORCJIT library code for symbol lookups. +However, with ORCJIT v2 the JIT itself is doing lookups and it does so only for external +symbols assuming the symbols it generates internally will be exports, but are not (at least +for COFF modules anyway). For more details see the LLVM bugs: +* [25493](https://bugs.llvm.org/show_bug.cgi?id=25493) +* [28699](https://bugs.llvm.org/show_bug.cgi?id=28699) ## v8.0.1 ### Bug Fixes - | Bug | Description | |-------|--------------| | [151](https://github.com/UbiquityDotNET/Llvm.NET/issues/151) | Updated DebugFunctionType signature to use interface instead of concrete type | | [152](https://github.com/UbiquityDotNET/Llvm.NET/issues/152) | Corrected docs copy/paste error [renaming part of the issue is left for the next major release as that is a breaking change] | ### Additional changes -Additionally the internal build scripts were updated to simplify the consistent cross solution versioning. Previously, -a complex process of building a dummy project to generate a data file was used, however that was no longer necessary -as the [CSemVer.Build.Tasks ](https://github.com/UbiquityDotNET/CSemVer.GitBuild) package can figure out all except -the CiBuildIndex, which, for this project, is an ISO-8601 formatted time-stamp (of the latest commit for automated -builds or the build start for local developer builds) +Additionally the internal build scripts were updated to simplify the consistent cross +solution versioning. Previously, a complex process of building a dummy project to generate a +data file was used, however that was no longer necessary as the +[CSemVer.Build.Tasks ](https://github.com/UbiquityDotNET/CSemVer.GitBuild) package can +figure out all except the CiBuildIndex, which, for this project, is an ISO-8601 formatted +time-stamp (of the latest commit for automated builds or the build start for local developer +builds) ## v8.0.0 ### Ubiquity.NET.Llvm.Interop (New library) -Ubiquity.NET.Llvm 8.0 adds a new library (Ubiquity.NET.Llvm.Interop) that contains the raw P/Invoke -APIs and support needed to inter-operate with the native library. The NuGet package -for the interop library includes the native code binaries as they are tightly coupled. -This package contains the native LibLLVM.dll and the P/Invoke interop support layers. -Ubiquity.NET.Llvm uses this library to define a clean projection of LLVM for .NET consumers. -This will, hopefully, allow for future development and enhancement of the Ubiquity.NET.Llvm -object model without changing the underlying P/Invoke layers. (e.g. -the Ubiquity.NET.Llvm.Interop can "snap" to LLVM versions, but the Ubiquity.NET.Llvm model can have -multiple incremental releases) This isn't a hard/fast rule as it is possible that -getting new functionality in the object model requires new custom extensions. At -this point in time both libraries are built together and share build numbers. -Though, that may change in the future. +Ubiquity.NET.Llvm 8.0 adds a new library (Ubiquity.NET.Llvm.Interop) that contains the raw +P/Invoke APIs and support needed to inter-operate with the native library. The NuGet package +for the interop library includes the native code binaries as they are tightly coupled. This +package contains the native LibLLVM.dll and the P/Invoke interop support layers. +`Ubiquity.NET.Llvm` uses this library to define a clean projection of LLVM for .NET +consumers. This will, hopefully, allow for future development and enhancement of the +`Ubiquity.NET.Llvm` object model without changing the underlying P/Invoke layers. (e.g. the +`Ubiquity.NET.Llvm.Interop` can "snap" to LLVM versions, but the `Ubiquity.NET.Llvm` model +can have multiple incremental releases) This isn't a hard/fast rule as it is possible that +getting new functionality in the object model requires new custom extensions. At this point +in time both libraries are built together and share build numbers. Though, that may change +in the future. #### Auto-generated P/Invoke -LLVM-C API now includes most of the debug APIs so, significantly fewer custom -extensions are needed (That's a good thing!). To try and keep things simpler this -moves the interop back to using code generation for the bulk of the P/Invoke interop. -However, unlike the first use of generation, the [LLVMBindingsGenerator](https://github.com/UbiquityDotNET/Llvm.NET/tree/master/src/Interop/LlvmBindingsGenerator) -is much more targeted and includes specialized handling to prevent the need for -additional "by-hand" tweaking of the generated code, such as: +LLVM-C API now includes most of the debug APIs so, significantly fewer custom extensions are +needed (That's a good thing!). To try and keep things simpler this moves the interop back to +using code generation for the bulk of the P/Invoke interop. However, unlike the first use of +generation, the [LLVMBindingsGenerator](https://github.com/UbiquityDotNET/Llvm.NET/tree/master/src/Interop/LlvmBindingsGenerator) +is much more targeted and includes specialized handling to prevent the need for additional +"by-hand" tweaking of the generated code, such as: 1. Marshaling of strings with the many ways to dispose (or not) a returned string -2. LLVMBool vs LLVMStatus +2. `LLVMBool` vs `LLVMStatus` 3. "smart ref" handle types, including aliases that should not be released by client code. The generated code is combined with some fixed support classes to create a new -Ubiquity.NET.Llvm.Interop Library and NuGet Package. +`Ubiquity.NET.Llvm.Interop` Library and NuGet Package. ### New features * ObjectFile Support - * Ubiquity.NET.Llvm.ObjectFile namespace contains support for processing object files using LLVM + * Ubiquity.NET.Llvm.ObjectFile namespace contains support for processing object files + using LLVM. * Eager compilation JIT * The OrcJIT now supports eager and lazy compilation for Windows platforms * Full initialization for all the latests supported targets @@ -210,10 +225,10 @@ Ubiquity.NET.Llvm.Interop Library and NuGet Package. * Added accessors to allow retrieval/addition of metadata on instructions ### Breaking Changes -This is a Major release and, as such, can, and does, have breaking changes. While there -are several such changes the actual impact to a code base is fairly trivial. Most are -driven by either obsolescence of functionality in LLVM or general naming cleanup in the -Ubiquity.NET.Llvm library: +This is a Major release and, as such, can, and does, have breaking changes. While there are +several such changes the actual impact to a code base is fairly trivial. Most are driven by +either obsolescence of functionality in LLVM or general naming cleanup in the +`Ubiquity.NET.Llvm` library: 1. New namespace and assembly for some classes (Ubiquity.NET.Llvm.Interop) 1. Ubiquity.NET.Llvm.DisposableAction -> Ubiquity.NET.Llvm.Interop.DisposableAction @@ -228,25 +243,25 @@ Ubiquity.NET.Llvm library: in code. 4. Legacy JIT engine support is dropped. ORCJit is the only supported JIT engine 1. Removed Ubiquity.NET.Llvm.JIT.EngineKind - 2. Removed all use of Ubiquity.NET.Llvm.JIT.IJitModuleHandle. Handles are now just an integral value + 2. Removed all use of Ubiquity.NET.Llvm.JIT.IJitModuleHandle. Handles are now just an + integral value. 3. Removed Ubiquity.NET.Llvm.LegacyExecutionEngine -5. Context.CreateBasicBlock() now only creates detached blocks, if append to a function - is desired, there is a method on IrFunction to create and append a block. - 1. CreateBasicBlock signature changed to remove the function and block parameters +5. Context.CreateBasicBlock() now only creates detached blocks, if append to a function is + desired, there is a method on IrFunction to create and append a block. +1. CreateBasicBlock signature changed to remove the function and block parameters 6. PassManager, ModulePassManager, and FunctionPassManager are IDisposable to help apps ensure that a function pass manager, which is bound to a module, is destroyed before the module it is bound to. Failure to do so can result in app crashes from access violations in the native LLVM code. 7. Module - 1. MakeShared and shared refs of modules is removed. (This was created for - OrcJIT use of shared_ptr under the hood, which is no longer used. OrcJit now uses the - same ownership transfer model as the legacy engines. E.g. the ownership for the module + 1. MakeShared and shared refs of modules is removed. (This was created for OrcJIT use of + `shared_ptr` under the hood, which is no longer used. OrcJit now uses the same + ownership transfer model as the legacy engines. E.g. the ownership for the module is transferred to the JIT engine) - 2. Module is now Disposable backed by a safe handle, this allows for detaching and - invalidating the underlying LLVMModuleRef when the module is provided to the JIT - 3. CreateFunction() signature changed, Dropped the default null node parameters - not supported by the LLVM-C implementation. - + 2. `Module` is now `IDisposable` backed by a safe handle, this allows for detaching and + invalidating the underlying LLVMModuleRef when the module is provided to the JIT. + 3. CreateFunction() signature changed, Dropped the default null node parameters not + supported by the LLVM-C implementation. 8. Renamed Function class to IrFunction to avoid potential collision with common language keywords 9. Renamed Select to SelectInstruction to avoid potential collision with language keyword @@ -263,36 +278,41 @@ Ubiquity.NET.Llvm library: 13. DIBuilder 1. CreateFunction() signature changed, Dropped the default null node parameters not supported by the LLVM-C implementation. - 2. DIBuilder.CreateReplaceableCompositeType() and CreateUnionType() signatures changed to - include unique ID - 1. The id is set to default to string.Empty so this should largely go without actually - breaking anything + 2. DIBuilder.CreateReplaceableCompositeType() and CreateUnionType() signatures changed + to include a unique ID + 1. The id is set to default to string.Empty so this should largely go without + actually breaking anything 3. CreateBasicType Added DebugIngoFlags parameter - 4. CreateEnumerationType removed uniqueId string parameter as it isn't supported by LLVM-C - 5. Obsoleted CreateStructType signature taking `DINodeArray` in favor of `IEnumerable` -14. Ubiquity.NET.Llvm.DebugInfo.ExpressionOp names changed to correct PascalCasing and eliminate - underscores in the value names for better consistency and style compliance. -15. Renamed some Ubiquity.NET.Llvm.DebugInfo.SourceLanguage vendor specific values to conform with - underlying LLVM names + 4. CreateEnumerationType removed uniqueId string parameter as it isn't supported by + LLVM-C. + 5. Obsoleted CreateStructType signature taking `DINodeArray` in favor of + `IEnumerable`. +14. Ubiquity.NET.Llvm.DebugInfo.ExpressionOp names changed to correct PascalCasing and + eliminate underscores in the value names for better consistency and style compliance. +15. Renamed some Ubiquity.NET.Llvm.DebugInfo.SourceLanguage vendor specific values to + conform with underlying LLVM names 1. RenderScript -> GoogleRenderScript 2. Delphi -> BorlandDelphi -16. Renamed or removed some of the Ubiquity.NET.Llvm.DebugInfo.Tag values to better reflect underlying - LLVM names and avoid potential language keyword conflicts. +16. Renamed or removed some of the Ubiquity.NET.Llvm.DebugInfo.Tag values to better reflect + underlying LLVM names and avoid potential language keyword conflicts. 1. Label -> TagLabel 2. PtrToMemberType -> PointerToMemberType 3. Removed AutoVariable, ArgVariable, Expression, UserBase, LoUser and MipsLoop as they don't exist in the LLVM support. 17. InstructionBuilder - 1. Obsoleted Methods that don't support opaque pointers in preparation for LLVM's transition - 2. Changed MemCpy, MemMove, and MemSet signatures to remove alignment as LLVM intrinsic no - longer includes an alignment parameter. It is applied as a parameter attribute for each - of the pointer parameters (source and destination). + 1. Obsoleted Methods that don't support opaque pointers in preparation for LLVM's + transition. + 2. Changed MemCpy, MemMove, and MemSet signatures to remove alignment as LLVM intrinsic + no longer includes an alignment parameter. It is applied as a parameter attribute for + each of the pointer parameters (source and destination). 18. Ubiquity.NET.Llvm.JIT.IExecutionEngine - 1. Replaced AddModule with AddEagerlyCompiledModule to make it more explicit on the behavior + 1. Replaced AddModule with AddEagerlyCompiledModule to make it more explicit on the + behavior. 19. Ubiquity.NET.Llvm.ILazyCompileExecutionEngine - 1. Replaced AddModule [From IExecutionEngine] with AddLazyCompiledModule to make it explicit - 2. Removed DefalultSymbolResolver from interface as it should not have been in the interface - to start with. + 1. Replaced AddModule [From IExecutionEngine] with AddLazyCompiledModule to make it + explicit. + 2. Removed DefalultSymbolResolver from interface as it should not have been in the + interface to start with. 20. Deleted Ubiquity.NET.Llvm.LegacyExecutionEngine 21. Ubiquity.NET.Llvm.JIT.OrcJit - updated to reflect changes in the IExecutionEngine and ILazyCompileExecutionEngine interfaces. diff --git a/docfx/llvm/articles/InternalDetails/index.md b/docfx/llvm/articles/InternalDetails/index.md index dc3beb6a6..e461584d8 100644 --- a/docfx/llvm/articles/InternalDetails/index.md +++ b/docfx/llvm/articles/InternalDetails/index.md @@ -1,16 +1,13 @@ # Internal details -This section is focused on providing internal details of the Ubiquity.NET.Llvm implementation for -developers contributing to the contents of the Ubiquity.NET.Llvm library itself. If you are only -interested in using the Ubiquity.NET.Llvm APIs you don't need this information, though it may -satisfy curiosity 8^). +This section is focused on providing internal details of the Ubiquity.NET.Llvm +implementation for developers contributing to the contents of the Ubiquity.NET.Llvm library +itself. If you are only interested in using the `Ubiquity.NET.Llvm` APIs you don't need this +information, though it may satisfy curiosity 8^). ## Generate Handles -The source for the handles is generated from the headers by the LibLLVM repository build. They -are created and published in the `Ubiquity.NET.Llvm.Interop.Handles` NuGet package. This package -has dependencies on types in the `Ubiquity.NET.Llvm.Interop` namespace and library so there's -naturally some tensions or issues with coherency there. The intent is to move ALL of the LIBLLVM -support into this repo. But doing so means careful use of the build to ensure only the parts that -have changed are built. (Specifically, that changes to the managed code portions of the wrappers -DO NOT re-build the LLVM library. [THat's a HUGE beast that takes significant resources to build, -but changes rarely so a rebuild after a release should be kept to a minimum.]) Until the -unification happens, the tension exists. +The source for the handles is generated from the headers by the LibLLVM repository build. +They are created by the `LLvmBindingsGenerator` from the headers contained in the +`Ubiquity.NET.LibLLvm` package. The LibLLVM package is a bundle of the RID neutral headers +along with any RID specific headers. It is ultimately a "Uber" package that references the +RID specific native libraries. This keeps the size of each package down to meet NuGet +standards. diff --git a/docfx/llvm/articles/InternalDetails/llvm-handles.md b/docfx/llvm/articles/InternalDetails/llvm-handles.md index 75d1bdd1f..7a71fa3a2 100644 --- a/docfx/llvm/articles/InternalDetails/llvm-handles.md +++ b/docfx/llvm/articles/InternalDetails/llvm-handles.md @@ -15,11 +15,9 @@ title: LLVM-C Handle Wrappers This occurs when a child of a resource contains a reference to the parent. In such a case the handle should be considered like an alias and not disposed. -The Handle implementations in Ubiquity.NET.Llvm follow consistent patterns for -implementing each form of handle. All handle types are generated from the native C++ -headers. Thus, they area source only NuGet package built along with the native -extended C API library. -Ultimately, the handles are reduced to two forms: +The Handle implementations in Ubiquity.NET.Llvm follow consistent patterns for implementing +each form of handle. All handle types are generated from the native C++ headers contained in +the `Ubiquity.NET.LibLLVM` package. Ultimately, the handles are reduced to two forms: 1) Requires caller to release them - Lifetime of the thing the handle refers to is controlled by the caller - Release is implemented by standard .NET pattern with [IDisposable](xref:System.IDisposable) @@ -27,44 +25,46 @@ Ultimately, the handles are reduced to two forms: - Lifetime of the thing the handle refers to is controlled by the container >[!NOTE] -> The generated sources are not useful outside of the `Ubiquity.NET.Llvm.Interop` as -> they use classes within that as a base class. The use of code generation for the -> handles in a different repo is a bit fragile as the generated handles are derived -> from and depend on support in a different consuming repository. This is a result of -> the historical split of the native code libraries. The build of that, takes a MUCH -> longer time AND requires distinct runners for each RID supported. While there is -> thinking about how to unify these repositories, that isn't done yet. The focus is -> on getting the support for LLVM20.x and especially the JIT support. [It's been a -> long run with LLVM10 as the only option.] +> The generated sources are not useful outside of the `Ubiquity.NET.Llvm.Interop` as they +> use classes within that as a base class. These are generated manually via the +> `Generate-HandleWrappers.ps1` script. This is done once for any updates to the LibLLVM +> package to ensure the handles are kept up to date with the underlying native library. ### Contextual handles and Aliases -These handles are never manually released or disposed, though releasing their containers will -make them invalid. The general pattern for implementing such handles is to use a generated -struct that is marked as implementing the `IContextHandle` interface. This interface -is ONLY used during marshalling where the concrete type `THandle` is known and therefore does -NOT require any boxing. The struct is essentially a strongly typed alias for an nint value. -Contiguous sequences of these handles are re-interpret castable to a sequence of nint. (The -interop support uses this for efficient marshalling of arrays.) +These handles are never manually released or disposed, though releasing their containers +will make them invalid. The general pattern for implementing such handles is to use a +generated struct that is marked as implementing the `IContextHandle` interface. +This interface is ONLY used during marshalling where the concrete type `THandle` is known +and therefore does NOT require any boxing. The struct is essentially a strongly typed alias +for an nint value. Contiguous sequences of these handles are re-interpret castable to a +sequence of nint. (The interop support uses this for efficient marshalling of arrays.) ### Global Handles -Global handles require the caller to explicitly release the resources. In -Ubiquity.NET.Llvm.Interop these are managed with the .NET [SafeHandle](xref:System.Runtime.InteropServices.SafeHandle) -types through an LLVM interop specific derived type `GlobalHandleBase`. Since these types are -derived from a `SafeHandle` they are cleaned up with the standard .NET [IDisposable](xref:System.IDisposable). +Global handles require the caller to explicitly release the resources. These types all +implement `IDisposable` even though they are value types. This ensures a consistency of the +destruction at the API level. It's ALWAYS done via a call to the `Dispose()` method. Since +these handles are value types they are immutable and the `Dispose()` method is not, and +cannot be idempotent. This is generally handled in wrapper classes that ARE mutable and +replace the wrapped handle with a default value on `Dispose()` or when "moved" (usually to +native code) IFF, the wrapper supports "move" semantics then the `Dispose()` call is +idempotent. Calling Dispose() may be a NOP. This ensures that applications need not worry +about move semantics and just call `Dispose()` [Usually implicitly via a `using` expression] +Thus, even if an exception occured and the move didn't complete, the resource is properly +disposed of. -All resource handles in `Ubiquity.NET.Llvm,Interop` requiring explicit release are handled consistently -using the generated handle types as a distinct type derived from `GlobalHandleBase` +All resource handles in `Ubiquity.NET.Llvm,Interop` requiring explicit release are handled +consistently using the generated handle types as a distinct type implementing `IDisposable` Global handles that also have an alias include a declaration of the alias type and allow conversion to the unowned form of the handle. ### Global Alias handles -Global alias handles are a specialized form of global handles where they do not participate in -ownership control/release. These are commonly used when a child of a global container exposes -a property that references the parent container. In such cases the reference retrieved from the -child shouldn't be used to destroy the parent when no longer used. +Global alias handles are a specialized form of global handles where they do not participate +in ownership control/release. These are commonly used when a child of a global container +exposes a property that references the parent container. In such cases the reference +retrieved from the child shouldn't be used to destroy the parent when no longer used. In Ubiquity.NET.Llvm.Interop this is represented as an unowned context handle, that is alias -handles are the same as a context handle. There is no way to convert from an unowned alias to -an owned global handle (The other way around is allowed and supported) +handles are the same as a context handle. There is no way to convert from an unowned alias +to an owned global handle (The other way around is allowed and supported) diff --git a/docfx/llvm/articles/InternalDetails/marshal-LLVMBool.md b/docfx/llvm/articles/InternalDetails/marshal-LLVMBool.md index 29e8909c3..267692a7a 100644 --- a/docfx/llvm/articles/InternalDetails/marshal-LLVMBool.md +++ b/docfx/llvm/articles/InternalDetails/marshal-LLVMBool.md @@ -1,24 +1,25 @@ # Marshaling LLVMBool -LLVMBool is a typdef in the LLVM-C API that is both simple and problematic. In it's -simplest sense an LLVMBool is a representation of a bi-modal value. However, the -problematic part is that the semantics for the value are different depending on any -given API. That is, in some cases LLVMBool != 0 is a failure case, and others it is -a success! The confusion stems from LLVMBool serving a dual role: +LLVMBool is a typdef in the LLVM-C API that is both simple and problematic. In it's simplest +sense an LLVMBool is a representation of a bi-modal value. However, the problematic part is +that the semantics for the value are different depending on any given API. That is, in some +cases LLVMBool != 0 is a failure case, and others it is a success! The confusion stems from +LLVMBool serving a dual role: 1. A real boolean true/false 2. A status code where 0 == success and non-zero indicates an error This duality is confusing and can lead to subtle errors in usage of APIs if translated directly into language projections. This makes hands-off automatic generation of P/Invoke -calls to LLVM either impossible or error prone. Thus, Ubiquity.NET.Llvm uses manually updated P/Invoke -calls that were initially auto generated to get things started but not maintained via any -generation tools. In the case of LLVMBool Ubiquity.NET.Llvm uses distinct types for the different -semantics and declares the interop signatures with the form appropriate to the function -being called. The two types are LLVMStatus and standard `System.Boolean` or `bool` in C# +calls to LLVM either impossible or error prone. Thus, Ubiquity.NET.Llvm uses manually +updated P/Invoke calls that were initially auto generated to get things started but not +maintained via any generation tools. In the case of `LLVMBool` `Ubiquity.NET.Llvm.Interop` +uses distinct types for the different semantics and declares the interop signatures with the +form appropriate to the function being called. The two types are LLVMStatus and standard +`System.Boolean` or `bool` in C#. ## LLVMStatus This is a status value where 0 == Success and non-zero is a failure or false status. -LLVMStatus is used whenever the 0 == success semantics apply to the API. For example: +`LLVMStatus` is used whenever the 0 == success semantics apply to the API. For example: ```C# [DllImport( LibraryPath, EntryPoint = "LLVMWriteBitcodeToFD", CallingConvention = CallingConvention.Cdecl )] diff --git a/docfx/llvm/articles/InternalDetails/marshal-string.md b/docfx/llvm/articles/InternalDetails/marshal-string.md index 0d1f6e95d..29e356b54 100644 --- a/docfx/llvm/articles/InternalDetails/marshal-string.md +++ b/docfx/llvm/articles/InternalDetails/marshal-string.md @@ -1,24 +1,13 @@ # Marshaling strings in Ubiquity.NET.Llvm -LLVM provides strings in several forms and this leads to complexities for -P/Invoke signatures as sometimes the strings require some form of release -and in other cases, they do not. Standard .NET marshaling of strings makes -some assumptions with regard to strings as a return type that make the LLVM -APIs difficult. (e.g. in some LLVM APIs the returned string must be released -via LLVMDisposeMessage() or some other call, while in other cases it is just -a pointer to an internal const string that does not need any release.) +LLVM provides strings in several forms and this leads to complexities for P/Invoke +signatures as sometimes the strings require some form of release and in other cases, they do +not. Standard .NET marshaling of strings makes some assumptions with regard to strings as a +return type that make the LLVM APIs difficult. (e.g. in some LLVM APIs the returned string +must be released via LLVMDisposeMessage() or some other call, while in other cases it is +just a pointer to an internal const string that does not need any release.) To resolve these issues and make the requirements explicitly clear and consistent -Ubiquity.NET.Llvm.Interop uses custom marshaling of the strings to mark the exact behavior directly -on the P/Invoke signature so it is both clear and easy to use for the upper layers -(it's just a `System.String`) +`Ubiquity.NET.Llvm.Interop` uses custom marshaling of the strings to mark the exact behavior +directly on the P/Invoke signature so it is both clear and easy to use for the upper layers +(In most cases this is a `LazyEncodedString` but for a few it's just a `System.String`) -## Generated String Marshalers -The [LlvmBindingsGenerator](https://github.com/UbiquityDotNET/Llvm.NET/tree/master/src/Interop/LlvmBindingsGenerator) -Creates concrete custom marshalers for every string disposal type supported. To -keep things simple and eliminate redundancies, the generated marshalers all derive from -a common base type CustomStringMarshalerBase. - -### Marshaling configuration -LLVMBindingsGenerator supports a flexible configuration to identify which functions require which -form of marshaling. For strings this is an instance of the `StringMarshalInfo` -class diff --git a/docfx/llvm/articles/Samples/index.md b/docfx/llvm/articles/Samples/index.md index c00976b3d..ec5d84884 100644 --- a/docfx/llvm/articles/Samples/index.md +++ b/docfx/llvm/articles/Samples/index.md @@ -1,7 +1,8 @@ # Samples -Ubiquity.NET.Llvm provides multiple samples to aid in understanding how to use the Ubiquity.NET.Llvm library. -These samples are designed and intended to illustrate some aspect(s) of using Ubiquity.NET.Llvm itself -and are not generally considered production quality. They serve to illustrate usage of some aspect with as -little extraneous overhead as possible. The Kaleidoscope examples all use a common runtime, which is not -necessary to understand usage of the Ubiquity.NET.Llvm library. Though they can serve as an example of how -to implement something similar. +Ubiquity.NET.Llvm provides multiple samples to aid in understanding how to use the +`Ubiquity.NET.Llvm` library. These samples are designed and intended to illustrate some +aspect(s) of using Ubiquity.NET.Llvm itself and are not generally considered production +quality. They serve to illustrate usage of some aspect with as little extraneous overhead +as possible. The Kaleidoscope examples all use a common runtime, which is not necessary to +understand usage of the Ubiquity.NET.Llvm library. Though they can serve as an example of +how to implement something similar. diff --git a/docfx/llvm/index.md b/docfx/llvm/index.md index 692c14e0b..7454dd76e 100644 --- a/docfx/llvm/index.md +++ b/docfx/llvm/index.md @@ -1,80 +1,83 @@ # Ubiquity.NET.Llvm -Ubiquity.NET.Llvm is a managed wrapper around an extended LLVM-C API including an Object Oriented model that closely matches -the underlying LLVM internal object model. This allows for building code generation, JIT and other utilities leveraging LLVM -from .NET applications. +Ubiquity.NET.Llvm is a managed wrapper around an extended LLVM-C API including an Object +Oriented model that closely matches the underlying LLVM internal object model. This allows +for building code generation, JIT and other utilities leveraging LLVM from .NET applications. ## Guiding principles - 1. Mirror the underlying LLVM model as much as possible while - providing a well behaved .NET projection including: + 1. Mirror the underlying LLVM model as much as possible while providing a well behaved + .NET projection including: 1. Class names and hierarchies 2. Object identity and reference equality - 3. [Fluent](https://en.wikipedia.org/wiki/Fluent_interface) APIs when plausible and appropriate - 2. Hide low-level interop details and the raw LLVM-C API. - The native model for LLVM is a C++ class hierarchy and not the LLVM-C API used for most - language/runtime bindings. Ubiquity.NET.Llvm is designed to provide an OO model that faithfully reflects the - underlying LLVM model while fitting naturally into .NET programming patterns. + 3. [Fluent](https://en.wikipedia.org/wiki/Fluent_interface) APIs when plausible and + appropriate. + 2. Hide low-level interop details and the raw LLVM-C API. + - The native model for LLVM is a C++ class hierarchy and not the LLVM-C API used for + most language/runtime bindings. `Ubiquity.NET.Llvm` is designed to provide an OO + model that faithfully reflects the underlying LLVM model while fitting naturally + into .NET programming patterns. 3. Leverage existing LLVM-C APIs underneath whenever possible 1. Extend only when needed with custom wrappers - 4. FxCop/Code Analysis Clean + 4. StyleCop/Code Analysis Clean ## Features * LLVM Cross target code generation from .NET code -* JIT engine support for creating dynamic domain specific language - runtimes with JIT support. +* JIT engine support for creating dynamic domain specific language runtimes with JIT support. * Ahead of time compilation with support for Link time optimization and debug information * Object model that reflects the underlying LLVM classes >[!Important] -> It is important to point out that the Ubiquity.NET.Llvm documentation is not a substitute -> for the official LLVM documentation itself. That is, the content here is focused on -> using Ubiquity.NET.Llvm and how it maps to the underlying LLVM. The LLVM documentation is, -> generally speaking, required reading to understand Ubiquity.NET.Llvm. The topics here often -> contain links to the official LLVM documentation to help in further understanding the -> functionality of the library. +> It is important to point out that the `Ubiquity.NET.Llvm` documentation is not a +> substitute for the official LLVM documentation itself. That is, the content here is +> focused on using `Ubiquity.NET.Llvm` and how it maps to the underlying LLVM. The LLVM +> documentation is, generally speaking, required reading to understand Ubiquity.NET.Llvm. +> The topics here often contain links to the official LLVM documentation to help in +> further understanding the functionality of the library. ## Breaking changes from prior versions In Version 20.1.0 a number of issues were resolved using newer .NET as well as in the LLVM design itself that allows for a fundamentally new implementation. While there isn't a LOT of -code that consumers have to change (See the samples and compare against older versions) there -are important factors to consider in the new library: +code that consumers have to change (See the samples and compare against older versions) +there are important factors to consider in the new library: 1) Ownership - - The previous variants of the library did NOT generally consider ownership carefully. It - routinely provided types that under some circumstances require disposal, and others did - not (Alias). This caused problems for the interning of projected types as the behavior - of the first instance interned was used. (Usually leading to leaks or strange crashes at - obscure unrelated times that made testing extremely difficult [Worst case scenario, it - works fine in all in-house testing but breaks in the field!). + - The previous variants of the library did NOT generally consider ownership carefully. + It routinely provided types that under some circumstances require disposal, and others + did not (Alias). This caused problems for the interning of projected types as the + behavior of the first instance interned was used. (Usually leading to leaks or strange + crashes at obscure unrelated times that made testing extremely difficult [Worst case + scenario, it works fine in all in-house testing but breaks in the field!). 3) No Interning of projected types - Projected types are no longer interned, this dramatically increases performance and reduces the complexity of maintenance of this library. Generally it should have little impact as anything that produces an alias where the type might in other cases require - the owner to dispose it should now produce an interface that is not disposable. Anything - the caller owns IS an IDisposable. - - Move semantics are handled internally where the provided instance is invalidated but - the Dispose remains a safe NOP. This helps prevent leaks or confusion when transfer is - unable to complete due to an exception. The caller still owns the resource. Either way, - Dispose() is called to clean it up, which is either a safe NOP, or an actual release of - the native resource. + the owner to dispose it should now produce an interface that is not disposable. + Anything the caller owns IS an `IDisposable`. + - Move semantics are handled internally where the provided instance is invalidated + but the Dispose remains a safe NOP. This helps prevent leaks or confusion when + transfer is unable to complete due to an exception. The caller still owns the + resource. Either way, `Dispose()` is called to clean it up, which is either a + safe NOP, or an actual release of the native resource. 2) Assumption of Reference Equality 1) In the new library there is NO guarantee of reference equality for reference types. - Such types MAY be value equal if they refer to the same underlying native instance. ### Ownership and IDisposable -When dealing with native interop the concept of ownership is of critical importance. The underlying -resources are NOT controlled by a Garbage collector, and therefore require care to avoid access violations -and other app crash scenarios. This library aims to make that much easier by using IDisposable for these -scenarios. It is ***HIGHLY*** recommended to use the [IDisposableAnalyzers](https://www.nuget.org/packages/IDisposableAnalyzers/) -in ANY project that consumes this library. (It was/is used internally to find and fix issues across the -library that were tedious to identify otherwise). +When dealing with native interop the concept of ownership is of critical importance. The +underlying resources are NOT controlled by a Garbage collector, and therefore require care +to avoid access violations and other app crash scenarios. This library aims to make that +much easier by using IDisposable for these scenarios. It is ***HIGHLY*** recommended to use +the [IDisposableAnalyzers](https://www.nuget.org/packages/IDisposableAnalyzers/) in ANY +project that consumes this library. (It was/is used internally to find and fix issues across +the library that were tedious to identify otherwise). #### Ownership transfer (move semantics) -Sometimes an API will transfer ownership to a containing type or native code in general. In C++ terminology -that is known as 'move semantics' and typically handled with `std::move()` but .NET and C# have no such -concept. To make life easier and keep usage of disposable types consistent, when a method follows the move -semantics it should be documented as such and, more importantly, it will set the value provided as invalid -BUT calling `Dispose()` is still a NOP. This keeps usage consistent even if ownership is transferred. -Attempting to use an instance after it is transferred will result in an `ObjectDisposedException`. +Sometimes an API will transfer ownership to a containing type or native code in general. In +C++ terminology that is known as 'move semantics' and typically handled with `std::move()` +but .NET and C# have no such concept. To make life easier and keep usage of disposable types +consistent, when a method follows the move semantics it should be documented as such and, +more importantly, it will set the value provided as invalid BUT calling `Dispose()` is still +a NOP. This keeps usage consistent even if ownership is transferred. Attempting to use an +instance after it is transferred will result in an `ObjectDisposedException`. Example from [OrcV2VeryLazy](xref:orcjitv2-very-lazy) sample application ``` C# @@ -88,25 +91,29 @@ jit.MainLib.Define(fooMu); ``` ### Unowned references (alias) -For an unowned reference to an underlying resource an interface is defined such as [IModule](xref:Ubiquity.NET.Llvm.IModule). -When a property returns an interface only it is not Disposable and ownership remains with the source. -Care is required on the part of a consumer to not store that instance anywhere and treat it as if it was a -`ref struct` (That is, only held on the stack). While the GC is free to clean up such an instance at any time -this prevents attempts to use the interface after the containing object is destroyed. +For an unowned reference to an underlying resource an interface is defined such as +[IModule](xref:Ubiquity.NET.Llvm.IModule). When a property returns an interface only it is +not Disposable and ownership remains with the source. Care is required on the part of a +consumer to not store that instance anywhere and treat it as if it was a `ref struct` +(That is, only held on the stack). While the GC is free to clean up such an instance at any +time this prevents attempts to use the interface after the containing object is destroyed. ### Equality -In prior releases of this library a complex scheme of interning projection wrappers was used to support -reference equality. When you had an instance of class 'foo' you could just compare it to any other using reference -equality. For any two that referred to the same native instance they'd be the same object. While this had convenience -for the user it had a multitude of hidden flaws. The biggest is the concept of ownership. [See discussion above]. If -objects are interned then you would end up with whatever instance was first created, ignoring the ownership completely. -If the first instance was an unowned alias, then it would leak as nothing owns it... If it was NOT an alias, then, -when retrieved from interning when an alias is needed to be the result, you could end up with premature disposal... -It was all confusing on whether you are supposed to call Dispose() or not. (Exact opposite of recommended best practice -for IDisposable). +In prior releases of this library a complex scheme of interning projection wrappers was used +to support reference equality. When you had an instance of class 'foo' you could just +compare it to any other using reference equality. For any two that referred to the same +native instance they'd be the same object. While this had convenience for the user it had a +multitude of hidden flaws. The biggest is the concept of ownership. [See discussion above]. +If objects are interned then you would end up with whatever instance was first created, +ignoring the ownership completely. If the first instance was an unowned alias, then it would +leak as nothing owns it... If it was NOT an alias, then, when retrieved from interning when +an alias is needed to be the result, you could end up with premature disposal... +It was all confusing on whether you are supposed to call Dispose() or not. (Exact opposite +of recommended best practice for IDisposable). -Thus, this version of the library eliminates the confusion and complexity by use of objects that are disposable, -interfaces and a usage pattern that ensures Dispose() is idempotent and a NOP when already disposed. In the current -release no interning is performed, and instead wrapping types implement [`IEquatable`](xref:System.IEquatable`1) -to allow value equality to compare the underlying native handle and resolve them as the same underlying instance or -not. +Thus, this version of the library eliminates the confusion and complexity by use of objects +that are disposable, interfaces and a usage pattern that ensures `Dispose()` is idempotent +and a NOP when already disposed. In the current release no interning is performed, and +instead wrapping types implement [`IEquatable`](xref:System.IEquatable`1) to allow value +equality to compare the underlying native handle and resolve them as the same underlying +instance or not. diff --git a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.DebugInfo.md b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.DebugInfo.md index bc751f6dc..d30bb3c26 100644 --- a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.DebugInfo.md +++ b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.DebugInfo.md @@ -5,14 +5,15 @@ remarks: *content This namespace contains all the support for the LLVM representation of debugging information. ## Differences from previous release -A critical difference is that a [Module](xref:Ubiquity.NET.Llvm.Module) does NOT own a [DIBuilder](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder) -That was a customized extension that was more accidental as a result of the former releases using object -interning. However, once that was removed it was found that Module instances were attempting to hold fields -or properties of things that were NOT part of the underlying native object. So, the pattern of use was -changed to better match how the underlying LLVM API worked. In particular a DIBuilder is a `ref struct` -that is ONLY allowed on the stack as it is intended for short term use. It can (and does) own a -[DICompileUnit](xref:Ubiquity.NET.Llvm.DebugInfo.DICompileUnit) and it can reference the module it was -created from. Since it is a `ref struct` it is not allowed to store instances of a DIBuilder in the heap in -any way. ([Kaleidoscope Chapter 9](xref:Kaleidoscope-ch9) provides a sample of use in a visitor pattern where the -instance is provided as an `in` parameter to `ref readonly` functions. This ability was added to the visitor -pattern specifically for this case.) +A critical difference is that a [Module](xref:Ubiquity.NET.Llvm.Module) does NOT own a +[DIBuilder](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder). That idea in previous releases was +a customized extension that was more accidental as a result of the former releases using +object interning. However, once that was removed it was found that Module instances were +attempting to hold fields or properties of things that were NOT part of the underlying +native object. So, the pattern of use was changed to better match how the underlying LLVM +API worked. In particular a `DIBuilder` is intended for short term use. It can (and does) +own a [DICompileUnit](xref:Ubiquity.NET.Llvm.DebugInfo.DICompileUnit) and it can reference +the module it was created from. ([Kaleidoscope Chapter 9](xref:Kaleidoscope-ch9) provides a +sample of use in a visitor pattern where the instance is provided as a parameter to +functions. This ability of providing a "context" was added to the visitor pattern +specifically for this case. diff --git a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Metadata.md b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Metadata.md index 31abb0a62..0532eaa2f 100644 --- a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Metadata.md +++ b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Metadata.md @@ -2,15 +2,15 @@ uid: Ubiquity.NET.Llvm.Metadata remarks: *content --- -This namespace hosts all of the wrappers for the LLVM Metadata. The namespace contains the root -of the metadata type system [IrMetadata](xref:Ubiquity.NET.Llvm.Metadata.IrMetadata). +This namespace hosts all of the wrappers for the LLVM Metadata. The namespace contains the +root of the metadata type system [IrMetadata](xref:Ubiquity.NET.Llvm.Metadata.IrMetadata). >[NOTE] -> The name [IrMetadata](xref:Ubiquity.NET.Llvm.Metadata.IrMetadata) is used to help deal with a -> number of naming issues and conflicts with existing types or namespaces. +> The name [IrMetadata](xref:Ubiquity.NET.Llvm.Metadata.IrMetadata) is used to help deal +> with a number of naming issues and conflicts with existing types or namespaces. > ([CA1724](https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1724) > is not your friend on this one. The name `Metadata` conflicts with namespace > [System.Runtime.Remoting.Metadata](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.remoting.metadata?view=netframework-4.8.1), -> which apparently is ONLY a legacy desktop framework namespace, but the tooling still complains -> about a conflict between the namespace name and a type name). To resolve this the type is named -> as closely as possible without causing the conflict. +> which apparently is ONLY a legacy desktop framework namespace, but the tooling still +> complains about a conflict between the namespace name and a type name). To resolve this +> the type is named as closely as possible without causing the conflict. diff --git a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.ObjectFile.md b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.ObjectFile.md index c16fb052a..ed1333029 100644 --- a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.ObjectFile.md +++ b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.ObjectFile.md @@ -4,5 +4,6 @@ remarks: *content --- This namespace provides wrappers for the LLVM ObjectFile manipulation support. While it is mostly used for object files, it also applies to libraries as well as LLVM IR. In LLVM terms -these are all categorized as a [TargetBinary](xref:Ubiquity.NET.Llvm.ObjectFile.TargetBinary). +these are all categorized as a +[TargetBinary](xref:Ubiquity.NET.Llvm.ObjectFile.TargetBinary). diff --git a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Transforms.Legacy.md b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Transforms.Legacy.md index cd08876b7..f020be7e6 100644 --- a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Transforms.Legacy.md +++ b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.Transforms.Legacy.md @@ -2,8 +2,9 @@ uid: Ubiquity.NET.Llvm.Transforms.Legacy remarks: *content --- -This namespace contains the wrappers for supporting the "legacy" pass management. This is NOT -normally used by applications as the new pass manager support is built into [Module](xref:Ubiquity.NET.Llvm.Module) -and [Function](xref:Ubiquity.NET.Llvm.Values.Function) via one of the overloads of -`TryRunPasses(...)`. Generally the legacy pass manager support is only used for final target -code generation and not exposed for LLVM-C consumption and therefore not of any real use. +This namespace contains the wrappers for supporting the "legacy" pass management. This is +NOT normally used by applications as the new pass manager support is built into +[Module](xref:Ubiquity.NET.Llvm.Module) and [Function](xref:Ubiquity.NET.Llvm.Values.Function) +via one of the overloads of `TryRunPasses(...)`. Generally the legacy pass manager support +is only used for final target code generation and not exposed for LLVM-C consumption and +therefore not of any real use. diff --git a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.md b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.md index a4cc5b295..7a60def3a 100644 --- a/docfx/llvm/namespaces/Ubiquity.NET.Llvm.md +++ b/docfx/llvm/namespaces/Ubiquity.NET.Llvm.md @@ -2,6 +2,6 @@ uid: Ubiquity.NET.Llvm remarks: *content --- -This is the root namespace of all of the library support, it contains the entirety of the wrapped -projection of LLVM for .NET consumers. There are several core items in this namespace as well as -distinct child namespaces for various purposes. +This is the root namespace of all of the library support, it contains the entirety of the +wrapped projection of LLVM for .NET consumers. There are several core items in this +namespace as well as distinct child namespaces for various purposes. diff --git a/docfx/templates/Ubiquity/readme.md b/docfx/templates/Ubiquity/readme.md index 3846976be..46f326c75 100644 --- a/docfx/templates/Ubiquity/readme.md +++ b/docfx/templates/Ubiquity/readme.md @@ -1,25 +1,25 @@ # Ubiquity DOCFX template -This template adds support to the syntax highlighting provided by [HightlightJS](https://highlightjs.readthedocs.io/en/latest/supported-languages.html). -The languages added are for ANTLR (Which seems bizarre it isn't already covered -given the esoteric nature of some of the supported languages...) and of course the -`Kaleidoscope` language, which was made up entirely for the purposes of LLVM -tutorials. (No surprise that one isn't supported) [Though it oddly IS supported -directly in the [Local MarkDig based editor](https://github.com/MadsKristensen/MarkdownEditor2022) -used to edit these files... [Go Figure! :shrug: ] +This template adds support to the syntax highlighting provided by +[HightlightJS](https://highlightjs.readthedocs.io/en/latest/supported-languages.html). +The languages added are for ANTLR (Which seems bizarre it isn't already covered given the +esoteric nature of some of the supported languages...) and of course the `Kaleidoscope` +language, which was made up entirely for the purposes of LLVM tutorials. (No surprise that +one isn't supported) [Though it oddly IS supported directly in the +[Local MarkDig based editor](https://github.com/MadsKristensen/MarkdownEditor2022) used to +edit these files, but NOT the native VS markdown editor... [Go Figure! :shrug: ] ## layout/_master.tmpl This is mostly borrowed from the official DocFX `modern` template. However, the build -version number was added to the footer. Unfortunately no simpler means was found to do -that. +version number was added to the footer. Unfortunately no simpler means was found to do that. ## Theming -This template also updates the theme for SVG image backgrounds to improve -readability of the parse diagrams. It also updates the HighlightJS classifiers -theming to better show various parts of the ANTLR language. +This template also updates the theme for SVG image backgrounds to improve readability of the +parse diagrams. It also updates the HighlightJS classifiers theming to better show various +parts of the ANTLR language. -The default theming is limited, especially for the ANTLR language, as it doesn't -support distinction between function definitions and invocations. (Both have -essentially the default text formatting so it's like none is applied). HighlightJS -has rather limited "scopes" and mapping to them for a language like ANTLR4 -Lex/Grammar is a challenge. It is NOT an imperative language (but it does generate -to that) but the HighlightJS is mostly focused on those. +The default theming is limited, especially for the ANTLR language, as it doesn't support +distinction between function definitions and invocations. (Both have essentially the default +text formatting so it's like none is applied). HighlightJS has rather limited "scopes" and +mapping to them for a language like ANTLR4 Lex/Grammar is a challenge. It is NOT an +imperative language (but it does generate to that) but the HighlightJS is mostly focused on +those. diff --git a/src/Interop/InteropTests/ABI/libllvm-c/ReadMe.md b/src/Interop/InteropTests/ABI/libllvm-c/ReadMe.md index 2d2f3d2e0..61f3236ea 100644 --- a/src/Interop/InteropTests/ABI/libllvm-c/ReadMe.md +++ b/src/Interop/InteropTests/ABI/libllvm-c/ReadMe.md @@ -1,14 +1,14 @@ # LibLLVM Interop Tests -These tests focus on validating the extended C API unique to LibLLVM. They do NOT -test the official LLVM-C API nor it's interop bindings. (That would be excessive and -likely redundant as LLVM itself tests the APIs better, and the callers are tested in -the wrapper libraries in this repo.) The LibLLVM methods are unique to this library -and need testing at this low level to ensure that the upper layers can rely on them -to behave as expected. There is no native code testing of these APIs (at least not at -this point) so this is the only "line of defense" for them. +These tests focus on validating the extended C API unique to LibLLVM. They do NOT test the +official LLVM-C API nor it's interop bindings. (That would be excessive and likely redundant +as LLVM itself tests the APIs better, and the callers are tested in the wrapper libraries in +this repo.) The LibLLVM methods are unique to this library and need testing at this low +level to ensure that the upper layers can rely on them to behave as expected. There is no +native code testing of these APIs (at least not at this point) so this is the only +"line of defense" for them. >[!IMPORTANT] -> As of this writing most are completely empty tests that don't do anything. These -> will come online in subsequent builds. In many cases the value of tests at this -> low level is debatable. Is it worth 300 lines of complex setup to test one function -> call? Especially when the upper layers do exercise the functionality? +> As of this writing most are completely empty tests that don't do anything. These will come +> online in subsequent builds. In many cases the value of tests at this low level is +> debatable. Is it worth 300 lines of complex setup to test one function call? Especially, +> when the upper layers do exercise the functionality? diff --git a/src/Interop/LlvmBindingsGenerator/ReadMe.md b/src/Interop/LlvmBindingsGenerator/ReadMe.md index e0eda9138..d08a9b1a3 100644 --- a/src/Interop/LlvmBindingsGenerator/ReadMe.md +++ b/src/Interop/LlvmBindingsGenerator/ReadMe.md @@ -1,35 +1,37 @@ # LibLLVMBindingsGenerator As the name implies this is a code generator for use in building the LLVM libraries. -Historically it did more than it does now but that use proved to be difficult to maintain (It -generated all the P/Invoke code from a custom representation in YAML. In the end that was just -extra work as not much of generating interop for the LLVM-C was automatable. (A LOT required -reading the docs, and often the LLVM source code to determine if it was an ownership transfer -(move) and, if so, how to dispose of the resource once done with it. All of that is expressible -IN C# already as part of P/Invoke generation so there wasn't much point in continuing with -that. (Though there is something to be said for use as a starting point...) +Historically it did more than it does now but that use proved to be difficult to maintain +(It generated all the P/Invoke code from a custom representation in YAML. In the end that +was just extra work as not much of generating interop for the LLVM-C was automatable. (A LOT +required reading the docs, and often the LLVM source code to determine if it was an +ownership transfer (move) and, if so, how to dispose of the resource once done with it. All +of that is expressible IN C# already as part of P/Invoke generation so there wasn't much +point in continuing with that. (Though there is something to be said for use as a starting +point...) ## Split implementation -This app was subsequently split into to implementations that now exist in distinct repositories -1) Generates the EXPORTS.g.def for the Windows DLL generation from the LLVM + LIBLLVM headers +This app was subsequently split into to implementations that now exist in distinct +repositories: +1) Generate the EXPORTS.g.def for the Windows DLL generation from the LLVM + LIBLLVM headers 1) This version lives in the [LibllVM repository](https://github.com/UbiquityDotNET/Llvm.Libs) -2) Generates the "safe handle" C# code from the LLVM + LIBLLVM headers +2) Generate the "safe handle" C# code from the LLVM + LIBLLVM headers 1) This version is the one in this repository this document discusses ### Common implementation -While there is a common implementation between the implementations (They started as simply the -same code and commenting out the functionality not desired) they have and will diverge over -time, though anything in the core parsing of headers and general code generation from templates -is likely to remain. (It may be viable to support a common library for this scenario but this -is ONLY necessary when the native side of the interop library changes) +While there is a common implementation between the implementations (They started as simply +the same code and commenting out the functionality not desired) they have and will diverge +over time. Though, anything in the core parsing of headers and general code generation from +templates is likely to remain. (It may be viable to support a common library for this +scenario but this is ONLY necessary when the native side of the interop library changes) ## Usage > [!IMPORTANT] > This project has a dependency on the `CppSharp` library which ONLY supports the `X64` > architecture but the generated wrappers are NOT dependent on a particular architecture. > This limits the environments that can be used to generate the sources. To simplify that, -> the generated sources are placed into source control but generated off-line by a developer. -> A developer machine doing this ***MUST*** be X64 or this tool can't run. This is a limitation -> defined by a dependent library. +> the generated sources are placed into source control but generated off-line by a +> developer. A developer machine doing this ***MUST*** be X64 or this tool can't run. This +> is a limitation defined by a dependent library. `LlvmBindingsGenerator -l -e -h [-Diagnostics ]` @@ -52,15 +54,15 @@ This library will generate the handle file directly. Therefore ROSLYN source gen not used. >[!IMPORTANT] ->The generated files are not usable on their own. They depend on the additional types found in -> Ubiquity.NET.Llvm.Interop assembly. The design of this app assumes that is where the results -> are used. ANY USE outside of that context is ***STRONGLY*** discouraged and explicitly NOT -> supported. +>The generated files are not usable on their own. They depend on the additional types found +> in Ubiquity.NET.Llvm.Interop assembly. The design of this app assumes that is where the +> results are used. ANY USE outside of that context is ***STRONGLY*** discouraged and +> explicitly NOT supported. #### Roslyn Source Generators - 'There be dragons there!' Roslyn allows source generators directly in the compiler making for a feature similar to C++ -template code generation AT compile time. However, there's a couple of BIG issues with that for -this particular code base. +template code generation AT compile time. However, there's a couple of BIG issues with that +for this particular code base. 1) Non-deterministic ordering, or more specifically for this app, no way to declare the dependency on ***outputs*** of one generator as the ***input*** for another. 2) Dependencies for project references @@ -70,62 +72,62 @@ this particular code base. with them as project references... Specifically, in this code, the built-in generator that otherwise knows noting about the -handle generation, needs to see and use the **OUTPUT** of the handle source generation. (It's -not just a run ordering problem as ALL generators see the same input text!) +handle generation, needs to see and use the **OUTPUT** of the handle source generation. +(It's not just a run ordering problem as ALL generators see the same input text!) [See: [Discussion on ordering and what a generator "sees"](https://github.com/dotnet/roslyn/discussions/57912#discussioncomment-1682779) [See: [Roslyn issue #57239](https://github.com/dotnet/roslyn/issues/57239)]] The interop code uses the LibraryImportAttribute for AOT support of ALL of the interop APIs -declared. Thus, at compile time the interop source generator **MUST** be able to see the used, -specifically, it must have access to the `NativeMarshalling` attribute for all the handle -types. Otherwise, it doesn't know how to marshal the type and bails out. It is possible to -"overcome" this with an explicit `MarshalUsingAttribute` on every parameter or return type -but that's tedious. Tedious, typing is what source generators and templates are supposed to -remove. Thus, this library will host the source generator (like a unit test would) and -generates the handle sources **BEFORE** they are compiled in the project. Thus, the generated -source files will contain the marshaling attributes so that the interop source generator knows -how to generate the correct code. +declared. Thus, at compile time the interop source generator **MUST** be able to see the +code used, Specifically, it must have access to the `NativeMarshalling` attribute for all +the handle types. Otherwise, it doesn't know how to marshal the type and bails out. It is +possible to "overcome" this with an explicit `MarshalUsingAttribute` on every parameter or +return type but that's tedious. Tedious, typing is what source generators and templates are +supposed to remove. Thus, this library will host the source generator (like a unit test +would) and generates the handle sources **BEFORE** they are compiled in the project. Thus, +the generated source files will contain the marshaling attributes so that the interop source +generator knows how to generate the correct code. >To be crystal clear - The problem is **NOT** one of generator run ordering, but on the > ***dependency of outputs***. By design, Roslyn source generators can only see the original > source input, never the output of another generator. Most don't, and never will, care. The -> handle generation, in this case, does. Solving that generically in a performant fashion is a -> ***HARD*** problem indeed... Not guaranteed impossible, but so far no-one has come up with a -> good answer to the problem. Even C++ has this issue with templates+concepts+CRTP; and that -> language has had source generating templates as a direct part of the language for several -> decades now. +> handle generation, in this case, does. Solving that generically in a performant fashion is +> a ***HARD*** problem indeed... Not guaranteed impossible, but so far no-one has come up +> with a good answer to the problem. Even C++ has this issue with templates+concepts+CRTP; +> and that language has had source generating templates as a direct part of the language for +> several decades now. [See also: [Using the CRTP and C++20 Concepts to Enforce Contracts for Static Polymorphism](https://medium.com/@rogerbooth/using-the-crtp-and-c-20-concepts-to-enforce-contracts-for-static-polymorphism-a27d93111a75) ] [See also: [Rules for Roslyn source generators](https://github.com/dotnet/roslyn/blob/main/docs/features/incremental-generators.cookbook.md)] #### Alternate solutions considered and rejected 1) Running the source generator directly in the project - 1) This is where the problem on non-deterministic ordering and visibility of the generated - code was discovered. Obviously (now anyway!) this won't work. + 1) This is where the problem on non-deterministic ordering and visibility of the + generated code was discovered. Obviously (now anyway!) this won't work. 2) Use a source generator in a separate assembly - 1) This solves the generator output dependency problem but introduces a new problem of how - the build infrastructure for these types manage NuGet versions + 1) This solves the generator output dependency problem but introduces a new problem of + how the build infrastructure for these types manage NuGet versions. 2) Additionally, this adds complexity of a second native dependency on the library - exporting the native functionality. (Should there be two copies? How does code in each - refer to the one instance?...) + exporting the native functionality. (Should there be two copies? How does code in + each refer to the one instance?...) 3) Call the source generator from within this app to control the ordering - 1) This at least could get around the ordering/dependency problem as it would guarantee the - custom generator runs before the built-in one. - 2) However, this runs afoul of the binary dependency problem... Not 100% insurmountable but - the number of caveats on the Roslyn Source Generator side of things grows to a + 1) This at least could get around the ordering/dependency problem as it would guarantee + the custom generator runs before the built-in one. + 2) However, this runs afoul of the binary dependency problem... Not 100% insurmountable + but the number of caveats on the Roslyn Source Generator side of things grows to a significant factor. #### The final choice Keep using this app as a generator for the handle types. This used to work, and still does. -However, this doesn't solve the problem of expressing managed code things in a custom language -(YAML) but it's at least a rather simplistic expression for the handles. And arguably less -complicated then all the subtleties of using a Roslyn Source generator for this sort of one off -specialized code generation. +However, this doesn't solve the problem of expressing managed code things in a custom +language (YAML) but it's at least a rather simplistic expression for the handles. And +arguably less complicated then all the subtleties of using a Roslyn Source generator for +this sort of one off specialized code generation. -Solving the problem of expressing P/Invokes is simply to just manage that directly. It seemed -like a good idea to automate the tedium of generating those. Sadly, there are so many +Solving the problem of expressing P/Invokes is simply to just manage that directly. It +seemed like a good idea to automate the tedium of generating those. Sadly, there are so many subtleties of "special cases" that involve reading the docs (or source code) before you can -correctly implement it. In the end, there's no value in expressing all that subtlety in anything -other than C#. +correctly implement it. In the end, there's no value in expressing all that subtlety in +anything other than C#. This also keeps the door open to use the native AST from within the source generator or an analyzer to perform additional checks and ensure the hand written code matches the actual diff --git a/src/Interop/LlvmBindingsGenerator/Templates/readme.md b/src/Interop/LlvmBindingsGenerator/Templates/readme.md index 6139a7694..a750cdb56 100644 --- a/src/Interop/LlvmBindingsGenerator/Templates/readme.md +++ b/src/Interop/LlvmBindingsGenerator/Templates/readme.md @@ -1,14 +1,15 @@ # Ubiquity.NET.Llvm.Interop Generation The code generation for the Ubiquity.NET.Llvm.Interop namespace leverages [CppSharp] for -parsing and processing the LLVM-C (and custom extension) headers. The actual code generation is -done using a custom system of T4 templates. While CppSharp has a code generation system it is -focused primarily on projecting the full C++ type system (including implementing derived types -in C#!). However, the generation is pretty inflexible when it comes to the final form of the -output in C# and how it handles marshaling. Ubiquity.NET.Llvm uses custom handle types for all -references in the C API along with custom string marshaling to handle the various kinds of -string disposal used in the C API. Unfortunately, CppSharp wasn't flexible enough to handle -that with it's built-in generation. Thus, the Ubiquity.NET.Llvm.Interop bindings are generated -using customized support based on a few T4 templates. +parsing and processing the LLVM-C (and custom extension) headers. The actual code generation +is done using a custom system of T4 templates. While CppSharp has a code generation system +it is focused primarily on projecting the full C++ type system (including implementing +derived types in C#!). However, the generation is pretty inflexible when it comes to the +final form of the output in C# and how it handles marshaling. Ubiquity.NET.Llvm uses custom +handle types for all references in the C API along with custom string marshaling to handle +the various kinds of string disposal used in the C API. Unfortunately, CppSharp wasn't +flexible enough to handle that with it's built-in generation. Thus, the +`Ubiquity.NET.Llvm.Interop` bindings are generated using customized support based on a few +T4 templates. ## T4 Templates ### ContextHandleTemplate.tt @@ -21,24 +22,24 @@ Provides a template for all Global handles (see below for details of handles) Handles for LLVM are just opaque pointers. They generally come in one of two forms. - 1. Context owned - Where there is always a well known owner that ultimately is responsible for - disposing/releasing the resource. - 1. A specialized and common use of this is that of an unowned alias to a global resource. - This occurs when a child of a global resource contains a reference to the parent. In - such a case the handle should be considered like an alias and not disposed. - 2. Global resources - Where there is no parent child ownership relationship and callers must manually release - the resource. + 1. Context owned + 1. There is always a well known owner (The "Context") that ultimately is responsible + for disposing/releasing the resource. + 1. A specialized and common use of this is that of an unowned alias to a global + resource. This occurs when a child of a global resource contains a reference to the + parent. In such a case the handle should be considered an alias and not disposed. + 2. Global resources + 1. There is no parent child ownership relationship and callers must manually release + the resource. The Handle implementations here follow consistent patterns for implementing each form of handle. ### Contextual handles -These handles are never manually released or disposed, though releasing their containers will -make them invalid. The general pattern for implementing such handles is taken care of by the -T4 template `ContextHandleTemplate.tt` +These handles are never manually released or disposed, though releasing their containers +will make them invalid. The general pattern for implementing such handles is taken care of +by the T4 template `ContextHandleTemplate.tt` ### Global Handles Global handles require the caller to explicitly release the resources. @@ -48,14 +49,15 @@ LLVM requiring explicit release are handled consistently by the T4 template `GlobalHandleTemplate.tt` #### Global Alias handles -Global alias handles are a specialized form of global handles where they do not participate in -ownership control/release. These are commonly used when a child of a global container exposes -a property that references the parent container. In such cases the reference retrieved from -the child shouldn't be used to destroy the child or the parent when no longer used. +Global alias handles are a specialized form of global handles where they do not participate +in ownership control/release. These are commonly used when a child of a global container +exposes a property that references the parent container. In such cases the reference +retrieved from the child shouldn't be used to destroy the child or the parent when no longer +used. In Ubiquity.NET.Llvm this is represented as a distinct context handle type that has implicit -casting to allow for simpler usage scenarios. (That, is an alias can cast to an unowned global -handle when needed to allow passing it in to native APIs without taking ownership) Most APIs -will have the alias type as the signature, especially for [In] parameters. This helps to -reinforce the intended semantics for the parameter. To make life easy there is an implicit -cast from the global handle to an alias (which is just a value type) when needed. +casting to allow for simpler usage scenarios. (That, is an alias can cast to an unowned +global handle when needed to allow passing it in to native APIs without taking ownership). +Most APIs will have the alias type as the signature, especially for [In] parameters. This +helps to reinforce the intended semantics for the parameter. To make life easy there is an +implicit cast from a global handle to an alias (which is just a value type) when needed. diff --git a/src/Interop/readme-first.md b/src/Interop/readme-first.md index f076d1e88..15f4b8e42 100644 --- a/src/Interop/readme-first.md +++ b/src/Interop/readme-first.md @@ -1,70 +1,65 @@ # Interop Support -This folder contains the low level LLVM direct interop support. It requires some -specialized build ordering in particular the generated handle wrappers are needed to -build the interop library, which is, currently^1^, a dependency of the LLVM wrapper -library. +This folder contains the low level LLVM direct interop support. It requires some specialized +build ordering in particular the generated handle wrappers are needed to build the interop +library, which is, currently^1^, a dependency of the LLVM wrapper library. # OBSOLECENSE NOTE -Most of the functionality of this tool for generating the interop API signatures was -gutted and removed. It fell into the "good idea at the time" category. But in reality -turned out to be a greater PITA than worth. The source code committed to the repo -now includes the C# interop code so this generator is not needed for most of that. -It **IS** still used for the following scenarios: +Most of the functionality of this tool for generating the interop API signatures was gutted +and removed. It fell into the "good idea at the time" category. But in reality turned out to +be a greater PITA than worth. The source code committed to the repo now includes the C# +interop code so this generator is not needed for most of that. It **IS** still used for the +following scenarios: * To generate the source for the handle types. - - These types are very tedious and repetitive which is the perfect use of some - form of template + - These types are very tedious and repetitive which is the perfect use of some form of + template - Global handles are mostly covered in the `GlobalHandleBase` class - - Context handles are all `readonly record` types that wrap a native integer so - have more to generate. - - In the future when .NET 10 is available as Long Term Support (LTS) these - may change to a `ref` struct so that the compiler can validate usage as - never stored. They are not currently using that as they also need to - support the `IContextHandle` interface for marshalling. - `allows ref struct` is ONLY available in .NET 9 which only has Short Term - Support (STS). + - Context handles are all `readonly record` types that wrap a native integer so have + more to generate. + - In the future when .NET 10 is available as Long Term Support (LTS) these may + change to a `ref` struct so that the compiler can validate usage as never stored. + They are not currently using that as they also need to support the + `IContextHandle` interface for marshalling. `allows ref struct` is ONLY + available in .NET 9 which only has Short Term Support (STS). ## Roslyn Source Generators - 'There be dragons there!' -Roslyn allows source generators directly in the compiler making for a feature similar -to C++ template code generation AT compile time. However, there's a couple of BIG -issue with that for this particular code base. -1) Non-deterministic ordering, or more specifically no way to declare the dependency - on outputs of one generator as the input for another. +Roslyn allows source generators directly in the compiler making for a feature similar to C++ +template code generation AT compile time. However, there's a couple of BIG issue with that +for this particular code base. +1) Non-deterministic ordering, or more specifically no way to declare the dependency on + outputs of one generator as the input for another. 2) Dependencies for project references - - The generators are not general purpose they are not published or produced as a - NUGET package. They only would be of use as a project reference. But that - creates a TON of problems for the binary runtime dependencies of source - generators, which don't flow with them as project references... - -Specifically, in this code, the built-in generator that otherwise knows noting about -the handle generation, needs to see and use the **OUTPUT** of the handle source -generation. (It's not just a run ordering problem as ALL generators see the same -input text!) + - The generators are not general purpose they are not published or produced as a NUGET + package. They only would be of use as a project reference. But that creates a TON of + problems for the binary runtime dependencies of source generators, which don't flow + with them as project references... + +Specifically, in this code, the built-in generator that otherwise knows noting about the +handle generation, needs to see and use the **OUTPUT** of the handle source generation. +(It's not just a run ordering problem as ALL generators see the same input text!) [See: [Discussion on ordering and what a generator "sees"](https://github.com/dotnet/roslyn/discussions/57912#discussioncomment-1682779) [See: [Roslyn issue #57239](https://github.com/dotnet/roslyn/issues/57239)]] -The interop code uses the LibraryImportAttribute for AOT support of ALL of the -interop APIs declared. Thus, at compile time the interop source generator **MUST** -be able to see the used, specifically, it must have access to the `NativeMarshalling` -attribute for all the handle types. Otherwise, it doesn't know how to marshal the -type and bails out. It is possible to "overcome" this with an explicit -`MarshalUsingAttribute` on every parameter or return type but that's tedious. -Tedious, typing is what source generators and templates are supposed to remove. Thus, -this library includes the generated sources as part of the repository. A developer -must run the `Generate-HandleWrappers.ps1` script whenever the native library is -updated to ensure the wrappers are up to date with the native headers **BEFORE** they -are compiled in the project. Thus, the generated source files will contain the -marshalling attributes so that the interop source generator knows how to generate -the correct code. - ->To be crystal clear - The problem is **NOT** one of generator run ordering, but on -> the ***dependency of outputs***. By design, Roslyn source generators can only see -> the original source input, never the output of another generator. Most don't, and -> never will, care. The handle generation, in this case, does. Solving that -> generically in a performant fashion is a ***HARD*** problem indeed... Not -> guaranteed impossible, but so far no-one has come up with a good answer to the -> problem. Even C++ has this issue with templates+concepts+CRTP; and that language -> has had source generating templates as a direct part of the language for several -> decades now. +The interop code uses the LibraryImportAttribute for AOT support of ALL of the interop APIs +declared. Thus, at compile time the interop source generator **MUST** be able to see the +used, specifically, it must have access to the `NativeMarshalling` attribute for all the +handle types. Otherwise, it doesn't know how to marshal the type and bails out. It is +possible to "overcome" this with an explicit `MarshalUsingAttribute` on every parameter or +return type but that's tedious. Tedious, typing is what source generators and templates are +supposed to remove. Thus, this library includes the generated sources as part of the +repository. A developer must run the `Generate-HandleWrappers.ps1` script whenever the +native library is updated to ensure the wrappers are up to date with the native headers +**BEFORE** they are compiled in the project. Thus, the generated source files will contain +the marshalling attributes so that the interop source generator knows how to generate the +correct code. + +>To be crystal clear - The problem is **NOT** one of generator run ordering, but on the +> ***dependency of outputs***. By design, Roslyn source generators can only see the original +> source input, never the output of another generator. Most don't, and never will, care. The +> handle generation, in this case, does. Solving that generically in a performant fashion is +> a ***HARD*** problem indeed... Not guaranteed impossible, but so far no-one has come up +> with a good answer to the problem. Even C++ has this issue with templates+concepts+CRTP; +> and that language has had source generating templates as a direct part of the language for +> several decades now. [See also: [Using the CRTP and C++20 Concepts to Enforce Contracts for Static Polymorphism](https://medium.com/@rogerbooth/using-the-crtp-and-c-20-concepts-to-enforce-contracts-for-static-polymorphism-a27d93111a75) ] [See also: [Rules for Roslyn source generators](https://github.com/dotnet/roslyn/blob/main/docs/features/incremental-generators.cookbook.md)] @@ -73,36 +68,35 @@ the correct code. 1) This is where the problem on non-deterministic ordering and visibility of the generated code was discovered. Obviously (now anyway!) this won't work. 2) Use a source generator in a separate assembly - 1) This solves the generator output dependency problem but introduces a new - problem of how the build infrastructure for these types manage NuGet versions. - 2) Additionally, this adds complexity of a second native dependency on the DLL - exporting the native functionality. (Should there be two copies? How does - code in each refer to the one instance?...) + 1) This solves the generator output dependency problem but introduces a new problem of + how the build infrastructure for these types manage NuGet versions. + 2) Additionally, this adds complexity of a second native dependency on the DLL exporting + the native functionality. (Should there be two copies? How does code in each refer to + the one instance?...) 3) Call the source generator from within this app to control the ordering - 1) This at least could get around the ordering/dependency problem as it would - guarantee the custom generator runs before the built-in one. - 2) However, this runs afoul of the binary dependency problem... Not 100% - insurmountable but the number of caveats on the Roslyn Source Generator side - of things grows to a significant factor. This makes it more trouble than it's - worth. + 1) This at least could get around the ordering/dependency problem as it would guarantee + the custom generator runs before the built-in one. + 2) However, this runs afoul of the binary dependency problem... Not 100% insurmountable + but the number of caveats on the Roslyn Source Generator side of things grows to a + significant factor. This makes it more trouble than it's worth. ### The final choice -Keep using this library as a generator for the handle types. This used to work, and -still does. However, this doesn't solve the problem of expressing managed code things -in a custom language (YAML) but it's at least a rather simplistic expression for the -handles. And, arguably, less complicated then all the subtleties of using a Roslyn -Source generator for this sort of one off specialized code generation. +Keep using this library as a generator for the handle types. This used to work, and still +does. However, this doesn't solve the problem of expressing managed code things in a custom +language (YAML) but it's at least a rather simplistic expression for the handles. And, +arguably, less complicated then all the subtleties of using a Roslyn Source generator for +this sort of one off specialized code generation. -This also keeps the door open to use the native AST from within the source generator -or an analyzer to perform additional checks and ensure the hand written code matches -the actual native code... (Though this would involve more direct use of the Roslyn -parser/analyzer and may be best to generate an input to a proper analyzer) +This also keeps the door open to use the native AST from within the source generator or an +analyzer to perform additional checks and ensure the hand written code matches the actual +native code... (Though this would involve more direct use of the Roslyn parser/analyzer and +may be best to generate an input to a proper analyzer) ## Projects ### LlvmBindingsGenerator -This is the P/Invoke generator for the handle wrappers in Ubiquity.NET.Llvm.Interop. -It uses CppSharp to parse the C or C++ headers and generate wrappers for all the -handles in the native library (Global, Context, and global aliased). +This is the P/Invoke generator for the handle wrappers in Ubiquity.NET.Llvm.Interop. It uses +`CppSharp` to parse the C or C++ headers and generate wrappers for all the handles in the +native library (Global, Context, and global aliased). #### Usage `LlvmBindingsGenerator -l -e [-o ]` @@ -114,53 +108,53 @@ handles in the native library (Global, Context, and global aliased). | InputPath | Root directory containing the "GlobalHandles.cs" and "ContextHandles.cs" files to generate the backing implementation for | OutputPath | This is the root of the location to generate the output into, normally this is the "GeneratedCode" sub folder of the Ubiquity.NET.Llvm.Interop project so the files are generated into the project | -This tool is generally only required once per Major LLVM release. (Though a Minor -release that adds new APIs would also warrant a new run) However, to ensure the code -generation tool itself isn't altered with a breaking change, the PowerShell script -takes care of running the generator to update the Ubiquity.NET.Llvm.Interop code -base on each run, even if nothing changes in the end. This is run on every automated -build before building the Ubiquity.NET.Llvm.Interop project so that the generator -is tested on every full automated build. +This tool is generally only required once per Major LLVM release. (Though a Minor release +that adds new APIs would also warrant a new run) However, to ensure the code generation tool +itself isn't altered with a breaking change, the PowerShell script takes care of running the +generator to update the Ubiquity.NET.Llvm.Interop code base on each run, even if nothing +changes in the end. This is run on every automated build before building the +`Ubiquity.NET.Llvm.Interop` project so that the generator is tested on every full automated +build. ### Ubiquity.NET.Llvm.Interop -This is the .NET P/Invoke layer that provides the raw API projection to .NET. The, -majority of the code is simply P/Invokes to the native library. There are a few -additional support classes that are consistent across variations in LLVM. While this -library has a runtime dependency on the native LibLLVM binary there is no compile -time dependency. +This is the .NET P/Invoke layer that provides the raw API projection to .NET. The, majority +of the code is simply P/Invokes to the native library. There are a few additional support +classes that are consistent across variations in LLVM. While this library has a runtime +dependency on the native LibLLVM binary there is no compile time dependency. ## Building the Interop libraries ### General requirements -There are some general steps that are required to successfully build the interop -NuGet package and a couple of different ways to go about completing them. - 1. Build LlvmBindingsGenerator - 2. Run LlvmBindingsGenerator to parse the llvm headers and the extended headers - from the native LibLLVM. - 1. This generates the C# interop code AND the linker DEF file used by the native - library and therefore needs to run before the other projects are built. - Generating the exports file ensures that it is always accurate and any - functions declared in the headers are exported so that the linker generate - an error for any missing implementation(s). - 3. Build Ubiquity.NET.Llvm.Interop to create the interop assembly and, ultimately - create the final NuGet package with the native and manged code bundled together. - -Steps 1-2 are only needed once per build of the native libraries and are wrapped -into the `Generate-WrappedHandles.ps1` for developer convenience. (It is otherwise -very difficult to specify how the generator must build, and then run the generator -***Before*** building the interop. Since it only needs to be done on an update to -the native libraries, this is simplified by making it a developer local action. +There are some general steps that are required to successfully build the interop NuGet +package and a couple of different ways to go about completing them. +1. Build LlvmBindingsGenerator + 2. Run LlvmBindingsGenerator to parse the llvm headers and the extended headers from the + native LibLLVM. + 1. This generates the C# interop code AND the linker DEF file used by the native library + and therefore needs to run before the other projects are built. + 1. Generating the exports file ensures that it is always accurate and any functions + declared in the headers are exported. This also ensures that the linker generates + an error for any missing implementation(s). + 3. Build Ubiquity.NET.Llvm.Interop to create the interop assembly and, ultimately create + the final NuGet package with the native and manged code bundled together. + +Steps 1-2 are only needed once per build of the native libraries and are wrapped into the +`Generate-WrappedHandles.ps1` for developer convenience. (It is otherwise very difficult to +specify how the generator must build, and then run the generator ***Before*** building the +interop. Since it only needs to be done on an update to the native libraries, this is +simplified by making it a developer local action. (This also resolves the problem of `x64` +only target support) >[!NOTE] -> Keeping generation as a local developer action resolves the problem of binary -> dependencies of the generator itself. The generator depends on `CppSharp` which -> depends on a native `libclang` and ONLY supports X64 architectures. Thus, the -> generator itself cannot run on any target that is not X64. +> Keeping generation as a local developer action resolves the problem of binary dependencies +> of the generator itself. The generator depends on `CppSharp` which depends on a native +> `libclang` and ONLY supports X64 architectures. Thus, the generator itself cannot run on +> any target that is not X64. --- -^1^ There is some consideration/thinking to eliminate the interop library entirely -and move all of it's functionality to the main LLVM.NET assembly itself. Therefore, -no production app or library should release with that as a dependency (except -transitively from the wrapper) as it may not exist in the future +^1^ There is some consideration/thinking to eliminate the interop library entirely and move +all of it's functionality to the main LLVM.NET assembly itself. Therefore, no production app +or library should release with that as a dependency (except transitively from the wrapper) +as it may not exist in the future :warning: You have been warned! :warning: diff --git a/src/Samples/CodeGenWithDebugInfo/codegeneration.md b/src/Samples/CodeGenWithDebugInfo/codegeneration.md index 22c3ebb14..4fbf3892b 100644 --- a/src/Samples/CodeGenWithDebugInfo/codegeneration.md +++ b/src/Samples/CodeGenWithDebugInfo/codegeneration.md @@ -3,33 +3,33 @@ uid: code-generation-with-debug-info --- # CodeGenWithDebugInfo -Sample application to generate target machine code. The sample is -provided in the [source tree](https://github.com/UbiquityDotNET/Llvm.NET/tree/master/Samples/CodeGenWithDebugInfo). +Sample application to generate target machine code. The sample is provided in the +[source tree](https://github.com/UbiquityDotNET/Llvm.NET/tree/master/Samples/CodeGenWithDebugInfo). -This sample generates LLVM IR equivalent to what Clang will generate for a sample C -file. While it doesn't parse the C File, this sample does show all the steps and -techniques for using Ubiquity.NET.Llvm to generate the LLVM IR with debug information -and, ultimately, the target machine code. +This sample generates LLVM IR equivalent to what Clang will generate for a sample C file. +While it doesn't parse the C File, this sample does show all the steps and techniques for +using Ubiquity.NET.Llvm to generate the LLVM IR with debug information and, ultimately, the +target machine code. ## Example C Code -The CodeGenWithDebugInfo sample will generate LLVM IR and machine code for the -following sample "C" code. +The CodeGenWithDebugInfo sample will generate LLVM IR and machine code for the following +sample "C" code. >[!NOTE] ->The C code file is provided in the source tree along with a script file to compile -> it for comparing output with Clang. The current implementation was last compared -> with Clang 5 RC4 - any differences to the latest version of clang are expected to -> be minor. Updating the sample to replicate the latest Clang version is left as an -> exercise for the reader :grin: +> The C code file is provided in the source tree along with a script file to compile it for +> comparing output with Clang. The current implementation was last compared with Clang 5 +> RC4 - any differences to the latest version of clang are expected to be minor. Updating +> the sample to replicate the latest Clang version is left as an exercise for the reader +> :grin: [!code-c[Main](Support Files/test.c)] This sample supports targeting two different processor types x64 and ARM Cortex-M3 ## Initializing Ubiquity.NET.Llvm -The underlying LLVM library requires initialization for it's internal data, -furthermore Ubiquity.NET.Llvm must load the actual underlying DLL specific to the -current system architecture. Thus, the library as a whole requires initialization. +The underlying LLVM library requires initialization for it's internal data, furthermore +`Ubiquity.NET.Llvm` must load the actual underlying DLL specific to the current system +architecture. Thus, the library as a whole requires initialization. ```C# using static Ubiquity.NET.Llvm.StaticState; @@ -41,59 +41,61 @@ using( InitializeLLVM() ) ``` The initialization returns an IDisposable so that the calling application can -shutdown/cleanup resources and potentially re-initialize for a different target if -desired. This application only needs to generate one module and exit so it just -applies a standard C# `using` scope to ensure proper cleanup. +shutdown/cleanup resources and potentially re-initialize for a different target if desired. +This application only needs to generate one module and exit so it just applies a standard +C# `using` scope to ensure proper cleanup. ### Target ABI specific details -In order to isolate the ABI specific details of the target architecture the -application uses an interface that contains properties and methods to handle target -specific support. Furthermore, an application may not need to use all of the possible -target architectures so the application selects to register/initialize support for -specific targets. This reduces startup time and resource commitments to only what is -required by the application. In this sample that is handled in the constructor of the -target dependent details. Most compiler type applications would allow command line -options for the CPU target variants and feature sets. For this sample those are just -hard coded into the target details class to keep things simple and focused on the -rest of the code generation. +In order to isolate the ABI specific details of the target architecture the application uses +an interface that contains properties and methods to handle target specific support. +Furthermore, an application may not need to use all of the possible target architectures so +the application selects to register/initialize support for specific targets. This reduces +startup time and resource commitments to only what is required by the application. In this +sample that is handled in the constructor of the target dependent details. Most compiler +type applications would allow command line options for the CPU target variants and feature +sets. For this sample those are just hard coded into the target details class to keep things +simple and focused on the rest of the code generation. [!code-csharp[Main](ITargetABI.cs#ITargetABI)] -This interface isolates the rest of the code from knowing which architecture is used, -and theoretically could include support for additional targets beyond the two in the -sample source. +This interface isolates the rest of the code from knowing which architecture is used, and +theoretically could include support for additional targets beyond the two in the sample +source. -The sample determines which target to use based on the second command line argument -to the application. +The sample determines which target to use based on the second command line argument to the +application. [!code-csharp[Main](Program.cs#TargetABISelection)] ## Creating the Module -To generate code in Ubiquity.NET.Llvm a [Module](xref:Ubiquity.NET.Llvm.Module) is -required as a container for the LLVM IR. To create a module a -[Context](xref:Ubiquity.NET.Llvm.Context) is required. +To generate code in Ubiquity.NET.Llvm a [Module](xref:Ubiquity.NET.Llvm.Module) is required +as a container for the LLVM IR. To create a module a [Context](xref:Ubiquity.NET.Llvm.Context) +is required. >[!NOTE] -> The Context and Module are Disposable types in Ubiquity.NET.Llvm to manage some -> complex and hidden ownership transfers that can happen with the different forms -> of JIT/Execution engines. This may not always be true in future versions of the -> library, but for now they must be disposable. +> The `Context` and `Module` are disposable types in `Ubiquity.NET.Llvm` to manage some +> complex and hidden ownership transfers that can happen with the different forms of +> JIT/Execution engines. This may not always be true in future versions of the library, but +> for now they must be disposable. >[!CAUTION] -> A Context is, by design, **NOT** a thread safe type. It is designed to contain -> various interned objects in LLVM. All modules are owned by exactly one Context. -> Applications can create any number of threads and create a context for each one. -> However, threads must not reference the context of another thread nor reference -> any of the objects created within another thread's context. This is a fundamental -> design of LLVM and reduces the complexity of attempting to manage collections of -> objects and interning them in a thread safe manner. Applications instead just -> create a context per thread if needed. - -To generate code for a particular target the application initializes the module to -include the source file name that it was generated from, the -[Triple](xref:Ubiquity.NET.Llvm.Triple) that describes the target and a target -specific [DataLayout](xref:Ubiquity.NET.Llvm.DataLayout). The sample application -extracts these from the [TargetMachine](xref:Ubiquity.NET.Llvm.TargetMachine) +> A `Context` is, by design, **NOT** a thread safe type. It is designed to contain various +> interned objects in LLVM. All modules are owned by exactly one Context. Applications can +> create any number of threads and create a context for each one. However, threads must not +> reference the context of another thread nor reference any of the objects created within +> another thread's context. This is a fundamental design of LLVM and reduces the complexity +> of attempting to manage collections of objects and interning them in a thread safe manner. +> Applications instead just create a context per thread if needed. This constraint does not +> prevent use with asyncornous operations as long as such operations NEVER touch the context +> from multiple threads. That is, each leg of an asynchonous operation may use the same +> `Context` but no other threads or operations may use the same one. This ensoures that the +> `Context` is only used by one thread at a time even if it is passed arround between +> threads. + +To generate code for a particular target the application initializes the module to include +the source file name that it was generated from, the [Triple](xref:Ubiquity.NET.Llvm.Triple) +that describes the target and a target specific [DataLayout](xref:Ubiquity.NET.Llvm.DataLayout). +The sample application extracts these from the [TargetMachine](xref:Ubiquity.NET.Llvm.TargetMachine) provided by the target details interface for the selected target. [!code-csharp[Main](Program.cs#CreatingModule)] @@ -101,32 +103,31 @@ provided by the target details interface for the selected target. ## Creating the DICompileUnit LLVM Debug information is all scoped to a top level [DICompileUnit](xref:Ubiquity.NET.Llvm.DebugInfo.DICompileUnit). There is exactly one -DICompileUnit for a [DIBuilder](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder) and all -debug information metadata is ultimately a child of that unit. The sample creates -the compilation unit just after the module is created and the target specific -information is added to it. In this sample there is a direct 1:1 correlation between -the compile unit and the source file so it creates a -[DIFile](xref:Ubiquity.NET.Llvm.DebugInfo.DIFile) for the source at the same time. -The sample code creates the DICompileUnit when creating the bit code module. This is -the normal pattern for creating the compile unit when generating debugging -information. Though it is possible to create it independently but there usually -isn't any real benefit to doing so. +`DICompileUnit` for a [DIBuilder](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder) and all debug +information metadata is ultimately a child of that unit. The sample creates the compilation +unit just after the module is created and the target specific information is added to it. In +this sample there is a direct 1:1 correlation between the compile unit and the source file +so it creates a [DIFile](xref:Ubiquity.NET.Llvm.DebugInfo.DIFile) for the source at the same +time. + +The sample code creates the `DICompileUnit` when creating the bit code module. This is the +normal pattern for creating the compile unit when generating debugging information. Though +it is possible to create it independently but there usually isn't any real benefit to doing +so. ## Creating basic types with debug information -In LLVM types are fairly minimalistic and only contain the basic structural -information for generating the final machine code. Debug information, as metadata in -LLVM, provides all the source level debugging information. In LLVM this requires -creating and tracking both the native type and the Debug information metadata as -independent object instances. In Ubiquity.NET.Llvm this is handled by a unified debug -and type information system. That is, in Ubiquity.NET.Llvm a single class is used to -represent types and it acts as a binder between the full debugging description of -the type and the native LLVM minimal description. These types all implement a common -interface [ITypeRef](xref:Ubiquity.NET.Llvm.Types.ITypeRef). This interface is used -throughout Ubiquity.NET.Llvm to expose types in a consistent fashion. -`Ubiquity.NET.Llvm` provides a set of classes for building the bound types. -This sample uses the [DebugBasicType](xref:Ubiquity.NET.Llvm.DebugInfo.DebugBasicType). -To define the basic types used in the generated code with appropriate debug -information. +In LLVM types are fairly minimalistic and only contain the basic structural information for +generating the final machine code. Debug information, as metadata in LLVM, provides all the +source level debugging information. In LLVM this requires creating and tracking both the +native type and the Debug information metadata as independent object instances. In +`Ubiquity.NET.Llvm` this is handled by a unified debug and type information system. That is, +in Ubiquity.NET.Llvm a single class is used to represent types and it acts as a binder +between the full debugging description of the type and the native LLVM minimal description. +These types all implement a common interface [ITypeRef](xref:Ubiquity.NET.Llvm.Types.ITypeRef). +This interface is used throughout Ubiquity.NET.Llvm to expose types in a consistent fashion. +`Ubiquity.NET.Llvm` provides a set of classes for building the bound types. This sample uses +the [DebugBasicType](xref:Ubiquity.NET.Llvm.DebugInfo.DebugBasicType). To define the basic +types used in the generated code with appropriate debug information. [!code-csharp[Main](Program.cs#CreatingBasicTypesWithDebugInfo)] @@ -140,35 +141,33 @@ This constructs several basic types and assigns them to variables: | i32Array_0_32 | array i32[0..31] | n/a ## Creating qualified types -Creating qualified (const, volatile, etc...) and pointers is just as easy as -creating the basic types. The sample needs a pointer to a const instance of the -struct foo. A qualified type for constant foo is created first, then a pointer type -is created for the const type. +Creating qualified (const, volatile, etc...) and pointers is just as easy as creating the +basic types. The sample needs a pointer to a const instance of the struct foo. A qualified +type for constant foo is created first, then a pointer type is created for the const type. [!code-csharp[Main](Program.cs#CreatingQualifiedTypes)] ## Creating structure types -As previously mentioned, the LLVM types only contain basic layout information and -not full source level debugging information. Thus, for types there are two distinct -descriptions, one for the LLVM native type and another for the debugging information. -As with basic types, Ubiquity.NET.Llvm has support for defining complete information -for composite structure types. This is done using a collection of -[DebugMemberInfo](xref:Ubiquity.NET.Llvm.DebugInfo.DebugMemberInfo). DebugMemberInfo -fully describes an element of a composite type including the native LLVM type as well -as all the Debugging information metadata. A collection of these is then used to -create the final composite type with full debug data in a simple single call. The -sample only needs to create one such type for the `struct foo` in the example source -code. +As previously mentioned, the LLVM types only contain basic layout information and not full +source level debugging information. Thus, for types there are two distinct descriptions, +one for the LLVM native type and another for the debugging information. As with basic types, +Ubiquity.NET.Llvm has support for defining complete information for composite structure +types. This is done using a collection of +[DebugMemberInfo](xref:Ubiquity.NET.Llvm.DebugInfo.DebugMemberInfo). `DebugMemberInfo` fully +describes an element of a composite type including the native LLVM type as well as all the +Debugging information records. A collection of these is then used to create the final +composite type with full debug data in a simple single call. The sample only needs to create +one such type for the `struct foo` in the example source code. [!code-csharp[Main](Program.cs#CreatingStructureTypes)] ## Creating module metadata and global variables -The sample code contains two global instances of `struct foo` `bar` and `baz`. -Furthermore, bar is initialized with constant data. The sample starts by constructing -the const array data that forms the initialized value of `bar.c`, the source only -provides const values for the first two entries of a 32 element array. The const data -is created via [ConstArray](xref:Ubiquity.NET.Llvm.Values.ConstantArray). The full -initialized const data for bar is the created from +The sample code contains two global instances of `struct foo` `bar` and `baz`. Furthermore, +bar is initialized with constant data. The sample starts by constructing the const array +data that forms the initialized value of `bar.c`, the source only provides const values for +the first two entries of a 32 element array. The const data is created via +[ConstArray](xref:Ubiquity.NET.Llvm.Values.ConstantArray). The full initialized const data +for bar is the created from [Context.CreateNamedConstantStruct](xref:Ubiquity.NET.Llvm.Context.CreateNamedConstantStruct*) [!code-csharp[Main](Program.cs#CreatingGlobalsAndMetadata)] @@ -176,36 +175,36 @@ initialized const data for bar is the created from [!code-csharp[Main](Program.cs#AddModuleFlags)] Once the constant data is available an LLVM global is created for it with a name that -matches the source name via [AddGlobal](xref:Ubiquity.NET.Llvm.Module.AddGlobal*). To -ensure the linker lays out the structure correctly the code uses the layout -information for the module to get the ABI required alignment for the global and sets -the [Alignment](xref:Ubiquity.NET.Llvm.Values.GlobalObject.Alignment) property for -the global. Finally the debug information for the global is created as a +matches the source name via [AddGlobal](xref:Ubiquity.NET.Llvm.Module.AddGlobal*). To ensure +the linker lays out the structure correctly the code uses the layout information for the +module to get the ABI required alignment for the global and sets the +[Alignment](xref:Ubiquity.NET.Llvm.Values.GlobalObject.Alignment) property for the global. +Finally the debug information for the global is created as a [DIGlobalVariableExpression](xref:Ubiquity.NET.Llvm.DebugInfo.DIGlobalVariableExpression) using [CreateGlobalVariableExpression](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder.CreateGlobalVariableExpression*) finally the added to the variable to complete the creation. -For the `baz` instance the process is almost identical. The major difference is that -the value of the structure is initialized to all zeros. That is the initialized data -for the structure is created with [NullValueFor](xref:Ubiquity.NET.Llvm.Values.Constant.NullValueFor*), +For the `baz` instance the process is almost identical. The major difference is that the +value of the structure is initialized to all zeros. That is the initialized data for the +structure is created with [NullValueFor](xref:Ubiquity.NET.Llvm.Values.Constant.NullValueFor*), which creates an all zero value of a type. [!code-csharp[Main](Program.cs#CreatingGlobalsAndMetadata)] -LLVM modules may contain additional module flags as metadata that describe how the -module is generated or how the code generation/linker should treat the code. In this -sample the dwarf version and debug metadata versions are set along with a -VersionIdentString that identifies the application that generated the module. -Additionally, any target specific metadata is added to the module. The ordering of -these is generally not relevant, however it is very specific in the sample to help -ensure the generated IR is as close to the Clang version as possible making it -possible to run llvm-dis to generate the textual IR files and compare them. +LLVM modules may contain additional module flags as metadata that describe how the module is +generated or how the code generation/linker should treat the code. In this sample the dwarf +version and debug metadata versions are set along with a VersionIdentString that identifies +the application that generated the module. Additionally, any target specific metadata is +added to the module. The ordering of these is generally not relevant, however it is very +specific in the sample to help ensure the generated IR is as close to the Clang version as +possible making it possible to run llvm-dis to generate the textual IR files and compare +them. [!code-csharp[CreatingGlobalsAndMetadata](Program.cs#AddModuleFlags)] ## Declaring the functions -The function declarations for both of the two function's is mostly the same, -following a common pattern: +The function declarations for both of the two function's is mostly the same, following a +common pattern: 1. Create the signature with debug information 1. Create the function declaration referencing the signature 1. Add attributes appropriate for the function @@ -215,56 +214,53 @@ is visible only locally. This is indicated by the [Linkage.Internal](xref:Ubiquity.NET.Llvm.Values.Linkage.Internal) linkage value. >[!NOTE] -> The use of fluent style extension methods in the Ubiquity.NET.Llvm API helps make it -> easy to add to or modify the attributes and linkage etc... - -DeclareCopyFunc() is a bit special in that it handles some target specific support in -a generalized way. In particular the calling convention for the struct to use the -`byval` form to pass the structure as a pointer but that the callee gets a copy of -the original. This, is used for some large structures and allows the target machine -generation room to use alternate means of transferring the data. (Stack or possibly -otherwise unused registers). For the two processors this sample supports Clang only -uses this for the Cortex-M3 so the code calls the -`TargetDetails.AddABIAttributesForByValueStructure`) to add the appropriate attributes -for the target as needed. +> The use of fluent style extension methods in the Ubiquity.NET.Llvm API helps make it easy +> to add to or modify the attributes and linkage etc... + +`DeclareCopyFunc()` is a bit special in that it handles some target specific support in a +generalized way. In particular the calling convention for the struct to use the `byval` form +to pass the structure as a pointer but that the callee gets a copy of the original. This, is +used for some large structures and allows the target machine generation room to use +alternate means of transferring the data. (Stack or possibly otherwise unused registers). +For the two processors this sample supports Clang only uses this for the Cortex-M3 so the +code calls the `TargetDetails.AddABIAttributesForByValueStructure`) to add the appropriate +attributes for the target as needed. [!code-csharp[FunctionDeclarations](Program.cs#FunctionDeclarations)] ## Generating function bodies -This is where things really get interesting as this is where the actual code is -generated for the functions. Up to this point everything has created metadata or -prototypes and signatures. The code generation generally follows a pattern that -starts with creation of an entry block to initialize the parameters and then -additional blocks for the actual code. While LLVM IR uses an SSA form with virtual -registers, code generation, usually doesn't need to worry about that so long as it -follows some basic rules, in particular, all of the locals are allocated a slot on -the stack via alloca along with any parameters. The parameters are initialized from -the signature values. All of which is done in the entry block. LLVM has a pass -(mem2reg) that will lower this into SSA form with virtual registers so that each +This is where things really get interesting as this is where the actual code is generated +for the functions. Up to this point everything has created metadata or prototypes and +signatures. The code generation generally follows a pattern that starts with creation of an +entry block to initialize the parameters and then additional blocks for the actual code. +While LLVM IR uses an SSA form with virtual registers, code generation, usually doesn't need +to worry about that so long as it follows some basic rules, in particular, all of the locals +are allocated a slot on the stack via alloca along with any parameters. The parameters are +initialized from the signature values. All of which is done in the entry block. LLVM has a +pass (mem2reg) that will lower this into SSA form with virtual registers so that each generating application doesn't have to worry about conversion into SSA form. -After the parameters are handled in the entry block, the rest of the function is -generated based on the source language or application defined behavior. In this case -the sample generates IR equivalent to the functions defined in the sample test.c -file. There are a few points to make about the function generation in the sample. +After the parameters are handled in the entry block, the rest of the function is generated +based on the source language or application defined behavior. In this case the sample +generates IR equivalent to the functions defined in the sample test.c file. There are a few +points to make about the function generation in the sample. ### Generating Argument and Local variables -As discussed the arguments and locals are allocated in the entry block however that -only makes them usable in the function and ready for the mem2reg pass. In particular -there is no debug information attached to the variables. To provide debug -information LLVM provides an intrinsic function that is used to declare the debug -information for a variable. In Ubiquity.NET.Llvm this is emitted using the -[InsertDeclare](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder.InsertDeclare*) method. +As discussed the arguments and locals are allocated in the entry block however that only +makes them usable in the function and ready for the mem2reg pass. In particular there is no +debug information attached to the variables. To provide debug information LLVM provides an +intrinsic function that is used to declare the debug information for a variable. In +`Ubiquity.NET.Llvm` this is emitted using the [InsertDeclare](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder.InsertDeclare*) +method. ### Calling LLVM Intrinsics -The generated code needs to copy some data, rather than directly doing a copy in a -loop, the code uses the LLVM intrinsic memcopy function. This function is lowered to -an optimized copy for the target so that applications need not worry about building -optimal versions of IR for this common functionality. Furthermore, the LLVM intrinsic -supports a variety of signatures for various data types all of which are hidden in -the Ubiquity.NET.Llvm method. Rather than require callers to create a declaration of -the correct signature the Ubiquity.NET.Llvm wrapper automatically figures out the -correct signature from the parameters provided. +The generated code needs to copy some data, rather than directly doing a copy in a loop, the +code uses the LLVM intrinsic memcopy function. This function is lowered to an optimized copy +for the target so that applications need not worry about building optimal versions of IR for +this common functionality. Furthermore, the LLVM intrinsic supports a variety of signatures +for various data types all of which are hidden in the `Ubiquity.NET.Llvm` method. Rather +than require callers to create a declaration of the correct signature the `Ubiquity.NET.Llvm` +wrapper automatically figures out the correct signature from the parameters provided. ## Final LLVM IR ```llvm diff --git a/src/Samples/Kaleidoscope/Chapter2/Kaleidoscope-ch2.md b/src/Samples/Kaleidoscope/Chapter2/Kaleidoscope-ch2.md index 5941f8899..9172e0565 100644 --- a/src/Samples/Kaleidoscope/Chapter2/Kaleidoscope-ch2.md +++ b/src/Samples/Kaleidoscope/Chapter2/Kaleidoscope-ch2.md @@ -3,21 +3,20 @@ uid: Kaleidoscope-ch2 --- # 2. Kaleidoscope: Implementing the parser -The chapter 2 sample doesn't actually generate any code. Instead it focuses on the -general structure of the samples and parsing of the language. The sample for this -chapter enables all language features to allow exploring the language and how it is -parsed to help better understand the rest of the chapters better. It is hoped that -users of this library find this helpful. - -The Ubiquity.NET.Llvm version of Kaleidoscope leverages ANTLR4 to parse the language -into a parse tree. The parse tree is visited to transform it into a an Abstract -Syntax Tree (AST). This has several advantages including logical isolation of the -parsing and code generation. Additionally, it provides a single formal definition -of the grammar for the language. Understanding the language grammar from reading -the LVM tutorials and source was a difficult task since it isn't formally defined -in one place. (There are some EBNF like comments in the official LLVM tutorial -code but it is spread around without much real discussion of the language the -tutorials guide you to implement) +The chapter 2 sample doesn't actually generate any code. Instead it focuses on the general +structure of the samples and parsing of the language. The sample for this chapter enables +all language features to allow exploring the language and how it is parsed to help better +understand the rest of the chapters better. It is hoped that users of this library find this +helpful. + +The Ubiquity.NET.Llvm version of Kaleidoscope leverages ANTLR4 to parse the language into a +parse tree. The parse tree is visited to transform it into a an Abstract Syntax Tree (AST). +This has several advantages including logical isolation of the parsing and code generation. +Additionally, it provides a single formal definition of the grammar for the language. +Understanding the language grammar from reading the LVM tutorials and source was a difficult +task since it isn't formally defined in one place. (There are some EBNF like comments in the +official LLVM tutorial code but it is spread around without much real discussion of the +language the tutorials guide you to implement) ## Formal Grammar The following sections cover the formal grammar of the Kaleidoscope language. @@ -87,17 +86,18 @@ Identifier: [a-zA-Z][a-zA-Z0-9]*; Number: Digits_ ('.' DecimalDigit_+)?; ``` -This includes basic numeric patterns as well as Identifiers and the symbols allowed for operators -and keywords for the language. Subsequent chapters will introduce the meaning and use of each of -these. +This includes basic numeric patterns as well as Identifiers and the symbols allowed for +operators and keywords for the language. Subsequent chapters will introduce the meaning and +use of each of these. #### Language Feature Defined Keywords -Chapters 5-7 each introduce new language features that introduce new keywords into the language. -In order to maintain a single grammar for all chapters the lexer uses a technique of ANTLR4 called -[Semantic Predicates](https://github.com/antlr/antlr4/blob/master/doc/predicates.md). -These are basically boolean expressions that determine if a given rule should be applied while -parsing the input language. These are applied to the rules for the feature specific keywords. Thus, -at runtime, if a given feature is disabled then the keyword is not recognized. +Chapters 5-7 each introduce new language features that introduce new keywords into the +language. In order to maintain a single grammar for all chapters the lexer uses a technique +of ANTLR4 called [Semantic Predicates](https://github.com/antlr/antlr4/blob/master/doc/predicates.md). +These are basically boolean expressions that determine if a given rule should be applied +while parsing the input language. These are applied to the rules for the feature specific +keywords. Thus, at runtime, if a given feature is disabled then the keyword is not +recognized. ```antlr IF: {FeatureControlFlow}? 'if'; @@ -111,39 +111,45 @@ BINARY: {FeatureUserOperators}? 'binary'; ``` >[!NOTE] -> There are some important distinctions in the Ubiquity.NET.Llvm implementation of Kaleidoscope, with regard to the -> symbols allowed for user defined operators. The official LLVM version allows defining an operator '=', -> (in chapter 6). However, in Chapter 7, when Mutable variables are introduced the '=' is reserved by the -> language for assignment. Thus, any code written for chapter 6 with a user defined '=' operator would not -> work in later versions. Thus, the Ubiquity.NET.Llvm version reserves the '=' in all versions, but uses the '==' -> operator for equality comparisons. (It also adds the '++' and '--' tokens as user operators [The official -> LLVM implementation only allows a single character as the operator lexeme]) +> There are some important distinctions in the Ubiquity.NET.Llvm implementation of +> Kaleidoscope, with regard to the symbols allowed for user defined operators. The official +> LLVM version allows defining an operator '=', (in chapter 6). However, in Chapter 7, when +> Mutable variables are introduced the '=' is reserved by the language for assignment. Thus, +> any code written for chapter 6 with a user defined '=' operator would not work in later +> versions. Thus, the Ubiquity.NET.Llvm version reserves the '=' in all versions, but uses +> the '==' operator for equality comparisons. (It also adds the '++' and '--' tokens as user +> operators [The official LLVM implementation only allows a single character as the operator +> lexeme]) > -> Additionally the Ubiquity.NET.Llvm implementation adds the built-in '^' operator for exponentiation. +> Additionally the Ubiquity.NET.Llvm implementation adds the built-in '^' operator for +> exponentiation. ### Parser -The parser, like the lexer, uses Semantic Predicates, which allows for dynamic adaptation of the grammar -and parser to handle variations or versions of the language. The Sample code uses the predicates to -selectively enable language features as the chapters progress, without needing to change the grammar or -generated parser code. The parser code provides a simple means of expressing the language support level. -Semantic predicates play a vital role in supporting user defined operators with user defined precedence. +The parser, like the lexer, uses Semantic Predicates, which allows for dynamic adaptation of +the grammar and parser to handle variations or versions of the language. The Sample code +uses the predicates to selectively enable language features as the chapters progress, +without needing to change the grammar or generated parser code. The parser code provides a +simple means of expressing the language support level. Semantic predicates play a vital role +in supporting user defined operators with user defined precedence. #### Parser grammar -A full tutorial on ANTLR is beyond the scope of this article but the basics should be familiar enough to -anyone acquainted with EBNF form to make enough sense out of it. Don't worry too much about the details at -this point as subsequent chapters will cover salient points as new features are enabled. +A full tutorial on ANTLR is beyond the scope of this article but the basics should be +familiar enough to anyone acquainted with EBNF form to make enough sense out of it. Don't +worry too much about the details at this point as subsequent chapters will cover salient +points as new features are enabled. ##### Operators -In order to support the parser detecting attempts to overload built-in operators and to handle the fact -that some operators don't make any sense as unary operators (e.g. you can't create a user defined unary -'=' operator. Technically, you could implement that but it would make for some confusing code. If you really -like hard to read and comprehend code there are [other languages](https://en.wikipedia.org/wiki/Brainfuck) -better suited to that end 8^) ) - -To manage detection of appropriate operator tokens the grammar uses a set of parser rules that group the -operator tokens by their allowed kinds. This allows subsequent rules to simply refer to the kind of -operator expected and not worry about the actual tokens involved. It also allows the parser to detect -syntax and usage errors like trying to create a user defined '+' operator. +In order to support the parser detecting attempts to overload built-in operators and to +handle the fact that some operators don't make any sense as unary operators (e.g. you can't +create a user defined unary '=' operator. Technically, you could implement that but it would +make for some confusing code. If you really like hard to read and comprehend code there are +[other languages](https://en.wikipedia.org/wiki/Brainfuck) better suited to that end 8^) ) + +To manage detection of appropriate operator tokens the grammar uses a set of parser rules +that group the operator tokens by their allowed kinds. This allows subsequent rules to +simply refer to the kind of operator expected and not worry about the actual tokens +involved. It also allows the parser to detect syntax and usage errors like trying to create +a user defined '+' operator. ``` antlr // built-in operator symbols @@ -229,9 +235,9 @@ binaryop ``` ### Initializers -The Initializers rule provides a way to handle a common sequence in the language in multiple different -contexts (sort of like a function in most programming languages, in fact, ANTLR rules are implemented -in the generated parser as methods). +The Initializers rule provides a way to handle a common sequence in the language in multiple +different contexts (sort of like a function in most programming languages, in fact, ANTLR +rules are implemented in the generated parser as methods). ``` antlr // pull the initializer out to a distinct rule so it is easier to get at @@ -243,9 +249,9 @@ initializer ``` ### Primary Expressions (Atoms) -There are a number of primary expressions (also known as 'Atoms') that are not left recursive in their -definition. These are split out to a distinct rule to aid in the support of left recursion and the need -for user defined operator precedence. +There are a number of primary expressions (also known as 'Atoms') that are not left +recursive in their definition. These are split out to a distinct rule to aid in the support +of left recursion and the need for user defined operator precedence. ``` antlr // Non Left recursive expressions (a.k.a. atoms) @@ -267,9 +273,9 @@ Let's look at each of these in turn to get a better understanding of the languag ``` antlr LPAREN expression[0] RPAREN ``` -This is a simple rule for sub-expressions within parenthesis for example: `(1+2)/3` the parenthesis groups -the addition so that it occurs before the division since, normally the precedence of division is higher. -The parse tree for that expression looks like this: +This is a simple rule for sub-expressions within parenthesis for example: `(1+2)/3` the +parenthesis groups the addition so that it occurs before the division since, normally the +precedence of division is higher. The parse tree for that expression looks like this: ![Parse Tree](./parsetree-paren-expr.svg) @@ -277,8 +283,8 @@ The parse tree for that expression looks like this: ``` antlr Identifier LPAREN (expression[0] (COMMA expression[0])*)? RPAREN ``` -This rule covers a function call which can have 0 or more comma delimited arguments. The parse tree -for the call `foo(1, 2, 3);` is: +This rule covers a function call which can have 0 or more comma delimited arguments. The +parse tree for the call `foo(1, 2, 3);` is: ![Parse Tree](./parsetree-func-call.svg) @@ -286,22 +292,23 @@ for the call `foo(1, 2, 3);` is: ``` antlr VAR initializer (COMMA initializer)* IN expression[0] ``` -The VarInExpression rule provides variable declaration, with optional initialization. The scope of the -variables is that of the expression on the right of the `in` keyword. The `var ... in ...` expression is -in many ways like a declaration of an inline function. The variables declared are scoped to the internal -implementation of the function. Once the function produces the return value the variables no longer exist. +The VarInExpression rule provides variable declaration, with optional initialization. The +scope of the variables is that of the expression on the right of the `in` keyword. The +`var ... in ...` expression is in many ways like a declaration of an inline function. The +variables declared are scoped to the internal implementation of the function. Once the +function produces the return value the variables no longer exist. ### ConditionalExpression ``` antlr IF expression[0] THEN expression[0] ELSE expression[0] ``` -Conditional expressions use the very common and familiar if-then-else syntax and semantics with one -notable unique quality. In Kaleidoscope ***every language construct is an expression***, there are -no statements. Expressions all produce a value. So the result of the conditional -expression is the result of the sub-expression selected based on the condition. The -condition value is computed and if the result == 0.0 (false) the `else` expression -is used to produce the final result. Otherwise, the `then` expression is executed -to produce the result. Thus, the actual semantics are more like the ternary +Conditional expressions use the very common and familiar if-then-else syntax and semantics +with one notable unique quality. In Kaleidoscope ***every language construct is an +expression***, there are no statements. Expressions all produce a value. So the result of +the conditional expression is the result of the sub-expression selected based on the +condition. The condition value is computed and if the result == 0.0 (false) the `else` +expression is used to produce the final result. Otherwise, the `then` expression is executed +to produce the result. Thus, the actual semantics are more like the ternary conditional operator found C and other languages: ``` C @@ -318,9 +325,9 @@ def fib(x) ``` ### ForInExpression -The ForInExpression provides support for classic for loop constructs. In particular it provides a variable -scope for a loop value, a condition to test when to exit the loop and an optional step value for incrementing -the loop value (default is 1.0). +The ForInExpression provides support for classic for loop constructs. In particular it +provides a variable scope for a loop value, a condition to test when to exit the loop and an +optional step value for incrementing the loop value (default is 1.0). ``` Kaleidoscope extern putchard(char); @@ -333,12 +340,12 @@ printstar(100); ``` > [!NOTE] -> Technically, there are no statements in Kaleidoscope, everything is an expression and has a value. putchard() -> implicitly returns a value as does printstar(). (e.g. there is no void return - ALL functions implicitly -> return a floating point value, even if it is always 0.0). +> Technically, there are no statements in Kaleidoscope, everything is an expression and has +> a value. putchard() implicitly returns a value as does printstar(). (e.g. there is no void +> return - ALL functions implicitly return a floating point value, even if it is always 0.0). -For loops with mutable values support in the language may provide a result that isn't always 0.0, for -example: +For loops with mutable values support in the language may provide a result that isn't always +0.0, for example: ```Kaleidoscope # Define ':' for sequencing: as a low-precedence operator that ignores operands @@ -366,46 +373,51 @@ fibi(10); ``` ## Parse Tree -ANTLR produces a low level parse tree with nodes corresponding to each of the rules defined in the grammar. -In most cases this is extremely verbose and more details than is actually needed for generating code. (Though, -it can be used as-is in some cases.) Typically code generation will walk the parse tree to provide a simpler -Abstract Syntax Tree that represents the actual language concepts independent of the syntax of the language. -ANTLR will generate a parser based on the grammar description input file. This generated parser (and lexer) -includes a context type for each rule of the grammar. The C# target for ANTLR generates these types as partial -classes so they are extensible from the parser assembly without needing to derive a new type or use virtual -methods etc. Thus, the Kaleidoscope.Grammar assembly contains partial class extensions that provide simpler +ANTLR produces a low level parse tree with nodes corresponding to each of the rules defined +in the grammar. In most cases this is extremely verbose and more details than is actually +needed for generating code. (Though, it can be used as-is in some cases.) Typically code +generation will walk the parse tree to provide a simpler Abstract Syntax Tree that +represents the actual language concepts independent of the syntax of the language. ANTLR +will generate a parser based on the grammar description input file. This generated parser +(and lexer) includes a context type for each rule of the grammar. The C# target for ANTLR +generates these types as partial classes so they are extensible from the parser assembly +without needing to derive a new type or use virtual methods etc. Thus, the +`Kaleidoscope.Grammar` assembly contains partial class extensions that provide simpler property accessors and support methods to aid is generating the AST. -See [Kaleidoscope Parse Tree Examples](xref:Kaleidoscope-Parsetree-examples) for more information and example -diagrams of the parse tree for various language constructs. +See [Kaleidoscope Parse Tree Examples](xref:Kaleidoscope-Parsetree-examples) for more +information and example diagrams of the parse tree for various language constructs. ## Abstract Syntax Tree (AST) -To further simplify code generators the Kaleidoscope.Grammar library contains the AstBuilder type that is -an ANTLR parse tree visitor. AstBuilder will convert a raw ANTLR IParseTree into a a tree of `IAstNode` elements. -That is, it visits the declarations and definitions in the parse tree to produce a full tree of declarations -and definitions as they appeared in the source. For interactive modes - the tree will have only one top level node. -However, when parsing a whole source file, the parse tree may contain multiple declarations and definitions under -a RootNode. - -The [Kaleidoscope AST](xref:Kaleidoscope-AST) is a means of simplifying the original parse tree into -constructs that are easy for the code generation to use directly and to validate the syntax of the input source. -In the case of Kaleidoscope there are a few types of nodes that are used to generate LLVM IR. The AstBuilder class -is responsible for generating an AST from an ANTLR4 parse tree. +To further simplify code generators the Kaleidoscope.Grammar library contains the AstBuilder +type that is an ANTLR parse tree visitor. AstBuilder will convert a raw ANTLR `IParseTree` +into a a tree of `IAstNode` elements. That is, it visits the declarations and definitions in +the parse tree to produce a full tree of declarations and definitions as they appeared in +the source. For interactive modes - the tree will have only one top level node. However, +when parsing a whole source file, the parse tree may contain multiple declarations and +definitions under a RootNode. + +The [Kaleidoscope AST](xref:Kaleidoscope-AST) is a means of simplifying the original parse +tree into constructs that are easy for the code generation to use directly and to validate +the syntax of the input source. In the case of Kaleidoscope there are a few types of nodes +that are used to generate LLVM IR. The AstBuilder class is responsible for generating an AST +from an ANTLR4 parse tree. The major simplifying transformations performed in building the AST are: * Convert top-level functions to a pair of FunctionDeclaration and FunctionDefinition - * Convert user defined operator definition to simple FunctionDefinition with a special name for the operator - * Convert user defined operator expressions into simple function calls to the operator function + * Convert user defined operator definition to simple FunctionDefinition with a special name + for the operator. + * Convert user defined operator expressions into simple function calls to the operator + function. >[!NOTE] ->An interesting consequence of these transformations into the AST form is that the ->concept of user defined operators no longer exists in the AST! The AST only deals ->in function declarations, definitions and the built-in operators. All issues of ->precedence are implicitly resolved in the ordering of the nodes in the AST. ->Thus, the code generation doesn't need to consider the issue of user defined ->operators or operator precedence at all. ([Chapter 6](xref:Kaleidoscope-ch6) ->covers the details of user defined operators and how the Kaleidoscope sample ->language uses ANTLR to implement them.) +>An interesting consequence of these transformations into the AST form is that the concept +> of user defined operators no longer exists in the AST! The AST only deals in function +> declarations, definitions and the built-in operators. All issues of precedence are +> implicitly resolved in the ordering of the nodes in the AST. Thus, the code generation +> doesn't need to consider the issue of user defined operators or operator precedence at +> all. ([Chapter 6](xref:Kaleidoscope-ch6) covers the details of user defined operators and +> how the Kaleidoscope sample language uses ANTLR to implement them.) ## Basic Application Architecture @@ -418,62 +430,65 @@ applications. 4. The code generator (e.g. CodeGenerator.cs) ### Driver -While each chapter is a bit different from the others. Many of the chapters are -virtually identical for the driver. In particular Chapters 3-7 only really differ -in the name of the app and window title etc... +While each chapter is a bit different from the others. Many of the chapters are virtually +identical for the driver. In particular Chapters 3-7 only really differ in the name of the +app and window title etc... [!code-csharp[Program.cs](Program.cs)] ### Read, Evaluate, Print loop -The Kaleidoscope.Runtime library contains an abstract base class for building a -standard REPL engine from an input TextReader. The base class handles converting -the input reader into a sequence of statements, and parsing them into AST nodes. -The nodes are provided to an application provided generator that produces the -output result. The REPL engine base uses the abstract ShowResults method to -actually show the results. +The Kaleidoscope.Runtime library contains an abstract base class for building a standard +REPL engine from an input TextReader. The base class handles converting the input reader +into a sequence of statements, and parsing them into AST nodes. The nodes are provided to an +application provided generator that produces the output result. The REPL engine base uses +the abstract ShowResults method to actually show the results. [!code-csharp[Program.cs](ReplEngine.cs)] ### Runtime Support -The Parser contains the support for parsing the Kaleidoscope language from the REPL loop interactive -input. The parser stack also maintains the global state of the runtime, which -controls the language features enabled, and if user defined operators are enabled, -contains the operators defined along with their precedence. +The Parser contains the support for parsing the Kaleidoscope language from the REPL loop +interactive input. The parser stack also maintains the global state of the runtime, which +controls the language features enabled, and if user defined operators are enabled, contains +the operators defined along with their precedence. -After the parser is created an enumerable sequence of statements is created for the -parser to process. -This results in a sequence of AST nodes. After construction, the sequence is used to iterate over all of -the nodes generated from the user input. +After the parser is created an enumerable sequence of statements is created for the parser +to process. This results in a sequence of AST nodes. After construction, the sequence is +used to iterate over all of the nodes generated from the user input. -This use of an enumerator sequences is a bit of a different approach to things for running an interpreter Read, -Evaluate Print Loop, but once you get your head around it, the sequence provides a nice clean and flexible -mechanism for building a pipeline of transformations from the text input into the result output. +This use of an enumerator sequences is a bit of a different approach to things for running +an interpreter Read, Evaluate Print Loop, but once you get your head around it, the sequence +provides a nice clean and flexible mechanism for building a pipeline of transformations from +the text input into the result output. ### CodeGenerator -The code generator will transform the AST node into the final output for the program. For the basic samples -(Chapter 4-7) it indicates the value of any JITed and executed top level expressions. In chapter 3 it indicates -the name of any functions defined and the LLVM IR for them. Chapter 2 uses a generator that simply produces the -node it was given as the app doesn't actually use LLVM (it focuses on parsing the language only and the REPL -infrastructure). This, helps to keep the samples consistent and as similar as possible to allow direct file -comparisons to show the changes for a particular feature. The separation of concerns also aids in making the -grammar, runtime and code generation unit-testable without the driver application. +The code generator will transform the AST node into the final output for the program. For +the basic samples (Chapter 4-7) it indicates the value of any JITed and executed top level +expressions. In chapter 3 it indicates the name of any functions defined and the LLVM IR for +them. Chapter 2 uses a generator that simply produces the node it was given as the app +doesn't actually use LLVM (it focuses on parsing the language only and the REPL +infrastructure). This, helps to keep the samples consistent and as similar as possible to +allow direct file comparisons to show the changes for a particular feature. The separation +of concerns also aids in making the grammar, runtime and code generation unit-testable +without the driver application. [!code-csharp[ShowResults](CodeGenerator.cs)] ### Special case for Chapter 2 -Chapter 2 sample code, while still following the general patterns used in all of the chapters, is a bit -unique, it doesn't actually use Ubiquity.NET.Llvm at all! Instead, it is only focused on the language and parsing. -This helps in understanding the basic patterns of the code. Furthermore, this chapter serves as an aid in -understanding the language itself. Of particular use is the ability to generate DGML and [blockdiag](http://blockdiag.com) +Chapter 2 sample code, while still following the general patterns used in all of the +chapters, is a bit unique, it doesn't actually use Ubiquity.NET.Llvm at all! Instead, it is +only focused on the language and parsing. This helps in understanding the basic patterns of +the code. Furthermore, this chapter serves as an aid in understanding the language itself. +Of particular use is the ability to generate DGML and [blockdiag](http://blockdiag.com) representations of the parse tree for a given parse. >[!NOTE] ->All of the diagrams in these tutorials were created by generating the blockdiag files and then producing ->the SVG files from that. Having a nice visual representation of a parse tree result is helpful to understanding ->the parsing and various parse tree node types. - -The visual graph is also immensely valuable when making changes to the grammar so you can see the results -of a parse and more readily understand why something isn't right. In fact, this feature was created to -help track down bugs in the parsing for user defined operator precedence that was difficult to figure out. -Once the visualization was available it became quite easy to see the problems. Thus, Chapter 2 is both -a simple introductory example and a tool for use when doing more advanced language tweaking or extension. +>All of the diagrams in these tutorials were created by generating the blockdiag files and +> then producingthe SVG files from that. Having a nice visual representation of a parse tree +> result is helpful to understanding the parsing and various parse tree node types. + +The visual graph is also immensely valuable when making changes to the grammar so you can +see the results of a parse and more readily understand why something isn't right. In fact, +this feature was created to help track down bugs in the parsing for user defined operator +precedence that was difficult to figure out. Once the visualization was available it became +quite easy to see the problems. Thus, Chapter 2 is both a simple introductory example and a +tool for use when doing more advanced language tweaking or extension. diff --git a/src/Samples/Kaleidoscope/Chapter3.5/Kaleidoscope-ch3.5.md b/src/Samples/Kaleidoscope/Chapter3.5/Kaleidoscope-ch3.5.md index c316e06ce..85177cac4 100644 --- a/src/Samples/Kaleidoscope/Chapter3.5/Kaleidoscope-ch3.5.md +++ b/src/Samples/Kaleidoscope/Chapter3.5/Kaleidoscope-ch3.5.md @@ -3,67 +3,64 @@ uid: Kaleidoscope-ch3.5 --- # 3.5 Kaleidoscope: Generating LLVM IR With optimizations -This chapter focuses on the basics of optimization with LLVM IR. It diverges from -the official tutorial where that mixes the optimization with the use of a JIT. This -sub chapter is attempting to isolate those and was born as a means to test/validate -the core library and optimization without a JIT (The JIT wrappers didn't exist yet). +This chapter focuses on the basics of optimization with LLVM IR. It diverges from the +official tutorial where that mixes the optimization with the use of a JIT. This sub chapter +is attempting to isolate those and was born as a means to test/validate the core library and +optimization without a JIT (The JIT wrappers didn't exist yet). + The general goal is to parse Kaleidoscope source code to generate a [Module](xref:Ubiquity.NET.Llvm.Module) -representing the source as optimized LLVM IR. This is broken out as a distinct -chapter to help identify the support for profiling and how it is different from the -LLVM source samples that link directly to the LLVM libraries (That is, the samples -are written in C++ AND use the C++ pass builder and management support that is NOT -exported via the stable LLVM-C API. This level of functionality is only available -as the legacy pass management system with VERY limited support in the LLVM-C API. -[It is so legacy now that almost ALL remnants of it are removed from the LLVM-C API, -not just deprecated]) +representing the source as optimized LLVM IR. This is broken out as a distinct chapter to +help identify the support for profiling and how it is different from the LLVM source samples +that link directly to the LLVM libraries (That is, the samples are written in C++ AND use +the C++ pass builder and management support that is NOT exported via the stable LLVM-C API. +This level of functionality is only available as the legacy pass management system with VERY +limited support in the LLVM-C API. [It is so legacy now that almost ALL remnants of it are +removed from the LLVM-C API, not just deprecated]) ## Code generation -The Core of this sample doesn't change much from [Chapter 3](xref:Kaleidoscope-ch3). -It simply adds module generation with optimized IR. To do that there are a few changes -to make. In fact the optimizations provided don't do much and the resulting IR is much -the same. [Coming up with a more complex Kaleidoscope sample that actually uses the -optimizations more is left as an exercise for the reader. :wink: ] +The Core of this sample doesn't change much from [Chapter 3](xref:Kaleidoscope-ch3). It +simply adds module generation with optimized IR. To do that there are a few changes to make. +In fact the optimizations provided don't do much and the resulting IR is much the same. +[Coming up with a more complex Kaleidoscope sample that actually uses the optimizations more +is left as an exercise for the reader. :wink: ] ### Initialization -The code generation maintains state for the transformation as private members. To -support optimization generally only requires a set of named passes and to call the -method to run the passes on a function or module. [Technically an overload provides -the chance to set [PassBuilderOptions](xref:Ubiquity.NET.Llvm.PassBuilderOptions) -but this sample just uses the overload that applies defaults.] The new pass -management system uses the string names of passes instead of a distinct type and -named methods for adding them etc... +The code generation maintains state for the transformation as private members. To support +optimization generally only requires a set of named passes and to call the method to run the +passes on a function or module. [Technically an overload provides the chance to set +[PassBuilderOptions](xref:Ubiquity.NET.Llvm.PassBuilderOptions) but this sample just uses +the overload that applies defaults.] The new pass management system uses the string names of +passes instead of a distinct type and named methods for adding them etc... These Options are initialized in a private static member for the passes. + [!code-csharp[Main](CodeGenerator.cs#PrivateMembers)] ### Special attributes for parsed functions >[!WARNING] ->When performing optimizations with the new pass builder system the ->TargetLibraryInfo (Internal LLVM concept) is used to determine what the "built-in" ->functions are. Unfortunately, they leave little room for manipulating or customizing ->this set (In C++ there is some "wiggle room", in LLVM-C there is NO support for ->this type at all!). Unfortunately, that means that if any function happens to have ->the same name as the TargetLibraryInfo for a given Triple then it will be optimized ->AS a built-in function (even if not declared as one) unless explicitly declared as ->"not" at the call site with an attribute. This is an unfortunate state ->of affairs with the LLVM support for C++ and highly problematic for `C` based ->bindings/projections like this library. Fortunately, there is a scapegoat for this. ->The function can include a `nobuiltin` attribute at the call site to prevent the ->optimizer from assuming the call is to one of the well known built-in functions. ->This isn't used for Kaleidoscope. But does leave room for problems with names that ->match some arbitrary set of "built-in" symbols. +>When performing optimizations with the new pass builder system the TargetLibraryInfo +> (Internal LLVM concept) is used to determine what the "built-in" functions are. +> Unfortunately, they leave little room for manipulating or customizing this set (In C++ +> there is some "wiggle room", in LLVM-C there is NO support for this type at all!). +> Unfortunately, that means that if any function happens to have the same name as the +> TargetLibraryInfo for a given Triple then it will be optimized AS a built-in function +> (even if not declared as one) unless explicitly declared as "not" at the call site with +> an attribute. This is an unfortunate state of affairs with the LLVM support for C++ and +> highly problematic for `C` based bindings/projections like this library. Fortunately, +> there is a scapegoat for this. The function can include a `nobuiltin` attribute at the +> call site to prevent the optimizer from assuming the call is to one of the well known +> built-in functions. This isn't used for Kaleidoscope. But does leave room for problems +> with names that match some arbitrary set of "built-in" symbols. [!code-csharp[Main](CodeGenerator.cs#GetOrDeclareFunction)] - ### Function Definition -The only other major change for optimization support is to actually run the -optimizations. In LLVM optimizations are supported at the module or individual -function level. For this sample each function definition is optimized as each is -returned individually. That will change in later chapters. Thus the only real -change is after generating a new function for a given AST definition the -optimization passes are run for it. This involves calling one of the overloads -of the `TryRunPasses` function and then checking for errors. +The only other major change for optimization support is to actually run the optimizations. +In LLVM optimizations are supported at the module or individual function level. For this +sample each function definition is optimized as each is returned individually. That will +change in later chapters. Thus the only real change is after generating a new function for +a given AST definition the optimization passes are run for it. This involves calling one of +the overloads of the `TryRunPasses` function and then checking for errors. [!code-csharp[Main](CodeGenerator.cs#FunctionDefinition)] diff --git a/src/Samples/Kaleidoscope/Chapter3/Kaleidoscope-ch3.md b/src/Samples/Kaleidoscope/Chapter3/Kaleidoscope-ch3.md index b6c65f98a..6d32e3e3e 100644 --- a/src/Samples/Kaleidoscope/Chapter3/Kaleidoscope-ch3.md +++ b/src/Samples/Kaleidoscope/Chapter3/Kaleidoscope-ch3.md @@ -3,30 +3,29 @@ uid: Kaleidoscope-ch3 --- # 3. Kaleidoscope: Generating LLVM IR -This chapter focuses on the basics of transforming the ANTLR parse tree into LLVM -IR. The general goal is to parse Kaleidoscope source code to generate a +This chapter focuses on the basics of transforming the ANTLR parse tree into LLVM IR. The +general goal is to parse Kaleidoscope source code to generate a [Module](xref:Ubiquity.NET.Llvm.Module) representing the source as LLVM IR. ## Basic code flow -The basic flow of all of these samples is the same for the LLVM+JIT variants (Even -though this version doesn't use the JIT it is setting the stage to get there). +The basic flow of all of these samples is the same for the LLVM+JIT variants (Even though +this version doesn't use the JIT it is setting the stage to get there). -1) A new `ReplEngine` is created to handle the standard REPL support with - customizations. - 1) The application uses its own implementation to handle extension points for - the common support. +1) A new `ReplEngine` is created to handle the standard REPL support with customizations. + 1) The application uses its own implementation to handle extension points for the common + support. 2) A `CancellationTokenSource` is created and hooked up to cancel the REPL when - `CTRL-C is pressed` to allow normal expectations of termination for a command - line application. + `CTRL-C` is pressed to allow normal expectations of termination for a command line + application. 3) Information on the specific app is reported to the console 4) LLVM is initialized 1) The native target is registered so that at least a Local JIT is workable 5) The REPL engine is run to do the work ### Initializing Ubiquity.NET.Llvm -The underlying LLVM library requires initialization for it's internal data, -furthermore Ubiquity.NET.Llvm must load the actual underlying library specific to -the current system architecture. Thus, the Ubiquity.NET.Llvm as a whole requires +The underlying LLVM library requires initialization for it's internal data, furthermore +`Ubiquity.NET.Llvm` must load the actual underlying library specific to the current system +and architecture (RID). Thus, the `Ubiquity.NET.Llvm` library as a whole requires initialization. ``` C# @@ -40,27 +39,27 @@ using( InitializeLLVM() ) } ``` -The initialization returns an IDisposable so that the calling application can -shutdown/cleanup resources and potentially re-initialize for a different target, if -desired. This application only needs to generate one module and exit so it just -applies a standard C# `using` scope to ensure proper cleanup. +The initialization returns an `IDisposable` so that the calling application can +shutdown/cleanup resources and potentially re-initialize for a different target, if desired. +This application only needs to generate one module and exit so it just applies a standard C# +`using` scope to ensure proper cleanup. ### Initializing Targets -LLVM supports a number of target architectures, however for the Kaleidoscope -tutorials the only supported target is the one the host application is running on. -So, only the native target is registered. +LLVM supports a number of target architectures, however for the Kaleidoscope tutorials the +only supported target is the one the host application is running on. So, only the native +target is registered. ``` C# RegisterNative(); ``` ### Generator and REPL loop -This chapter supports the simple expressions of the language that are parsed and -generated to an LLVM [Value](xref:Ubiquity.NET.Llvm.Values.Value). This forms the -foundation of the Kaleidoscope samples outer generation loop. Subsequent, chapters -will focus on additional functionality including JIT compilation, Debugging -information, and Native Module generation. Processing the results for this chapter, -is pretty simple, it just prints out a textual form of the generated LLVM IR. +This chapter supports the simple expressions of the language that are parsed and generated +to an LLVM [Value](xref:Ubiquity.NET.Llvm.Values.Value). This forms the foundation of the +Kaleidoscope samples outer generation loop. Subsequent, chapters will focus on additional +functionality including JIT compilation, Debugging information, and Native Module +generation. Processing the results for this chapter, is pretty simple, it just prints out a +textual form of the generated LLVM IR. [!code-csharp[Main](ReplEngine.cs)] @@ -75,8 +74,8 @@ These are initialized in the constructor [!code-csharp[Main](CodeGenerator.cs#Initialization)] -The exact set of members varies for each chapter but the basic ideas remain across -each chapter. +The exact set of members varies for each chapter but the basic ideas remain across each +chapter. |Name | Description | |-----|-------------| @@ -87,85 +86,81 @@ each chapter. | NamedValues | Contains a mapping of named variables to the generated [Value](xref:Ubiquity.NET.Llvm.Values.Value) | ### Generate Method -The Generate method is used by the REPL loop to generate the final output from a -parse tree. The common implementation simply passes the tree to the AST generating -parse tree visitor to generate the AST and process the AST nodes from that. Due to -the simplicity of the Kaleidoscope language the AST is more of a List than a tree. -In fact, the AstBuilder creates an enumerable sequence of nodes that are either a -function declaration or a function definition. For the interactive mode only a -single element is parsed at a time. However, when doing Ahead of Time (AOT) -compilation in [Chapter 8](xref:Kaleidoscope-ch8) this sequence can contain many -declarations and definitions in any order. To handle the different node types the -generate method simply uses pattern matching to detect the type of node to dispatch to a visitor -function for that kind of node. +The Generate method is used by the REPL loop to generate the final output from a parse tree. +The common implementation simply passes the tree to the AST generating parse tree visitor to +generate the AST and process the AST nodes from that. Due to the simplicity of the +Kaleidoscope language the AST is more of a List than a tree. In fact, the AstBuilder creates +an enumerable sequence of nodes that are either a function declaration or a function +definition. For the interactive mode only a single element is parsed at a time. However, +when doing Ahead of Time (AOT) compilation in [Chapter 8](xref:Kaleidoscope-ch8) this +sequence can contain many declarations and definitions in any order. To handle the different +node types the generate method simply uses pattern matching to detect the type of node to +dispatch to a visitor function for that kind of node. [!code-csharp[Main](CodeGenerator.cs#Generate)] ### Function Declarations -Function declarations don't actually generate any code. Instead they are captured -and added to a collection of declarations used in validating subsequent function -calls when generating the AST for function definitions. +Function declarations don't actually generate any code. Instead they are captured and added +to a collection of declarations used in validating subsequent function calls when generating +the AST for function definitions. [!code-csharp[Main](../Kaleidoscope.Grammar/AST/Prototype.cs)] ### Function Definition -Functions with bodies (e.g. not just a declaration to a function defined elsewhere) are handled via the -VisitFunctionDefinition() Method. +Functions with bodies (e.g. not just a declaration to a function defined elsewhere) are +handled via the VisitFunctionDefinition() Method. [!code-csharp[Main](CodeGenerator.cs#FunctionDefinition)] -VisitFunctionDefinition() simply extracts the function prototype from the AST node. A private utility -method GetOrDeclareFunction() is used to get an existing function or declare a new -one. +VisitFunctionDefinition() simply extracts the function prototype from the AST node. A +private utility method GetOrDeclareFunction() is used to get an existing function or declare +a new one. [!code-csharp[Main](CodeGenerator.cs#GetOrDeclareFunction)] -GetOrDeclareFunction() will first attempt to get an existing function and if found -returns that function. Otherwise it creates a function signature type then adds a -function to the module with the given name and signature and adds the parameter -names to the function. In LLVM the signature only contains type information and no -names, allowing for sharing the same signature for completely different functions. - -The function and the expression representing the body of the function is then used -to emit IR for the function. - -The generation verifies that the function is a declaration (e.g. does not have a -body) as Kaleidoscope doesn't support any sort of overloaded functions. - -The generation of a function starts by constructing a basic block for the entry -point of the function and attaches the InstructionBuilder to the end of that block. -(It's empty so it is technically at the beginning but placing it at the end it will -track the end position as new instructions are added so that each instruction added -will go on the end of the block). At this point there will only be the one block as -the language doesn't yet have support for control flow. (That is introduced in -[Chapter 5](xref:Kaleidoscope-ch5)) - -The NamedValues map is cleared and each of the parameters is mapped in the -NamedValues map to its argument value in IR. The body of the function is visited to -produce an LLVM Value. The visiting will, in turn add instructions, and possibly -new blocks, as needed to represent the body expression in proper execution order. - -If generating the body results in an error, then the function is removed from the -parent and the exception propagates up. This allows the user to define the function -again, if appropriate. - -Finally, a return instruction is applied to return the result of the expression -followed by a verification of the function to ensure internal consistency. -(Generally the verify is not used in production releases as it is an expensive -operation to perform on every function. But when building up a language generator it -is quite useful to detect errors early.) +GetOrDeclareFunction() will first attempt to get an existing function and if found returns +that function. Otherwise it creates a function signature type then adds a function to the +module with the given name and signature and adds the parameter names to the function. In +LLVM the signature only contains type information and no names, allowing for sharing the +same signature for completely different functions. + +The function and the expression representing the body of the function is then used to emit +IR for the function. + +The generation verifies that the function is a declaration (e.g. does not have a body) as +Kaleidoscope doesn't support any sort of overloaded functions. + +The generation of a function starts by constructing a basic block for the entry point of the +function and attaches the InstructionBuilder to the end of that block. (It's empty so it is +technically at the beginning but placing it at the end it will track the end position as new +instructions are added so that each instruction added will go on the end of the block). At +this point there will only be the one block as the language doesn't yet have support for +control flow. (That is introduced in [Chapter 5](xref:Kaleidoscope-ch5)) + +The NamedValues map is cleared and each of the parameters is mapped in the NamedValues map +to its argument value in IR. The body of the function is visited to produce an LLVM Value. +The visiting will, in turn add instructions, and possibly new blocks, as needed to represent +the body expression in proper execution order. + +If generating the body results in an error, then the function is removed from the parent and +the exception propagates up. This allows the user to define the function again, if +appropriate. + +Finally, a return instruction is applied to return the result of the expression followed by +a verification of the function to ensure internal consistency. (Generally the verify is not +used in production releases as it is an expensive operation to perform on every function. +But when building up a language generator it is quite useful to detect errors early.) #### Top Level Expression -Top level expressions in Kaleidoscope are transformed into an anonymous function -definition by the AstBuilder. Since this chapter is focused on generating the IR -module there isn't any special handling needed for a top level expression - they -are simply just another function definition. (JIT execution of the top level -expression comes in the next chapter) +Top level expressions in Kaleidoscope are transformed into an anonymous function definition +by the AstBuilder. Since this chapter is focused on generating the IR module there isn't +any special handling needed for a top level expression - they are simply just another +function definition. (JIT execution of the top level expression comes in the next chapter) ### Constant expression -In Kaleidoscope all values are floating point and constants are represented in LLVM -IR as [ConstantFP](xref:Ubiquity.NET.Llvm.Values.ConstantFP). The AST provides the -value of the constant as a C# `double`. +In Kaleidoscope all values are floating point and constants are represented in LLVM IR as +[ConstantFP](xref:Ubiquity.NET.Llvm.Values.ConstantFP). The AST provides the value of the +constant as a C# `double`. [!code-csharp[Main](../Kaleidoscope.Grammar/AST/ConstantExpression.cs)] @@ -174,63 +169,60 @@ Generation of the LLVM IR for a constant is quite simple. [!code-csharp[Main](CodeGenerator.cs#ConstantExpression)] > [!NOTE] -> The constant value is uniqued in LLVM so that multiple calls given the same input -> value will produce the same LLVM Value. Ubiquity.NET.Llvm honors this via value -> equality tests. It does ***NOT*** guarantee reference equality. (It used to in -> older versions but no longer does as that led to subtle problems with ownership -> and multi-threaded JIT) +> The constant value is uniqued in LLVM so that multiple calls given the same input value +> will produce the same LLVM Value. Ubiquity.NET.Llvm honors this via value equality tests. +> It does ***NOT*** guarantee reference equality. (It used to in older versions but no +> longer does as that led to subtle problems with ownership and multi-threaded JIT) ### Variable reference expression -References to variables in Kaleidoscope, like most other languages, use a name. In -this chapter the support of variables is rather simple. The Variable expression -generator assumes the variable is declared somewhere else already and simply looks -up the value from the private map. At this stage of the development of Kaleidoscope -the only place where the named values are generated are function arguments, later -chapters will introduce loop induction variables and variable assignment. The -implementation uses a standard "Try/Get" pattern to retrieve the value or throw an -exception if the variable doesn't exist. +References to variables in Kaleidoscope, like most other languages, use a name. In this +chapter the support of variables is rather simple. The Variable expression generator assumes +the variable is declared somewhere else already and simply looks up the value from the +private map. At this stage of the development of Kaleidoscope the only place where the named +values are generated are function arguments, later chapters will introduce loop induction +variables and variable assignment. The implementation uses a standard "Try/Get" pattern to +retrieve the value or throw an exception if the variable doesn't exist. [!code-csharp[Main](CodeGenerator.cs#VariableReferenceExpression)] ### Binary Operator Expression -Things start to get a good bit more interesting with binary operators. The AST node -for an expression is a simple empty "tagging" interface. Since the interface also -requires the IAstNode interface it contains support for walking the chain of -operators that form an expression in left to right order, accounting for precedence. +Things start to get a good bit more interesting with binary operators. The AST node for an +expression is a simple empty "tagging" interface. Since the interface also requires the +`IAstNode` interface it contains support for walking the chain of operators that form an +expression in left to right order, accounting for precedence. [!code-csharp[Main](../Kaleidoscope.Grammar/AST/IExpression.cs)] -Generation of an expression consists a simple visitor method to emit the code for -the operands and the actual operator. +Generation of an expression consists a simple visitor method to emit the code for the +operands and the actual operator. [!code-csharp[Main](CodeGenerator.cs#BinaryOperatorExpression)] -The process of transforming the operator starts by generating an LLVM IR Value from -the right-hand side expression. A simple switch statement based on the token type -of the operator is used to generate the actual LLVM IR instruction(s) for the -operator. +The process of transforming the operator starts by generating an LLVM IR Value from the +right-hand side expression. A simple switch statement based on the token type of the +operator is used to generate the actual LLVM IR instruction(s) for the operator. -LLVM has strict rules on the operators and their values for the IR, in particular -the types of the operands must be identical and, usually must also match the type -of the result. For the Kaleidoscope language that's easy to manage as it only -supports one data type. Other languages might need to insert additional conversion -logic as part of emitting the operators. (Kaleidoscope does this for boolean values -when supporting conditional control flow in [Chapter 5](xref:Kaleidoscope-ch5)) +LLVM has strict rules on the operators and their values for the IR, in particular the types +of the operands must be identical and, usually must also match the type of the result. For +the Kaleidoscope language that's easy to manage as it only supports one data type. Other +languages might need to insert additional conversion logic as part of emitting the +operators. (Kaleidoscope does this for boolean values when supporting conditional control +flow in [Chapter 5](xref:Kaleidoscope-ch5)) The Generation of the IR instructions uses the current InstructionBuilder and the [RegisterName](xref:Ubiquity.NET.Llvm.Values.ValueExtensions.RegisterName``1(``0,System.String)) extension method to provide a name for the result in LLVM IR. The name helps with -readability of the IR when generated in the textual form of LLVM IR assembly. A -nice feature of LLVM is that it will automatically handle duplicate names by -appending an integral value to the name automatically so that generators don't -need to keep track of the names to ensure uniqueness. +readability of the IR when generated in the textual form of LLVM IR assembly. A nice feature +of LLVM is that it will automatically handle duplicate names by appending an integral value +to the name automatically so that generators don't need to keep track of the names to ensure +uniqueness. -The `Less` operator uses a floating point `unordered less than` IR instruction -followed by an integer to float cast to translate the LLVM IR i1 result into a -floating point value needed by Kaleidoscope. +The `Less` operator uses a floating point `unordered less than` IR instruction followed by +an integer to float cast to translate the LLVM IR i1 result into a floating point value +needed by Kaleidoscope. -The `^` operator for exponentiation uses the `llvm.pow.f64` intrinsic to perform -the exponentiation a efficiently as the back-end generator can. +The `^` operator for exponentiation uses the `llvm.pow.f64` intrinsic to perform the +exponentiation a efficiently as the back-end generator can. ## Examples diff --git a/src/Samples/Kaleidoscope/Chapter4/Kaleidoscope-ch4.md b/src/Samples/Kaleidoscope/Chapter4/Kaleidoscope-ch4.md index 88331efb6..46713cce6 100644 --- a/src/Samples/Kaleidoscope/Chapter4/Kaleidoscope-ch4.md +++ b/src/Samples/Kaleidoscope/Chapter4/Kaleidoscope-ch4.md @@ -3,142 +3,134 @@ uid: Kaleidoscope-ch4 --- # 4. Kaleidoscope: Adding JIT -At this point things generally re-converge with the official LLVM tutorials -(optimization was already covered in the previous sub-chapter.) +At this point things generally re-converge with the official LLVM tutorials (optimization +was already covered in the previous sub-chapter.) ## Adding JIT Compilation -Now that the code generation produces optimized code, it is time to get to the fun -part - executing code! The basic idea is to allow the user to type in the -Kaleidoscope code as supported thus far and it will execute to produce a result. -Unlike the previous chapters, instead of just printing out the LLVM IR -representation of a top level expression this sample will execute the code and -provide the results back to the user! (Like a real language/Tool should!) +Now that the code generation produces optimized code, it is time to get to the fun part - +executing code! The basic idea is to allow the user to type in the Kaleidoscope code as +supported thus far and it will execute to produce a result. Unlike the previous chapters, +instead of just printing out the LLVM IR representation of a top level expression this +sample will execute the code and provide the results back to the user! (Like a real +language/Tool should!) ### Main Driver -The changes needed to the main driver are pretty simple, mostly consisting of -removing a couple lines of code that print out the LLVM IR for the module at the -end when defined. The code already supported showing the results if it was a -floating point value by checking if the generated value is a -[ConstantFP](xref:Ubiquity.NET.Llvm.Values.ConstantFP). We'll see a bit later on -why that is a ConstantFP value. +The changes needed to the main driver are pretty simple, mostly consisting of removing a +couple lines of code that print out the LLVM IR for the module at the end when defined. +The code already supported showing the results if it was a floating point value by checking +if the generated value is a [ConstantFP](xref:Ubiquity.NET.Llvm.Values.ConstantFP). We'll +see a bit later on why that is a ConstantFP value. ### Code Generator -The code generation needs an update to support using a JIT engine to generate and -execute the Kaleidoscope code provided by the user. +The code generation needs an update to support using a JIT engine to generate and execute +the Kaleidoscope code provided by the user. #### Generator fields -To begin with, the generator needs some additional members, including the JIT -engine. +To begin with, the generator needs some additional members, including the JIT engine. [!code-csharp[PrivateMembers](CodeGenerator.cs#PrivateMembers)] -The JIT engine is retained for the generator to use. The same engine is held for -the lifetime of the generator so that functions are added to the same engine and -can call functions previously added. The JIT provides a 'tracker' for every module -added, which is used to reference the module in the JIT, this is normally used to -remove the module from the JIT engine when re-defining a function. Thus, a map of -the function names and the JIT tracker created for them is maintained. Additionally, -a collection of defined function prototypes is retained to enable matching a -function call to a previously defined function. Since the JIT support uses a module -per function approach, lookups on the current module aren't sufficient. +The JIT engine is retained for the generator to use. The same engine is held for the +lifetime of the generator so that functions are added to the same engine and can call +functions previously added. The JIT provides a 'tracker' for every module added, which is +used to reference the module in the JIT, this is normally used to remove the module from the +JIT engine when re-defining a function. Thus, a map of the function names and the JIT +tracker created for them is maintained. Additionally, a collection of defined function +prototypes is retained to enable matching a function call to a previously defined function. +Since the JIT support uses a module per function approach, lookups on the current module +aren't sufficient. The JIT engine use a [ThreadSafeContext](xref:Ubiquity.NET.Llvm.OrcJITv2.ThreadSafeContext) and [ThreadSafeModule](xref:Ubiquity.NET.Llvm.OrcJITv2.ThreadSafeModule) to manage -callbacks and materialization in the JIT while supporting multiple threads of -execution. Thus the context type for all modules and generation options needs the -new type. +callbacks and materialization in the JIT while supporting multiple threads of execution. +Thus the context type for all modules and generation options needs the new type. -As described previously the names of functions the module is generated for is held -in a dictionary with the [ResourceTracker](xref:Ubiquity.NET.Llvm.OrcJITv2.ResourceTracker) -for that module to ensure it is 'removable'. +As described previously the names of functions the module is generated for is held in a +dictionary with the [ResourceTracker](xref:Ubiquity.NET.Llvm.OrcJITv2.ResourceTracker) for +that module to ensure it is 'removable'. #### Generator initialization The initialization of the generator requires updating to support the new members. [!code-csharp[Initialization](CodeGenerator.cs#Initialization)] -In particular, the static output writer is set for the JIT to use whatever writer -was provided. Normally, this is the system console but for testing it can be any -standard `TextWriter`. Then the [ThreadSafeContext](xref:Ubiquity.NET.Llvm.OrcJITv2.ThreadSafeContext) +In particular, the static output writer is set for the JIT to use whatever writer was +provided. Normally, this is the system console but for testing it can be anything derived +from `TextWriter`. Then the [ThreadSafeContext](xref:Ubiquity.NET.Llvm.OrcJITv2.ThreadSafeContext) is created for the generator and used to create the instruction builder. #### JIT Engine -The JIT engine itself is a class provided in the Kaleidoscope.Runtime library -that wraps a Ubiquity.NET.Llvm OrcJIT engine. It is NOT derived from that class as -a JIT engine is created using a "Builder" or factory pattern. So it is not possible -to create a derived type using a builder. +The JIT engine itself is a class provided in the `Kaleidoscope.Runtime` library that wraps a +Ubiquity.NET.Llvm OrcJIT engine. It is NOT derived from that class as a JIT engine is +created using a "Builder" or factory pattern. So it is not possible to create a derived type +using a builder. [!code-csharp[Kaleidoscope JIT](../../../Samples/Kaleidoscope/Kaleidoscope.Runtime/KaleidoscopeJIT.cs)] -[LLJit](xref:Ubiquity.NET.Llvm.OrcJITv2.LLJit) provides support for declaring -functions that are external to the JIT that the JIT'd module code can call ( -Absolutes). For Kaleidoscope, two such functions are defined directly in -`KaleidoscopeJIT` (`putchard` and `printd`), which is consistent with the same -functions used in the official LLVM C++ tutorial. Thus, allowing sharing of samples -between the two. These functions are used to provide rudimentary console output -support. +[LLJit](xref:Ubiquity.NET.Llvm.OrcJITv2.LLJit) provides support for declaring functions that +are external to the JIT that the JIT'd module code can call (Absolutes). For Kaleidoscope, +two such functions are defined directly in `KaleidoscopeJIT` (`putchard` and `printd`), +which is consistent with the same functions used in the official LLVM C++ tutorial. Thus, +allowing sharing of samples between the two. These functions are used to provide rudimentary +console output support. > [!WARNING] -> All such methods implemented in .NET must block any exception from bubbling out -> of the call as the JIT engine doesn't know anything about them and neither does -> the Kaleidoscope language. Exceptions thrown in these functions would produce -> undefined results, at best - probably crashing the application. +> All such methods implemented in .NET must block any exception from bubbling out of the +> call as the JIT engine doesn't know anything about them and neither does the Kaleidoscope +> language. Exceptions thrown in these functions would produce undefined results, at best - +> probably crashing the application. #### Generator Dispose -Since the JIT engine is disposable, the code generators Dispose() method must now -call the Dispose() method on the JIT engine. +Since the JIT engine is disposable, the code generators Dispose() method must now call the +Dispose() method on the JIT engine. [!code-csharp[Dispose](CodeGenerator.cs#Dispose)] #### Generate Method -To actually execute the code the generated modules are added to the JIT. If the -function is an anonymous top level expression, it is eagerly compiled and a -delegate is retrieved from the JIT to allow calling the compiled function directly. -The delegate is then called to get the result. Once an anonymous function produces -a value, it is no longer used so is removed from the JIT and the result value -returned. For other functions the module is added to the JIT and the function is -returned. - -For named function definitions, the module is lazy added to the JIT as it isn't -known if/when the function is called. The JIT engine will compile modules lazy -added into native code on first use. Though, if the function is never used, then -creating the IR module was wasted. ([Chapter 7.1](xref:Kaleidoscope-ch7.1) has a -solution for even that extra overhead - truly lazy JIT). Since Kaleidoscope is -generally a dynamic language it is possible and reasonable for the user to -re-define a function (to fix an error, or provide a completely different -implementation all together). Therefore, any named functions are removed from the -JIT, if they existed, before adding in the new definition. Otherwise the JIT -resolver would still resolve to the previously compiled instance. +To actually execute the code the generated modules are added to the JIT. If the function is +an anonymous top level expression, it is eagerly compiled and a delegate is retrieved from +the JIT to allow calling the compiled function directly. The delegate is then called to get +the result. Once an anonymous function produces a value, it is no longer used so is removed +from the JIT and the result value returned. For other functions the module is added to the +JIT and the function is returned. + +For named function definitions, the module is lazy added to the JIT as it isn't known +if/when the function is called. The JIT engine will compile modules lazy added into native +code on first use. Though, if the function is never used, then creating the IR module was +wasted. ([Chapter 7.1](xref:Kaleidoscope-ch7.1) has a solution for even that extra overhead - +truly lazy JIT). Since Kaleidoscope is generally a dynamic language it is possible and +reasonable for the user to re-define a function (to fix an error, or provide a completely +different implementation all together). Therefore, any named functions are removed from the +JIT, if they existed, before adding in the new definition. Otherwise the JIT resolver would +still resolve to the previously compiled instance. [!code-csharp[Generate](CodeGenerator.cs#Generate)] -Keeping all the JIT interaction in the generate method isolates the rest of the -generation from any awareness of the JIT. This will help when adding truly lazy JIT -compilation in [Chapter 7.1](xref:Kaleidoscope-ch7.1) and AOT compilation in -[Chapter 8](xref:Kaleidoscope-ch8) +Keeping all the JIT interaction in the generate method isolates the rest of the generation +from any awareness of the JIT. This will help when adding truly lazy JIT compilation in +[Chapter 7.1](xref:Kaleidoscope-ch7.1) and AOT compilation in [Chapter 8](xref:Kaleidoscope-ch8) #### GetOrDeclareFunction() Next is to update the GetOrDeclareFunction() to handle the new support for -[ThreadSafeContext](xref:Ubiquity.NET.Llvm.OrcJITv2.ThreadSafeContext) and a sanity -check for the nullability of a module. +[ThreadSafeContext](xref:Ubiquity.NET.Llvm.OrcJITv2.ThreadSafeContext) and a sanity check +for the nullability of a module. [!code-csharp[Main](CodeGenerator.cs#GetOrDeclareFunction)] #### Function Definitions -Visiting a function definition needs to remove the previously added manual step of -running the optimization passes. That is now handled by the Kaleidoscope JIT as a -transformation layer. Before final target code generation is performed on a symbol -the transforms are run to perform any modifications desired. This makes the -optimization process a lazy operation as well as the final target machine native -code generation. The JIT is setup with a default pass pipeline that is roughly -equivalent to a the Clang compiler with 'O3' optimizations. +Visiting a function definition needs to remove the previously added manual step of running +the optimization passes. That is now handled by the Kaleidoscope JIT as a transformation +layer. Before final target code generation is performed on a symbol the transforms are run +to perform any modifications desired. This makes the optimization process a lazy operation +as well as the final target machine native code generation. The JIT is setup with a default +pass pipeline that is roughly equivalent to a the Clang compiler with 'O3' optimizations. [!code-csharp[Main](CodeGenerator.cs#FunctionDefinition)] ## Conclusion -While the amount of words needed to describe the changes to support JIT execution -here isn't exactly small, the actual code changes required really are. The Parser -and JIT engine do all the heavy lifting. Ubiquity.NET.Llvm.JIT provides a clean -interface to the underlying LLVM OrcJIT v2 that fits with common patterns and -runtime support for .NET. Very cool, indeed! :nerd_face: +While the amount of words needed to describe the changes to support JIT execution here isn't +exactly small, the actual code changes required really are. The Parser and JIT engine do all +the heavy lifting. Ubiquity.NET.Llvm.JIT provides a clean interface to the underlying LLVM +OrcJIT v2 that fits with common patterns and runtime support for .NET. Very cool, indeed! +:nerd_face: diff --git a/src/Samples/Kaleidoscope/Chapter5/Kaleidoscope-ch5.md b/src/Samples/Kaleidoscope/Chapter5/Kaleidoscope-ch5.md index 776764efd..63f57d07d 100644 --- a/src/Samples/Kaleidoscope/Chapter5/Kaleidoscope-ch5.md +++ b/src/Samples/Kaleidoscope/Chapter5/Kaleidoscope-ch5.md @@ -3,18 +3,18 @@ uid: Kaleidoscope-ch5 --- # 5. Kaleidoscope: Control Flow -This chapter focuses on adding the support necessary to implement the if-then-else -and for loop control flow support in the Kaleidoscope language. Without some sort -of control flow the Kaleidoscope language is not particularly useful. So, this -chapter completes the core language support to make it a usable language. +This chapter focuses on adding the support necessary to implement the if-then-else and for +loop control flow support in the Kaleidoscope language. Without some sort of control flow +the Kaleidoscope language is not particularly useful. So, this chapter completes the core +language support to make it a usable language. ## if-then-else -It is worth re-visiting the discussion of the intended syntax and semantics for -conditional flow in [Chapter 2](xref:Kaleidoscope-ch2#conditionalexpression). This -will help in understanding the language functionality to implement. +It is worth re-visiting the discussion of the intended syntax and semantics for conditional +flow in [Chapter 2](xref:Kaleidoscope-ch2#conditionalexpression). This will help in +understanding the language functionality to implement. -The ultimate goal of the changes to support code generation for control flow -constructs is to transform Kaleidoscope code such as: +The ultimate goal of the changes to support code generation for control flow constructs is +to transform Kaleidoscope code such as: ```Kaleidoscope extern foo(); @@ -47,30 +47,28 @@ ifcont: ; preds = %else, %then } ``` -The entry code will convert the input x into an llvm i1 value to use as the -condition for a branch. This is done by comparing the input value of x to 0.0 to -get the condition boolean value. Then the condition is used to branch to either -the 'then' block or the 'else' block. The two target blocks contain the generated -code for the expressions for each part of the conditional and a final branch to a -continuation block. - -Since the code branch could flow into the continuation block from either the 'else' -or 'end' blocks a phi instruction is placed at the beginning of the continuation -block with appropriate values for the result from each of the two predecessor -blocks. The resulting value is then provided as the return of the function. It is -important to note that using the phi node in this fashion does not require -generating all of the code in SSA form. In fact, doing that in the front end is -strongly discouraged. Generally speaking, there are two primary reasons where a phi -node may crop up: +The entry code will convert the input x into an llvm i1 value to use as the condition for a +branch. This is done by comparing the input value of x to 0.0 to get the condition boolean +value. Then the condition is used to branch to either the 'then' block or the 'else' block. +The two target blocks contain the generated code for the expressions for each part of the +conditional and a final branch to a continuation block. + +Since the code branch could flow into the continuation block from either the 'else' or 'end' +blocks a phi instruction is placed at the beginning of the continuation block with +appropriate values for the result from each of the two predecessor blocks. The resulting +value is then provided as the return of the function. It is important to note that using the +phi node in this fashion does not require generating all of the code in SSA form. In fact, +doing that in the front end is strongly discouraged. Generally speaking, there are two +primary reasons where a phi node may crop up: 1. Mutable variables like x = 1; x = x + 1; 2. Values that are part of the structure of the language (usually for control flow) -[Chapter 7](xref:Kaleidoscope-ch7) Covers the mutable variables case in detail and -the techniques for generating the code without using a phi node. For cases like -this one where it is straight forward and easy to insert the phi node directly then -there's no reason not to. Though, the solution provided in Chapter 7 can, and does, -eliminate the need to manually insert the phi node here as well. +[Chapter 7](xref:Kaleidoscope-ch7) Covers the mutable variables case in detail and the +techniques for generating the code without using a phi node. For cases like this one where +it is straight forward and easy to insert the phi node directly then there's no reason not +to. Though, the solution provided in Chapter 7 can, and does, eliminate the need to manually +insert the phi node here as well. ### Code Generation Generating the code for the condition expression follows the pattern shown above @@ -84,38 +82,36 @@ with the following high level steps: 6. Emit conditional branch to the then, else blocks 7. Switch to the then expression block 8. Emit code for the then expression -9. Capture the insertion block location as generating the then expression may add - new blocks. +9. Capture the insertion block location as generating the then expression may add new blocks. 10. Emit a branch to the if continuation block 11. Switch to the else block 12. Emit code for the else expression 13. Emit a branch to the if continuation block -14. Capture the insertion block location as generating the else expression may add - new blocks +14. Capture the insertion block location as generating the else expression may add new + blocks 15. Switch to the if continuation block -16. Emit phi node with the results of the insertion blocks and result values - captured after generating each of the sub expressions +16. Emit phi node with the results of the insertion blocks and result values captured after + generating each of the sub expressions. 17. Use the result... -That's a bit more complex than the other language constructs seen so far, but is -still pretty straight forward once you get the general gist of how LLVM IR works. -There's one extra trick repeated in steps 9 and again in 14, where after generating -the IR for the sub expressions, the current block insertion point is captured. This -is needed as the generation for the sub expression may include another conditional -expression, which may contain a conditional sub expression, ... Thus, the 'current -block' may well have changed from the starting block. The phi node needs the -immediate predecessor block and the value it produced, so the current block is -captured after generation, before switching the block to the next one for -generation to ensure that the correct block is used with the value. +That's a bit more complex than the other language constructs seen so far, but is still +pretty straight forward once you get the general gist of how LLVM IR works. There's one +extra trick repeated in steps 9 and again in 14, where after generating the IR for the sub +expressions, the current block insertion point is captured. This is needed as the generation +for the sub expression may include another conditional expression, which may contain a +conditional sub expression, ... Thus, the 'current block' may well have changed from the +starting block. The phi node needs the immediate predecessor block and the value it produced, +so the current block is captured after generation, before switching the block to the next +one for generation to ensure that the correct block is used with the value. -The actual code follows the description pretty closely and should now be fairly -easy to follow: +The actual code follows the description pretty closely and should now be fairly easy to +follow: [!code-csharp[ConditionalExpression](CodeGenerator.cs#ConditionalExpression)] ## For Loop -Now that the basics of control flow are available it is possible to leverage the -same concepts to implement the for loop constructs for the language. +Now that the basics of control flow are available it is possible to leverage the same +concepts to implement the for loop constructs for the language. The general idea is to transform the loops in Kaleidoscope such as this: @@ -158,34 +154,31 @@ afterloop: ; preds = %loop } ``` -Thus, the basic pattern to generate the for loop code consists of the following -steps: +Thus, the basic pattern to generate the for loop code consists of the following steps: 1. Create block for loop header 2. Switch to the loop header block -2. Emit code to Initialize start value with starting value from initialization - expression. +2. Emit code to Initialize start value with starting value from initialization expression. 4. Create block for the loop body 5. Create block for the loop end 3. Emit unconditional branch to the loop body 4. Switch to the loop body block -7. Emit phi node for the loop value with the loop header block and initial value - as first predecessor. +7. Emit phi node for the loop value with the loop header block and initial value as first + predecessor. 9. Push a new scope for named values as the loop body represents a new scope 10. Add the variable for the loop to the current scope 11. Emit the body expression, which may create new blocks 12. Emit the code to compute the next value (e.g. next = current + step ) 13. Emit code for the end condition -14. Emit code to convert the result of the condition to an LLVM i1 for a - conditional branch +14. Emit code to convert the result of the condition to an LLVM i1 for a conditional branch. 15. Capture loop end block for PHI node 16. Create after loop block -17. Emit conditional branch to the loop body block or after loop block depending - on the result of the end condition -18. Add an incoming predecessor to the phi node at the beginning of the loop body - for the next loop value and the loop end block it comes from. -19. Switch to after block -20. Create constant value of 0.0 as the result expression of the for loop +17. Emit conditional branch to the loop body block or after loop block depending on the + result of the end condition. +18. Add an incoming predecessor to the phi node at the beginning of the loop body for the + next loop value and the loop end block it comes from. +19. Switch to after block. +20. Create constant value of 0.0 as the result expression of the for loop. That's a few more steps than even the if-then-else but the basic concepts of blocks, conditional branches and direct phi-nodes remains the same. @@ -194,21 +187,20 @@ The code to generate a for loop follows this pattern pretty closely. [!code-csharp[Main](CodeGenerator.cs#ForInExpression)] -The only new functionality in that is the use of the ScopeStack class to support -nested scopes and the named variables within them. ScopeStack is provided in the -Ubiquity.NET.Runtime.Utils library. It is basically a stack of name to value -mapping dictionaries. The EnterScope method will push a new dictionary on to the -stack and return an IDisposable that will handle popping it back off. This allows -for nested expressions to use variables in the parent scope and to override them -with its own value too. That, is the symbols available in a loop include the loop -variable and any variables in the parent scope, all the way back to the function -parameters. The stack nature allows for deeper scopes to shadow the variable of the -same name in the parent, while allowing access to all other variables from other -scopes. +The only new functionality in that is the use of the ScopeStack class to support nested +scopes and the named variables within them. ScopeStack is provided in the +`Ubiquity.NET.Runtime.Utils` library. It is basically a stack of name to value mapping +dictionaries. The EnterScope method will push a new dictionary on to the stack and return an +`IDisposable` that will handle popping it back off. This allows for nested expressions to +use variables in the parent scope and to override them with its own value too. That, is the +symbols available in a loop include the loop variable and any variables in the parent scope, +all the way back to the function parameters. The stack nature allows for deeper scopes to +shadow the variable of the same name in the parent, while allowing access to all other +variables from other scopes. ## Conclusion -Control flow is certainly more complex to generate than any of the other language -constructs but it relies on a few basic primitive building block patterns. Thus, it -is fairly easy to understand and implement once the basic patterns are understood. -With the inclusion of control flow the Kaleidoscope language is now a complete, -albeit simplistic, functional language. +Control flow is certainly more complex to generate than any of the other language constructs +but it relies on a few basic primitive building block patterns. Thus, it is fairly easy to +understand and implement once the basic patterns are understood. With the inclusion of +control flow the Kaleidoscope language is now a complete, albeit simplistic, functional +language. diff --git a/src/Samples/Kaleidoscope/Chapter6/Kaleidoscope-ch6.md b/src/Samples/Kaleidoscope/Chapter6/Kaleidoscope-ch6.md index 82446e02d..86a8627fe 100644 --- a/src/Samples/Kaleidoscope/Chapter6/Kaleidoscope-ch6.md +++ b/src/Samples/Kaleidoscope/Chapter6/Kaleidoscope-ch6.md @@ -3,38 +3,41 @@ uid: Kaleidoscope-ch6 --- # 6. Kaleidoscope: User Defined Operators -At this point in the progression of the tutorial, Kaleidoscope is a fully functional, albeit fairly minimal, -language. Thus far, the tutorial has avoided details of the parsing. One of the benefits of using a tool -like ANTLR4 is that you can accomplish a lot without needing to spend a lot of time thinking about the -parser too much. With user defined operators we'll break that and get down and dirty with the parser a bit -to make the operators work, in particular implementing user defined precedence. +At this point in the progression of the tutorial, Kaleidoscope is a fully functional, albeit +fairly minimal, language. Thus far, the tutorial has avoided details of the parsing. One of +the benefits of using a tool like ANTLR4 is that you can accomplish a lot without needing to +spend a lot of time thinking about the parser too much. With user defined operators we'll +break that and get down and dirty with the parser a bit to make the operators work, in +particular implementing user defined precedence. > [!TIP] -> The actual value of user defined operator precedence in a language is a bit debatable, and the -> initial plan for the Ubiquity.NET.Llvm tutorials was to skip this chapter as it doesn't involve any new -> LLVM IR or code generation. After the code was done to get the other chapters working - this one -> was still nagging, begging really, for a solution. The challenge to come up with a good solution -> was ultimately too tempting to resist, and we now have a full implementation with a few useful -> extensions on top! (Exponent operator '^', '=' vs '==', '++', and '--') +> The actual value of user defined operator precedence in a language is a bit debatable, and +> the initial plan for the Ubiquity.NET.Llvm tutorials was to skip this chapter as it +> doesn't involve any new LLVM IR or code generation. After the code was done to get the +> other chapters working - this one was still nagging, begging really, for a solution. The +> challenge to come up with a good solution was ultimately too tempting to resist, and we +> now have a full implementation with a few useful extensions on top! (Exponent operator '^', +'=' vs '==', '++', and '--') ## General idea of user defined operators -User defined operators in Kaleidoscope are a bit unique. Unlike C++ and other similar languages, the -precedence of the user defined operators in Kaleidoscope are not fixed. Though, the built-in operators -all use a fixed precedence. That poses some interesting challenges for a parser as it must dynamically -adapt to the state of the language runtime as it is parsing so that it can correctly evaluate the operator -expressions. - -Making that work while using ANTLR requires looking under the hood to how ANTLR4 ordinarily handles -precedence. A full treatise on the subject is outside the scope of this tutorial, but the -[ANTLR GitHub site](https://github.com/antlr/antlr4/blob/master/doc/left-recursion.md) -has a good description of the details of the precedence climbing approach used in ANTLR. The general idea -is that the expression rule takes an additional precedence argument and the operator expressions include -a semantic predicate that tests the current precedence level. If the current level is greater than or equal -to the current level then that operator rule expression is allowed to match the input. Otherwise, the rule -is skipped. Usually this is all hidden by the implicit support for precedence climbing and left recursion -that is built-in to ANTLR4. However, that requires fixing the precedence for operators in the grammar. -Thus, Kaleidoscope doesn't use the default left-recursion support, but does use the same concepts with -custom semantic predicates in the code behind. +User defined operators in Kaleidoscope are a bit unique. Unlike C++ and other similar +languages, the precedence of the user defined operators in Kaleidoscope are not fixed. +Though, the built-in operators all use a fixed precedence. That poses some interesting +challenges for a parser as it must dynamically adapt to the state of the language runtime as +it is parsing so that it can correctly evaluate the operator expressions. + +Making that work while using ANTLR requires looking under the hood to how ANTLR4 ordinarily +handles precedence. A full treatise on the subject is outside the scope of this tutorial, +but the [ANTLR GitHub site](https://github.com/antlr/antlr4/blob/master/doc/left-recursion.md) +has a good description of the details of the precedence climbing approach used in ANTLR. The +general idea is that the expression rule takes an additional precedence argument and the +operator expressions include a semantic predicate that tests the current precedence level. +If the current level is greater than or equal to the current level then that operator rule +expression is allowed to match the input. Otherwise, the rule is skipped. Usually this is +all hidden by the implicit support for precedence climbing and left recursion that is +built-in to ANTLR4. However, that requires fixing the precedence for operators in the +grammar. Thus, Kaleidoscope doesn't use the default left-recursion support, but does use the +same concepts with custom semantic predicates in the code behind. ``` antlr // pull the initializer out to a distinct rule so it is easier to get at @@ -65,9 +68,10 @@ expression[int _p] ``` Two custom functions are used to handle the dynamic runtime defined nature of the precedence. -1. GetPrecedence() used in the semantic predicate determines the precedence of the operator for -the current rule -2. GetNextPrecedence() is used to determine the next higher level of precedence for any child expressions +1. GetPrecedence() used in the semantic predicate determines the precedence of the operator + for the current rule. +2. GetNextPrecedence() is used to determine the next higher level of precedence for any + child expressions. These are implemented in the partial class extension of the parser: ``` C# @@ -82,15 +86,16 @@ private int GetNextPrecedence( ) } ``` -These two functions use the current input state to identify the actual operator token. Get Precedence does -a look-ahead by one token to determine what the precedence for the operator is. The rest of the rule is -only executed if the precedence is greater than or equal to the current precedence. The right hand side -matches expressions of a higher precedence by doing a look-behind one token to get the next precedence -level. The custom parser functions are pretty small since they defer the real work to the GlobalState -instance provided when constructing the parser. The state is an instance of the DynamicRuntimeState class. -Up until now, this state was only used to determine the language features to enable. With dynamic precedence -for user operators, the state maintains a pair of tables of operator information that includes the symbol -for the operator and precedence: +These two functions use the current input state to identify the actual operator token. +`GetPrecedence()` does a look-ahead by one token to determine what the precedence for the +operator is. The rest of the rule is only executed if the precedence is greater than or +equal to the current precedence. The right hand side matches expressions of a higher +precedence by doing a look-behind one token to get the next precedence level. The custom +parser functions are pretty small since they defer the real work to the GlobalState instance +provided when constructing the parser. The state is an instance of the DynamicRuntimeState +class. Up until now, this state was only used to determine the language features to enable. +With dynamic precedence for user operators, the state maintains a pair of tables of operator +information that includes the symbol for the operator and precedence: ``` C# private OperatorInfoCollection UnaryOps = new OperatorInfoCollection( ); @@ -107,10 +112,11 @@ private OperatorInfoCollection BinOpPrecedence = new OperatorInfoCollection( ) }; ``` -The tables are used to determine the precedence for an operator and what the next precedence should be. -They start out with the built-in binary operators. (Kaleidoscope doesn't define any unary operators so -that table starts empty) The GetPrededence() and GetNextPrecedence() functions lookup the operators token -in the table to determine the operators associativity and its precedence. +The tables are used to determine the precedence for an operator and what the next precedence +should be. They start out with the built-in binary operators. (Kaleidoscope doesn't define +any unary operators so that table starts empty) The GetPrededence() and GetNextPrecedence() +functions lookup the operators token in the table to determine the operators associativity +and its precedence. ``` C# public OperatorInfo GetBinOperatorInfo( int tokenType ) @@ -148,57 +154,65 @@ internal int GetNextPrecedence( int tokenType ) } ``` -This provides the core ability for looking up and handling precedence. Though, as shown so far, it is just -a rather convoluted form of what ANTLR4 gives us for free. The real point of this runtime state is the -ability of the language to _dynamically add user operators_. By adding operators to the runtime state the -lookup process will include them during parsing. +This provides the core ability for looking up and handling precedence. Though, as shown so +far, it is just a rather convoluted form of what ANTLR4 gives us for free. The real point of +this runtime state is the ability of the language to _dynamically add user operators_. By +adding operators to the runtime state the lookup process will include them during parsing. -Actually adding the operators to the table is handled in the parsing process itself using a feature of the -ANTLR generated parser called a "Parse Listener". A parse listener is attached to the parser and effectively -monitors the entire parsing process. For the user operators, the listener will listen for the specific case -of a complete definition of a user operator. When it detects such a case it will update the runtime table to -include the operator. +Actually adding the operators to the table is handled in the parsing process itself using a +feature of the ANTLR generated parser called a "Parse Listener". A parse listener is +attached to the parser and effectively monitors the entire parsing process. For the user +operators, the listener will listen for the specific case of a complete definition of a user +operator. When it detects such a case it will update the runtime table to include the +operator. [!code-csharp[UserOperatorListener](../Kaleidoscope.Grammar/KaleidoscopeUserOperatorListener.cs)] -With the use of the listener the dynamic precedence is contained entirely in the parser. When the parse tree is -processed to produce the AST the user defined operators are transformed to simple function declarations and -function calls. This simplification allows consumers of the AST to remain blissfully ignorant of the issue of -precedence and even the existence of user defined operators. +With the use of the listener the dynamic precedence is contained entirely in the parser. +When the parse tree is processed to produce the AST the user defined operators are +transformed to simple function declarations and function calls. This simplification allows +consumers of the AST to remain blissfully ignorant of the issue of precedence and even the +existence of user defined operators. -Ordinarily it is best to design parsers without any sort of context or feedback mechanisms to keep them easier -to maintain. However, the language design of the Kaleidoscope language requires some level of feedback so that -the parser can handle precedence correctly. Using the state, listener and semantic predicates keeps the roles -of each part clear while allowing for completely isolating the feedback to the parsing alone. +Ordinarily it is best to design parsers without any sort of context or feedback mechanisms +to keep them easier to maintain. However, the language design of the Kaleidoscope language +requires some level of feedback so that the parser can handle precedence correctly. Using +the state, listener and semantic predicates keeps the roles of each part clear while +allowing for completely isolating the feedback to the parsing alone. ### AST -When building the AST, Prototypes for user defined operators are transformed to a FunctionDeclaration +When building the AST, Prototypes for user defined operators are transformed to a +`FunctionDeclaration` + [!code-csharp[UserOperatorPrototypes](../Kaleidoscope.Grammar/AST/AstBuilder.cs#UserOperatorPrototypes)] -During construction of the AST all occurrences of a user defined operator expression are transformed into a function -call for the function that actually implements the behavior for the operator. +During construction of the AST all occurrences of a user defined operator expression are +transformed into a function call for the function that actually implements the behavior for +the operator. [!code-csharp[UserBinaryOpExpression](../Kaleidoscope.Grammar/AST/AstBuilder.cs#UserBinaryOpExpression)] [!code-csharp[UnaryOpExpression](../Kaleidoscope.Grammar/AST/AstBuilder.cs#UnaryOpExpression)] -Thus, after AST transformation completes, the user defined operators, and any issues of precedence no longer -exist! +Thus, after AST transformation completes, the user defined operators, and any issues of +precedence no longer exist! ### CodeGen and Driver -If you compare the code generation and driver code between Chapter 5 and Chapter 6 you'll see the only differences -are the change of namespace and the language level setting - it got a bump (Literally a single enum on one line of -each component). Everything else is identical. This is because the real work is on the parser and AST not the code -generation. This is where having a good parser + AST model can help keep the code generation simpler. If the parse -tree alone was used, then the code generation would need additional support similar to what is found in the AST -generation. Putting it into the AST generation keeps things much cleaner as, obviously, the support for user -defined operators and precedence has nothing to do with code generation. Keeping the code generation simpler is -generally a really good thing! +If you compare the code generation and driver code between Chapter 5 and Chapter 6 you'll +see the only differences are the change of namespace and the language level setting - it got +a bump (Literally a single enum on one line of each component). Everything else is identical. +This is because the real work is on the parser and AST not the code generation. This is +where having a good parser + AST model can help keep the code generation simpler. If the +parse tree alone was used, then the code generation would need additional support similar to +what is found in the AST generation. Putting it into the AST generation keeps things much +cleaner as, obviously, the support for user defined operators and precedence has nothing to +do with code generation. Keeping the code generation simpler is generally a really +good thing! That completes the support for user defined operators. ### Example -The following example is a complete program in Kaleidoscope that will generate a textual representation -of the classic Mandelbrot Set using all of the features of the language. +The following example is a complete program in Kaleidoscope that will generate a textual +representation of the classic Mandelbrot Set using all of the features of the language. [!code-Kaleidoscope[mandel](mandel.kls)] @@ -252,16 +266,18 @@ Ready> ``` ## Conclusion -Adding user defined operators with user defined precedence is fairly straight forward to implement in -terms of the code generation. No new code generation is required (Can't get any simpler than that! :grin: ). -ANTLR4 has support for left-recursion in the grammar and precedence of expressions. Even though ANTLR4 only -directly supports fixed precedence it is rather easy to extend the underlying support to handle dynamic -precedence and associativity, once the underlying mechanics are understood. The rest is on the AST -construction as it converts the user defined operators to function definitions and function calls. +Adding user defined operators with user defined precedence is fairly straight forward to +implement in terms of the code generation. No new code generation is required (Can't get any +simpler than that! :grin: ). ANTLR4 has support for left-recursion in the grammar and +precedence of expressions. Even though ANTLR4 only directly supports fixed precedence it is +rather easy to extend the underlying support to handle dynamic precedence and associativity, +once the underlying mechanics are understood. The rest is on the AST construction as it +converts the user defined operators to function definitions and function calls. >[!TIP] ->An early version of these samples skipped the use of an AST and used the parse tree directly. You can ->compare the history of the generators for that transition to see how the AST helps simplify the code ->generation. (Not to mention sets the stage for an otherwise unimplemented feature - truly lazy compilation, ->which is covered in [Chapter 7.1](xref:Kaleidoscope-ch7.1).) +>An early version of these samples skipped the use of an AST and used the parse tree +> directly. You can compare the history of the generators for that transition to see how the +> AST helps simplify the code generation. (Not to mention sets the stage for an otherwise +> unimplemented feature - truly lazy compilation, which is covered in +> [Chapter 7.1](xref:Kaleidoscope-ch7.1).) diff --git a/src/Samples/Kaleidoscope/Chapter7.1/Kaleidoscope-ch7.1.md b/src/Samples/Kaleidoscope/Chapter7.1/Kaleidoscope-ch7.1.md index 766fc5907..4f4fb925c 100644 --- a/src/Samples/Kaleidoscope/Chapter7.1/Kaleidoscope-ch7.1.md +++ b/src/Samples/Kaleidoscope/Chapter7.1/Kaleidoscope-ch7.1.md @@ -3,67 +3,71 @@ uid: Kaleidoscope-ch7.1 --- >[!WARNING] -> There is a fatal flaw in the current design of this support for an interactive -> runtime like Kaleidoscope thus far. It does NOT allow for re-defining a -> function. Once it is defined, you cannot define it again or an exception -> or application crash will occur. Hopefully a future variant of this sample -> will address tracking and removing that. See [Special notes for interactive run-times](#special-notes-for-interactive-run-times) +> There is a fatal flaw in the current design of this support for an interactive runtime +> like Kaleidoscope thus far. It does NOT allow for re-defining a function. Once it is +> defined, you cannot define it again or an exception or application crash will occur. +> This is handled in Kaleidoscope by setting an option in the `DynamicRuntimeState` to +> indicate that redefinition is not supported. This is then processed in the AST conversion +> to produce an error node. This reports the redefinition as an error in the input rather +> then an exception at runtime. Hopefully a future variant of this sample will address +> tracking and removing that. See [Special notes for interactive run-times](#special-notes-for-interactive-run-times) > for more details. # 7. Kaleidoscope: Extreme Lazy JIT -In the previous chapters the code generation took an AST, converted it to LLVM IR, handed the IR -to the JIT, which then generated the native code. For a top level anonymous expression that is -pretty much all you need. But what if a function is defined but not used (yet or ever)? The -process of generating the IR, and then subsequently the native code, is all wasted overhead in -such a case. That's not really following through on the "Just-In-Time" part of the JIT. This -chapter focuses on resolving that with truly lazy JIT that doesn't even generate the LLVM IR -for a function until it is called for the first time. +In the previous chapters the code generation took an AST, converted it to LLVM IR, handed +the IR to the JIT, which then generated the native code. For a top level anonymous +expression that is pretty much all you need. But what if a function is defined but not used +(yet or ever)? The process of generating the IR, and then subsequently the native code, is +all wasted overhead in such a case. That's not really following through on the +"Just-In-Time" part of the JIT. This chapter focuses on resolving that with truly lazy JIT +that doesn't even generate the LLVM IR for a function until it is called for the first time. ## Performance trade-offs -As with many things in software, there are trade-offs involved. In this case the trade-off is -when you JIT compile vs. lazy compile. This choice is a major element to efficient use of a JIT. -The more you have to JIT before anything can actually run the slower the application startup is. -If you defer too much then the execution slows down as everything needs to compile code. -Ultimately, there is no one "right" solution as many factors contribute to the results, including -the level of optimizations applied during generation. (e.g. it might achieve better results to -generate unoptimized code during startup, and later regenerate optimized versions of the most -frequently used code.) - -The approach to balancing the trade-offs taken in this chapter is to eagerly compile top level -expressions as it is obvious they are going to be called, and discarded afterwards. For function -definitions, it isn't clear if the functions will or won't be called. While, the code generation -could scan the function to find all functions it calls to generate them all at the same time - -there is no guarantee that the input arguments to the function will go through a path that needs -them all. Thus, for Kaleidoscope, function definitions are all lazy compiled on first use. +As with many things in software, there are trade-offs involved. In this case the trade-off +is when you JIT compile vs. lazy compile. This choice is a major element to efficient use of +a JIT. The more you have to JIT before anything can actually run the slower the application +startup is. If you defer too much then the execution slows down as everything needs to +compile code. Ultimately, there is no one "right" solution as many factors contribute to the +results, including the level of optimizations applied during generation. (e.g. it might +achieve better results to generate unoptimized code during startup, and later regenerate +optimized versions of the most frequently used code.) + +The approach to balancing the trade-offs taken in this chapter is to eagerly compile top +level expressions as it is obvious they are going to be called, and discarded afterwards. +For function definitions, it isn't clear if the functions will or won't be called. While, +the code generation could scan the function to find all functions it calls to generate them +all at the same time - there is no guarantee that the input arguments to the function will +go through a path that needs them all. Thus, for Kaleidoscope, function definitions are all +lazy compiled on first use. ## General Concept of Lazy Compilation -The general idea is that the language runtime registers every lazy JIT function with the JIT by -name with a callback function to handle generating code for that function. This does two things -in the JIT: +The general idea is that the language runtime registers every lazy JIT function with the JIT +by name with a callback function to handle generating code for that function. This does two +things in the JIT: 1. Adds the name to the function symbol table in the JIT - 2. Creates a stub implementation function in native code that will call back to the JIT when - application code calls the function. + 2. Creates a stub implementation function in native code that will call back to the JIT + when application code calls the function. The stub is implemented by the JIT to call back into the JIT in a way that includes the -information needed to identify the correct function to generate code for. The JIT will do some -of it's own internal setup and then call the code generation callback registered by the runtime -code generator. This callback is what actually generates the LLVM IR, and ultimately the native -code, for the function. +information needed to identify the correct function to generate code for. The JIT will do +some of it's own internal setup and then call the code generation callback registered by the +runtime code generator. This callback is what actually generates the LLVM IR, and ultimately +the native code, for the function. Once the function is generated the generator uses the JIT to update the stub so that, in the -future, it will just call to the generated function directly. One somewhat confusing aspect of -this is that there are two symbols in the JIT for what is really only one function. One, is the -stub that remains at a fixed location (to allow pointer to function patterns to work) the other -is the JIT compiled actual implementation of the function. They can't both have the same name -so the code generation for the implementation must use a unique name. +future, it will just call to the generated function directly. One somewhat confusing aspect +of this is that there are two symbols in the JIT for what is really only one function. One, +is the stub that remains at a fixed location (to allow pointer to function patterns to work) +the other is the JIT compiled actual implementation of the function. They can't both have +the same name so the code generation for the implementation must use a unique name. ## Code changes for lazy JIT ### Initialization -The LLVM ORC JIT v2 uses a multi-layered system for materializing the IR and eventually the native -executable code. The Kaleidoscope JIT includes transforms of IR modules to support setting the -data layout for the module to match the JIT and also to run optimization passes on the module. -To support lazy evaluation a few such components are needed for the code generator. These are -setup in the constructor and destroyed in the Dispose method. +The LLVM ORC JIT v2 uses a multi-layered system for materializing the IR and eventually the +native executable code. The Kaleidoscope JIT includes transforms of IR modules to support +setting the data layout for the module to match the JIT and also to run optimization passes +on the module. To support lazy evaluation a few such components are needed for the code +generator. These are setup in the constructor and destroyed in the Dispose method. [!code-csharp[PrivateMembers](CodeGenerator.cs#PrivateMembers)] @@ -73,96 +77,99 @@ setup in the constructor and destroyed in the Dispose method. ### Body implementation -Since the lazy JIT registers the callback stub with the function's name when the actual function -is generated it needs a new name for the backing body. So, we add a new helper method to -effectively clone a FunctionDefinition AST node while renaming it. This only needs a shallow -clone that changes the name so there isn't a lot of overhead for it. (Theoretically, this could -be done with a readonly struct and 'with', such an optimization is left as an exercise for the -reader :nerd_face:) +Since the lazy JIT registers the callback stub with the function's name when the actual +function is generated it needs a new name for the backing body. So, we add a new helper +method to effectively clone a FunctionDefinition AST node while renaming it. This only needs +a shallow clone that changes the name so there isn't a lot of overhead for it. +(Theoretically, this could be done with a readonly struct and 'with', such an optimization +is left as an exercise for the reader :nerd_face:) [!code-csharp[CloneAndRenameFunction](CodeGenerator.cs#CloneAndRenameFunction)] -The name used for the body is the original function name plus the suffix `$impl` tacked onto the -end. This suffix was chosen as it includes characters not allowed within the Kaleidoscope -language so there is no possibility of a name collision. +The name used for the body is the original function name plus the suffix `$impl` tacked onto +the end. This suffix was chosen as it includes characters not allowed within the +Kaleidoscope language so there is no possibility of a name collision. ### Code generation -The next requirement is to change how we generate the functions. For an anonymous function the -generation is pretty much the same. There's really no point in going through the process of -setting up the lazy JIT when the next thing to do is get the address of the function and call it. For other definitions, though, -things get different as they are selected for lazy JIT. +The next requirement is to change how we generate the functions. For an anonymous function +the generation is pretty much the same. There's really no point in going through the process +of setting up the lazy JIT when the next thing to do is get the address of the function and +call it. For other definitions, though, things get different as they are selected for lazy +JIT. [!code-csharp[Generate](CodeGenerator.cs#Generate)] -Function definitions for lazy JIT are first cloned and renamed, as discussed previously. Then a -lazy module materializer is registered for the name of the function. This creates the stub -function exported by the function's name with a callback that knows how to generate the LLVM IR -for the function. The actual code generation call back is a local function that has captured the -AST so it initializes a new module, generates the function using the visitor pattern to -generate LLVM IR for the function into the freshly allocated module. (This is where keeping the -code generation ignorant of the JIT comes in handy as the same code is called to generate a -function into a module and doesn't need to care if it is eager or lazy) - -The JIT implementation will do the following after the generator -callback returns: +Function definitions for lazy JIT are first cloned and renamed, as discussed previously. +Then a lazy module materializer is registered for the name of the function. This creates the +stub function exported by the function's name with a callback that knows how to generate the +LLVM IR for the function. The actual code generation call back is a local function that has +captured the AST so it initializes a new module, generates the function using the visitor +pattern to generate LLVM IR for the function into the freshly allocated module. (This is +where keeping the code generation ignorant of the JIT comes in handy as the same code is +called to generate a function into a module and doesn't need to care if it is eager or lazy) + +The JIT implementation will do the following after the generator callback returns: 1. Add the returned module to the JIT 2. Generate native code for the module 3. Get the address of the implementation function - 4. Update the stub for the function with the address of the function instead of the internal - callback. + 4. Update the stub for the function with the address of the function instead of the + internal callback. 5. return the address to the JIT engine so it can ultimately call the function and continue on it's merry way. #### Lazy Materializer -The bulk of the work is in the ORCJIT v2 implementation however kaleidoscope must "hook" into -the support there to provide a materializer that can convert the AST into an LLVM IR. Technically, -it provides an LLVM module for a symbol (the body implementation name). The JIT couldn't care -less about the AST. The materializer will generate the IR for a given symbol by processing the -AST into a module and providing that to the JIT. +The bulk of the work is in the ORCJIT v2 implementation however kaleidoscope must "hook" +into the support there to provide a materializer that can convert the AST into an LLVM IR. +Technically, it provides an LLVM module for a symbol (the body implementation name). The JIT +couldn't care less about the AST. The materializer will generate the IR for a given symbol +by processing the AST into a module and providing that to the JIT. [!code-csharp[AddLazyMaterializer](CodeGenerator.cs#AddLazyMaterializer)] ## Conclusion Implementing Lazy JIT support with Ubiquity.NET.Llvm is a bit more complex, but still not significant. It took almost as many words to describe then actual lines of code. Efficiently, -supporting lazy JIT is a much more complex matter. There are trade-offs doing things lazy, in -particular the application can stall for a period, while the system generates new code to run -"on the fly". Optimizations, when fully enabled, add additional time to the code generation. -While, for some applications, it may be obvious whether these factors matter or not, in general -it's not something that can be known, thus the quest for optimal efficiency includes decisions -on eager vs lazy JIT as well as optimized JIT or not. This can include lazy JIT with minimal -optimization during startup of an app. Once things are up and going the engine can come back -to re-generate the functions with full optimization. All sorts of possibilities exist, but the -basics of how the lazy and eager generation works doesn't change no matter what approach a given -language or runtime wants to use. For most DSLs like Kaleidoscope these trade-offs are not -generally relevant (Or even necessary) as the fundamental point is to simplify expression of a -particular domain problem in domain terminology. Performance trade-offs are often not that -important for such cases. (And can occasionally get in the way - See [Special notes for -interactive run-times](#special-notes-for-interactive-run-times) below for more details) +supporting lazy JIT is a much more complex matter. There are trade-offs doing things lazy, +in particular the application can stall for a period, while the system generates new code to +run "on the fly". Optimizations, when fully enabled, add additional time to the code +generation. While, for some applications, it may be obvious whether these factors matter or +not, in general it's not something that can be known, thus the quest for optimal efficiency +includes decisions on eager vs lazy JIT as well as optimized JIT or not. This can include +lazy JIT with minimal optimization during startup of an app. Once things are up and going +the engine can come back to re-generate the functions with full optimization. All sorts of +possibilities exist, but the basics of how the lazy and eager generation works doesn't +change no matter what approach a given language or runtime wants to use. For most DSLs like +Kaleidoscope these trade-offs are not generally relevant (Or even necessary) as the +fundamental point is to simplify expression of a particular domain problem in domain +terminology. Performance trade-offs are often not that important for such cases. (And can +occasionally get in the way - See [Special notes for interactive run-times](#special-notes-for-interactive-run-times) +below for more details) ### Special notes for interactive run-times -It turns out that re-definition of a lazy JIT'd function is a rather complex problem involving -a lot of moving pieces. The IR module for the AST is lazy generated asynchronously and added to -the JIT AFTER production by the materialization by the infrastructure. That is, outside of the -driving application code control so it can't specify a resource tracker. Additionally, there is -no resource tracker for a materialization unit that can remove the unit BEFORE it is run. +It turns out that re-definition of a lazy JIT'd function is a rather complex problem +involving a lot of moving pieces. The IR module for the AST is lazy generated asynchronously +and added to the JIT AFTER production by the materialization by the infrastructure. That is, +outside of the driving application code control so it can't specify a resource tracker. +Additionally, there is no resource tracker for a materialization unit that can remove the +unit BEFORE it is run. There are at least three states of a function definition to deal with: 1) Not defined anywhere yet (First occurrence) 2) Materializer Created, but not yet materialized 3) Already materialized. -Tracking of each is different and thus handling removal will require different implementations. -All of which requires thread synchronization as the JIT could materialize the function at ANY -point along the way! So it is possible that while trying to remove a definition it transitions -from #2 to #3. Even if code for removal looked at the state first it's a classic -[TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) problem. There is no -mechanism in the standard OrcJIT v2 for this scenario. It is arguable what the validity of such -a thing is for an interactive language/runtime. For any sufficiently complex thing there's at -least two high level default questions to ask: +Tracking of each is different and thus handling removal will require different +implementations. All of which requires thread synchronization as the JIT could materialize +the function at ANY point along the way! So it is possible that while trying to remove a +definition it transitions from #2 to #3. Even if code for removal looked at the state first +it's a classic [TOCTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) problem. +There is no mechanism in the standard OrcJIT v2 for this scenario. It is arguable what the +validity of such a thing is for an interactive language/runtime. For any sufficiently +complex thing there's at least two high level default questions to ask: -1) Is it worth the cost of implementation? -2) Do we even know HOW to do it yet? +1) Do we even know HOW to do it yet? +2) Is it worth the cost of implementation? For an interactive language/runtime like Kaleidoscope, the answer to both thus far is a -hard 'NO'. This sort of support is best for non-interactive run-times like .NET or Java. +hard 'NO'. This sort of support is best for non-interactive run-times like .NET or Java +where redefinition isn't legal syntax and caught in the parser/AST transforms. diff --git a/src/Samples/Kaleidoscope/Chapter7/Kaleidoscope-ch7.md b/src/Samples/Kaleidoscope/Chapter7/Kaleidoscope-ch7.md index 2912bd928..5f7a03f98 100644 --- a/src/Samples/Kaleidoscope/Chapter7/Kaleidoscope-ch7.md +++ b/src/Samples/Kaleidoscope/Chapter7/Kaleidoscope-ch7.md @@ -3,22 +3,24 @@ uid: Kaleidoscope-ch7 --- # 7. Kaleidoscope: Mutable Variables -The previous chapters introduced the Kaleidoscope language and progressively implemented a variety of -language features to make a fully featured, though simplistic, functional programming language. To a -certain extent the choice of a functional language was a bit of a cheat. Generating LLVM IR for a -functional language is straight forward as functional languages map very easily into the LLVM native -[SSA form](http://en.wikipedia.org/wiki/Static_single_assignment_form). While the SSA form is very -useful for transformations and optimizations it is sometimes overwhelming to new users of LLVM. In -particular it may seem like LLVM doesn't support imperative languages with mutable variables or that -you need to convert all such languages into SSA form before generating LLVM IR. That is a bit of a -daunting task that might scare off a number of users. The good news is, there is no need for a language -front-end to convert to SSA form directly. +The previous chapters introduced the Kaleidoscope language and progressively implemented a +variety of language features to make a fully featured, though simplistic, functional +programming language. To a certain extent the choice of a functional language was a bit of a +cheat. Generating LLVM IR for a functional language is straight forward as functional +languages map very easily into the LLVM native [SSA form](http://en.wikipedia.org/wiki/Static_single_assignment_form). +While the SSA form is very useful for transformations and optimizations it is sometimes +overwhelming to new users of LLVM. In particular it may seem like LLVM doesn't support +imperative languages with mutable variables or that you need to convert all such languages +into SSA form before generating LLVM IR. That is a bit of a daunting task that might scare +off a number of users. The good news is, there is no need for a language front-end to +convert to SSA form directly. >[!IMPORTANT] ->***In LLVM There is no need for a language front-end to convert to SSA form directly!*** In fact, -> ***manually*** converting to SSA form is strongly discouraged! LLVM already has very efficient, -> and more importantly, well tested, support for converting to SSA form (though how that works might -> be a bit surprising - read on!). The use of this support is the focus of this chapter. +>***In LLVM There is no need for a language front-end to convert to SSA form directly!*** +> In fact, ***manually*** converting to SSA form is strongly discouraged! LLVM already has +> very efficient, and more importantly, well tested, support for converting to SSA form +> (though how that works might be a bit surprising - read on!). The use of this support is +> the focus of this chapter. ## Mutable Variables in LLVM ### Mutable Variables vs. SSA, What's the big deal? @@ -38,8 +40,8 @@ int test(_Bool Condition) ``` The general idea of how to handle this in LLVM SSA form was already covered in [Chapter 5](xref:Kaleidoscope-ch5). -Since there are two possible values for X when the function returns, a PHI node is inserted to merge the values. -The LLVM IR for this would look like this: +Since there are two possible values for X when the function returns, a PHI node is inserted +to merge the values. The LLVM IR for this would look like this: ```llvm @G = weak global i32 0 ; type of @G is i32* @@ -63,35 +65,39 @@ cond_next: } ``` -A full treatise on SSA is beyond the scope of this tutorial. If you are interested, there are plenty of -[resources available on-line](http://en.wikipedia.org/wiki/Static_single_assignment_form). The focus for -this chapter is on how traditional imperative language front-ends can use the LLVM support for mutable -values without performing SSA conversion up-front. While, LLVM requires IR in SSA form (there's no such -thing as "non-SSA mode"). Constructing the SSA form usually would require non-trivial algorithms and data -structures, so it is both wasteful and error-prone for every front-end to have to manage implementing such -a thing. Thus, LLVM provides a consistent and simpler solution. +A full treatise on SSA is beyond the scope of this tutorial. If you are interested, there +are plenty of [resources available on-line](http://en.wikipedia.org/wiki/Static_single_assignment_form). +The focus for this chapter is on how traditional imperative language front-ends can use the +LLVM support for mutable values without performing SSA conversion up-front. While, LLVM +requires IR in SSA form (there's no such thing as "non-SSA mode"). Constructing the SSA form +usually would require non-trivial algorithms and data structures, so it is both wasteful and +error-prone for every front-end to have to manage implementing such a thing. Thus, LLVM +provides a consistent and simpler solution. ### Memory in LLVM -The trick to the apparent incompatibility of SSA in LLVM IR and mutable values in imperative languages -lies in how LLVM deals with memory. While LLVM requires all register values in SSA form, it does not -require, or even permit, memory objects in SSA form. In the preceding example, access to global values G -and H are direct loads of memory. They are not named or versioned in any way. This differs from some other -compiler implementations that try to version memory objects. In LLVM, instead of encoding data-flow -analysis of memory in the IR, it is handled with Analysis Passes, which are computed on demand. This -further helps to reduce the work load of building a front-end while re-using well tested support in the -LLVM libraries. - -Given all of that, the general idea is to create a stack variable, which lives in memory, for each mutable -object in a function. Since LLVM supports loads and stores from/to memory - mutable values are fairly -straight forward. Though, they may seem terribly inefficient at first. But, fear not LLVM has a way to deal -with that. (Optimizations and efficiency is getting ahead of things a bit.) - -In LLVM, memory accesses are always explicit with load/store instructions. LLVM has no "address-of" -operator, and doesn't need one. Notice the type of the LLVM variables @G, and @H from the sample are -actually `i32*` even though the variable is defined as i32. In other words, @G (and @H) defines space for -an i32, but the actual symbolic name refers to the address for that space (e.g. it's a pointer). Stack -variables work the same way, except that instead of static allocation via a global declaration they are -declared with the [LLVM alloca instruction](xref:Ubiquity.NET.Llvm.Instructions.Alloca). +The trick to the apparent incompatibility of SSA in LLVM IR and mutable values in imperative +languages lies in how LLVM deals with memory. While LLVM requires all register values in SSA +form, it does not require, or even permit, memory objects in SSA form. In the preceding +example, access to global values G and H are direct loads of memory. They are not named or +versioned in any way. This differs from some other compiler implementations that try to +version memory objects. In LLVM, instead of encoding data-flow analysis of memory in the IR, +it is handled with Analysis Passes, which are computed on demand. This further helps to +reduce the work load of building a front-end while re-using well tested support in the LLVM +libraries. + +Given all of that, the general idea is to create a stack variable, which lives in memory, +for each mutable object in a function. Since LLVM supports loads and stores from/to memory - +mutable values are fairly straight forward. Though, they may seem terribly inefficient at +first. But, fear not LLVM has a way to deal with that. (Optimizations and efficiency is +getting ahead of things a bit.) + +In LLVM, memory accesses are always explicit with load/store instructions. LLVM has no +"address-of" operator, and doesn't need one. Notice the type of the LLVM variables @G, and +@H from the sample are actually `i32*` even though the variable is defined as i32. In other +words, @G (and @H) defines space for an i32, but the actual symbolic name refers to the +address for that space (e.g. it's a pointer). Stack variables work the same way, except that +instead of static allocation via a global declaration they are declared with the +[LLVM alloca instruction](xref:Ubiquity.NET.Llvm.Instructions.Alloca). ```llvm define i32 @example() { @@ -104,10 +110,10 @@ entry: ... ``` -This code shows how LLVM supports creation and manipulation of stack based variables. Stack memory allocated -with alloca is completely generalized. you can pass the address of a stack slot to a function, store it in -a variable, etc... Using alloca, the previous example could be re-written using alloca without the PHI node -as follows: +This code shows how LLVM supports creation and manipulation of stack based variables. Stack +memory allocated with alloca is completely generalized. you can pass the address of a stack +slot to a function, store it in a variable, etc... Using alloca, the previous example could +be re-written using alloca without the PHI node as follows: ```llvm @G = weak global i32 0 ; type of @G is i32* @@ -134,20 +140,21 @@ cond_next: } ``` -This example shows the general approach for handling arbitrary mutable values in LLVM IR without the need -for PHI nodes. +This example shows the general approach for handling arbitrary mutable values in LLVM IR +without the need for PHI nodes. 1. Mutable Variables become a stack allocation 2. Reading the variable uses a load instruction to retrieve the value from memory 3. Updates of the variable become a store instruction to write the value to memory 4. Taking the address of a variable just uses the stack address directly -This nicely and cleanly handles mutable variables in a fairly simple and easy to generate form. However, it -has apparently introduced a new problem. Every variable use requires stack memory and reads/writes operate -directly on stack memory - a major performance penalty. Fortunately, as previously hinted, LLVM has a well -tuned optimization pass named "mem2reg" that handles this case, promoting allocas into SSA registers, inserting -PHI nodes as necessary. For example if you run the alloca version of the IR code through the mem2reg optimization -pass you get: +This nicely and cleanly handles mutable variables in a fairly simple and easy to generate +form. However, it has apparently introduced a new problem. Every variable use requires stack +memory and reads/writes operate directly on stack memory - a major performance penalty. +Fortunately, as previously hinted, LLVM has a well tuned optimization pass named "mem2reg" +that handles this case, promoting allocas into SSA registers, inserting PHI nodes as +necessary. For example if you run the alloca version of the IR code through the mem2reg +optimization pass you get: ```llvm $ llvm-as < example.ll | opt -mem2reg | llvm-dis @@ -172,47 +179,54 @@ cond_next: } ``` -The mem2reg pass implements the standard "iterated dominance frontier" algorithm for building -the SSA form with specialized optimizations to speed up common degenerate cases. The mem2reg pass -is an integral part of the full solution to mutable variables. Using mem2reg is highly recommended. -There are a few conditions for using mem2reg correctly. - -1. mem2reg is based on alloca: it looks for and promotes alloca. It does not apply to globals or heap allocations. -1. mem2reg only looks for alloca instructions in the **entry block** of the function. Placing Alloca instructions for -all variables, in all scopes, in the entry block ensures they are executed only once, which makes the conversion -simpler. -1. mem2reg only promotes Alloca instructions whose only uses are direct loads and stores. If the address of the object -is passed to a function or any pointer math applied the alloca is **not** promoted. -1. mem2reg only works on Alloca instructions of first class values (such as pointers, scalars and vectors), and only if -the array size of the allocation is 1. -1. mem2reg is not capable of promoting structs or arrays to registers. (The SROA pass is more powerful and can promote structs, unions and arrays in many cases) - -These may seem onerous but are really fairly straight forward and easy to abide, the rest of this chapter -will focus on doing that with the Kaleidoscope language. If you are considering doing your own SSA construction, -then please stop and consider the following aspects of the existing LLVM patterns and mem2reg: - -* The mem2reg and alloca pattern is proven and very well tested. The most common clients of LLVM use this -for the bulk of their variables, bugs are found fast and early. -* It is fast, the LLVM implementation has a number of optimizations that make it fast in common cases and -fully general. This includes fast-paths for variables used only in a single block, variables with only a -single assignment point, and heuristics to help avoid phi nodes when not needed. -* It is needed for debug info generation, debug info in LLVM relies on having the address of the variable -exposed so that debugging data is attached to it. The mem2reg+alloca pattern fits well with this debug info -style. -* It's really simple to do, letting you focus on the core of the front-end instead of the details of correctly -building SSA form. +The mem2reg pass implements the standard "iterated dominance frontier" algorithm for +building the SSA form with specialized optimizations to speed up common degenerate cases. +The mem2reg pass is an integral part of the full solution to mutable variables. Using +mem2reg is highly recommended. There are a few conditions for using mem2reg correctly. + +1. mem2reg is based on alloca: it looks for and promotes alloca. It does not apply to + globals or heap allocations. +1. mem2reg only looks for alloca instructions in the **entry block** of the function. + 1. Placing Alloca instructions for all variables, in all scopes, in the entry block + ensures they are executed only once, which makes the conversion simpler. +1. mem2reg only promotes Alloca instructions whose only uses are direct loads and stores. + If the address of the object is passed to a function or any pointer math applied the + alloca is **not** promoted. +1. mem2reg only works on Alloca instructions of first class values (such as pointers, + scalars and vectors), and only if the array size of the allocation is 1. +1. mem2reg is not capable of promoting structs or arrays to registers. (The SROA pass is + more powerful and can promote structs, unions and arrays in many cases) + +These may seem onerous but are really fairly straight forward and easy to abide, the rest of +this chapter will focus on doing that with the Kaleidoscope language. If you are considering +doing your own SSA construction, then please stop and consider the following aspects of the +existing LLVM patterns and mem2reg: + +* The mem2reg and alloca pattern is proven and very well tested. The most common clients of + LLVM use this for the bulk of their variables, bugs are found fast and early. +* It is fast, the LLVM implementation has a number of optimizations that make it fast in + common cases and fully general. This includes fast-paths for variables used only in a + single block, variables with only a single assignment point, and heuristics to help avoid + phi nodes when not needed. +* It is needed for debug info generation, debug info in LLVM relies on having the address of + the variable exposed so that debugging data is attached to it. The mem2reg+alloca pattern + fits well with this debug info style. +* It's really simple to do, letting you focus on the core of the front-end instead of the + details of correctly building SSA form. ## Generating LLVM IR for Mutable Variables -Now that we've covered the general concepts of how LLVM supports mutable variables we can focus on implementing -mutable variables in Kaleidoscope. This includes the following new features: +Now that we've covered the general concepts of how LLVM supports mutable variables we can +focus on implementing mutable variables in Kaleidoscope. This includes the following new +features: 1. Mutate variables with an assignment operator '=' 2. Ability to define new variables -Generally the first item is the primary feature here. Though, at this point, the Kaleidoscope language only -has variables for incoming arguments and for loop induction variables. Defining variables is just a generally -useful concept that can serve many purposes, including self documentation. The following is an example on -how these features are used: +Generally the first item is the primary feature here. Though, at this point, the +Kaleidoscope language only has variables for incoming arguments and for loop induction +variables. Defining variables is just a generally useful concept that can serve many +purposes, including self documentation. The following is an example on how these features +are used: ```Kaleidoscope # Define ':' for sequencing: as a low-precedence operator that ignores operands @@ -239,65 +253,71 @@ def fibi(x) fibi(10); ``` -In order to mutate variables the current implementation needs to change to leverage the "alloca trick". -Then support for assignment will complete the mutable variables support. +In order to mutate variables the current implementation needs to change to leverage the +"alloca trick". Then support for assignment will complete the mutable variables support. ## Adjusting Existing Variables for Mutation -Currently the symbol stack in Kaleidoscope stores LLVM Values directly. To support mutable values the - NamedValues ScopeStack needs to switch to using [Alloca](xref:Ubiquity.NET.Llvm.Instructions.Alloca). +Currently the symbol stack in Kaleidoscope stores LLVM Values directly. To support mutable +values the NamedValues ScopeStack needs to switch to using +[Alloca](xref:Ubiquity.NET.Llvm.Instructions.Alloca). + ```C# private readonly ScopeStack NamedValues; ``` ### Update Visitor for VariableReferenceExpression -The first change to the existing code generation is to update handling of variable expressions to generate -a load through the pointer created with an alloca instruction. This is pretty straight forward since the -scope map now stores the alloca instructions for the variable. +The first change to the existing code generation is to update handling of variable +expressions to generate a load through the pointer created with an alloca instruction. This +is pretty straight forward since the scope map now stores the alloca instructions for the +variable. [!code-csharp[VisitVariableExpression](CodeGenerator.cs#VariableReferenceExpression)] ### Update Visitor for ConditionalExpression -Now that we have the alloca support we can update the conditional expression handling to remove the need -for direct PHI node construction. This involves adding a new compiler generated local var for the result -of the condition and storing the result value into that location for each side of the branch. Then, in the -continue block load the value from the location so that it is available as a value for the result of the -expression. +Now that we have the alloca support we can update the conditional expression handling to +remove the need for direct PHI node construction. This involves adding a new compiler +generated local var for the result of the condition and storing the result value into that +location for each side of the branch. Then, in the continue block load the value from the +location so that it is available as a value for the result of the expression. [!code-csharp[VisitConditionalExpression](CodeGenerator.cs#ConditionalExpression)] ### Update Visitor for ForInExpression -Next up is to update the for loop handling to use Alloca. The code is almost identical except for the -use of load/store for the variables and removal of the manually generated PHI nodes. +Next up is to update the for loop handling to use Alloca. The code is almost identical +except for the use of load/store for the variables and removal of the manually generated PHI +nodes. [!code-csharp[VisitForExpression](CodeGenerator.cs#ForInExpression)] ### Update Visitor for FunctionDefinition -To support mutable function argument variables the handler for functions requires a small update to create -the Alloca for each incoming argument and for each of the local variables used by the function. The AST -generation tracks the variable declarations in a function so they are all available to generate directly -into the entry block. +To support mutable function argument variables the handler for functions requires a small +update to create the Alloca for each incoming argument and for each of the local variables +used by the function. The AST generation tracks the variable declarations in a function so +they are all available to generate directly into the entry block. [!code-csharp[DefineFunction](CodeGenerator.cs#FunctionDefinition)] ### InitializeModuleAndPassManager -The last piece required for mutable variables support is to include the optimization pass to promote memory -to registers. This is always enabled, so that the proper SSA form is correctly generated. +The last piece required for mutable variables support is to include the optimization pass to +promote memory to registers. This is always enabled, so that the proper SSA form is +correctly generated. [!code-csharp[InitializeModuleAndPassManager](CodeGenerator.cs#InitializeModuleAndPassManager)] ### Add operator support for Assignment Expressions -Unlike the other binary operators assignment doesn't follow the same emit left, emit right, emit operator -sequence. This is because an expression like '(x+1) = expression' is nonsensical and therefore not allowed. -The left hand side is always a variable reference expression as the destination of a store. To handle this -special case the Generator doesn't generate for the left side, but instead looks up the Alloca for the -variable for the store. The generator then implements a store operation of the right hand side value to the -Alloca for the left side. +Unlike the other binary operators assignment doesn't follow the same emit left, emit right, +emit operator sequence. This is because an expression like '(x+1) = expression' is +nonsensical and therefore not allowed. The left hand side is always a variable reference +expression as the destination of a store. To handle this special case the Generator doesn't +generate for the left side, but instead looks up the Alloca for the variable for the store. +The generator then implements a store operation of the right hand side value to the `alloca` +for the left side. [!code-csharp[BinaryOperatorExpression](CodeGenerator.cs#BinaryOperatorExpression)] -Now that we have mutable variables and assignment we can mutate loop variables or input parameters. For -example: +Now that we have mutable variables and assignment we can mutate loop variables or input +parameters. For example: ```Kaleidoscope # Function to print a double. @@ -318,21 +338,22 @@ test(123); When run, this prints `1234` and `4`, showing that the value was mutated as, expected. ## User-defined Local Variables -As described in the general syntax discussion of the Kaleidoscope language [VarInExpression](xref:Kaleidoscope-ch2#varinexpression) -the VarIn expression is used to declare local variables for a scope. A few changes are required to support -this language construct. +As described in the general syntax discussion of the Kaleidoscope language +[VarInExpression](xref:Kaleidoscope-ch2#varinexpression) the VarIn expression is used to +declare local variables for a scope. A few changes are required to support this language +construct. ### Add Visitor for VarInExpression -The VarIn expression visitor needs to handle the mutability of the scoped variables. The basic idea for each -VarIn expression is to push a new scope on the scope stack then walk through all the variables in the -expression to define them and emit the expression for the initializer. After all the values are defined the -child expression "scope" is emitted, which may contain another VarIn or loop expression. Once the emit -completes, the variable scope is popped from the stack to restore back the previous level. +The VarIn expression visitor needs to handle the mutability of the scoped variables. The +basic idea for each VarIn expression is to push a new scope on the scope stack then walk +through all the variables in the expression to define them and emit the expression for the +initializer. After all the values are defined the child expression "scope" is emitted, which +may contain another VarIn or loop expression. Once the emit completes, the variable scope is +popped from the stack to restore back the previous level. [!code-csharp[VisitVarInExpression](CodeGenerator.cs#VarInExpression)] ## Conclusion -This completes the updates needed to support mutable variables with potentially nested scopes. All of this -without needing to manually deal with PHI nodes or generate SSA form! Now, that's convenient! - - +This completes the updates needed to support mutable variables with potentially nested +scopes. All of this without needing to manually deal with PHI nodes or generate SSA form! +Now, that's convenient! diff --git a/src/Samples/Kaleidoscope/Chapter8/Kaleidoscope-ch8.md b/src/Samples/Kaleidoscope/Chapter8/Kaleidoscope-ch8.md index 348fc41ba..418676601 100644 --- a/src/Samples/Kaleidoscope/Chapter8/Kaleidoscope-ch8.md +++ b/src/Samples/Kaleidoscope/Chapter8/Kaleidoscope-ch8.md @@ -3,26 +3,27 @@ uid: Kaleidoscope-ch8 --- # 8. Kaleidoscope: Compiling to Object Code -This tutorial describes how to adapt the Kaleidoscope JIT engine into an Ahead of Time (AOT) compiler -by generating target specific native object files. +This tutorial describes how to adapt the Kaleidoscope JIT engine into an Ahead of Time (AOT) +compiler by generating target specific native object files. ## Choosing a target -LLVM has built-in support for cross-compilation. This allows compiling to the architecture of the platform -you run the compiler on or, just as easily, for some other architecture. For the Kaleidoscope tutorial -we'll focus on just the native target the compiler is running on. +LLVM has built-in support for cross-compilation. This allows compiling to the architecture +of the platform you run the compiler on or, just as easily, for some other architecture. For +the Kaleidoscope tutorial we'll focus on just the native target the compiler is running on. -LLVM uses a "Triple" string to describe the target used for code generation. This takes the form -`---` (see the description of the [Triple](xref:Ubiquity.NET.Llvm.Triple) type for -more details) +LLVM uses a "Triple" string to describe the target used for code generation. This takes the +form `---` (see the description of the [Triple](xref:Ubiquity.NET.Llvm.Triple) +type for more details) -Fortunately, it is normally not required to build such strings directly. +Fortunately, it is normally not required to build such strings directly. ## Grammar -In the preceding chapters the Kaleidoscope implementation provided an interactive JIT based on the classic -Read Evaluate Print Loop (REPL). So the grammar focused on a top level rule "repl" that processes individual -expressions one at a time. For native compilation this complicates the process of parsing and processing a -complete file. To handle these two distinct scenarios the grammar has different rules. For the interactive -scenario the previously mentioned "repl" rule is used. When parsing a full source file the "fullsrc" rule +In the preceding chapters the Kaleidoscope implementation provided an interactive JIT based +on the classic Read Evaluate Print Loop (REPL). So the grammar focused on a top level rule +"repl" that processes individual expressions one at a time. For native compilation this +complicates the process of parsing and processing a complete file. To handle these two +distinct scenarios the grammar has different rules. For the interactive scenario the +previously mentioned "repl" rule is used. When parsing a full source file the "fullsrc" rule is used as the start. ```antlr @@ -32,78 +33,80 @@ fullsrc : repl*; ``` -This rule simply accepts any number of expressions so that a single source file is parsed to a single -complete parse tree. (This particular point will become even more valuable when generating debug information -in [Chapter 9](xref:Kaleidoscope-ch9) as the parse tree nodes contain the source location information based -on the original input stream). +This rule simply accepts any number of expressions so that a single source file is parsed to +a single complete parse tree. (This particular point will become even more valuable when +generating debug information in [Chapter 9](xref:Kaleidoscope-ch9) as the parse tree nodes +contain the source location information based on the original input stream). ## Code Generation Changes -The changes in code generation are fairly straight forward and consist of the following basic steps. +The changes in code generation are fairly straight forward and consist of the following +basic steps. 1. Remove JIT engine support 2. Expose the bit code module generated, so it is available to the "driver". 3. Saving the target machine (since it doesn't come from the JIT anymore) 4. Keep track of all generated top level anonymous expressions -5. Once generating from the parse tree is complete generate a main() that includes calls to all the -previously generated anonymous expressions. - -Most of these steps are pretty straight forward. The anonymous function handling is a bit distinct. -Since the language syntax allows anonymous expressions throughout the source file, and they don't -actually execute during generation - they need to be organized into an executable form. Thus, a new -list of the generated functions is maintained and, after the tree is generated, a new main() function -is created and a call to each anonymous expression is made with a second call to printd() to show -the results - just like they would appear if typed in an interactive console. A trick used in the -code generation is to mark each of the anonymous functions as private and always inline so that a -simple optimization pass can eliminate the anonymous functions after inlining them all into the main() -function. +5. Once generating from the parse tree is complete generate a main() that includes calls to + all the previously generated anonymous expressions. + +Most of these steps are pretty straight forward. The anonymous function handling is a bit +distinct. Since the language syntax allows anonymous expressions throughout the source file, +and they don't actually execute during generation - they need to be organized into an +executable form. Thus, a new list of the generated functions is maintained and, after the +tree is generated, a new main() function is created and a call to each anonymous expression +is made with a second call to printd() to show the results - just like they would appear if +typed in an interactive console. A trick used in the code generation is to mark each of the +anonymous functions as private and always inline so that a simple optimization pass can +eliminate the anonymous functions after inlining them all into the main() function. ```C# -// mark anonymous functions as always-inline and private so they can be inlined and then removed -if( isAnonymous ) +// mark anonymous functions as always-inline and private so they can be removed +if(definition.IsAnonymous) { - retVal.AddAttribute( FunctionAttributeIndex.Function, AttributeKind.AlwaysInline ) - .Linkage( Linkage.Private ); -} -else -{ - retVal.Linkage( Linkage.External ); + function.AddAttribute( FunctionAttributeIndex.Function, "alwaysinline"u8 ) + .Linkage( Linkage.Private ); + + AnonymousFunctions.Add( function ); } ``` -These settings are leveraged after generating from the tree to create the main function. A simple -loop generates a call to each expression along with the call to print the results. -> NOTE: -The always inliner will inline the functions marked as inline and the dead code elimination pass will -eliminate unused internal/private global symbols. This has the effect of generating the main function -with all top level expressions inlined and the originally generated anonymous functions removed. +These settings are leveraged after generating from the tree to create the main function. A +simple loop generates a call to each expression along with the call to print the results. + +> [!NOTE] +> The always inliner will inline the functions marked as inline and the dead code +> elimination pass will eliminate unused internal/private global symbols. This has the +> effect of generating the main function with all top level expressions inlined and the +> originally generated anonymous functions removed. [!code-csharp[Generate](CodeGenerator.cs#Generate)] -Most of the rest of the changes are pretty straightforward following the steps listed previously. +Most of the rest of the changes are pretty straightforward following the steps listed +previously. ### Anonymous Function Definitions -As previously mentioned, when generating the top level expression the resulting function is added to the -list of anonymous functions to generate a call to it from main(). +As previously mentioned, when generating the top level expression the resulting function is +added to the list of anonymous functions to generate a call to it from main(). [!code-csharp[FunctionDefinition](CodeGenerator.cs#FunctionDefinition)] ## Driver changes -To support generating object files the "driver" application code needs some alterations. The changes -fall into two general categories: +To support generating object files the "driver" application code needs some alterations. The +changes fall into two general categories: 1. Command line argument handling 2. Generating the output files ### Adding Command Line handling -To allow providing a file like a traditional compiler the driver app needs to have some basic -command line argument handling. ("Basic" in this case means truly rudimentary :grin: ) +To allow providing a file like a traditional compiler the driver app needs to have some +basic command line argument handling. ("Basic" in this case means truly rudimentary :grin: ) Generally this just gets a viable file path to use for the source code. [!code-csharp[ProcessArgs](Program.cs#ProcessArgs)] ### Update Main() -The real work comes in the Main application driver, though there isn't a lot of additional code -here either. The general plan is: +The real work comes in the Main application driver, though there isn't a lot of additional +code here either. The general plan is: 1. Process the arguments to get the path to compile 2. Open the file for reading 3. Create a new target machine from the default triple of the host @@ -116,10 +119,11 @@ here either. The general plan is: [!code-csharp[Main](Program.cs#Main)] ## Conclusion -That's it - seriously! Very little change was needed, mostly deleting code and adding the special handling -of the anonymous expressions. Looking at the changes it should be clear that it is possible to support -runtime choice between JIT and full native compilation instead of deleting the JIT code. (Implementing -this feature is "left as an exercise for the reader" :wink:) +That's it - seriously! Very little change was needed, mostly deleting code and adding the +special handling of the anonymous expressions. Looking at the changes it should be clear +that it is possible to support runtime choice between JIT and full native compilation +instead of deleting the JIT code. (Implementing this feature is "left as an exercise for the +reader" :wink:) diff --git a/src/Samples/Kaleidoscope/Chapter9/Kaleidoscope-ch9.md b/src/Samples/Kaleidoscope/Chapter9/Kaleidoscope-ch9.md index a16fb9ad5..fb535c4ca 100644 --- a/src/Samples/Kaleidoscope/Chapter9/Kaleidoscope-ch9.md +++ b/src/Samples/Kaleidoscope/Chapter9/Kaleidoscope-ch9.md @@ -3,112 +3,125 @@ uid: Kaleidoscope-ch9 --- # 9. Kaleidoscope: Adding Debug Information -So far in the progress of the Kaleidoscope tutorials we've covered the basics of the language as a JIT -engine and even added ahead of time compilation into the mix so it is a full static compiled language. But -what happens if something goes wrong in one of the programs written in Kaleidoscope? How can a developer -debug applications written in this wonderful new language? Up until now, the answer is, you can't. This -chapter will add debugging information to the generated object file so that it is available for debuggers. - -Source level debugging uses formatted data bound into the output binaries that helps the debugger map the -state of the application to the original source code that created it. The exact format of the data depends -on the target platform but the general idea holds for all of them. In order to isolate front-end developers -from the actual format - LLVM uses an abstract form of debug data that is based on the common DWARF -debugging format. Internally, the LLVM target will transform the abstract representation into the actual -target binary form. +So far in the progress of the Kaleidoscope tutorials we've covered the basics of the +language as a JIT engine and even added ahead of time compilation into the mix so it is a +full static compiled language. But what happens if something goes wrong in one of the +programs written in Kaleidoscope? How can a developer debug applications written in this +wonderful new language? Up until now, the answer is, you can't. This chapter will add +debugging information to the generated object file so that it is available for debuggers. + +Source level debugging uses formatted data bound into the output binaries that helps the +debugger map the state of the application to the original source code that created it. The +exact format of the data depends on the target platform but the general idea holds for all +of them. In order to isolate front-end developers from the actual format - LLVM uses an +abstract form of debug data that is based on the common DWARF debugging format. Internally, +the LLVM target will transform the abstract representation into the actual target binary +form. >[!NOTE] -> Debugging JIT code is rather complex as it requires awareness of the runtime within the debugger to -> manage the execution and runtime state etc... Such functionality is beyond the scope of this tutorial. +> Debugging JIT code is rather complex as it requires awareness of the runtime within the +> debugger to manage the execution and runtime state etc... Such functionality is beyond the +> scope of this tutorial. ## Why is it a hard problem? -Debugging is a tough problem for a number of reasons, mostly revolving around optimized code. Optimizations -make keeping source level information more difficult. In LLVM the original source location information is -attached to each LLVM IR instruction. Optimization passes should keep the source location for any new -instructions created, but merged instructions only get to keep a single source location. This is generally -the cause of the observed "jumping around" when debugging optimized code. Additionally, optimizations can -move variables in ways that are either optimized out, shared in memory, in registers or otherwise difficult -to track. Thus, for the purposes of this tutorial we'll skip optimizations. +Debugging is a tough problem for a number of reasons, mostly revolving around optimized +code. Optimizations make keeping source level information more difficult. In LLVM the +original source location information is attached to each LLVM IR instruction. Optimization +passes should keep the source location for any new instructions created, but merged +instructions only get to keep a single source location. This is generally the cause of the +observed "jumping around" when debugging optimized code. Additionally, optimizations can +move variables in ways that are either optimized out, shared in memory, in registers or +otherwise difficult to track. Thus, for the purposes of this tutorial we'll skip +optimizations. ## Setup for emitting debug information -Debug information in Ubiquity.NET.Llvm is created with the [DIBuilder](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder). -This is similar to the [InstructionBuilder](xref:Ubiquity.NET.Llvm.Instructions.InstructionBuilder). Using the -DIBuilder requires a bit more knowledge on the general concepts of the DWARF debugging format, and -in particular the [DebuggingMetadata](xref:llvm_sourcelevel_debugging) in LLVM. In Ubiquity.NET.Llvm you need -to, create an instance of the DIBuilder class bound to a particular module. Such a builder is disposable and -therefore requires a call to Dispose(). Normally this is handled in a `using` expression. - -Another important item for debug information is called the Compilation Unit. In Ubiquity.NET.Llvm that is the -[DICompileUnit](xref:Ubiquity.NET.Llvm.DebugInfo.DICompileUnit). The compile unit is the top level scope for -storing debug information generally it represents the full source file that was used to create the module. -(Though with IR linking it is plausible that a module has multiple Compile Units associated). -Unlike a builder it isn't something that is constructed without more information. -Therefore, Ubiquity.NET.Llvm provides overloads for the creation of a module that includes the additional data -needed to create the DICompileUnit for you. It is important to note that a DIBuilder may have ONLY one -DICompileUnit and that unit is used for all of the debug nodes it builds. It must be set when finalizing -the debug information in order to properly resolve items to the compilation unit. - -TODO: Discuss DIBuilder as a ref struct and that it must be passed through as part of the "visitor" - -Another point to note is that the module ID is derived from the source file path and the source file path -is provided so that it becomes the root compile unit. +Debug information in Ubiquity.NET.Llvm is created with the +[DIBuilder](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder). This is similar to the +[InstructionBuilder](xref:Ubiquity.NET.Llvm.Instructions.InstructionBuilder). Using the +`DIBuilder` requires a bit more knowledge on the general concepts of the DWARF debugging +format, and in particular the [Debug Data Representation](xref:llvm_sourcelevel_debugging) +used in LLVM. In `Ubiquity.NET.Llvm` you need to, create an instance of the `DIBuilder` +class bound to a particular module. Such a builder is disposable and therefore requires a +call to `Dispose()`. Normally this is handled in a `using` expression. + +Another important item for debug information is called the Compilation Unit. In +`Ubiquity.NET.Llvm` that is the [DICompileUnit](xref:Ubiquity.NET.Llvm.DebugInfo.DICompileUnit). +The compile unit is the top level scope for storing debug information. Generally it +represents the full source file that was used to create the module. (Though with IR linking +it is plausible that a module has multiple Compile Units associated). Unlike a builder it +isn't something that is constructed without more information. Therefore, `Ubiquity.NET.Llvm` +provides overloads for the creation of a module that includes the additional data needed to +create the DICompileUnit for you. It is important to note that a DIBuilder may have ONLY one +DICompileUnit and that unit is used for all of the debug nodes it builds. It must be set +when finalizing the debug information in order to properly resolve items to the compilation +unit. + +Another point to note is that the module ID is derived from the source file path and the +source file path is provided so that it becomes the root compile unit. >[!IMPORTANT] -> It is important to note that when using the DIBuilder it must be "finalized" in order to resolve internal -> forward references in the debug metadata. The exact details of this aren't generally relevant, just -> remember that somewhere after generating all code and debug information to call the +> It is important to note that when using the DIBuilder it must be "finalized" in order to +> resolve internal forward references in the debug metadata. The exact details of this +> aren't generally relevant, just remember that somewhere after generating all code and +> debug information to call the > [Finish](xref:Ubiquity.NET.Llvm.DebugInfo.DIBuilder.Finish(Ubiquity.NET.Llvm.DebugInfo.DISubProgram)) -> method. (In Ubiquity.NET.Llvm this method is called Finish() to avoid conflicts with the .NET runtime defined -> Finalize() and to avoid confusion on the term as the idea of "finalization" has a very different meaning -> in .NET then what applies to the DIBuilder). +> method. (In `Ubiquity.NET.Llvm` this method is called Finish() to avoid conflicts with the +> .NET runtime defined Finalize() and to avoid confusion on the term as the idea of +> "finalization" has a very different meaning in .NET then what applies to the `DIBuilder`). -The tutorial takes care of finishing the debug information in the generator's Generate method after -completing code generation for the module. +The tutorial takes care of finishing the debug information in the generator's Generate +method after completing code generation for the module. [!code-csharp[Generate](CodeGenerator.cs#Generate)] ## Functions -With the basics of the DIBuilder and DICompile unit setup for the module it is time to focus on providing -debug information for functions. This requires adding a few extra calls to build the context of the debug -information for the function. The DWARF debug format that LLVM's debug metadata is based on calls these -"SubPrograms". The new code builds a representation of the file the code is contained in as a new -[DIFile](xref:Ubiquity.NET.Llvm.DebugInfo.DIFile). In this case that is a bit redundant as all the code comes from -a single file and the compile unit already has the file info. However, that's not always true for all -languages (i.e. some sort of Include mechanism) so the file is created. It's not a problem as LLVM will -intern the file definition so that it won't actually end up with duplicates. +With the basics of the DIBuilder and DICompile unit setup for the module it is time to focus +on providing debug information for functions. This requires adding a few extra calls to +build the context of the debug information for the function. The DWARF debug format that +LLVM's debug representation of functions is based on calls these "SubPrograms". The new code +builds a representation of the file the code is contained in as a new +[DIFile](xref:Ubiquity.NET.Llvm.DebugInfo.DIFile). In this case that is a bit redundant as +all the code comes from a single file and the compile unit already has the file info. +However, that's not always true for all languages (i.e. some sort of Include mechanism) so +the file is created. It's not a problem as LLVM will intern the file definition so that it +won't actually end up with duplicates. [!code-csharp[GetIrDeclareFunction](CodeGenerator.cs#GetOrDeclareFunction)] ## Debug Locations -The AST contains full location information for each parsed node from the parse tree. This allows building -debug location information for each node fairly easily. The general idea is to set the location in the -InstructionBuilder so that it is applied to all instructions emitted until it is changed. This saves on -manually adding the location on every instruction. +The AST contains full location information for each parsed node from the parse tree. This +allows building debug location information for each node fairly easily. The general idea is +to set the location in the `InstructionBuilder` so that it is applied to all instructions +emitted until it is changed. This saves on manually adding the location on every +instruction. [!code-csharp[EmitLocation](CodeGenerator.cs#EmitLocation)] ## Function Definition -The next step is to update the function definition with attached debug information. The definition starts -by pushing a new lexical scope that is the functions declaration. This serves as the parent scope for all -the debug information generated for the function's implementation. The debug location info is cleared from -the builder to set up all the parameter variables with alloca, as before. Additionally, the debug -information for each parameter is constructed. After the function is fully generated the debug information -for the function is finalized, this is needed to allow for any optimizations to occur at the function -level. +The next step is to update the function definition with attached debug information. The +definition starts by pushing a new lexical scope that is the functions declaration. This +serves as the parent scope for all the debug information generated for the function's +implementation. The debug location info is cleared from the builder to set up all the +parameter variables with alloca, as before. Additionally, the debug information for each +parameter is constructed. After the function is fully generated the debug information for +the function is finalized, this is needed to allow for any optimizations to occur at the +function level. [!code-csharp[DefineFunction](CodeGenerator.cs#FunctionDefinition)] ## Debug info for Parameters and Local Variables -Debug information for parameters and local variables is similar but not quite identical. Thus, two new -overloaded helper methods `AddDebugInfoForAlloca` handle attaching the correct debug information for -parameters and local variables. +Debug information for parameters and local variables is similar but not quite identical. +Thus, two new overloaded helper methods `AddDebugInfoForAlloca` handle attaching the correct +debug information for parameters and local variables. [!code-csharp[CreateEntryBlockAlloca](CodeGenerator.cs#AddDebugInfoForAlloca)] ## Conclusion -Adding debugging information in LLVM IR is rather straight forward. The bulk of the problem is in tracking -the source location information in the parser. Fortunately for Ubiquity.NET.Llvm version of Kaleidoscope, the ANTLR4 -generated parsers do this for us already! Thus, combining the parser with Ubiquity.NET.Llvm makes building a full -compiler for custom languages, including debug support a lot easier. The most "complex" part is handling the -correct ownership semantics for a DIBuilder but that is generally enforced by the compiler as it is a -`ref struct` type. +Adding debugging information in LLVM IR is rather straight forward. The bulk of the problem +is in tracking the source location information in the parser. Fortunately, for the +`Ubiquity.NET.Llvm` version of Kaleidoscope, the ANTLR4 generated parsers do this for us +already! Thus, combining the parser with `Ubiquity.NET.Llvm` makes building a full compiler +for custom languages, including debug support a lot easier. The most "complex" part is +handling the correct ownership semantics for a `DIBuilder` but that is generally enforced by +the compiler. diff --git a/src/Samples/Kaleidoscope/Kaleidoscope-Overview.md b/src/Samples/Kaleidoscope/Kaleidoscope-Overview.md index 17e8e5510..fa22031a8 100644 --- a/src/Samples/Kaleidoscope/Kaleidoscope-Overview.md +++ b/src/Samples/Kaleidoscope/Kaleidoscope-Overview.md @@ -6,48 +6,45 @@ uid: Kaleidoscope-ch1 # 1. Kaleidoscope: Language Introduction The general flow of this tutorial follows that of the official [LLVM tutorial](xref:llvm_kaleidoscope_tutorial) -and many of the samples are lifted directly from that tutorial to make it easier to -follow along both tutorials to see how the various LLVM concepts are projected in the +and many of the samples are lifted directly from that tutorial to make it easier to follow +along both tutorials to see how the various LLVM concepts are projected in the `Ubiquity.NET.Llvm library.` >[!NOTE] -> The samples are all setup to include `True` and therefore -> support AOT code generation. To use that you only need to run -> `dotnet publish -r ` to build the native standalone version -> of the app. This demonstrates that the libraries are AOT compatible. While this makes -> things run faster as no JIT is used, everything is already native code, it has the -> drawback of making the app RID specific. That is, you must AOT build for EVERY -> supported RID target. Each usage case must make a choice and there is no single -> "one size fits all" answer. Thus, the samples and the library itself allow for, but -> ***Do NOT*** require AOT builds. +> The samples are all setup to include `True` and therefore support +> AOT code generation. To use that you only need to run +> `dotnet publish -r ` to build the native standalone version of the +> app. This demonstrates that the libraries are AOT compatible. While this makes things run +> faster as no .NET JIT is used, everything is already native code, it has the drawback of +> making the app RID specific. That is, you must AOT build for EVERY supported RID target. +> Each usage case must make a choice and there is no single "one size fits all" answer. +> Thus, the samples and the library itself allow for, but ***Do NOT require***, AOT builds. ## Overview -Kaleidoscope is a simple functional language that is used to illustrate numerous real -world use cases for Ubiquity.NET.Llvm for code generation and JIT execution. +Kaleidoscope is a simple functional language that is used to illustrate numerous real world +use cases for Ubiquity.NET.Llvm for code generation and JIT execution. -It is worth pointing out that this example is not intended as a treatise on compiler -design nor on language parsing. While it contains many aspects of those topics the -tutorial is, mostly, focused on the use of Ubiquity.NET.Llvm for code generation. -Furthermore it isn't a trans-literation of the LLVM C++ sample as that would defeat -one of the major points of `Ubiquity.NET.Llvm` - to provide a familiar API and use -patterns familiar to C# developers. +It is worth pointing out that this example is not intended as a treatise on compiler design +nor on language parsing. While it contains many aspects of those topics the tutorial is, +mostly, focused on the use of Ubiquity.NET.Llvm for code generation. Furthermore it isn't +a trans-literation of the LLVM C++ sample as that would defeat one of the major points of +`Ubiquity.NET.Llvm` - to provide a familiar API and use patterns familiar to C# developers. ## General layout -The samples are built using common core libraries and patterns. They are explicitly -designed to make code comparisons between chapters via your favorite code comparison -tool. Each, chapter builds on the next so running a comparison makes it easy to see -the changes in full context. The text of the tutorials explains why the changes are -made and a comparison helps provide the "big picture" view. +The samples are built using common core libraries and patterns. They are explicitly designed +to make code comparisons between chapters via your favorite code comparison tool. Each, +chapter builds on the next so running a comparison makes it easy to see the changes in full +context. The text of the tutorials explains why the changes are made and a comparison helps +provide the "big picture" view. ## Variations from the Official LLVM Tutorial -The Ubiquity.NET.Llvm version of the Kaleidoscope series takes a different route for -parsing from the LLVM implementation. In particular the Ubiquity.NET.Llvm version -defines a formal grammar using [ANTLR4](http://antlr.org) with the full grammar for -all variations of the language features in a single assembly. Ultimately the parsing -produces an [AST](xref:Kaleidoscope-AST) so that the actual technology used for the -parse is hidden as an implementation detail. This helps in isolating the parsing from -the use of Ubiquity.NET.Llvm for code generation and JIT compilation for interactive -languages. +The Ubiquity.NET.Llvm version of the Kaleidoscope series takes a different route for parsing +from the LLVM implementation. In particular the Ubiquity.NET.Llvm version defines a formal +grammar using [ANTLR4](http://antlr.org) with the full grammar for all variations of the +language features in a single assembly. Ultimately the parsing produces an +[AST](xref:Kaleidoscope-AST) so that the actual technology used for the parse is hidden as +an implementation detail. This helps in isolating the parsing from the use of +`Ubiquity.NET.Llvm` for code generation and JIT compilation for interactive languages. ## The Kaleidoscope Language ### General Concepts @@ -59,42 +56,41 @@ Kaleidoscope is a simple functional language with the following major features: * For loop style control flow * User defined operators - User defined operators can specify operator precedence - - User defined precedence is arguably the most complex part of parsing and - implementing the language. Though, ANTLR4's flexibility made it fairly easy to - do once fully understood. (more details in [Chapter 6](xref:Kaleidoscope-ch6)) + - User defined precedence is arguably the most complex part of parsing and implementing + the language. Though, ANTLR4's flexibility made it fairly easy to do once fully + understood. (more details in [Chapter 6](xref:Kaleidoscope-ch6)) ### Expressions -In Kaleidoscope, everything is an expression (e.g. everything has or returns a value -even if the value is a constant 0.0). There are no statements and no "void" functions -etc... +In Kaleidoscope, everything is an expression (e.g. everything has or returns a value even +if the value is a constant 0.0). There are no statements and no "void" functions etc... #### Multi-line expressions -There are a few different ways to represent an expression that is long enough to -warrant splitting it across multiple lines when typing it out. +There are a few different ways to represent an expression that is long enough to warrant +splitting it across multiple lines when typing it out. ##### Expression Continuation Marker -One mechanism for handling multi-line expressions that is used in most shell -scripting languages is a line continuation marker. In such cases a special character -followed by a line-termination char or char sequence indicates that the expression -continues on the next line (e.g. it isn't complete yet). +One mechanism for handling multi-line expressions that is used in most shell scripting +languages is a line continuation marker. In such cases a special character followed by a +line-termination char or char sequence indicates that the expression continues on the next +line (e.g. it isn't complete yet). ##### Expression Complete Marker -Another approach to handling long expressions spanning multiple lines is basically -the opposite of line continuation, expression complete markers. This marker indicates -the end of a potentially multi-line expression. (A variant of this might require a -line termination following the marker as with the line continuation). +Another approach to handling long expressions spanning multiple lines is basically the +opposite of line continuation, expression complete markers. This marker indicates the end +of a potentially multi-line expression. (A variant of this might require a line termination +following the marker as with the line continuation). ##### Kaleidoscope Implementation -The original LLVM C++ implementation chose the expression completion approach using -a semicolon as the completion. (So it seems familiar like statements in other C like -languages) Therefore, the Ubiquity.NET.Llvm tutorial follows the same design. -[Implementing the line continuation mechanism in Kaleidoscope is left as an exercise -for the reader - though if you come up with a mechanism to support either that is -determined by the calling application; PRs are welcome! :wink:] +The original LLVM C++ implementation chose the expression completion approach using a +semicolon as the completion. (So it seems familiar like statements in other C like languages +) Therefore, the Ubiquity.NET.Llvm tutorial follows the same design. [Implementing the line +continuation mechanism in Kaleidoscope is left as an exercise for the reader - though if you +come up with a mechanism to support either that is determined by the calling application; +PRs are welcome! :wink:] ### Example -The following example is a complete program in Kaleidoscope that will generate a -textual representation of the classic Mandelbrot Set. +The following example is a complete program in Kaleidoscope that will generate a textual +representation of the classic Mandelbrot Set. [!code-Kaleidoscope[mandel.kls](mandel.kls)] @@ -102,9 +98,8 @@ When entered ( or copy/pasted) to the command line Kaleidoscope will print out t following: >[!NOTE] ->This example uses features of the language only enabled/discussed in Chapter 6 of ->the tutorial.The runtime from chapters 3-5 will generate errors trying to parse this ->code. +>This example uses features of the language only enabled/discussed in Chapter 6 of the +> tutorial.The runtime from chapters 3-5 will generate errors trying to parse this code. ```shell Ready>mandel(-2.3, -1.3, 0.05, 0.07); @@ -154,9 +149,8 @@ Ready> ``` ## Conclusion -Kaleidoscope is a simple language with a good deal of functionality. This serves as -a great language to study the use of Ubiquity.NET.Llvm for code generation and -Domain Specific Languages. While, generally speaking, the functionality of the -`Ubiquity.NET.Llvm` version of this tutorial differs only slightly from that of the -official LLVM version, it serves well as an example of what you can do with -`Ubiquity.NET.Llvm.` +Kaleidoscope is a simple language with a good deal of functionality. This serves as a great +language to study the use of `Ubiquity.NET.Llvm` for code generation and Domain Specific +Languages (DSLs) in general. While, generally speaking, the functionality of the +`Ubiquity.NET.Llvm` version of this tutorial differs only slightly from that of the official +LLVM version, it serves well as an example of what you can do with `Ubiquity.NET.Llvm.` diff --git a/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/AST/Kaleidoscope-AST.md b/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/AST/Kaleidoscope-AST.md index b8d3a978f..2f4561270 100644 --- a/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/AST/Kaleidoscope-AST.md +++ b/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/AST/Kaleidoscope-AST.md @@ -3,12 +3,13 @@ uid: Kaleidoscope-AST --- # Kaleidoscope Abstract Syntax Tree -As with many language parsing systems Kaleidoscope leverages an Abstract Syntax Tree (AST) to simplify -generating code from the parsed language. Each type of node in the tree implements the IAstNode interface +As with many language parsing systems Kaleidoscope leverages an Abstract Syntax Tree (AST) +to simplify generating code from the parsed language. Each type of node in the AST +implements the `IAstNode` interface -This interface provides the basic properties of any node in the tree for common uses. The Kaleidoscope -language is a simple one and, therefore, has only a few kinds of nodes. The AST consist of the following -basic categories of nodes: +This interface provides the basic properties of any node in the tree for common uses. The +Kaleidoscope language is a simple one and, therefore, has only a few kinds of nodes. The AST +consist of the following basic categories of nodes: * [Root Node](#root-node) * [Function Declaration](#function-declaration) * [Function Definition](#function-definition) @@ -32,8 +33,8 @@ support for all nodes. [!code-csharp[RootNode](RootNode.cs)] ## Function Definition -FunctionDefinition, as the name implies, contains the definition of a function. This includes the signature -and the full body of the function. +FunctionDefinition, as the name implies, contains the definition of a function. This +includes the signature and the full body of the function. [!code-csharp[FunctionDefinition](FunctionDefinition.cs)] @@ -41,10 +42,10 @@ and the full body of the function. [!code-csharp[Function Signatures (Prototype)](Prototype.cs)] ## Variable Declaration -IVariableDeclaration is implemented by local variable declarations and parameter declarations. The -interface abstracts the differences between the two types of variable declarations for most common -cases. Most code generation or AST consumers don't care about the differences (i.e. Parameters have -an index but locals don't) +IVariableDeclaration is implemented by local variable declarations and parameter +declarations. The interface abstracts the differences between the two types of variable +declarations for most common cases. Most code generation or AST consumers don't care about +the differences (i.e. Parameters have an index but locals don't) [!code-csharp[IVariableDeclaration](IVariableDeclaration.cs)] @@ -55,30 +56,31 @@ an index but locals don't) [!code-csharp[Parameter declarations](ParameterDeclaration.cs)] ## Expression -Kaleidoscope is a functional language, all expressions produce a value, even if it is always zero. There -are no statements in the language. Expressions form the core of the language and the bulk of the AST. +Kaleidoscope is a functional language, all expressions produce a value, even if it is always +zero. There are no statements in the language. Expressions form the core of the language and +the bulk of the AST. -The IExpression interface forms the common interface for all AST expression nodes +The `IExpression` interface forms the common interface for all AST expression nodes [!code-csharp[IExpression](IExpression.cs)] -While this is an empty interface, it serves to distinguish between AST nodes that are not expressions. -Thus providing some type safety for consumers. (i.e. it makes no sense to have a prototype as the operand -for a binary operator so only nodes that implement the IExpression tag interface are allowed) This isn't -a common or generally recommended pattern for interfaces but makes sense here since some form of differentiation -is needed. +While this is an empty interface, it serves to distinguish between AST nodes that are not +expressions. Thus providing some type safety for consumers. (i.e. it makes no sense to have +a prototype as the operand for a binary operator so only nodes that implement the +`IExpression` tag interface are allowed) This isn't a common or generally recommended +pattern for interfaces but makes sense here since some form of differentiation is needed. ### Unary Operator -Unary operators are all user defined, so the AST simply represents them as a Function Definition. No -additional node types are needed for unary operators in the AST. +Unary operators are all user defined, so the AST simply represents them as a function +definition. No additional node types are needed for unary operators in the AST. ### Binary Operator -BinaryOperatorExpression covers the built-in operators, any user defined binary operators are transformed -to a function declaration/definition +BinaryOperatorExpression covers the built-in operators, any user defined binary operators +are transformed to a function declaration/definition [!code-csharp[BinaryOperatorExpression](BinaryOperatorExpression.cs)] -The properties are fairly self explanatory, including the kind of operator and the left and right sides of the -operator. The normal code generator pattern for the binary operators is: +The properties are fairly self explanatory, including the kind of operator and the left and +right sides of the operator. The normal code generator pattern for the binary operators is: 1. Generate code for the left side expression to a new value 2. Generate code for the right side expression to a new value @@ -86,28 +88,32 @@ operator. The normal code generator pattern for the binary operators is: 4. Return the result #### Assignment -Assignment is a special kind of binary operator to represent "store" semantics for a variable. (e.g. mutable variables). -Code generation for the assignment must handle the left side operand with a slightly different pattern. In particular, -the left hand side is not an evaluated expression. Instead, it is the variable to assign the right hand value to. Thus, -there isn't anything to evaluate for the left hand side as it is always a Variable Reference for the variable to assign -the value to. +Assignment is a special kind of binary operator to represent "store" semantics for a +variable. (e.g. mutable variables). Code generation for the assignment must handle the left +side operand with a slightly different pattern. In particular, the left hand side is not an +evaluated expression. Instead, it is the variable to assign the right hand value to. Thus, +there isn't anything to evaluate for the left hand side as it is always a Variable Reference +for the variable to assign the value to. ### Function Call -Calls to functions (extern, user defined operators, or user defined functions) are represented in the AST as a -FunctionCallExpression. The FunctionCallExpression contains the declaration of the function to call along with -expressions for all of the arguments to the function. +Calls to functions (extern, user defined operators, or user defined functions) are +represented in the AST as a FunctionCallExpression. The FunctionCallExpression contains the +declaration of the function to call along with expressions for all of the arguments to the +function. [!code-csharp[FunctionCallExpression](FunctionCallExpression.cs)] ### Variable Reference -A variable reference is used to refer to a variable. In most cases this represents implicit "load" semantics for a -variable. However, when used as the left hand side of an assignment operator, it has "store" semantics. +A variable reference is used to refer to a variable. In most cases this represents implicit +"load" semantics for a variable. However, when used as the left hand side of an assignment +operator, it has "store" semantics. [!code-csharp[VariableReferenceExpression](VariableReferenceExpression.cs)] ### Conditional -In Kaleidoscope conditional expressions follow the familiar if/then/else form, even though they are really more -like the ternary operator expression `( x ? y : z )` in C and related languages. +In Kaleidoscope conditional expressions follow the familiar if/then/else form, even though +they are really more like the ternary operator expression `( x ? y : z )` in C and related +languages. [!code-csharp[ConditionalExpression](ConditionalExpression.cs)] @@ -120,4 +126,3 @@ The for in expression is used to implement loops in Kaleidoscope. Var-In Expression is used to provide, potentially nested, local scopes for variables [!code-csharp[VarInExpression](VarInExpression.cs)] - diff --git a/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/ReadMe.md b/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/ReadMe.md index 060f66957..3aa9f92ee 100644 --- a/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/ReadMe.md +++ b/src/Samples/Kaleidoscope/Kaleidoscope.Grammar/ReadMe.md @@ -1,15 +1,14 @@ # Kaleidoscope Parser This library implements a Lexer, Parser and AST for the Ubiquity.NET.Llvm Kaleidoscope -tutorial. It currently leverages ANTLR4 to generate a parser and lexer core but the -actual parse technology is abstracted by the AST so it should be possible to use -any parse technology desired. (Though given what ANTLR4 provides it would take a -strong argument to use any other technology). +tutorial. It currently leverages ANTLR4 to generate a parser and lexer core but the actual +parse technology is abstracted by the AST so it should be possible to use any parse +technology desired. (Though given what ANTLR4 provides it would take a strong argument to +use any other technology). ## Language Syntax -The language syntax follows that of the official LLVM C++ tutorial though, unlike -the C++ version, all versions of the language use the same parser library. To -accomplish this, the grammar takes advantage of the dynamic parsing support found in -ANTLR4 so that various language features are enabled at runtime. This keeps the -individual chapter samples a bit cleaner and focused on the use of -`Ubiquity.NET.Llvm` instead of parsing techniques. +The language syntax follows that of the official LLVM C++ tutorial though, unlike the C++ +version, all versions of the language use the same parser library. To accomplish this, the +grammar takes advantage of the dynamic parsing support found in ANTLR4 so that various +language features are enabled at runtime. This keeps the individual chapter samples a bit +cleaner and focused on the use of `Ubiquity.NET.Llvm` instead of parsing techniques. diff --git a/src/Samples/Kaleidoscope/Kaleidoscope.Runtime/Kaleidoscope-Runtime.md b/src/Samples/Kaleidoscope/Kaleidoscope.Runtime/Kaleidoscope-Runtime.md index 4c3d2ea84..6a3205d6f 100644 --- a/src/Samples/Kaleidoscope/Kaleidoscope.Runtime/Kaleidoscope-Runtime.md +++ b/src/Samples/Kaleidoscope/Kaleidoscope.Runtime/Kaleidoscope-Runtime.md @@ -3,31 +3,35 @@ uid: Kaleidoscope-Runtime --- # Kaleidoscope.Runtime Library -The Kaleidoscope.Runtime Library provides a set of common support libraries to aid in keeping the -tutorial chapter code focused on the code generation and JIT support in Ubiquity.NET.Llvm rather then the -particulars of the Kaleidoscope language in general. It serves as a useful reference for the implementation -of other custom DSLs. (Along with a core part of the sample infrastructure of this repository) +The Kaleidoscope.Runtime Library provides a set of common support libraries to aid in +keeping the tutorial chapter code focused on the code generation and JIT support in +`Ubiquity.NET.Llvm` rather then the particulars of the Kaleidoscope language in general. It +serves as a useful reference for the implementation of other custom DSLs. (Along with a core +part of the sample infrastructure of this repository) ## Kaleidoscope specific REPL Loop support -The Kaleidoscope.Runtime library contains a language/runtime specific implementation of the classic Read, -Evaluate, Print, Loop (REPL) common for interactive/interpreted/JIT language run-times. This uses an -asynchronous pattern and allows cancellation via a standard cancellation token. This supports a clean shutdown -via a CTRl-C handler etc... +The Kaleidoscope.Runtime library contains a language/runtime specific implementation of the +classic Read, Evaluate, Print, Loop (REPL) common for interactive/interpreted/JIT language +run-times. This uses an asynchronous pattern and allows cancellation via a standard +cancellation token. This supports a clean shutdown via a CTRl-C handler etc... ## Kaleidoscope JIT engine -The JIT engine used for Kaleidoscope is based on the Ubiquity.NET.Llvm OrcJIT v2, which, unsurprisingly, uses -the LLVM OrcJit functionality to provide On Request Compilation (ORC). For most of the chapters, the JIT uses -a moderately lazy compilation technique where the source language is parsed, converted to LLVM IR and submitted -to the JIT engine. The JIT engine does not immediately generate native code from the module, however. Instead -it stores the module, and whenever compiled code calls to a symbol exported by the IR module, it will then -generate the native code for the function "on the fly". This has the advantage of not paying the price of -converting IR to native code if it is never actually used, though it does have the cost of converting the -source language to IR, even if the code will never execute. +The JIT engine used for Kaleidoscope is based on the Ubiquity.NET.Llvm OrcJIT v2, which, +unsurprisingly, uses the LLVM OrcJit functionality to provide On Request Compilation (ORC). +For most of the chapters, the JIT uses a moderately lazy compilation technique where the +source language is parsed, converted to LLVM IR and submitted to the JIT engine. The JIT +engine does not immediately generate native code from the module, however. Instead it stores +the module, and whenever compiled code calls to a symbol exported by the IR module, it will +then generate the native code for the function "on the fly". This has the advantage of not +paying the price of converting IR to native code if it is never actually used, though it +does have the cost of converting the source language to IR, even if the code will never +execute. ### Really lazy compilation -While the basic lazy compilation of IR to native code has performance benefits over a pure interpreter, it -still has the potential for wasted overhead converting the parsed language to LLVM IR. Fortunately, the LLVM -and Ubiquity.NET.Llvm.OrcJitv2 supports truly lazy compilation. This is done by asking the JIT to create a -stub for a named symbol and then, whenever code calls that symbol the stub calls back to the JIT which then -calls back the application to 'materialize' the IR, add the module to the JIT and trigger compilation to +While the basic lazy compilation of IR to native code has performance benefits over a pure +interpreter, it still has the potential for wasted overhead converting the parsed language +to LLVM IR. Fortunately, the LLVM and Ubiquity.NET.Llvm.OrcJitv2 supports truly lazy +compilation. This is done by asking the JIT to create a stub for a named symbol and then, +whenever code calls that symbol the stub calls back to the JIT which then calls back the +application to 'materialize' the IR, add the module to the JIT and trigger compilation to native. Thus, achieving true Just-In-Time compilation. diff --git a/src/Samples/OrcV2VeryLazy/OrcV2VeryLazy.md b/src/Samples/OrcV2VeryLazy/OrcV2VeryLazy.md index af0854922..71806b0a0 100644 --- a/src/Samples/OrcV2VeryLazy/OrcV2VeryLazy.md +++ b/src/Samples/OrcV2VeryLazy/OrcV2VeryLazy.md @@ -2,14 +2,13 @@ uid: orcjitv2-very-lazy --- # ORC JIT v2 Very Lazy sample -This sample is based on the official LLVM C sample but adapted to demonstrate the use -of the `Ubiquity.NET.llvm` libraries. The sample builds a basic native function that -is provided to the JIT engine. When executed that function calls to an unresolved -function. The unresolved function body is materialized through a delegate that will -parse the LLVM IR for the body to produce the required module. It then "emits" that -module to the JIT engine before returning. This demonstrates how lazy JIT symbol -resolution and materializers operate to allow use with any source. In this sample the -source is just LLVM IR in textual form but that is not a requirement. (It helps to -keep the sample as simple as possible without crowding it with parsing and other -language specific cruft. For an example using a custom language AST see +This sample is based on the official LLVM C sample but adapted to demonstrate the use of the +`Ubiquity.NET.llvm` libraries. The sample builds a basic native function that is provided to +the JIT engine. When executed, that function calls to an unresolved function. The unresolved +function body is materialized through a delegate that will parse the LLVM IR for the body to +produce the required module. It then "emits" that module to the JIT engine before returning. +This demonstrates how lazy JIT symbol resolution and materializers operate to allow use with +any source. In this sample the source is just LLVM IR in textual form but that is not a +requirement. (It helps to keep the sample as simple as possible without crowding it with +parsing and other language specific cruft. For an example using a custom language AST see [Kaleidoscope Chapter 7.1](xref:Kaleidoscope-ch7.1)) diff --git a/src/Ubiquity.NET.ANTLR.Utils/ReadMe.md b/src/Ubiquity.NET.ANTLR.Utils/ReadMe.md index fb2546f37..7abac0ebf 100644 --- a/src/Ubiquity.NET.ANTLR.Utils/ReadMe.md +++ b/src/Ubiquity.NET.ANTLR.Utils/ReadMe.md @@ -1,18 +1,18 @@ # About -This library provides general extensions to ANTLR including adapter bindings -for the Ubiquity.NET.Runtime library. +This library provides general extensions to ANTLR including adapter bindings for the +`Ubiquity.NET.Runtime` library. # Key usage -* Get a SourceLocation from various ANTLR types (rule,tokens,terminals) - - This provides an adaptation to the abstract SourceLocation +* Get a SourceLocation from various ANTLR types (rule, tokens, terminals) + - This provides an adaptation to the abstract source location * Debug trace listener - Provides debug TRACE support for any parser by listening for every rule and using Debug.Trace() to generate a string representation of that rule. This is VERY useful when developing or debugging a grammar. * Adapter for parse error listeners to a unified and abstract `Ubiquity.NET.Runtime.IParseErrorListener`. - - This allows building consumers that deal with errors and remain independent of the parsing - technology. + - This allows building consumers that deal with errors and remain independent of the + parsing technology. * Extension functions that provides commonly used support for ANTLR - Get a character interval from a ParserRuleContext with support for the standard EOF rule. @@ -20,6 +20,6 @@ for the Ubiquity.NET.Runtime library. - Gets the source text from a rule context and recognizer that produced it. - Gets source text from a rule context and stream that it was parsed from. - Gets a unique ID for a parse tree - * Useful for building graphs of the result of parsing as many graphing representations - require a unique node id for every node in the graph. + * Useful for building graphs of the result of parsing as many graphing + representations require a unique node id for every node in the graph. diff --git a/src/Ubiquity.NET.InteropHelpers/Readme.md b/src/Ubiquity.NET.InteropHelpers/Readme.md index 5adfffb7b..71cf94c5e 100644 --- a/src/Ubiquity.NET.InteropHelpers/Readme.md +++ b/src/Ubiquity.NET.InteropHelpers/Readme.md @@ -8,20 +8,20 @@ library for any code base providing interop support. * String handling * A lot of interop deals with strings in some form or another and handling them is a major amount of effort for most interop libraries. The support provided here - enables lazy evaluation/marshalling and encoding of native strings and managed strings. - These allow a simple `byte[]` to store a native string and ONLY marshals to a UTF16 - managed string once when needed. This allows storing and passing strings in their - native form for FAST retrieval from a native call and then providing that same string - as an `in` parameter in another call. All without the need to marshal from native and - then back again just for the call. This is a MAJOR performance enhancement for APIs - that deal in strings. + enables lazy evaluation/marshalling and encoding of native strings and managed + strings. These allow a simple `byte[]` to store a native string and ONLY marshals to a + UTF16 managed string once when needed. This allows storing and passing strings in + their native form for FAST retrieval from a native call and then providing that same + string as an `in` parameter in another call. All without the need to marshal from + native and then back again just for the call. This is a MAJOR performance enhancement + for APIs that deal in strings. * Delegates and NativeCallbacks as Function pointers * Function pointers are a new feature of C# that makes for very high performance interop scenarios. However, sometimes the callback for a function pointer actually needs additional data not part of the parameters of the function to work properly. This library provides support for such scenarios where a delegate is used to "capture" the - data while still supporting AOT scenarios. (NOTE: Marshal.GetFunctionPointerForDelegate() - must dynamically emit a thunk that contains the proper signature and the captured - "this" pointer so is NOT AOT friendly) The support offered in this library, though a - bit more tedious, is AOT friendly. + data while still supporting AOT scenarios. (NOTE: + `Marshal.GetFunctionPointerForDelegate()` must dynamically emit a thunk that contains + the proper signature and the captured "this" pointer so is NOT AOT friendly) The + support offered in this library, though a bit more tedious, is AOT friendly. diff --git a/src/Ubiquity.NET.Llvm/OrcJITv2/ReadMe.md b/src/Ubiquity.NET.Llvm/OrcJITv2/ReadMe.md index 6d99e6bd9..7c63f5a88 100644 --- a/src/Ubiquity.NET.Llvm/OrcJITv2/ReadMe.md +++ b/src/Ubiquity.NET.Llvm/OrcJITv2/ReadMe.md @@ -1,9 +1,9 @@ # LLVM ORC JIT v2 support ## Known issues -There is a [known bug](https://github.com/llvm/llvm-project/issues/65641) with LLVM -JIT object creation. The default LLJIT exposed by the LLVM-C API library does NOT -use JitLink as recommended. Changing to support that, may alleviate the problem. But, -as of this build it can still happen. Experience has shown it only appears to hit for -a Release build with an attached debugger. So, it's a very small window of opportunity -to hit but when it does, it's a HARD crash. +There is a [known bug](https://github.com/llvm/llvm-project/issues/65641) with LLVM JIT +object creation. The default LLJIT exposed by the LLVM-C API library does NOT use JitLink +as recommended. Changing to support that, may alleviate the problem. But, as of this build +it can still happen. Experience has shown it only appears to hit for a Release build with +an attached debugger. So, it's a very small window of opportunity to hit but when it does, +it's a HARD crash. diff --git a/src/Ubiquity.NET.Llvm/ReadMe.md b/src/Ubiquity.NET.Llvm/ReadMe.md index b4dee5f9e..e3d969df1 100644 --- a/src/Ubiquity.NET.Llvm/ReadMe.md +++ b/src/Ubiquity.NET.Llvm/ReadMe.md @@ -1,8 +1,8 @@ # About Ubiquity.NET.Llvm provides a managed Object Oriented (OO) wrapper around an extended C ABI for LLVM (An extended form of the official LLVM-C ABI is needed to fill in some gaps -in the official implementation to provide a robust managed OO wrapper. The number of extensions -required generally decreases with each release of LLVM). +in the official implementation to provide a robust managed OO wrapper. The number of +extensions required generally decreases with each release of LLVM). ## Key Features * OO Wrapper around the LLVM API that closely follows the underlying C++ object model diff --git a/src/Ubiquity.NET.Llvm/Transforms/Legacy/ReadMe.md b/src/Ubiquity.NET.Llvm/Transforms/Legacy/ReadMe.md index ba974b6f9..757b51e6a 100644 --- a/src/Ubiquity.NET.Llvm/Transforms/Legacy/ReadMe.md +++ b/src/Ubiquity.NET.Llvm/Transforms/Legacy/ReadMe.md @@ -1,10 +1,11 @@ ## Legacy Pass Manager support LLVM has changed the mechanisms for optimization and passes over the years. Sadly that means -there are two distinct sets of mechanics for the passes. Worse, that LLVM itself is apparently -still in transition (perhaps in permanent bifurcation?) so that BOTH forms are used depending -on the context. According to [LLVM docs](https://llvm.org/docs/WritingAnLLVMPass.html#introduction-what-is-a-pass) -the legacy pass manager is still used for code generation. Though the definition of that term -is left undefined it is assumed to mean the final stages of compilation that generate the -target machine code. This much is at least consistent with the naming of things and support in -the C-API. (You can really only add the target specific analysis passes to a legacy pass manager). -For everything else you need the new pass manager support. +there are two distinct sets of mechanics for the passes. Worse, that LLVM itself is +apparently still in transition (perhaps in permanent bifurcation?) so that BOTH forms are +used depending on the context. According to +[LLVM docs](https://llvm.org/docs/WritingAnLLVMPass.html#introduction-what-is-a-pass) the +legacy pass manager is still used for code generation. Though the definition of that term is +left undefined it is assumed to mean the final stages of compilation that generate the +target machine code. This much is at least consistent with the naming of things and support +in the C-API. (You can really only add the target specific analysis passes to a legacy pass +manager). For everything else you need the new pass manager support. diff --git a/src/Ubiquity.NET.SrcGeneration/ReadMe.md b/src/Ubiquity.NET.SrcGeneration/ReadMe.md index f0d39e39c..133bd78f7 100644 --- a/src/Ubiquity.NET.SrcGeneration/ReadMe.md +++ b/src/Ubiquity.NET.SrcGeneration/ReadMe.md @@ -44,7 +44,8 @@ While other languages are possible this is the only one currently "built-in". for use in determining if analyzers apply or not. (Usually analyzers are set to ignore generated code as it isn't something that is controlled by the developer). * Write an auto generated comment as a scope to allow adding custom content in the comment - * Closing of the comment region doesn't occur until the return is Disposed (RAII pattern) + * Closing of the comment region doesn't occur until the return is Disposed (RAII + pattern) * Write a namespace scope * All subsequent content is indented for the scope * Closing of the scope doesn't occur until the return is Disposed (RAII pattern).