Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Create LLVM scope class for use with LLVM libraries #83

Merged
merged 7 commits into from
Jul 15, 2022

Conversation

kparzysz-quic
Copy link

@kparzysz-quic kparzysz-quic commented Jun 27, 2022

@kparzysz-quic
Copy link
Author

kparzysz-quic commented Jun 27, 2022

Rendered (updated to include 8a739f8).

@tqchen
Copy link
Member

tqchen commented Jun 29, 2022

cc @junrushao1994

// Let's see the LLVM IR and MIR after each transformation, i.e. use
// -print-after-all in codegen.
my_target = Target("llvm -mtriple myarch-unknown-elf -llvm-options=print-after-all");
LLVMTarget llvm_target(my_target);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @kparzysz-quic . I wonder if there could be potential mis-use if a user created multiple LLVMTarget, since target is more like an config, rather than ensuring a scoping info.

A different name might help(e.g. something that implies scope), alternatively, we can also aligns to our current With convention, which clearly indicate scoping

With<LLVMTarget> llvm_scope(llvm_target);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think renaming LLVMTarget to LLVMScope is probably technically a better approach at the moment, although personally I like With a bit better (since it resembles the Python idiom, both visually and functionally).

The problem is when we want to initialize the LLVM scope by deserializing llvm::Module (either from file, or from a string). Since the target string is stored in the module, the llvm::Module needs to be created first (which means that it would be created before the scope object itself). To hide the gap between creating llvm::Module, and creating the scope object, the LLVMTarget (using old naming) implements both steps in a single function call. This call then returns both, the llvm::Module and the scope object.

The issue with With in the above situation is that it doesn't apply very nicely to this case. The return type (currently named ModuleData (for a lack of a better idea), is

std::pair<std::unique_ptr<llvm::Module>, std::unique_ptr<LLVMTarget>>

With With, it would become

std::pair<std::unique_ptr<llvm::Module>, std::unique_ptr<With<LLVMTarget>>>

which separates the "with" from its scope. This is not formally wrong, but loses the visual impact of "with".

Krzysztof Parzyszek added 3 commits June 29, 2022 12:36
Clarify that this RFC proposes a scope object that can be used later
to save/restore LLVM's options. The saving and restoring itself is
not a part of the RFC, since the `llvm` target in TVM does not yet
allow passing LLVM command line flags.
Krzysztof Parzyszek added 2 commits June 29, 2022 12:40
There was another passage about it that was ommitted from previous commits.
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broadly LGTM, but let's let TQ/Junru and any others comment on this/approve as well. unifying the interface between TVM Target and LLVM options seems like a net win for code complexity. my one question, which i think is mainly open-ended now, is how we handle LLVM arch that have unique flags that, right now, would show up in TVM as target-specific llvm flags.

@kparzysz-quic
Copy link
Author

my one question, which i think is mainly open-ended now, is how we handle LLVM arch that have unique flags that, right now, would show up in TVM as target-specific llvm flags.

I'd have to see a practical example of that to get an idea of what's expected. For Hexagon we'd want to automatically append certain flags (that enable auto-vectorization in LLVM, for example). I had one idea for that, but it may need a revision.

@kparzysz-quic
Copy link
Author

@tqchen @junrushao1994 Do you have any additional comments?

@kparzysz-quic kparzysz-quic changed the title [RFC] Encapsulate LLVM target for use with LLVM libraries [RFC] Create LLVM scope class for use with LLVM libraries Jul 7, 2022
@tqchen
Copy link
Member

tqchen commented Jul 8, 2022

Thanks @kparzysz-quic . Sorry for the delayed reply since we are taking break here.

Overall I like the direction we are going. Just to figure out the spectrum of possible APIs

My main question is how are we going to interact with multiple ParseIR calls. Some example would be helpful. For example, is it OK for us to have nested LLVMScope

void Example() {
     LLVMScope scope1(target);
     {
          // what is the effect here, seems mod_data1 is immediate in scope
          auto mod_data1 = LLVMScope::ParseIR(name);
     }
}

I also wonder if there is a way to "defer" the scope initialization. e.g. can LLVMModule be created, stored in the LLVMScope, but the target does not take in-effect until we enter the scope. We call InitializeLLVM in the constructor of LLVMTarget, but do other things like option setting in the enter stage.Something like

void Example() {
     LLVMTarget target1(target);
     auto mod_data = LLVMTarget::ParseIR(name);
     // enter target1 scope
     With<LLVMTarget> scope1(target1);
     {
           // entering target in mod_data
          With<LLVMTarget> scope2(mod_data);
     }
}

I think the main question is how coupled the operations related to LLVM are.

@kparzysz-quic
Copy link
Author

Both LoadIR or ParseIR take an optional pre-existing LLVMContext, and return a pair {llvm::Module, LLVMScope}, where the scope object in the pair is newly constructed, and contains the given LLVMContext (or create a new one if none was given). In the first call to ParseIR, the caller can take the second element of the pair and just use it as the scope. In a subsequent call, the caller can simply discard the second element, which will then be automatically destroyed.
An example of calling ParseIR or LoadIR when a LLVM scope already exists is in HandleImport in codegen_llvm.cc.

You bring up a good point about when the scope "takes effect", or binds to LLVM. I propose that the scope becomes active when a new LLVMContext is created (i.e. the scope is really determined by LLVMContext, not by the LLVMScope). We could also define "takes effect" to mean that locks (or failure-to-create object) could be acquired. That would mean that ParseIR and LoadIR return an active scope. Also, if they are called with a context argument (meaning that they are called from an active scope), they wouldn't need to create a new context, and so they wouldn't block (if we were to add locks). In such case they would return another scope object with the same LLVMContext. This shouldn't cause any problems even if both scope objects were used, but could make the code confusing to read. If we follow the convention that the "extra" scope is to be ignored, this would avoid the confusion. There may be details to be sorted out about releasing locks, etc, but those are implementation details.

Side note: Ideally, prior to the activation of the scope, none of LLVM code or data types would be exposed to the user, but this RFC doesn't try to go that far. The main concern here is a platform on which saving/restoring LLVM flags could be done.

I also suggest to create two classes: LLVMScope and LLVMTarget. The LLVMTarget would contain all the LLVM target flags and options, while LLVMScope would deal with activation, ParseIR and LoadIR. The LLVMTarget object could be passed around, copied, etc (without concerns about locking or activating anything), and its purpose would be to have all the target flags/options in one place, and it could only exist once scope has been activated.

Something like

class LLVMTarget {
public:
  LLVMTarget(const LLVMTarget&);  // copy ctor, etc.

private:
  friend class LLVMScope;
  LLVMTarget(ctx, values extracted from tvm::Target);

  std::shared_ptr<llvm::LLVMContext> ctx_;    // set via the private ctor, passed from LLVMContext
  llvm::TargetMachine *tm;
  ...
};

class LLVMScope {
public:
  LLVMScope(const Target& target) {
    // Parse the target into individual attributes, e.g. mtriple, etc. This would allow users
    // to create LLVMScope from Target and modify it before creating LLVMTarget.
    //
    // If the target has non-empty list of LLVM flags, it would be considered "modifying"
    // the global state, otherwise it would be "read-only".  This would allow creating
    // multiple "read-only" concurrent scopes, but only one "modifying" one.
  }

  void activate() {
     ctx_ = std::shared_ptr<llvm::LLVMContext>(CreateNewContext(is_read_only));
  }

  static std::pair<llvm::Module, LLVMScope> ParseIR(std::string ir) {   // same for LoadIR
    auto ctx = std::shared_ptr<llvm::LLVMContext>(CreateNewContext(false /*assume modifying*/));
    llvm::Module m = llvm::parse_string_etc(ctx, ...)
    LLVMScope scp(ctx, get_target_string(m));   // create an already active scope
    return {m, scp};
  }

  LLVMTarget getTargetInfo() {
    ICHECK(ctx_ != nullptr) << "must activate first";
    ...
  }

private:
  static llvm::LLVMContext* CreateNewContext(bool is_read_only) {
     // Create context with appropriate locking
  }
 
  LLVMScope(std::shared_ptr<llvm::LLVMContext> ctx, const Target& target);
  std::shared_ptr<llvm::LLVMContext> ctx_;
};

@kparzysz-quic
Copy link
Author

To address the second question: we shouldn't allow nesting of LLVM scopes. The reason is that if the target doesn't specify any command line flags, the assumption is that it will rely on LLVM's defaults. If we were to nest scopes, and the outer scope did modify flags, then the inner scope (without flags) would not have a way of knowing that the global state has been altered.

@tqchen
Copy link
Member

tqchen commented Jul 8, 2022

Just to followup, why does ParseIR require activation of a scope, or is it possible that ParseIR returns a tuple of Target and Module, where activation is done separately.

I am asking this mainly to see if we can get an explicit With style API

@kparzysz-quic
Copy link
Author

kparzysz-quic commented Jul 8, 2022

Just to followup, why does ParseIR require activation of a scope, or is it possible that ParseIR returns a tuple of Target and Module, where activation is done separately.

The reason is that ParseIR needs to make calls to LLVM functions to create the llvm::Module.

Edit: I like the explicit With, but it doesn't work well with ParseIR/LoadIR. See my earlier comment.

@tqchen
Copy link
Member

tqchen commented Jul 11, 2022

The reason is that ParseIR needs to make calls to LLVM functions to create the llvm::Module.

I get this part, my main question is whether we can initializeLLVM in the Target constructor but do option resetting in enter/exit, i could miss something here.

@kparzysz-quic
Copy link
Author

The reason is that ParseIR needs to make calls to LLVM functions to create the llvm::Module.

I get this part, my main question is whether we can initializeLLVM in the Target constructor but do option resetting in enter/exit, i could miss something here.

I see what you mean. My concern is that since command line options can be used almost everywhere in LLVM (not just optimizations or code generation), we should try to do the saving as early as possible (and restoring as late as possible). Ideally all the LLVM calls in the program would be enclosed in the save/restore functions (even the bitcode reader registers a couple of command line flags).

At the same time we could make a deliberate decision to only apply the save/restore to a part of LLVM functionality, i.e. allow some kinds of LLVM function calls to happen outside of the save/restore scope. Let me know what you think.

@tqchen
Copy link
Member

tqchen commented Jul 11, 2022

Logically, there are two kinds of operations involved:

  • K0: operations that deserializes/deserializes the data structure (llvm::Module)
  • K1: operations that transforms the llvm::Module

We are certainly in agreement that all K1 should be enclosed with option save/recovery scope. Parse/Save seems to belong to K0. From a logic reasoning pov, operations in K0 should not be impacted by cli opt as a result my comment.

But I think @kparzysz-quic what you mean is that cl opt might breach into K0 as well. If that is really the case, it would indeed be helpful to take the setting/recover at the load/store level. Would be great to reason and confirm a bit. Since the ability to de-couple K0 and K1 is nice from the overall code structuring pov

@kparzysz-quic
Copy link
Author

It does, yes: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Bitcode/Reader/BitcodeReader.cpp#L91-L99

The flags above come from the latest development branch, so in order to handle all LLVM versions we'd need to check all of them since 4.0 (which I'm ok doing), and we will need to keep this code up to date with the latest developments in LLVM.

On the other hand, these aren't options that are very likely to be used by TVM users, and the issue with command line flags only really applies to deserialization (since serialization would happen after the flags had been saved). I don't think it's unreasonable to state that we only handle (K1), but we'd need to ensure that (K1) also included creation of any data structures that assist in modifying or in analysis of llvm::Module.

@tqchen
Copy link
Member

tqchen commented Jul 12, 2022

In that case, I think it would be nice to state we handle K1, and leave ONLY serialization/deserialization out. This way we can have With style API, which indicate that we are explicit entering the scope for any processing.

auto data = LLVMTarget::ParseIR(file_name);
With<LLVMTarget> scope(mod_data);

``

LLVMScope(const Target& target);
~LLVMScope();

std::pair<llvm::Module, LLVMScope> LoadIR(const std::string& file_name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the semantics of this function? Does it create a new scope(besides the current one?) A code example that demonstrates LoadIR would be helpful

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments with descriptions.

@kparzysz-quic
Copy link
Author

I updated the text of the RFC, the rendered link above, and the prototype draft PR.

@kparzysz-quic
Copy link
Author

Should we still wait for review from Junru?

@tqchen
Copy link
Member

tqchen commented Jul 15, 2022

I think we can go ahead and merge

@tqchen tqchen merged commit 22d1d11 into apache:main Jul 15, 2022
@kparzysz-quic kparzysz-quic deleted the llvm-target branch July 15, 2022 19:03
kparzysz-quic pushed a commit to apache/tvm that referenced this pull request Aug 3, 2022
This implements RFC 80. See apache/tvm-rfcs#83.

Summary of changes:
- Created an `LLVMInstance` class. Uses of LLVM functions and data struc-
tures should be contained within the lifetime of an object of this class.
LLVMInstance object contains LLVMContext, and implements member functions
to deserialize an llvm::Module.
- Created an `LLVMTarget` class. Once an LLVMInstance object has been
created, an object of LLVMTarget class can be created from TVM target
string, or Target object for "llvm" target. Once LLVM command line flags
are added to the "llvm" target, one of the goals of this object will be
to save/restore relevant LLVM global state. Another objective for the
LLVMTarget object is to be a single location for all LLVM-related
compilation structures and options (such as TargetMachine, FastMathFlags,
etc.)
xinetzone pushed a commit to daobook/tvm that referenced this pull request Nov 25, 2022
…2140)

This implements RFC 80. See apache/tvm-rfcs#83.

Summary of changes:
- Created an `LLVMInstance` class. Uses of LLVM functions and data struc-
tures should be contained within the lifetime of an object of this class.
LLVMInstance object contains LLVMContext, and implements member functions
to deserialize an llvm::Module.
- Created an `LLVMTarget` class. Once an LLVMInstance object has been
created, an object of LLVMTarget class can be created from TVM target
string, or Target object for "llvm" target. Once LLVM command line flags
are added to the "llvm" target, one of the goals of this object will be
to save/restore relevant LLVM global state. Another objective for the
LLVMTarget object is to be a single location for all LLVM-related
compilation structures and options (such as TargetMachine, FastMathFlags,
etc.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants