Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CODEGEN] Enable inline llvm asm code #1486

Merged
merged 1 commit into from
Jul 25, 2018
Merged

[CODEGEN] Enable inline llvm asm code #1486

merged 1 commit into from
Jul 25, 2018

Conversation

tqchen
Copy link
Member

@tqchen tqchen commented Jul 25, 2018

Ref #1276

Usage

We can have a pragma attribute pragma_import_llvm, which can be used in either ir builder or schedule

Example

def test_llvm_import():
    # extern "C" is necessary to get the correct signature
    # we can put inline asm here
    cc_code = """
    extern "C" float my_add(float x, float y) {
      return x + y;
    }
    """
    n = 10
    A = tvm.placeholder((n,), name='A')
    B = tvm.compute((n,), lambda *i:
                    tvm.call_pure_extern("float32", "my_add", A(*i), 1.0),
                    name='B')
    def check_llvm(use_file):
        if not tvm.module.enabled("llvm"):
            return
        if not clang.find_clang(required=False):
            print("skip because clang is not available")
            return
        temp = util.tempdir()
        ll_path = temp.relpath("temp.ll")
        # Create LLVM ir from c source code
        ll_code = clang.create_llvm(cc_code, output=ll_path)
        s = tvm.create_schedule(B.op)
        if use_file:
            s[B].pragma(s[B].op.axis[0], "import_llvm", ll_path)
        else:
            s[B].pragma(s[B].op.axis[0], "import_llvm", ll_code)
        # BUILD and invoke the kernel.
        f = tvm.build(s, [A, B], "llvm")
        ctx = tvm.cpu(0)
        # launch the kernel.
        a = tvm.nd.array(np.random.uniform(size=n).astype(A.dtype), ctx)
        b = tvm.nd.array(np.random.uniform(size=n).astype(B.dtype), ctx)
        f(a, b)
        np.testing.assert_allclose(
            b.asnumpy(), a.asnumpy() + 1.0)
    check_llvm(use_file=True)

@tqchen tqchen mentioned this pull request Jul 25, 2018
4 tasks
@tqchen
Copy link
Member Author

tqchen commented Jul 25, 2018

@masahi @yzhliu @eqy @ajtulloch please review

@masahi
Copy link
Member

masahi commented Jul 25, 2018

Interesting. I'll try using PTX's shuffle instructions and AMDGCN's DPP instructions. I have a good use case for them.

@tqchen tqchen merged commit b061eb1 into apache:master Jul 25, 2018
@aditya4d1
Copy link
Contributor

@masahi dpp instructions are cross lane ops. For this test, you may not need dpp instructions

@masahi
Copy link
Member

masahi commented Jul 25, 2018

@adityaatluri what do you mean by "for this test"? I found that MIOpen uses dpp instructions for doing winograd input/filter/output transform. I can imagine using 4 threads cooperatively to compute 4x4 tile trasnform, but I haven't figured out how exactly they do it.

@ajtulloch
Copy link
Contributor

This looks great @tqchen. One thing I'd add (which was surprising) was that generating the .ll/.bc with O3 is quite important, since LLVM generates a lot of crap before/after the inline asm block, and this doesn't get entirely stripped when it's inlined into the larger function. So I think adding options=["-O3"] or similar would be useful here, I can send a patch.

@masahi
Copy link
Member

masahi commented Jul 26, 2018

@ajtulloch I think you can pass any compiler flags you like to create_llvm() function above.

@tqchen
Copy link
Member Author

tqchen commented Jul 26, 2018

I think it is good to options=["-O3"] by default if user do not provide any option flags

@aditya4d1
Copy link
Contributor

aditya4d1 commented Jul 27, 2018

I don't know if MIOpen uses DPP, but DPP are cross lane ops and there are no programming language constructs to express them.
If you are interested in how DPP can be used for GEMM, this blog can help you:
https://adityaatluri.github.io/2018/04/29/decreasing-lds-memory-requests.html

tqchen added a commit to tqchen/tvm that referenced this pull request Aug 4, 2018
sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018
gussmith23 added a commit to gussmith23/tvm that referenced this pull request May 1, 2019
This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in apache#1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.
gussmith23 added a commit to gussmith23/tvm that referenced this pull request May 1, 2019
This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in apache#1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.
gussmith23 added a commit to gussmith23/tvm that referenced this pull request May 2, 2019
This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in apache#1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.
gussmith23 added a commit to gussmith23/tvm that referenced this pull request May 10, 2019
This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in apache#1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.
gussmith23 added a commit to gussmith23/tvm that referenced this pull request May 11, 2019
This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in apache#1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.
tmoreau89 pushed a commit that referenced this pull request May 15, 2019
* Register and use custom datatypes in TVM

This patch adds the ability to register and use a custom datatype from Python,
using the `register_datatype` call. The datatype can then be passed as the
`dtype` parameter using the syntax `dtype="custom[<type_name>]bitsxlanes"`.

* Removes extra file

* Register custom datatypes with TVM; specify Cast and Add lowering

This commit adds functionality for registering custom datatypes with TVM, and
furthermore adding custom lowering functions to lower those custom datatypes.
This commit only adds lowering for the Cast and Add ops; more ops will be added
soon.

Check out some custom datatype samples in my repository of samples:
https://github.com/gussmith23/tvm-custom-datatype-samples

* Register and lower casts from Python

* Formatting

* Fix include; was including too much

* Add comment

* Add DatatypeRegistered

* Add storage size field to custom datatypes

This field indicates the bitwidth of the opaque block of data into which
instances of the datatype will be stored, when TVM compiles. For example, if I
create a datatype with a storage size of 16, then
- Constants of that datatype will be created as unsigned 16-bit ints
- Calls to external functions taking that datatype will pass the data as
  unsigned 16-bit ints
- External functions returning that datatype will be assumed to return unsigned
  16-bit ints.

* Change how lowering funcs (Cast and other ops) are named in registry

tvm.datatypes.lower.<target>.cast.<dst-type>.<src-type>
becomes
tvm.datatypes.lower.<target>.Cast.<dst-type>.<src-type>

And fixes some sloppy code around how the other ops were being formatted.

* Update Python register_datatype to accept storage size

* Oops, left out one cast->Cast change

* Look up storage size when parsing `custom[typename]`

When we encounter this type string in Python, it will be parsed into a Halide
type object in C++. Some of my original code supported this parsing, but we now
have to attach the storage type to the type (by setting the bits field).

* Change how external calls for casting/other ops are done

Firstly, we now use the storage size of the custom type when determining
input/output types; e.g. a cast to a custom type with storage size 16 is seen as
a call to an external function returning an opaque uint of size 16.

Secondly, write a macro to handle the other ops. Originally I thought I could
handle these at runtime, with a single `_register_op` global. I transitioned
instead to using individual `_register_Add` etc. calls generated with a macro,
but I don't remember why.

* When encountering a custom type immediate, generate UIntImm

* Translate custom types to LLVM type

* Generate correct return type in Casts

Originally I was assuming that the result type from casts was always a custom
datatype, and so I was making the Call return a UInt type.

* Use TVM-idiomatic recursion style in DatatypesLowerer

This was actually a bug, I'm pretty sure; we wouldn't have recursed deep on any
complex programs. As a result of making this change, I also uncovered another
potential bug, where the datatypes lowering pass would attempt to lower a Load
of a custom type. By commenting out the `Mutate_` for Load, I was able to stop
the error from cropping up, but frankly, I'm not satisfied with the solution;
how is it that we are able to run codegen when Loads of custom datatypes are
present in the IR? I have not written any code, to my knowledge, that will
support this. Perhaps Load does not care about the underlying datatype?

* Use CHECK

* Add comment about which Mutate_s are needed

* Add comments

* Add GetCustomDatatypeRegistered as an extern C function

* Formatting, comments, casting

* Change how datatype string is formatted

* Use bits() instead of GetStorageSize

Use bits() instead of GetStorageSize

* Change comment

* Add datatype.py

* Change registered function name (datatypes->datatype)

* Remove GetStorageSize

* Format custom datatypes like any other datatype

Specifically, we now print the bits and lanes after the `custom[...]` string.

* Correctly implement datatype lowering in Python

* Remove unneeded include

* Make function naming consistent

* Use CHECK instead of internal_assert

* Rename macro

* Formatting

* Rename functions

* Implement Cast lowering

`_datatype_register_op` is now able to lower both binary ops and Casts.

* Formatting

* Formatting

* Clang format, google style

* Fix std::string/extern "C" warnings

* Formatting

* Formatting

* Lower Allocates and Loads during datatype lowering

This should ensure that there are no custom datatypes remaining once datatype
lowering is done. This will allow us to remove the code in the LLVM codegen
which deals with custom datatypes.

* Revert additions to codegen_llvm.cc which are now unneeded

* Pass cpplint on lower_datatypes.cc

* Add clarifying comment

* Remove datatype lowering registration funcs from C++

* Add CHECKs

* Remove TODO

* Remove all references to storage size

* Move and rename function

* Rename function

* Remove done TODOs and other handled comments

* Remove irrelevant Load code and comments

* Comment out the IR node types I'm not sure about yet

* Add bfloat16 datatype unittest

* Fix MakeConstScalar

MakeConstScalar for a custom datatype will now call out to a function which can
be registered on a per-datatype basis. The function will take a double and
return the equivalent value in the custom datatype format.

Note that these code paths are not actually used or tested at the moment. I have
not yet written an example which uses const scalars of a custom datatype.

* Formatting

* Change pass name

* Allow users to register whatever lowering function they want

Tianqi pointed out that users should be able to register whatever lowering
function they want, and should not be constrained to registering lowering
functions which just call out to external libraries.

I still provide a function for making lowering functions which call out to
external libraries, for convenience.

* Add clarifying comment

* Remove unneeded comment

* Remove unneeded function

* Rename file

* Undo unnecessary change

* Undo unnecessary change

* Make naming consistent

Rename "datatypes" to "custom datatypes" in most contexts.

* Revert an artifact of old code

* Fix build warnings, add TODO

* Lint

* Remove unnecessary use of extern C by separating decl and impl

* Error checking

* Remove TODO

* Missed a name change

* Lint

* Python lint

* Correctly format datatype

* Move bfloat16 to 3rdparty

* "custom_datatypes" --> "datatype" in most places

I left the pass as "LowerCustomDatatypes" to indicate that we're not lowering
anything other than custom datatypes. Otherwise, everything else has been
changed.

* Upgrade datatype unittest

I used a float calculator to generate some real testcases for the unittest.

* Separate public includes and private implementation

Specifically, create cleaner decoupling between datatypes stuff in packed_func
and the datatype registry implementation.

* Formatting

* Limit custom datatype codes to >128

* Add TODOs

* Fix comment

* Formatting

* Clean up datatype unittest

* Remove un-exported functions in public headers; UIntImm->FloatImm

More places where I accidentally was using implementation-only functions in
public headers.

Additionally, store custom datatype immediates as FloatImms. A later change will
add new lowering logic to lower these FloatImms to UIntImms.

Plus formatting change.

* Lint

* Use FloatImm (not UIntImm) to hold immediates of custom datatypes

This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in #1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.

* Re-add a change I removed accidentally during rebase

* Cleanup

* Remove unnecessary TVM_DLLs

* Add custom datatype utilities source file to Go runtime pack

* Revert "Remove unnecessary TVM_DLLs"

This reverts commit 4b742b9.

* Mark bfloat code as TVM_DLL

* Moves custom datatype runtime utilities to c_runtime_api.cc

* Revert "Add custom datatype utilities source file to Go runtime pack"

This reverts commit aecbcde.

* Move datatype parsing to its own function

* Change comments

* Remove unneeded function

* Formatting

* Formatting

* Documentation

* Add kCustomBegin, use it for checking for custom types

* Documentation

* Formatting

* Move static definition to implementation

* Remove comment

* Decide toBeLowered before lowering arguments of Expr

In the past, e.g. when lowering custom datatypes for an Add, we would lower a
and b first, and then decide whether the resulting new Add needed to be lowered
based on the (new) types of a and b. Now, instead, we need to check the types of
a and b first (to see if they're custom types), and then lower them (so they'll
become non-custom types), and then lower the new Add.

* Revert "Move datatype parsing to its own function"

This reverts commit d554a58.

This broke parsing. Will figure this out later. There isn't a really clean way
to separate this out given how the rest of the function is written.

* Replace comment

* Documentation

* Remove comment and TVM_DLL

* Better error messages

* Remove artifact of rebase

* Separate datatypes parsing to its own function

* Add \returns

* Comment changes; add TODO

* Refactor tests
wweic pushed a commit to wweic/tvm that referenced this pull request Jun 26, 2019
* Register and use custom datatypes in TVM

This patch adds the ability to register and use a custom datatype from Python,
using the `register_datatype` call. The datatype can then be passed as the
`dtype` parameter using the syntax `dtype="custom[<type_name>]bitsxlanes"`.

* Removes extra file

* Register custom datatypes with TVM; specify Cast and Add lowering

This commit adds functionality for registering custom datatypes with TVM, and
furthermore adding custom lowering functions to lower those custom datatypes.
This commit only adds lowering for the Cast and Add ops; more ops will be added
soon.

Check out some custom datatype samples in my repository of samples:
https://github.com/gussmith23/tvm-custom-datatype-samples

* Register and lower casts from Python

* Formatting

* Fix include; was including too much

* Add comment

* Add DatatypeRegistered

* Add storage size field to custom datatypes

This field indicates the bitwidth of the opaque block of data into which
instances of the datatype will be stored, when TVM compiles. For example, if I
create a datatype with a storage size of 16, then
- Constants of that datatype will be created as unsigned 16-bit ints
- Calls to external functions taking that datatype will pass the data as
  unsigned 16-bit ints
- External functions returning that datatype will be assumed to return unsigned
  16-bit ints.

* Change how lowering funcs (Cast and other ops) are named in registry

tvm.datatypes.lower.<target>.cast.<dst-type>.<src-type>
becomes
tvm.datatypes.lower.<target>.Cast.<dst-type>.<src-type>

And fixes some sloppy code around how the other ops were being formatted.

* Update Python register_datatype to accept storage size

* Oops, left out one cast->Cast change

* Look up storage size when parsing `custom[typename]`

When we encounter this type string in Python, it will be parsed into a Halide
type object in C++. Some of my original code supported this parsing, but we now
have to attach the storage type to the type (by setting the bits field).

* Change how external calls for casting/other ops are done

Firstly, we now use the storage size of the custom type when determining
input/output types; e.g. a cast to a custom type with storage size 16 is seen as
a call to an external function returning an opaque uint of size 16.

Secondly, write a macro to handle the other ops. Originally I thought I could
handle these at runtime, with a single `_register_op` global. I transitioned
instead to using individual `_register_Add` etc. calls generated with a macro,
but I don't remember why.

* When encountering a custom type immediate, generate UIntImm

* Translate custom types to LLVM type

* Generate correct return type in Casts

Originally I was assuming that the result type from casts was always a custom
datatype, and so I was making the Call return a UInt type.

* Use TVM-idiomatic recursion style in DatatypesLowerer

This was actually a bug, I'm pretty sure; we wouldn't have recursed deep on any
complex programs. As a result of making this change, I also uncovered another
potential bug, where the datatypes lowering pass would attempt to lower a Load
of a custom type. By commenting out the `Mutate_` for Load, I was able to stop
the error from cropping up, but frankly, I'm not satisfied with the solution;
how is it that we are able to run codegen when Loads of custom datatypes are
present in the IR? I have not written any code, to my knowledge, that will
support this. Perhaps Load does not care about the underlying datatype?

* Use CHECK

* Add comment about which Mutate_s are needed

* Add comments

* Add GetCustomDatatypeRegistered as an extern C function

* Formatting, comments, casting

* Change how datatype string is formatted

* Use bits() instead of GetStorageSize

Use bits() instead of GetStorageSize

* Change comment

* Add datatype.py

* Change registered function name (datatypes->datatype)

* Remove GetStorageSize

* Format custom datatypes like any other datatype

Specifically, we now print the bits and lanes after the `custom[...]` string.

* Correctly implement datatype lowering in Python

* Remove unneeded include

* Make function naming consistent

* Use CHECK instead of internal_assert

* Rename macro

* Formatting

* Rename functions

* Implement Cast lowering

`_datatype_register_op` is now able to lower both binary ops and Casts.

* Formatting

* Formatting

* Clang format, google style

* Fix std::string/extern "C" warnings

* Formatting

* Formatting

* Lower Allocates and Loads during datatype lowering

This should ensure that there are no custom datatypes remaining once datatype
lowering is done. This will allow us to remove the code in the LLVM codegen
which deals with custom datatypes.

* Revert additions to codegen_llvm.cc which are now unneeded

* Pass cpplint on lower_datatypes.cc

* Add clarifying comment

* Remove datatype lowering registration funcs from C++

* Add CHECKs

* Remove TODO

* Remove all references to storage size

* Move and rename function

* Rename function

* Remove done TODOs and other handled comments

* Remove irrelevant Load code and comments

* Comment out the IR node types I'm not sure about yet

* Add bfloat16 datatype unittest

* Fix MakeConstScalar

MakeConstScalar for a custom datatype will now call out to a function which can
be registered on a per-datatype basis. The function will take a double and
return the equivalent value in the custom datatype format.

Note that these code paths are not actually used or tested at the moment. I have
not yet written an example which uses const scalars of a custom datatype.

* Formatting

* Change pass name

* Allow users to register whatever lowering function they want

Tianqi pointed out that users should be able to register whatever lowering
function they want, and should not be constrained to registering lowering
functions which just call out to external libraries.

I still provide a function for making lowering functions which call out to
external libraries, for convenience.

* Add clarifying comment

* Remove unneeded comment

* Remove unneeded function

* Rename file

* Undo unnecessary change

* Undo unnecessary change

* Make naming consistent

Rename "datatypes" to "custom datatypes" in most contexts.

* Revert an artifact of old code

* Fix build warnings, add TODO

* Lint

* Remove unnecessary use of extern C by separating decl and impl

* Error checking

* Remove TODO

* Missed a name change

* Lint

* Python lint

* Correctly format datatype

* Move bfloat16 to 3rdparty

* "custom_datatypes" --> "datatype" in most places

I left the pass as "LowerCustomDatatypes" to indicate that we're not lowering
anything other than custom datatypes. Otherwise, everything else has been
changed.

* Upgrade datatype unittest

I used a float calculator to generate some real testcases for the unittest.

* Separate public includes and private implementation

Specifically, create cleaner decoupling between datatypes stuff in packed_func
and the datatype registry implementation.

* Formatting

* Limit custom datatype codes to >128

* Add TODOs

* Fix comment

* Formatting

* Clean up datatype unittest

* Remove un-exported functions in public headers; UIntImm->FloatImm

More places where I accidentally was using implementation-only functions in
public headers.

Additionally, store custom datatype immediates as FloatImms. A later change will
add new lowering logic to lower these FloatImms to UIntImms.

Plus formatting change.

* Lint

* Use FloatImm (not UIntImm) to hold immediates of custom datatypes

This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in apache#1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.

* Re-add a change I removed accidentally during rebase

* Cleanup

* Remove unnecessary TVM_DLLs

* Add custom datatype utilities source file to Go runtime pack

* Revert "Remove unnecessary TVM_DLLs"

This reverts commit 4b742b9.

* Mark bfloat code as TVM_DLL

* Moves custom datatype runtime utilities to c_runtime_api.cc

* Revert "Add custom datatype utilities source file to Go runtime pack"

This reverts commit aecbcde.

* Move datatype parsing to its own function

* Change comments

* Remove unneeded function

* Formatting

* Formatting

* Documentation

* Add kCustomBegin, use it for checking for custom types

* Documentation

* Formatting

* Move static definition to implementation

* Remove comment

* Decide toBeLowered before lowering arguments of Expr

In the past, e.g. when lowering custom datatypes for an Add, we would lower a
and b first, and then decide whether the resulting new Add needed to be lowered
based on the (new) types of a and b. Now, instead, we need to check the types of
a and b first (to see if they're custom types), and then lower them (so they'll
become non-custom types), and then lower the new Add.

* Revert "Move datatype parsing to its own function"

This reverts commit d554a58.

This broke parsing. Will figure this out later. There isn't a really clean way
to separate this out given how the rest of the function is written.

* Replace comment

* Documentation

* Remove comment and TVM_DLL

* Better error messages

* Remove artifact of rebase

* Separate datatypes parsing to its own function

* Add \returns

* Comment changes; add TODO

* Refactor tests
wweic pushed a commit to neo-ai/tvm that referenced this pull request Jun 27, 2019
* Register and use custom datatypes in TVM

This patch adds the ability to register and use a custom datatype from Python,
using the `register_datatype` call. The datatype can then be passed as the
`dtype` parameter using the syntax `dtype="custom[<type_name>]bitsxlanes"`.

* Removes extra file

* Register custom datatypes with TVM; specify Cast and Add lowering

This commit adds functionality for registering custom datatypes with TVM, and
furthermore adding custom lowering functions to lower those custom datatypes.
This commit only adds lowering for the Cast and Add ops; more ops will be added
soon.

Check out some custom datatype samples in my repository of samples:
https://github.com/gussmith23/tvm-custom-datatype-samples

* Register and lower casts from Python

* Formatting

* Fix include; was including too much

* Add comment

* Add DatatypeRegistered

* Add storage size field to custom datatypes

This field indicates the bitwidth of the opaque block of data into which
instances of the datatype will be stored, when TVM compiles. For example, if I
create a datatype with a storage size of 16, then
- Constants of that datatype will be created as unsigned 16-bit ints
- Calls to external functions taking that datatype will pass the data as
  unsigned 16-bit ints
- External functions returning that datatype will be assumed to return unsigned
  16-bit ints.

* Change how lowering funcs (Cast and other ops) are named in registry

tvm.datatypes.lower.<target>.cast.<dst-type>.<src-type>
becomes
tvm.datatypes.lower.<target>.Cast.<dst-type>.<src-type>

And fixes some sloppy code around how the other ops were being formatted.

* Update Python register_datatype to accept storage size

* Oops, left out one cast->Cast change

* Look up storage size when parsing `custom[typename]`

When we encounter this type string in Python, it will be parsed into a Halide
type object in C++. Some of my original code supported this parsing, but we now
have to attach the storage type to the type (by setting the bits field).

* Change how external calls for casting/other ops are done

Firstly, we now use the storage size of the custom type when determining
input/output types; e.g. a cast to a custom type with storage size 16 is seen as
a call to an external function returning an opaque uint of size 16.

Secondly, write a macro to handle the other ops. Originally I thought I could
handle these at runtime, with a single `_register_op` global. I transitioned
instead to using individual `_register_Add` etc. calls generated with a macro,
but I don't remember why.

* When encountering a custom type immediate, generate UIntImm

* Translate custom types to LLVM type

* Generate correct return type in Casts

Originally I was assuming that the result type from casts was always a custom
datatype, and so I was making the Call return a UInt type.

* Use TVM-idiomatic recursion style in DatatypesLowerer

This was actually a bug, I'm pretty sure; we wouldn't have recursed deep on any
complex programs. As a result of making this change, I also uncovered another
potential bug, where the datatypes lowering pass would attempt to lower a Load
of a custom type. By commenting out the `Mutate_` for Load, I was able to stop
the error from cropping up, but frankly, I'm not satisfied with the solution;
how is it that we are able to run codegen when Loads of custom datatypes are
present in the IR? I have not written any code, to my knowledge, that will
support this. Perhaps Load does not care about the underlying datatype?

* Use CHECK

* Add comment about which Mutate_s are needed

* Add comments

* Add GetCustomDatatypeRegistered as an extern C function

* Formatting, comments, casting

* Change how datatype string is formatted

* Use bits() instead of GetStorageSize

Use bits() instead of GetStorageSize

* Change comment

* Add datatype.py

* Change registered function name (datatypes->datatype)

* Remove GetStorageSize

* Format custom datatypes like any other datatype

Specifically, we now print the bits and lanes after the `custom[...]` string.

* Correctly implement datatype lowering in Python

* Remove unneeded include

* Make function naming consistent

* Use CHECK instead of internal_assert

* Rename macro

* Formatting

* Rename functions

* Implement Cast lowering

`_datatype_register_op` is now able to lower both binary ops and Casts.

* Formatting

* Formatting

* Clang format, google style

* Fix std::string/extern "C" warnings

* Formatting

* Formatting

* Lower Allocates and Loads during datatype lowering

This should ensure that there are no custom datatypes remaining once datatype
lowering is done. This will allow us to remove the code in the LLVM codegen
which deals with custom datatypes.

* Revert additions to codegen_llvm.cc which are now unneeded

* Pass cpplint on lower_datatypes.cc

* Add clarifying comment

* Remove datatype lowering registration funcs from C++

* Add CHECKs

* Remove TODO

* Remove all references to storage size

* Move and rename function

* Rename function

* Remove done TODOs and other handled comments

* Remove irrelevant Load code and comments

* Comment out the IR node types I'm not sure about yet

* Add bfloat16 datatype unittest

* Fix MakeConstScalar

MakeConstScalar for a custom datatype will now call out to a function which can
be registered on a per-datatype basis. The function will take a double and
return the equivalent value in the custom datatype format.

Note that these code paths are not actually used or tested at the moment. I have
not yet written an example which uses const scalars of a custom datatype.

* Formatting

* Change pass name

* Allow users to register whatever lowering function they want

Tianqi pointed out that users should be able to register whatever lowering
function they want, and should not be constrained to registering lowering
functions which just call out to external libraries.

I still provide a function for making lowering functions which call out to
external libraries, for convenience.

* Add clarifying comment

* Remove unneeded comment

* Remove unneeded function

* Rename file

* Undo unnecessary change

* Undo unnecessary change

* Make naming consistent

Rename "datatypes" to "custom datatypes" in most contexts.

* Revert an artifact of old code

* Fix build warnings, add TODO

* Lint

* Remove unnecessary use of extern C by separating decl and impl

* Error checking

* Remove TODO

* Missed a name change

* Lint

* Python lint

* Correctly format datatype

* Move bfloat16 to 3rdparty

* "custom_datatypes" --> "datatype" in most places

I left the pass as "LowerCustomDatatypes" to indicate that we're not lowering
anything other than custom datatypes. Otherwise, everything else has been
changed.

* Upgrade datatype unittest

I used a float calculator to generate some real testcases for the unittest.

* Separate public includes and private implementation

Specifically, create cleaner decoupling between datatypes stuff in packed_func
and the datatype registry implementation.

* Formatting

* Limit custom datatype codes to >128

* Add TODOs

* Fix comment

* Formatting

* Clean up datatype unittest

* Remove un-exported functions in public headers; UIntImm->FloatImm

More places where I accidentally was using implementation-only functions in
public headers.

Additionally, store custom datatype immediates as FloatImms. A later change will
add new lowering logic to lower these FloatImms to UIntImms.

Plus formatting change.

* Lint

* Use FloatImm (not UIntImm) to hold immediates of custom datatypes

This change switches from using UIntImm to FloatImm for storing immediates of
custom datatypes. The value of the number is stored in a double, which should be
enough precision for now, for most custom types we will explore in the immediate
future.

In line with this change, we change the datatype lowering so that FloatImms are
lowered to UInts of the appropriate size. Originally, this was going to be done
by allowing the user to register a double->uint_<storage size>_t conversion
which would be called at compile time to convert the value from the FloatImm to
a UInt and store it in a UIntImm. After discussions with Tianqi, we decided to
take the simpler route, and lower FloatImms just as we lower all other ops: by
replacing them with Call nodes. In this case, presumably the user will Call out
to a conversion function in their datatype library.

The justification for this decision is due to the functionality added in apache#1486.
This pull request adds the ability to load LLVM bytecode in at compile time.
This applies in our case as follows:
 1. The user writes their custom datatype programs and registers their lowering
    functions in the same way we've been doing it so far. All operations over
    custom datatypes are lowered to Calls to the datatype library.
 2. The user compiles their datatype library to LLVM bytecode.
 3. At TVM compile time, the user loads the LLVM bytecode. Depending on how the
    datatype library is written, Clang should be able to perform constant
    folding over the custom datatype immediates, even if their conversions are
    done with calls to the library.

Additionally adds test to test the FloatImm codepath.

* Re-add a change I removed accidentally during rebase

* Cleanup

* Remove unnecessary TVM_DLLs

* Add custom datatype utilities source file to Go runtime pack

* Revert "Remove unnecessary TVM_DLLs"

This reverts commit 4b742b9.

* Mark bfloat code as TVM_DLL

* Moves custom datatype runtime utilities to c_runtime_api.cc

* Revert "Add custom datatype utilities source file to Go runtime pack"

This reverts commit aecbcde.

* Move datatype parsing to its own function

* Change comments

* Remove unneeded function

* Formatting

* Formatting

* Documentation

* Add kCustomBegin, use it for checking for custom types

* Documentation

* Formatting

* Move static definition to implementation

* Remove comment

* Decide toBeLowered before lowering arguments of Expr

In the past, e.g. when lowering custom datatypes for an Add, we would lower a
and b first, and then decide whether the resulting new Add needed to be lowered
based on the (new) types of a and b. Now, instead, we need to check the types of
a and b first (to see if they're custom types), and then lower them (so they'll
become non-custom types), and then lower the new Add.

* Revert "Move datatype parsing to its own function"

This reverts commit d554a58.

This broke parsing. Will figure this out later. There isn't a really clean way
to separate this out given how the rest of the function is written.

* Replace comment

* Documentation

* Remove comment and TVM_DLL

* Better error messages

* Remove artifact of rebase

* Separate datatypes parsing to its own function

* Add \returns

* Comment changes; add TODO

* Refactor tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants