Skip to content

Commit

Permalink
Reland "[HIP] Support compressing device binary"
Browse files Browse the repository at this point in the history
Original PR: llvm#67162

The commit was reverted due to UB detected by santizer:

https://lab.llvm.org/buildbot/#/builders/238/builds/5955

clang/lib/Driver/OffloadBundler.cpp:1012:25: runtime error:
 load of misaligned address 0xaaaae2d90e7c for type
 'const uint64_t' (aka 'const unsigned long'), which
 requires 8 byte alignment

It was fixed by using memcpy instead of dereferencing int*
casted from unaligned char*.
  • Loading branch information
yxsamliu authored and GZGavinZhao committed Jun 21, 2024
1 parent 669db88 commit ba2a204
Show file tree
Hide file tree
Showing 16 changed files with 663 additions and 60 deletions.
31 changes: 31 additions & 0 deletions clang/docs/ClangOffloadBundler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -494,7 +494,38 @@ Additional Options while Archive Unbundling
clang-offload-bundler determines whether a device binary is compatible with a
target by comparing bundle ID's. Two bundle ID's are considered compatible if:

* Their offload kind are the same
* Their target triple are the same
* Their GPUArch are the same

**-debug-only=CodeObjectCompatibility**
Verbose printing of matched/unmatched comparisons between bundle entry id of
a device binary from HDA and bundle entry ID of a given target processor
(see :ref:`compatibility-bundle-entry-id`).

Compression and Decompression
=============================

``clang-offload-bundler`` provides features to compress and decompress the full
bundle, leveraging inherent redundancies within the bundle entries. Use the
`-compress` command-line option to enable this compression capability.

The compressed offload bundle begins with a header followed by the compressed binary data:

- **Magic Number (4 bytes)**:
This is a unique identifier to distinguish compressed offload bundles. The value is the string 'CCOB' (Compressed Clang Offload Bundle).

- **Version Number (16-bit unsigned int)**:
This denotes the version of the compressed offload bundle format. The current version is `1`.

- **Compression Method (16-bit unsigned int)**:
This field indicates the compression method used. The value corresponds to either `zlib` or `zstd`, represented as a 16-bit unsigned integer cast from the LLVM compression enumeration.

- **Uncompressed Binary Size (32-bit unsigned int)**:
This is the size (in bytes) of the binary data before it was compressed.

- **Hash (64-bit unsigned int)**:
This is a 64-bit truncated MD5 hash of the uncompressed binary data. It serves for verification and caching purposes.

- **Compressed Data**:
The actual compressed binary data follows the header. Its size can be inferred from the total size of the file minus the header size.
37 changes: 37 additions & 0 deletions clang/include/clang/Driver/OffloadBundler.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,23 @@

#include "llvm/Support/Error.h"
#include "llvm/TargetParser/Triple.h"
#include <llvm/Support/MemoryBuffer.h>
#include <string>
#include <vector>

namespace clang {

class OffloadBundlerConfig {
public:
OffloadBundlerConfig();

bool AllowNoHost = false;
bool AllowMissingBundles = false;
bool CheckInputArchive = false;
bool PrintExternalCommands = false;
bool HipOpenmpCompatible = false;
bool Compress = false;
bool Verbose = false;

unsigned BundleAlignment = 1;
unsigned HostInputIndex = ~0u;
Expand Down Expand Up @@ -82,6 +87,38 @@ struct OffloadTargetInfo {
std::string str() const;
};

// CompressedOffloadBundle represents the format for the compressed offload
// bundles.
//
// The format is as follows:
// - Magic Number (4 bytes) - A constant "CCOB".
// - Version (2 bytes)
// - Compression Method (2 bytes) - Uses the values from
// llvm::compression::Format.
// - Uncompressed Size (4 bytes).
// - Truncated MD5 Hash (8 bytes).
// - Compressed Data (variable length).

class CompressedOffloadBundle {
private:
static inline const size_t MagicSize = 4;
static inline const size_t VersionFieldSize = sizeof(uint16_t);
static inline const size_t MethodFieldSize = sizeof(uint16_t);
static inline const size_t SizeFieldSize = sizeof(uint32_t);
static inline const size_t HashFieldSize = 8;
static inline const size_t HeaderSize = MagicSize + VersionFieldSize +
MethodFieldSize + SizeFieldSize +
HashFieldSize;
static inline const llvm::StringRef MagicNumber = "CCOB";
static inline const uint16_t Version = 1;

public:
static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
compress(const llvm::MemoryBuffer &Input, bool Verbose = false);
static llvm::Expected<std::unique_ptr<llvm::MemoryBuffer>>
decompress(const llvm::MemoryBuffer &Input, bool Verbose = false);
};

} // namespace clang

#endif // LLVM_CLANG_DRIVER_OFFLOADBUNDLER_H
5 changes: 5 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -992,6 +992,11 @@ defm convergent_functions : BoolFOption<"convergent-functions",
def gpu_use_aux_triple_only : Flag<["--"], "gpu-use-aux-triple-only">,
InternalDriverOpt, HelpText<"Prepare '-aux-triple' only without populating "
"'-aux-target-cpu' and '-aux-target-feature'.">;

def offload_compress : Flag<["--"], "offload-compress">,
HelpText<"Compress offload device binaries (HIP only)">;
def no_offload_compress : Flag<["--"], "no-offload-compress">;

def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[NoXarchOption]>,
HelpText<"Include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[NoXarchOption]>,
Expand Down
Loading

0 comments on commit ba2a204

Please sign in to comment.