Build libdatadog from source#18
Conversation
| URI.open(url, 'rb') do |remote| # standard:disable Security/Open | ||
| Cache.tarball.open('wb') do |local| | ||
| IO.copy_stream(remote, local) | ||
| end | ||
| end |
Check failure
Code scanning / CodeQL
Use of `Kernel.open` or `IO.read` or similar sinks with a non-constant value Critical
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 2 months ago
In general, to fix this class of problem you should avoid Kernel.open/URI.open with dynamic arguments and use safer alternatives: for local files, File.* methods; for HTTP(S), either URI(...).open or a dedicated HTTP client. Specifically for this code, we should change the call from URI.open(url, 'rb') to URI(url).open('rb'). This uses URI to parse the URL and then calls open on the resulting URI object, which does not invoke Kernel.open and addresses the CodeQL warning.
Concretely, in tasks/helpers.rb, inside the fetch! method at line 217, replace:
URI.open(url, 'rb') do |remote| # standard:disable Security/Openwith:
URI(url).open('rb') do |remote|This preserves behavior (still reads the remote tarball in binary mode, still yields an IO-like remote object), removes the need for the standard:disable Security/Open suppression, and satisfies CodeQL’s recommendation. No new methods or imports are needed; URI is already available via require 'open-uri' / Ruby stdlib.
| @@ -214,7 +214,7 @@ | ||
|
|
||
| # Download and store locally (unconditionally, no checks) | ||
| def fetch! | ||
| URI.open(url, 'rb') do |remote| # standard:disable Security/Open | ||
| URI(url).open('rb') do |remote| | ||
| Cache.tarball.open('wb') do |local| | ||
| IO.copy_stream(remote, local) | ||
| end |
| # | ||
| # TODO: not sure this value is correct | ||
| def crate | ||
| 'libdd-profiling-ffi' |
There was a problem hiding this comment.
@hoolioh is this the part that you're going to improve or is there a better existing crate for me to use here?
| data-pipeline-ffi | ||
| cbindgen | ||
| ddtelemetry-ffi | ||
| crashtracker-ffi | ||
| crashtracker-collector | ||
| crashtracker-receiver | ||
| demangler | ||
| ddsketch-ffi | ||
| datadog-library-config-ffi | ||
| datadog-log-ffi | ||
| datadog-ffe-ffi |
There was a problem hiding this comment.
@hoolioh does the feature list sound sensible?
| build! *%W[ | ||
| -p #{crate} | ||
| --features #{features.join(',')} | ||
| --release |
There was a problem hiding this comment.
This --release flag should probably become a conditional to enable debug builds, at least for development.
|
|
||
| # local platform artifact build target | ||
| def target | ||
| out / Target.rust_target / 'release' |
There was a problem hiding this comment.
This release path should probably become a conditional to enable debug builds, at least for development.
| dst_pkgconfig = Vendor.target / 'lib' / 'pkgconfig' | ||
| dst_pkgconfig.mkpath | ||
|
|
||
| template_dir = Source.root / Cargo.crate | ||
|
|
||
| %w[datadog_profiling_with_rpath.pc datadog_profiling.pc].each do |pc_name| | ||
| template = template_dir / "#{pc_name}.in" | ||
| raise "pkgconfig template not found: #{template}" unless template.file? | ||
|
|
||
| content = template.read.gsub('@Datadog_VERSION@', Libdatadog::LIB_VERSION) | ||
| (dst_pkgconfig / pc_name).write(content) | ||
| puts " Generated #{pc_name}" |
| system(*cmake_configure) or raise "failed (exit #{$?.exitstatus}): #{cmake_configure.inspect}" | ||
|
|
||
| cmake_build = %W[cmake --build #{build_dir}] | ||
| system(*cmake_build) or raise "failed (exit #{$?.exitstatus}): #{cmake_build.inspect}" |
There was a problem hiding this comment.
I'm not too happy about this doing the build during the install phase (it should be its own build target) but I've kept it this way for simplicity.
| # for compiling libdatadog | ||
| pkgs.rustc | ||
| pkgs.cargo | ||
| pkgs.cmake | ||
| pkgs.autoconf | ||
| pkgs.automake | ||
| pkgs.libtool |
There was a problem hiding this comment.
There should probably be some sanity checks in the Rake tasks that check for these dependencies and alert developers if they are missing.
| task(:release) { Rake::Task["push_to_rubygems"].invoke } | ||
|
|
||
| # Load additional tasks | ||
| Dir.glob("tasks/**/*.rake").each { |r| import r } |
There was a problem hiding this comment.
Some of the existing inline Rake tasks above should either disappear (fetch?) or be moved to the tasks subdirectory.
| # After the existing dedup_headers call, clean up intra-file duplicate | ||
| # typedefs in common.h. cbindgen can emit the same typedef from | ||
| # multiple crate boundaries; the dedup_headers tool only strips | ||
| # child-vs-base duplicates, not intra-file ones. | ||
| content = base_header.read | ||
|
|
||
| # Remove multiline typedef redefinitions: a "typedef struct X X;" | ||
| # forward decl where the full "} X;" body already exists. | ||
| multiline_dupes = 0 | ||
| content.gsub!(/^typedef struct (\w+) \1;\n/) do |match| | ||
| if content.include?("} #{$1};") | ||
| multiline_dupes += 1 | ||
| "" | ||
| else | ||
| match | ||
| end | ||
| end | ||
|
|
||
| # Remove identical single-line typedef duplicates (e.g. pointer | ||
| # typedefs like "typedef struct X *Y;" emitted by two crates). | ||
| seen = Set.new | ||
| lines = content.lines | ||
| before = lines.size | ||
| lines.reject! do |line| | ||
| line.start_with?('typedef ') && line.rstrip.end_with?(';') && !seen.add?(line) | ||
| end | ||
| singleline_dupes = before - lines.size | ||
|
|
||
| if multiline_dupes > 0 || singleline_dupes > 0 | ||
| puts " Removed #{multiline_dupes} multiline and #{singleline_dupes} single-line duplicate typedef(s) from common.h" | ||
| base_header.write(lines.join) | ||
| end |
There was a problem hiding this comment.
Either I'm doing something wrong with the dedup tool or this should be fixed upstream.
Note: the issue only materialises as warnings with clang on Linux (which triggers using C99 in our dd-trace-rb extension build process); gcc is silent about it but the duplication remains.
| platform = ruby_platform | ||
| RUST.fetch(platform) do | ||
| raise "No Rust target mapping for Ruby platform #{platform.inspect}. " \ | ||
| "Known platforms: #{RUST_TARGETS.keys.inspect}" |
There was a problem hiding this comment.
Is RUST_TARGETS defined somewhere? Should it be RUST.keys.inspect?
There was a problem hiding this comment.
Good catch, I fat fingered this one.
|
|
||
| # Remove identical single-line typedef duplicates (e.g. pointer | ||
| # typedefs like "typedef struct X *Y;" emitted by two crates). | ||
| seen = Set.new |
There was a problem hiding this comment.
Set doesn't need to be explicitly required?
There was a problem hiding this comment.
| end | ||
|
|
||
| # perform build | ||
| def build |
There was a problem hiding this comment.
I'll defer to @hoolioh and @ivoanjo as they're the SMEs here, but I think a cargo build ignores LTO when lib crates are included. It's mentioned in libdd-profiling-ffi. In the builder crate in libdatadog rustc is used.
Speaking of the builder crate, @hoolioh - We discussed using the builder crate in these repos to encapsulate some of the logic. Do we still need to do work to make that possible?
There was a problem hiding this comment.
Great! That's exactly the kind of feedback I was looking for, and the kind of things this Rakefile should help abstract away (Ideally in the thinnest way, with the libdatadog part providing most of the building process save for the one that is Ruby-specific in some way)
There was a problem hiding this comment.
Speaking of the builder crate, @hoolioh - We discussed using the builder crate in these repos to encapsulate some of the logic. Do we still need to do work to make that possible?
+1 if possible I'd like to avoid all of the Ruby code we're adding here and have the build be unified via the builder crate.
My mental model for this is "the current setup with builder crate works reasonably well for what we need BUT we'd like to be able to slightly tweak the list of crates + the versions built". Can that perhaps be an alternative direction here?
There was a problem hiding this comment.
My mental model for this is "the current setup with builder crate works reasonably well for what we need BUT we'd like to be able to slightly tweak the list of crates + the versions built". Can that perhaps be an alternative direction here?
AIUI I don't see that as antithetical to this PR. The amount of "build code" currently in this Rakefile is unfortunate but that's a reflection of the current situation. I expect most of this "build code" to disappear as it gets progressively replaced by some improved libdatadog-side process (@hoolioh's task).
have the build be unified via the builder crate.
The remainder remains still, I mean the builder crate has no business knowing about the structure of this gem or Ruby specifics, so there will always be some glue that can't be unified; I'd rather that glue be well-structured rake tasks that naturally integrate with the rake build gem building and release process than the inevitable organically grown shell script.
There was a problem hiding this comment.
I'd like to slightly push back on intertwining building the .gem (an exercise in taking some existing files and packaging them in what's effectively a .tar) with actually building the Rust libdatadog artifacts. These are two quite different tasks and I would advise keeping a clear and strict separation between them.
There was a problem hiding this comment.
b) some more general principle regarding this repository building an packaging a gem with a libdatadog for Ruby vs the upstream libdatadog repository providing Rust libdatadog artifact building facilities?
I mean this -- I don't think there should be Ruby-specific code to build rust libdatadog artifacts.
The whole point of libdatadog IMO is pooling resources so we don't have N implementations of code, so I'm wary of replacing N implementations of tracing with N implementations of build systems for libdatadog artifacts.
Yes, we probably need configuration (to make a specific libdatadog artifact), but I think in this PR we're doing a lot more than configuration.
There was a problem hiding this comment.
so I'm wary of replacing N implementations of tracing with N implementations of build systems for libdatadog artifacts.
I agree, in principal.
I'm not sure the builder crate is fit for purpose, now that I think about it more. One of the things @hoolioh is going to work on is using the published crates instead of the libdatadog workspace. So what we probably want is a Rust project in this repo, like dd-trace-py and dd-trace-js have (they aren't using the crates yet, but the idea is the same) to build the libdatadog artifacts. Common Components should definitely provide necessary common tooling on top of that.
I don't object to the current PR as a temporary solution if it unblocks the exporter integration. It'll be replaced in a matter of a couple of weeks.
There was a problem hiding this comment.
IMHO I really don't think we are in disagreement, quite the contrary. We just seem to be hung up on what constitutes an intermediary step towards our end game, which is:
libdatadogprovides some equivalent of./configuremakemake installrake fetchfetches the libdatadog sourcerake compilecompiles libdatadog source and "installs" it intovendorusing whatever./configure --with-foo --without-bar+make libdatadog crashtracker-receiver+make install DESTDIR=vendor/libdatadog-v1.2.3libdatadog provides
There was a problem hiding this comment.
My idea was to use the builder crate for generating the artifacts on downstream projects. It does pretty much what this PR does and the main benefit is that it will be maintained by libdatadog so it will be updated constantly to the actual state of the repo minimizing the chances of drifting apart in terms of platform compatibility or testing.
My plan was to divide the work in two milestones:
- Modify the builder crate to build the project from a tag/reference. This should be relatively easy.
- Try to build the project using crate independent versions. This approach will be more challenging and, at this point, I don't know if I could repurpose the crate or maybe I would need to split it to reuse part of the functionality.
There was a problem hiding this comment.
it will be maintained by libdatadog so it will be updated constantly to the actual state of the repo
Modify the builder crate to build the project from a tag/reference. This should be relatively easy.
I am not sure I follow: does this build system live in the libdatadog repository (implied by "updated constantly to the actual state of the repo") or the libdatadog-rb one (implied by "build the project from a tag/reference")?
High-level usage: - `bundle exec rake compile`: perform compilation from source tarball at the gem's currently defined version - `bundle exec rake compile LIBDATADOG_SRC=/path/to/libdatadog`: perform compilation from source at described path (typically a libdatadog git worktree)
|
Closing: superseded by #33 |
Why?
Provide:
What does this PR do?
Implement a high-level easy-to-use
rake compiletask. The task can both fetch and build from a source tarball based on the version currently set or it can be pointed to an existing local source tree for quick iterative development.How to test the change?
High-level usage:
bundle exec rake compile: perform compilation from source tarball at the gem's currently defined versionbundle exec rake compile LIBDATADOG_SRC=../libdatadog: perform compilation from source at described path (typically a libdatadog git worktree: e.g.git clone github.com:DataDog/libdatadog.git ../libdatadog)These are sliced into smaller rake tasks which can be listed (along with a description) via the standard self-documenting
rake -T(orrake -Dbut these are not multiline so it's the same output; could be improved as well)Additional Notes:
cargo installor something? So I didn't put too much effort in there.common.hitself; that only got surfaced when I built against using clang, this should probably be fixed in usptream libdatadog's dedup tool but I have some post-processing fix in there to cater for that.Next step in another PR would be to modify CI so that each platform gets built from source instead of fetching the binary release and then that gets joined and packaged as a gem.
JIRA: