Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding ASMap files as binary dump header file #28792

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Nov 4, 2023

Looking for conceptual feedback!

This is one of two options for how we may embed ASMap files into Bitcoin Core releases. This approach here uses a binary dump of the file which is then committed to the source code as a header file. The alternative approach is not having the data in the source code but only adding it during the build phase of a release. I initially favored the second approach but it seems harder to achieve than I originally expected and I have also started to see the downsides of this approach as well. For example, we would not have the asmap data available in dev builds, meaning those who want to use asmap with dev builds would always need to provide a file as input. Also, opening a PR for new ASMap data before the first RC allows for earlier testing of the data and the PR itself is the natural place where reviewers can give feedback. If the asmap data is only included during the build phase potential problems can only be discovered while testing RC1 and a separate issue would need to be opened to coordinate a fix. I had also not realized that the binary dump header file approach here is very similar to the embedding of the seeds.

I am still planning to demo how the embedding would work during the build phase, but since I have started to like the approach shown here a lot more lately, I thought I would demo it and hear people's opinions on it.

Currently, the binary dump header file could simply be generated using xxd -i path/to/ip_asn.map init/ip_asn.h. But I would also be open to adding an equivalent to makeseeds.py to make the process more convenient if people would prefer that.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 4, 2023

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #29641 (scripted-diff: Use LogInfo/LogDebug over LogPrintf/LogPrint by maflcko)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@fjahr fjahr mentioned this pull request Nov 4, 2023
4 tasks
@fjahr fjahr force-pushed the 2023-10-asmap-in-source branch 3 times, most recently from b8b09a7 to f9288bc Compare November 4, 2023 23:48
@sipa
Copy link
Member

sipa commented Nov 5, 2023

Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).

@maflcko
Copy link
Member

maflcko commented Nov 5, 2023

How much data would this add per year to the repo, assuming updates happen?

@sipa
Copy link
Member

sipa commented Nov 5, 2023

@maflcko The asmap file added here is around 1.6 MiB, so assuming we update for every 6 months, probably a bit over 3 MiB per year (gzip compresses asmap files only ~5%).

@luke-jr
Copy link
Member

luke-jr commented Nov 8, 2023

Why is the option of loading it from a file even in dev builds, not considered?

@fjahr
Copy link
Contributor Author

fjahr commented Nov 9, 2023

Would it make sense to only check in the asmap.dat file into the repository, and then have the build system convert it to a .h at compile time (similar to the approach used in the src/test/data directory)? That would mean there is also a directly-manipulable file in the repository (that can be used with asmap-tool etc).

Yeah, that would work as well. I have drafted that approach here: fjahr@0f9109f Details can change of course (like keeping the raw file in init) but if people prefer that approach, I will pull this into here.

@fjahr
Copy link
Contributor Author

fjahr commented Nov 9, 2023

Why is the option of loading it from a file even in dev builds, not considered?

It is considered. Loading from a file is the only option for dev builds if we go the other route mentioned in the description: "The alternative approach is not having the data in the source code but only adding it during the build phase of a release." But this also comes with the aforementioned downsides.

@willcl-ark
Copy link
Member

A general question about this approach: do most contemporary OSes handle loading large amounts of (potentially) unused data into physical memory well-enough, that they would (likely) not load this extra 1.5-3MB of data in the case that the -asmap is not being used? Perhaps this will be handled by the demand paging system? If not, it seems to be a bit of a shame to increase binary size by 20% in all cases.

It seems to make most sense to me to take the path suggested by @sipa, including the .dat file in the repo, and having it converted to a .h at compile time. It might also be nice to have a configuration flag to disable the embedded header creation for faster builds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants