Skip to content

Conversation

@jurraca
Copy link
Collaborator

@jurraca jurraca commented Dec 10, 2025

  • Brings back collaborative run outputs from the git history, and organizes data by year.
  • Renames old unfilled files for consistency.
  • adds a note about data format / folder structure.

It seems 1742572800 was added filled, without the unfilled alongside (see #23) .

@sipa
Copy link

sipa commented Dec 10, 2025

Concept ACK. I have not verified that the revived files match the historical ones.

README.md Outdated

ASmap files are provided in binary form, suitable for use with Bitcoin Core's `-asmap` flag.

The files are organized by year. The `latest_asmap.dat` at the root of the project maps to the latest ASmap produced in these folders. An `_unfilled.dat` map means that no attempt has been made to infer missing networks. See the [asmap-tool](https://github.com/bitcoin/bitcoin/blob/master/contrib/asmap/README.md) docs. The default is to use `--fill` when encoding the file.
Copy link

@sipa sipa Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wouldn't use "infer" here. The filling done isn't reasoning or guessing about what the AS for the missing range might be. It's picking whatever ASN minimizes asmap.dat file size, with no regard for reality, because it just doesn't matter what missing ranges are assigned to.

@jurraca jurraca force-pushed the add-historical-data branch from 6675342 to 934282d Compare December 10, 2025 17:59
@jurraca jurraca requested a review from fjahr December 11, 2025 10:49
@fjahr
Copy link
Collaborator

fjahr commented Dec 12, 2025

Cool, I have verified that the hashes of the re-introduced files match and that the other files are indeed move-only. The renamings are consistent with our latest scheme and thus make sense.

There is one more thing to do here: Fixing the latest file matching CI job. I was very irritated at first why this job doesn't fail here but I think now the reason is that this PR isn't based on the latest master including the files from the latest run. When the PR is merged with master before the CI is run, the latest run files are still in the root and not in the 2025 folder. Then the check still works as usual.

So this PR would need a rebase anyway so the latest files are included and moved into the 2025 folder and then some changes to GH action job so it takes the subfolders into account.

I drafted a fixed version of the GH action as part of my debugging before I realized that the reason for the non-failure here isn't actually in the job itself, feel free to use it but it's only lightly tested.

name: Latest ASMap file

on:
  pull_request:
    paths:
      - '**/*.dat'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout code
      uses: actions/checkout@v3

    - name: Find latest ASMap file by timestamp (recursive)
      id: find_latest
      shell: bash
      run: |
        set -euo pipefail

        # Find all *_asmap.dat except latest_asmap.dat, anywhere in the repo
        LATEST_FILE="$(
          find . -type f -name '*_asmap.dat' ! -name 'latest_asmap.dat' -print0 \
            | xargs -0 -I{} basename "{}" \
            | sort -r \
            | head -n 1
        )"

        if [ -z "${LATEST_FILE}" ]; then
          echo "Error: No timestamped *_asmap.dat files found (excluding latest_asmap.dat)."
          exit 1
        fi

        # Resolve the path of that file 
        LATEST_PATH="$(find . -type f -name "${LATEST_FILE}" ! -name 'latest_asmap.dat' | head -n 1)"

        if [ -z "${LATEST_PATH}" ]; then
          echo "Error: Could not resolve path for latest file '${LATEST_FILE}'."
          exit 1
        fi

        echo "LATEST_FILE=${LATEST_PATH#./}" >> "$GITHUB_ENV"

    - name: Check latest_asmap.dat exists and matches the highest timestamped file
      shell: bash
      run: |
        set -euo pipefail

        if [ ! -f "latest_asmap.dat" ]; then
          echo "Error: latest_asmap.dat file is missing."
          exit 1
        fi

        if [ -z "${LATEST_FILE:-}" ]; then
          echo "Error: LATEST_FILE env var is empty (find_latest step failed)."
          exit 1
        fi

        if [ ! -f "$LATEST_FILE" ]; then
          echo "Error: Resolved latest timestamped file does not exist: $LATEST_FILE"
          exit 1
        fi

        if ! cmp -s "$LATEST_FILE" "latest_asmap.dat"; then
          echo "Error: latest_asmap.dat does not match the content of $LATEST_FILE."
          exit 1
        fi

        echo "Success"

EDIT: maybe this could be further improved by explicity exluding the unfilled files from the matching in the find step

each year folder contains the AS map produced by a collaborative run.
We started adding both filled and unfilled versions in mid-2025.
Previous runs are _unfilled_.
@jurraca jurraca force-pushed the add-historical-data branch from 934282d to 830469b Compare December 14, 2025 13:53
@jurraca
Copy link
Collaborator Author

jurraca commented Dec 14, 2025

Thanks, and sorry to have sent you down that path @fjahr . I've rebased and added the missing files from 1764864000.

The CI job you showed is good, and I didn't see a way to make it better, so I've used it as is.

maybe this could be further improved by explicity exluding the unfilled files from the matching in the find step

It's already excluding unfilled files from the find command with the '*_asmap.dat' pattern.

Copy link
Collaborator

@fjahr fjahr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 830469b

I will go ahead and merge this since the re-introduced files are historical, the newer files are only moved and the latest file is untouched. Thus, even if I made a mistake and there was somehow a change in the data somewhere in here, there is only a low chance of it ending up in node with actual usage somewhere since anyone pulling a new file to use in their node should (hopefully) go with the latest one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Somehow this file isn't added along with the other historical files in the first commit but rather inside the move-only commit. Hash is still ok, this is 93afd5c0a82f82fca9988881f397739427c9c34bc1adb337e0d00978dcd1863a from #19

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea i was confused about this, either i missed it in the first addition, or it got dropped in the rebase, so I just added it back after confirming it was the result of #19 .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dafaf52c1acd6e3f1e1e91d613c5fbbf7e9905e020ecfe101e381ce1a88dd84d from #16

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

d628364786e292cc6770e562b3f232cf073ce334dc07b6bea93104c1ebeb1df9 from #13

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2155eab71080099f469c524e49ff71d264859b023431b27aa52f58315a640ff1 from #10

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

45b87814c47b5ea056c2e47fcc08a75afbb535c8bfb80d734bb364cddb432e73 from #6

@fjahr fjahr merged commit 9cdd501 into asmap:main Dec 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants