Skip to content

Conversation

@UebelAndre
Copy link
Contributor

@UebelAndre UebelAndre commented Apr 23, 2025

This change introduces the perl_cpan_compiler rule and perl_cpan extension module.

The perl_cpan_compiler rule is backed by Carton and generates a lock file from a given cpanfile that can then be passed to perl_cpan to generate dependencies.

Known limitations:

  • xs modules are not supported.
  • Compilation is not hermetic and some host tools may be required (e.g. make) to compile requirements.

This change does not impact any existing rules but does add a new entry to MODULE.bazel so should be considered a minor change.

closes #83

@UebelAndre UebelAndre force-pushed the cpan branch 2 times, most recently from 293c30e to a12c410 Compare April 23, 2025 15:19
@UebelAndre UebelAndre marked this pull request as ready for review April 23, 2025 15:24
@UebelAndre UebelAndre requested a review from skeletonkey as a code owner April 23, 2025 15:24
@UebelAndre
Copy link
Contributor Author

UebelAndre commented Apr 23, 2025

@lalten I would love it if you could take a look at the changes here since you have some experience in this domain with rules_cpan!

@UebelAndre
Copy link
Contributor Author

For developing this I had to use a bootstrap script to originally generate the lock file. I'll post it here for future use in case the compiler tool breaks and lockfiles can no longer be generated.

"""bootstrap"""

import json
import sys
import urllib.error
import urllib.request
from pathlib import Path


def deserialize_cpanfile_snapshot(content):
    """Deserialize the contents of a `cpanfile.snapshot` file.

    Args:
        content (str): The text from a `cpanfile.snapshot`

    Returns:
        dict: A mapping of the snapshot data.
    """
    results = {}

    current = ""
    container_name = ""
    for line in content.splitlines():
        text = line.strip()

        if not text or text.startswith("#"):
            continue

        if container_name and line.startswith("      "):
            key, _, value = text.partition(" ")
            results[current][container_name][key] = value
            continue

        if line.startswith("    "):
            if text.startswith("pathname:"):
                _, _, pathname = text.partition(" ")
                results[current]["pathname"] = pathname
                continue
            if text.startswith("provides:"):
                container_name = "provides"
                continue

            if text.startswith("requirements:"):
                container_name = "requirements"
                continue

        if line.startswith("  "):
            current = text
            results[current] = {
                "provides": {},
                "requirements": {},
            }
            continue

    return results


METACPAN_API_ENDPOINT = "https://fastapi.metacpan.org/release"


def _get_release(author: str, distribution: str) -> dict[str, str]:
    url = f"{METACPAN_API_ENDPOINT}/{author}/{distribution}"
    try:
        resp = urllib.request.urlopen(url).read().decode()
    except urllib.error.HTTPError as ex:
        raise RuntimeError(f"Failed to fetch {url}: {ex}") from ex
    try:
        return json.loads(resp)["release"]
    except json.JSONDecodeError as ex:
        raise RuntimeError(f"Failed to parse JSON from {url}: {ex}") from ex
    except KeyError as ex:
        raise RuntimeError(
            f"Failed to find 'release' key in JSON from {url}: {ex}. Json:\n{resp}"
        ) from ex


def sanitize_name(module):
    name, _, _ = module.rpartition("-")
    return name


def main() -> None:
    snapshot_path = Path(sys.argv[1])
    snapshot = deserialize_cpanfile_snapshot(snapshot_path.read_text())

    lockfile = {}
    for module, data in snapshot.items():
        dependencies = set()
        for req in data["requirements"]:
            for mod, mod_data in snapshot.items():
                if req in mod_data["provides"]:
                    dependencies.add(sanitize_name(mod))
                    break
        author = data["pathname"].split("/")[-2]
        release = _get_release(author, module)

        if "Path-Tiny" in module or "String-ShellQuote" in module:
            from pprint import pprint

            pprint(release)

        lockfile[sanitize_name(release["name"])] = {
            "dependencies": sorted(dependencies),
            "sha256": release["checksum_sha256"],
            "strip_prefix": module,
            "url": release["download_url"],
        }

    lockfile = snapshot_path.parent / snapshot_path.name + ".lock.json"
    Path(lockfile).write_text(json.dumps(lockfile, indent=2, sort_keys=True) + "\n")


if __name__ == "__main__":
    main()

@lalten
Copy link
Contributor

lalten commented Apr 23, 2025

Could you summarize how the implementation is different from rules_cpan?
rules_cpan's "bootstrap" step is bazel run @rules_cpan//lock

I think it would be preferable to have just one way that works rather than two different implementations. I agree with #83 (comment) that we should move the repos closer together to increase discoverability and improve maintainability

@UebelAndre
Copy link
Contributor Author

Could you summarize how the implementation is different from rules_cpan? rules_cpan's "bootstrap" step is bazel run @rules_cpan//lock

I think it would be preferable to have just one way that works rather than two different implementations. I agree with #83 (comment) that we should move the repos closer together to increase discoverability and improve maintainability

The main difference is that users do not need to separately go run carton install to generate the original cpanfile.snapshot. Once a perl_cpan_compiler target is defined, folks would simply run this target to generate the snapshot file and the Bazel lockfile.

There's currently a shared limitation with both implementations in that the dependencies need to be installable on the host which requires some host tools but given the interface in this PR I think there's a path forward where the tool just hits the CPAN API and only operates on metadata.

Additionally, the repository rules in this PR generate a DAG of dependencies vs a flat target which can be useful when trying to debug issues in external libraries. Down the road this could also be useful for allowing mods/annotations to the packages to inject user defined alterations into the generated module (similar to @rules_rust//crate_universe:defs.bzl%crate.annotation)

@UebelAndre
Copy link
Contributor Author

@lalten would you still be willing to do a full review if you think the direction is good?

Copy link
Contributor

@lalten lalten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@UebelAndre UebelAndre requested a review from lalten April 28, 2025 14:32
@UebelAndre
Copy link
Contributor Author

@lalten back to you!

@lalten
Copy link
Contributor

lalten commented Apr 28, 2025

lgtm! I think this is better than what's currently at rules_cpan so once this lands it would be cool if you could migrate the current BCR users (which is only Lcov I believe)? Then I'd archive rules_cpan and point users at this implementation.

@UebelAndre
Copy link
Contributor Author

@skeletonkey are you also able to take a look?

@skeletonkey skeletonkey merged commit d4e5cdb into bazel-contrib:main Apr 29, 2025
1 check passed
@UebelAndre UebelAndre deleted the cpan branch April 29, 2025 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add tooling for fetching CPAN dependencies

3 participants