# Git Repository Digest Notebook 🗃️➡️📝

This notebook clones **any Git repository you can reach on your network**—whether it’s a GitHub repo, a bare URL to a Git service on your homelab, or even a `file://` path—then runs **gitingest** to produce a compact, prompt‑friendly Markdown digest.

The digest is cached in a deterministic path so that the same URL always maps to the same file:

```
http://dockerserver:8929/cjtrowbridge/memes  →  cache/dockerserver.8929/cjtrowbridge/memes/ingest.md
https://github.com/cjtrowbridge/memes        →  cache/github.com/cjtrowbridge/memes/ingest.md
```

Feel free to tweak defaults (e.g. token cut‑offs, binary filters) by passing flags to `gitingest` in the final cell.


In [1]:
# --- Install dependencies ----------------------------------------------------
# Run once; comment out after first successful install if you prefer faster start‑ups.
%pip install --quiet gitingest gitpython


Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import os, subprocess, urllib.parse, pathlib, tempfile, shutil, sys

def _sanitize_netloc(netloc: str) -> str:
    """Replace ':' (port separator) with '.' so the path is filesystem‑safe."""
    return netloc.replace(":", ".")

def cache_path_from_url(url: str) -> pathlib.Path:
    """Map any URL or file path to a cache directory path."""
    if url.startswith("file://"):
        local_path = pathlib.Path(urllib.parse.urlparse(url).path).resolve()
        parts = local_path.parts[1:]  # drop leading '/'
        return pathlib.Path("cache").joinpath(*parts)
    parsed = urllib.parse.urlparse(url)
    netloc = _sanitize_netloc(parsed.netloc)
    rel_path = parsed.path.lstrip("/")
    return pathlib.Path("cache").joinpath(netloc, rel_path)


In [3]:
# --- Enter the repository URL -------------------------------------------------
git_url = input("🔗 Enter Git repository URL (can be http/https, ssh, or file://): ").strip()
if not git_url:
    sys.exit("No URL provided — aborting.")
print("Repo:", git_url)


Repo: http://docker-logic:8929/cjtrowbridge/memes.cjtrowbridge.com


In [4]:
out_dir = cache_path_from_url(git_url)
out_dir.mkdir(parents=True, exist_ok=True)
output_file = out_dir / "ingest.md"
print("Digest will be saved to:", output_file)


Digest will be saved to: cache\docker-logic.8929\cjtrowbridge\memes.cjtrowbridge.com\ingest.md


In [5]:
# --- Clone + ingest -----------------------------------------------------------
with tempfile.TemporaryDirectory() as tmpdir:
    print("⏳ Cloning…")
    subprocess.run(["git", "clone", "--depth", "1", git_url, tmpdir], check=True)
    print("✅ Clone done.")
    print("⏳ Running gitingest…")
    subprocess.run(["gitingest", tmpdir, "-o", str(output_file)], check=True)
    print("✅ Digest written to", output_file)


⏳ Cloning…
