Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
133 commits
Select commit Hold shift + click to select a range
2b9bb7d
remove requests, add aiohttp dependency
abidsikder May 5, 2025
cc68e21
make store async
abidsikder May 5, 2025
ce75866
address aiohttp deprecation warning
abidsikder May 6, 2025
6d73469
move common testing utils into single file
abidsikder May 6, 2025
916df3c
pass fuzz tested ipfsstore
abidsikder May 6, 2025
7dcc852
move key_value_list into common testing utilities
abidsikder May 6, 2025
7e0a162
compact and simplify Node's dag-cbor representation
abidsikder May 6, 2025
ae0a308
remove encryption transformers
abidsikder May 6, 2025
d72dde7
even more efficient node implementation
abidsikder May 6, 2025
0ceede9
remove node test, rely on hamt integration instead
abidsikder May 6, 2025
655f1af
add abstract method decorator, remove unnecessary line
abidsikder May 6, 2025
149bae2
add type definition to store codec input
abidsikder May 6, 2025
7dd2b9c
ignore .DS_Store
abidsikder May 6, 2025
1a0324e
async alternatives to all methods, VirtualStore with cache and in mem…
abidsikder May 6, 2025
9646367
finish reserialization and linking changes
abidsikder May 7, 2025
bfb2a2d
convert all read node uses to use virtual store
abidsikder May 7, 2025
a42bdaf
remove transformers
abidsikder May 7, 2025
f3cfd29
detach from MutableMapping interface to allow for full async creation
abidsikder May 7, 2025
0fea571
first round of passing tests with the new interface
abidsikder May 7, 2025
e1a69bd
node class convenience functions
abidsikder May 8, 2025
1aefd24
add encode/decode, bug fixes
abidsikder May 8, 2025
4659356
more bug fixes, large portion of test coverage
abidsikder May 12, 2025
2e3fe03
Fix bug with flushing in memory tree when there are internally unlink…
abidsikder May 12, 2025
cd2334a
misc design and documentation changes, no regressions demonstrated wi…
abidsikder May 13, 2025
6607848
100% test coverage for stores
abidsikder May 13, 2025
cda896b
remove need to constantly initialize with the root node for in memory…
abidsikder May 13, 2025
0ac1d3a
fix bug with not setting root node after reserializing in set_pointer
abidsikder May 13, 2025
a030c3a
simplify cache eviction calling and in memory flushing algorithm
abidsikder May 14, 2025
adeff28
remove unused method
abidsikder May 14, 2025
d956a5c
first 100% code coverage on core files
abidsikder May 14, 2025
9328add
remove remnants of lru cache insertion order manipulation
abidsikder May 15, 2025
f7f3f72
remove extraneous performance testing file
abidsikder May 16, 2025
7d79845
remove nginx since no ipfsstore internal auth, change tags to hashes
abidsikder May 19, 2025
6085db3
change all tags to hashes in action
abidsikder May 19, 2025
a11cdbc
add xarray complete, refresh ruff version
abidsikder May 19, 2025
1fe9f53
public library exports
abidsikder May 19, 2025
f367567
remove old ipfszarrr3 file
abidsikder May 19, 2025
eff2af5
Finish ZarrHAMTStore documentation
abidsikder May 19, 2025
91506f2
finish store documentation
abidsikder May 19, 2025
d453781
sample code in documentation
abidsikder May 19, 2025
ad30213
update readme
abidsikder May 19, 2025
c78f260
change from release candidate to major
abidsikder May 19, 2025
1938f7f
finish documentation
abidsikder May 19, 2025
c3203d4
fix hamt tests with the store -> cas renaming
abidsikder May 19, 2025
98ee808
complete hamt and stores code coverage
abidsikder May 19, 2025
2a10acd
finish tests with complete code coverage
abidsikder May 19, 2025
f6953ef
ignore type error
abidsikder May 19, 2025
b0bb6ae
update ipfs setup action used
abidsikder May 19, 2025
aaf722a
ruff format
abidsikder May 19, 2025
81aad06
restore 100% code coverage
abidsikder May 19, 2025
9c95b62
ruff format
abidsikder May 19, 2025
1684803
add note about async/thread safety
abidsikder May 19, 2025
b3fbbc5
remove old unnecessary commented out tests
abidsikder May 20, 2025
c428c53
add large kv set performance test
abidsikder May 20, 2025
26f9bfc
github ci hypothesis test limit increase
abidsikder May 20, 2025
c99a55c
fix duplicate node store vacate if write-enabled
abidsikder May 21, 2025
9865565
allow None to specify the default as well for base urls
abidsikder May 21, 2025
3a5c198
send blocking requests actions to separate thread
abidsikder May 21, 2025
2552c00
hypothesis test deadline increase
abidsikder May 21, 2025
ec7d712
fix typo
abidsikder May 21, 2025
0c42c78
docs
abidsikder May 21, 2025
2905bed
add get_pointer and implement metadata read cache for ZHS
abidsikder May 21, 2025
3cf5c15
increase test deadline
abidsikder May 21, 2025
a550d39
add even faster zhs metadata cache
abidsikder May 21, 2025
f2cbe62
remove extraneous comment
abidsikder May 21, 2025
ef18027
docs note about exceeded test deadlines
abidsikder May 21, 2025
e2cf99c
fix reference to old class in comment
abidsikder May 21, 2025
a958246
refactor: replace requests with aiohttp
Faolain May 22, 2025
423bd06
Merge pull request #46 from dClimate/refactor-swap-requests-with-aiohttp
Faolain May 25, 2025
2fa53b1
refactor: switch from weakref to session mgmt
Faolain May 26, 2025
bc4c42f
refactor: ensure tests pass with new architecture
Faolain May 26, 2025
20cdf92
refactor: event loop on linux issues
Faolain May 26, 2025
77ab594
fix: identation
Faolain May 26, 2025
21b8f34
fix: session errors
Faolain May 26, 2025
27d56e7
fix: indent
Faolain May 26, 2025
5703750
fix: tests
Faolain May 26, 2025
652693b
lint(ruff): remove unused imports
Faolain May 26, 2025
1b9cce1
lint(ruff): format with ruff
Faolain May 26, 2025
3e84819
Merge pull request #47 from dClimate/refactor-cleanup
Faolain May 26, 2025
c55fd4a
ci: replace manual checks with pre-commit gha and local
Faolain May 28, 2025
cad9f9e
Merge pull request #48 from dClimate/refactor-gha
Faolain May 28, 2025
37c8924
ci: remove extraneous installs for gha
Faolain May 28, 2025
0ca6651
deps: upgrade zarr, uv warning that zarr.load deleted data
Faolain May 28, 2025
b7be57a
refactor: add mypy to hamt
Faolain May 28, 2025
c15ab7d
chore: add more types
Faolain May 28, 2025
9560c26
fix: simple encrypted zarr
TheGreatAlgo May 28, 2025
bf89e29
fix: full coverage
TheGreatAlgo May 28, 2025
6739d1c
test: add typing to tests
Faolain May 29, 2025
f44ed7f
ci: integrate mypy into typing with some last additional types for tests
Faolain May 29, 2025
04a4a95
test: missing test after runtime bytes check inmemorycas
Faolain May 29, 2025
381305a
fix: ruff format
Faolain May 29, 2025
7c4f7e6
fix: metadata read cache
TheGreatAlgo May 30, 2025
0167a9e
Merge pull request #49 from dClimate/refactor-mypy
Faolain May 30, 2025
cb707fd
Merge branch 'refactor' into encryption-hamt
TheGreatAlgo May 30, 2025
5ff205a
fix: typing
TheGreatAlgo May 30, 2025
f9dd3b4
fix: update comments
TheGreatAlgo May 30, 2025
e8b4f42
test: added deterministic anchor for full test coverage
Faolain Jun 1, 2025
b4a0290
test: add deterministic anchor for cache vacate
Faolain Jun 1, 2025
78752c8
test: another deterministic anchor attempt
Faolain Jun 1, 2025
d7051a7
docs: update docstring for variable change typo
Faolain Jun 1, 2025
d4ed033
Merge pull request #50 from dClimate/encryption-hamt
Faolain Jun 1, 2025
e083212
fix: deduplicate key list in list_dir
Faolain Jun 1, 2025
2b67361
docs: added docstring for list_dirs function
Faolain Jun 1, 2025
29842a5
docs: update py_hamt/encryption_hamt_store.py
Faolain Jun 1, 2025
85e1f2e
docs: clarify docstrings
Faolain Jun 1, 2025
2a85afc
chore: add sort to ruff to standardize import order
Faolain Jun 1, 2025
345b702
refactor: add concurrency as configurable option for semaphore
Faolain Jun 2, 2025
c2e3bda
deps: remove requests -> replaced with aiohttp
Faolain Jun 2, 2025
01f4c9d
fix: add missing authorization for self instantiated aihotthp session
Faolain Jun 2, 2025
6d3178c
docs: improve docstring clarity for using session w/ kubocas
Faolain Jun 2, 2025
abcedaf
Merge pull request #54 from dClimate/refactor-add-concurrency-option
Faolain Jun 2, 2025
d50f8c0
chore: fix merge conflict
Faolain Jun 2, 2025
c11324f
test: add missing auth test
Faolain Jun 2, 2025
fdfcca6
deps: add pre-commit to dev dependencies so ai can run pre-commit
Faolain Jun 2, 2025
46399b5
refactor: update run-checks to run pre-commit instead
Faolain Jun 2, 2025
650a2a9
Merge pull request #52 from dClimate/fix-list-dir-dedupe
Faolain Jun 2, 2025
b843127
Merge pull request #55 from dClimate/fix-missing-default-auth-session
Faolain Jun 2, 2025
43fa124
Merge pull request #60 from dClimate/build-add-pre-commit
Faolain Jun 2, 2025
52464ea
test: use local daemon then fallback to docker if present otherwise s…
Faolain Jun 2, 2025
8669f9d
Merge branch 'refactor' into chore-add-import-sorting
Faolain Jun 2, 2025
e1d9334
lint(ruff): fix import order
Faolain Jun 2, 2025
40a6061
Merge pull request #53 from dClimate/chore-add-import-sorting
Faolain Jun 2, 2025
17c1ba0
fix: ensure that all tests can run depending on environment present d…
Faolain Jun 3, 2025
a3616ba
fix: ensure that all tests can run depending on environment present d…
Faolain Jun 3, 2025
5c4b043
Merge pull request #61 from dClimate/tests/add-kubo-if-not
Faolain Jun 3, 2025
14385ee
fix: remove duplicated
TheGreatAlgo Jun 3, 2025
7b95f1c
fix: readd parameter guard
Faolain Jun 4, 2025
87e3451
refactor: switch from old type alias to newer type syntax
Faolain Jun 4, 2025
1632dca
refactor: ensure run-checks will check all files not just those that …
Faolain Jun 4, 2025
7498c28
chore: remove commented out test code
Faolain Jun 4, 2025
a300c24
docs: add AGENTS.md file
Faolain Jun 4, 2025
c4ba8ea
docs: add AGENTS.md file
Faolain Jun 4, 2025
48800e1
docs: add more setup information to agents.md
Faolain Jun 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .github/workflows/pages-main.yaml
Original file line number Diff line number Diff line change
@@ -1,35 +1,49 @@
name: Deploy static site generated by pdoc to GitHub Pages

on:
push:
branches: ["main"]

permissions:
contents: read
pages: write
id-token: write

jobs:
build:
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}

permissions:
contents: read
pages: write
id-token: write

runs-on: ubuntu-latest

steps:
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4

- name: Setup Pages
uses: actions/configure-pages@983d7736d9b0ae728b81ab479565c72886d7745b # v5

- name: Install uv
uses: astral-sh/setup-uv@v3
uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6
with:
version: "latest"

- name: Create project environment
run: uv sync

- name: Build with pdoc
run: uv run pdoc py_hamt -o ./_site

- name: Upload artifact
# Automatically uploads an artifact from the './_site' directory by default
uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa # v3

- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e # v4
108 changes: 27 additions & 81 deletions .github/workflows/run-checks.yaml
Original file line number Diff line number Diff line change
@@ -1,108 +1,54 @@
name: Run checks
run-name: Triggered on push from ${{ github.actor }} to branch/tag ${{ github.ref_name }}
on: push
# Should be the same as py-hamt/run-checks.sh

jobs:
run_checks:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: pre-commit/action@v3.0.1
test:
name: Create project environment, run all checks
needs:
- validate
runs-on: ubuntu-latest

strategy:
fail-fast: true
matrix:
python-version: ["3.12"]
steps:
- name: Checkout repo
- name: Check out repository
uses: actions/checkout@v4
- name: Set up python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v3
uses: astral-sh/setup-uv@6b9c6063abd6010835644d4c2e1bef4cf5cd0fca # v6
with:
version: "latest"

- name: Create project environment
run: uv sync

- name: Install IPFS
uses: ibnesayeed/setup-ipfs@master
uses: oduwsdl/setup-ipfs@e92fedca9f61ab9184cb74940254859f4d7af4d9 # v0.6.3
with:
ipfs_version: "0.34.1"
ipfs_version: "0.35.0"
run_daemon: true
id: ipfs_setup

- name: Install and configure Nginx
run: |
# Install Nginx
sudo apt-get update
sudo apt-get install -y nginx

# Create Nginx config for reverse proxy with auth
cat <<EOF | sudo tee /etc/nginx/sites-available/ipfs
server {
listen 5002;
server_name localhost;

# Default deny unless authenticated
set \$auth_valid 0;

location /api/v0/ {
# Enforce X-API-Key for API key auth
if (\$http_x_api_key = "test") {
set \$auth_valid 1;
}

# Check Bearer token
if (\$http_authorization = "Bearer test") {
set \$auth_valid 1;
}

# Check Basic Auth (test:test = dGVzdDp0ZXN0)
if (\$http_authorization = "Basic dGVzdDp0ZXN0") {
set \$auth_valid 1;
}

# Deny if no valid auth method
if (\$auth_valid = 0) {
return 401 "Unauthorized: Invalid or missing authentication";
}

# Proxy to IPFS RPC API
proxy_pass http://127.0.0.1:5001;
proxy_set_header Host \$host;
proxy_set_header X-Real-IP \$remote_addr;
proxy_set_header X-Forwarded-For \$proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto \$scheme;
}
}
EOF

# Enable the site and remove default
sudo ln -s /etc/nginx/sites-available/ipfs /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default

# Test Nginx config
sudo nginx -t

- name: Start Nginx and restart IPFS daemon
run: |
# Start Nginx
sudo systemctl start nginx

# Restart IPFS daemon to ensure it’s running
ipfs shutdown
ipfs daemon &

# Wait for IPFS and Nginx to be ready
sleep 5

- name: Run pytest with coverage
run: uv run pytest --cov=py_hamt tests/ --cov-report=xml
run: uv run pytest --ipfs --cov=py_hamt tests/ --cov-report=xml

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v5
uses: codecov/codecov-action@18283e04ce6e62d37312384ff67231eb8fd56d24 # v5
with:
token: ${{ secrets.CODECOV_TOKEN }}

- name: Check coverage
run: uv run coverage report --fail-under=100 --show-missing

- name: Check linting with ruff
run: uv run ruff check

- name: Check formatting with ruff
run: uv run ruff format --check
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.DS_Store
pyrightconfig.json

*.prof
Expand Down
32 changes: 32 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
repos:
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v4.0.0
hooks:
- id: conventional-pre-commit
stages: [commit-msg]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-ast
- id: check-case-conflict
- id: check-merge-conflict
- id: check-toml
- id: debug-statements
- id: end-of-file-fixer
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.11.11
hooks:
- id: ruff-check
- id: ruff-format
- repo: https://github.com/ariebovenberg/slotscheck
rev: v0.17.1
hooks:
- id: slotscheck
name: slotscheck
entry: bash -c 'env PYTHONPATH=src slotscheck'
- repo: https://github.com/pre-commit/mirrors-mypy
rev: "v1.15.0"
hooks:
- id: mypy
127 changes: 127 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Project Agents.md Guide for py-hamt

This AGENTS.md file provides comprehensive guidance for OpenAI Codex and other AI agents working with this codebase.

## Project Description

This library is a python implementation of a HAMT, inspired by [rvagg's IAMap project written in JavaScript](https://github.com/rvagg/iamap).

py-hamt provides efficient storage and retrieval of large sets of key-value mappings in a content-addressed storage system. The main target is IPFS, and the data model used is IPLD.

dClimate primarily created this for storing large [zarrs](https://zarr.dev/) on IPFS. To see this in action, see our [data ETLs](https://github.com/dClimate/etl-scripts).

## Project Architecture

A user instantiates a ContentAddressedStore such as KuboCAS which is then used to instantiate a HAMT. This HAMT can then be passed to a ZarrHAMTStore to read/write zarr files. An IPFS daemon running locally can be used or a remote one can be accessed via gateway(reads) & rpc endpoints(writes) on their IPFS defaults. For tests, a local docker container is spun up if docker is available.

## Project Structure for OpenAI Codex Navigation

- `/py_hamt`: Source code of the py-hamt library
- `encryption_hamt_store.py` - Example using total encryption
- `hamt.py`- where the HAMT data structure is constructed and accesed
- `store.py`- Where various stores live that writes the data. Primarily used KuboCAS
- `zarr_hamt_store.py` - ZarrHAMTStore class used to directly write or read zarrs leveraging the HAMT data structure onto a Store usually a Content Addressed store like KuboCAS located in store.py.
- `/tests`: Test files that should be maintained and extended where possible.

## Coding Conventions

### General Conventions for AGENTS.md Implementation

- Use the latest version of Python (3.12) for all new code generated by OpenAI Codex
- The AI Agent should follow the existing code style in each file
- Agents.md requires meaningful variable and function names
- The AI Agent should add comments for complex logic as guided by Agents.md

### Python Guidelines
- Use functional easy to understand functions wherever possible.
- All functions need to be fully typed with mypy.
- All functions require tests to be written and coverage must be full.
- Example usage can be found in the `/tests` folder

### Development Setup

`py-hamt` uses uv for package management. If uv is not already present which can be checked with `uv` then it can be installed via

`curl -LsSf https://astral.sh/uv/install.sh | sh`

otherwise if uv is already present you can run

`uv sync` to install all dependencies

and then

`source .venv/bin/activate` to activate the venv created by uv.

Lastly while still setting up you can run

```
pre-commit install
```

to ensure pre-commit installed.

Docker is also used for integration tests to test ipfs. Try to have docker installed to be able to run all tests with` pytest --ipfs`

## Testing Requirements for the Agent

All tests should be created within the `/tests` directory.

The agent should run tests with the following commands:

```bash
# Run all tests
pytest --ipfs

# Run specific test file
pytest test tests/<insert_file_here>

# Run tests with coverage
pytest --ipfs --cov

# If IPFS is not present use
pytest --cov
```

Tests should be kept as closed to 100% as possible.

## Pull Request Guidelines for the Agent

When the Agent helps create a PR, please ensure it:

1. Includes a clear description of the changes as guided by AGENTS.md
2. References any related issues that the Agent is addressing
3. Ensures all tests pass for code generated by the Agent
4. Keeps PRs focused on a single concern as specified in AGENTS.md

## Programmatic Checks for the Agent

Before submitting changes generated by the Agent, run:

```bash
bash run-checks.sh
```

**Note:** This assumes ipfs is running as `run-checks.sh` runs with the ``--ipfs` flag. If kubo ipfs is not running in a local daemon or docker was unable to instantiate simply run

```
pytest --cov
```

and

```
uv run pre-commit run --all-files --show-diff-on-failure
```

instead.


If there is an error with formatting for ruff and it can be autofixed you can normally run

```
uv run ruff check --fix
```

to fix it.

All checks must pass before the Agent generated code can be merged. AGENTS.md helps ensure the Agent follows these requirements.
Loading
Loading