Skip to content

add trustless IPFS gateway, use patched rclone, remove bitswap#646

Merged
parkan merged 12 commits into
mainfrom
feat/trustless-gateway
Mar 31, 2026
Merged

add trustless IPFS gateway, use patched rclone, remove bitswap#646
parkan merged 12 commits into
mainfrom
feat/trustless-gateway

Conversation

@parkan

@parkan parkan commented Mar 17, 2026

Copy link
Copy Markdown
Collaborator

remove the dead bitswap protocol and add an /ipfs/ endpoint implementing the trustless gateway spec via boxo/gateway

StorageBlockStore pools rclone handlers per storage and holds open a streaming file reader to serve sequential block reads efficiently -- DAG nodes return from the DB without storage I/O, and leaf blocks from the same file share a single held-open rclone stream

supports dag-scope=all/entity/block, path resolution, entity-bytes, format=car and format=raw. stale source files detected at read time are mapped to HTTP 409 via boxo's error status mechanism

@parkan parkan force-pushed the feat/trustless-gateway branch from fe93173 to adfce2b Compare March 17, 2026 09:57
@parkan parkan marked this pull request as draft March 17, 2026 10:07
@parkan parkan changed the title add trustless IPFS gateway, remove bitswap add trustless IPFS gateway, use patched rclone, remove bitswap Mar 17, 2026
@parkan parkan marked this pull request as ready for review March 17, 2026 15:33
@parkan parkan force-pushed the feat/trustless-gateway branch 3 times, most recently from 51b8666 to a8d4dd4 Compare March 17, 2026 16:01
parkan added 7 commits March 26, 2026 16:56
remove the dead bitswap protocol and add an /ipfs/ endpoint implementing
the trustless gateway spec via boxo/gateway.

StorageBlockStore pools rclone handlers per storage and holds open a
streaming file reader to serve sequential block reads efficiently --
DAG nodes return from the DB without storage I/O, and leaf blocks from
the same file share a single held-open rclone stream.

supports dag-scope=all/entity/block, path resolution, entity-bytes,
format=car and format=raw. stale source files detected at read time
are mapped to HTTP 409 via boxo's error status mechanism.
replace github.com/rclone/rclone with parkan/rclone@singularity-v1.68.0
which adds sync.Map + singleflight caching for /metadata/ API calls in
the internetarchive backend. reduces per-item metadata hits from N (one
per file) to 1.

temporary measure until metadata caching is upstreamed to rclone.
- remove uptobox backend (dropped upstream)
- fix fs.Duration type change in config overrides
- handle []string option defaults in ToCLIFlag type switch
@parkan parkan force-pushed the feat/trustless-gateway branch from ca2e329 to c55bddf Compare March 27, 2026 10:04
@parkan parkan requested a review from anjor March 27, 2026 10:27
@anjor

anjor commented Mar 27, 2026

Copy link
Copy Markdown
Collaborator

I found 3 issues worth addressing in this PR:

  1. handler/wallet/export_keys.go
    migrateWalletAssignments skips unresolved legacy wallet_assignments rows, but still drops the table afterward. That makes the first export-keys run destructive for malformed keys, failed exports, or any actor that never gets a corresponding wallets row, because the preparation-wallet link is lost permanently. I think this should either fail when unresolved rows remain, or preserve the legacy table until every row has been migrated successfully.

  2. store/storage_blockstore.go
    The cached s.active reader is shared across requests, but it was opened with the ctx from whichever request populated it. readerWithRetry stores that context and checks it on every read and reopen, so a later request can inherit a canceled or timed-out context from an earlier one and fail spuriously. Reusing the stream only within a single request, or decoupling the cached reader from request-scoped context, would avoid that.

  3. store/storage_blockstore.go
    The mutex in readFileBlock is held across handler.Read, object validation, and the actual block read. That means one slow backend read serializes every /ipfs/ request through this process-wide blockstore. Under concurrent gateway traffic this looks like a single-reader bottleneck. I think the critical section should be narrowed to cache bookkeeping only, or moved to per-request state instead of a global lock.

@parkan

parkan commented Mar 30, 2026

Copy link
Copy Markdown
Collaborator Author

I think (1) landed on the wrong PR (concerning export_keys.go), maybe some stale state on your end? should have been addressed in #650

concerns (2) and (3) are valid in specific concurrent traffic cases, mostly materializing as a throughput bottleneck across serialized reads (3) and (much less likely to happen) same-file reads with different expectations in (2); acknowledge both but don't think they strictly block the baseline functionality here

@parkan

parkan commented Mar 30, 2026

Copy link
Copy Markdown
Collaborator Author

ok actually yes it's possible to inherit a cancelled context which is quite bad, working on a fix

@parkan

parkan commented Mar 30, 2026

Copy link
Copy Markdown
Collaborator Author

ok, context ownership has been shifted to the blockstore from the request; this should decouple cross-request concerns

@parkan

parkan commented Mar 30, 2026

Copy link
Copy Markdown
Collaborator Author

no idea why the CI got stuck on this, trying to manually run

@anjor

anjor commented Mar 31, 2026

Copy link
Copy Markdown
Collaborator

Code review

Found 3 issues worth flagging (none are blockers, but all affect the gateway path):

  1. Potential nil panic on carBlock.File in readFileBlockCarBlock.File is a nullable pointer (commit c23b61c made it nullable for DAG-node blocks that have no source file). If a block has RawBlock == nil AND FileID == NULL (e.g., source file deleted, FK set to NULL), readFileBlock dereferences carBlock.File unconditionally and panics.

storage := *carBlock.File.Attachment.Storage
file := *carBlock.File
blockLen := int64(carBlock.BlockLength())

  1. Mutex held across slow I/O in readFileBlocks.mu.Lock() is held from line 71 through handler.Read() (line 99) and io.ReadFull (via readFromActive, line 119). Under concurrent gateway traffic, all /ipfs/ requests serialize behind the slowest backend read. Acknowledged in prior discussion as non-blocking for baseline, but worth tracking.

func (s *StorageBlockStore) readFileBlock(ctx context.Context, carBlock model.CarBlock, c cid.Cid) (blocks.Block, error) {
s.mu.Lock()
defer s.mu.Unlock()
storage := *carBlock.File.Attachment.Storage
file := *carBlock.File
blockLen := int64(carBlock.BlockLength())
handler, err := s.getHandler(ctx, storage)
if err != nil {
return nil, errors.WithStack(err)
}
// try to read from held-open stream
if s.active != nil && s.active.fileID == file.ID && s.active.offset == carBlock.FileOffset {
data, err := s.readFromActive(blockLen)
if err != nil {
s.closeActive()
return nil, err
}
return blocks.NewBlockWithCid(data, c)
}
// different file or non-sequential offset -- close old, open new
s.closeActive()
// use a detached context for the cached reader so it outlives the
// request that created it. closeActive() cancels it on cleanup.
readerCtx, readerCancel := context.WithCancel(context.Background())
reader, obj, err := handler.Read(readerCtx, file.Path, carBlock.FileOffset, file.Size-carBlock.FileOffset)
if err != nil {
readerCancel()
return nil, errors.WithStack(err)
}
same, explanation := storagesystem.IsSameEntry(ctx, file, obj)
if !same {
reader.Close()
readerCancel()
return nil, errors.Wrap(ErrFileHasChanged, explanation)
}
s.active = &fileReader{
fileID: file.ID,
reader: reader,
offset: carBlock.FileOffset,
cancel: readerCancel,
}
data, err := s.readFromActive(blockLen)
if err != nil {
s.closeActive()
return nil, err
}
return blocks.NewBlockWithCid(data, c)
}

  1. Pooled RCloneHandler instances never closed on shutdownStorageBlockStore.Close() only cleans up the active file reader. The handlers map holding RClone filesystem instances (which can hold persistent S3/HTTP connections) has no shutdown path. The HTTP server also doesn't call Close() on the blockstore during Shutdown().

func (s *StorageBlockStore) Close() {
s.mu.Lock()
defer s.mu.Unlock()
s.closeActive()
}

None of these block the baseline gateway functionality, but (1) is a crash risk worth guarding against.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@parkan parkan force-pushed the feat/trustless-gateway branch from 0cb65cd to a9a1c74 Compare March 31, 2026 09:19
@parkan parkan force-pushed the feat/trustless-gateway branch from a9a1c74 to 2a404a6 Compare March 31, 2026 09:23
@parkan

parkan commented Mar 31, 2026

Copy link
Copy Markdown
Collaborator Author

nil guard and blockstore close added

mutex discussed above, accept for POC

re: closing rclone Fs, Shutdown() is an implementation detail and not consistently exposed by all backends, no Close() is exposed on any of them and they should be considered stateless even if in reality there's connection multiplexing etc -- can consider special casing stateful backends like SFTP but that's out of scope here

@anjor anjor left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three review items addressed — nil guard, blockstore close, and mutex accepted for POC. LGTM.

@parkan parkan merged commit 8257b36 into main Mar 31, 2026
9 checks passed
@parkan parkan deleted the feat/trustless-gateway branch March 31, 2026 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants