Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module github.com/docker/oci
go 1.25.0

require (
github.com/klauspost/compress v1.18.6
github.com/opencontainers/go-digest v1.0.0
github.com/opencontainers/image-spec v1.1.1
github.com/rogpeppe/go-internal v1.14.1
Expand Down
2 changes: 2 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/klauspost/compress v1.18.6 h1:2jupLlAwFm95+YDR+NwD2MEfFO9d4z4Prjl1XXDjuao=
github.com/klauspost/compress v1.18.6/go.mod h1:cwPg85FWrGar70rWktvGQj8/hthj3wpl0PGDogxkrSQ=
github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U=
github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM=
github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040=
Expand Down
99 changes: 99 additions & 0 deletions ocifs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# ocifs

Package `ocifs` provides an [`io/fs.FS`](https://pkg.go.dev/io/fs#FS) backed by an OCI image. It downloads each layer once, builds compressed-stream and tar indices, and then serves any file in the merged image as a random-access read — without ever unpacking layers to disk.

## How it works

An OCI image is a stack of compressed tar layers. `ocifs` composes four sub-packages into a single overlay filesystem:

```
Registry
└─ blobra — range-request io.ReaderAt over each compressed layer blob
└─ gzipr / zstdr — checkpoint index → random-access decompressed io.ReaderAt
└─ tarfs — tar entry index → io/fs.FS over the decompressed stream
└─ ocifs (overlay) — merges all layers with OCI whiteout semantics
```

On the first call to `New`, every layer blob is streamed once: the decompressor builds its checkpoint index while the tar scanner builds its entry index. Subsequent `Open` / `ReadAt` calls fetch only the compressed bytes covering the requested file — no full-layer downloads.

## Quick start

```go
import (
"context"
"io/fs"

"github.com/docker/oci"
"github.com/docker/oci/ocifs"
)

reg := oci.New(/* ... */)
ociFS, err := ocifs.New(ctx, reg, "library/alpine", "latest")
if err != nil {
return err
}
defer ociFS.Close()

// Walk the entire filesystem (root is always ".", never "/").
fs.WalkDir(ociFS, ".", func(path string, d fs.DirEntry, err error) error {
fmt.Println(path)
return err
})

// Read a single file.
data, err := ociFS.ReadFile("etc/os-release")
```

## Persisting the index

Scanning large layers on every startup is expensive. Save the index after the first `New` call and reuse it with `NewWithIndex`:

```go
// First run: build and persist.
ociFS, _ := ocifs.New(ctx, reg, repo, ref)
idx := ociFS.ImageIndex()
f, _ := os.Create("index.json")
idx.Encode(f)

// Subsequent runs: restore from disk.
f, _ := os.Open("index.json")
idx, _ := ocifs.DecodeImageIndex(f)
ociFS, err := ocifs.NewWithIndex(ctx, reg, repo, ref, idx)
if errors.Is(err, ocifs.ErrIndexStale) {
// Image was re-pushed; fall back to full scan.
ociFS, err = ocifs.New(ctx, reg, repo, ref)
}
```

`NewWithIndex` re-fetches the manifest to verify that layer digests still match the persisted index before accepting it. If the tag was re-pushed or a layer changed, `ErrIndexStale` is returned.

## Key types

| Type | Purpose |
|------|---------|
| `FS` | The `io/fs.FS` implementation. Also implements `ReadDirFS`, `StatFS`, `ReadFileFS`, and `Lstat`. |
| `ImageIndex` | Serializable bundle of per-layer checkpoint and tar indices. |
| `LayerIndex` | Per-layer index: digest, media type, compressed size, decompressor index, tar entry list. |

## Overlay semantics

Layers are merged bottom-to-top following the [OCI image layer spec](https://github.com/opencontainers/image-spec/blob/main/layer.md):

- **Whiteout** (`.wh.<name>`): deletes `<name>` from lower layers.
- **Opaque whiteout** (`.wh..wh..opq`): hides the entire directory contents from lower layers, keeping only what the current layer adds.
- **Hardlinks**: resolved to the target entry's content at open time.
- **Symlinks**: followed up to 255 hops; circular chains return `ErrSymlinkLoop`.

Whiteout markers are excluded from `ReadDir` results. Whiteout targets that are `"."`, `".."`, or contain `/` are silently ignored.

## Supported layer types

| Media type | Decompressor |
|-----------|-------------|
| `application/vnd.oci.image.layer.v1.tar+gzip` | `gzipr` |
| `application/vnd.docker.image.rootfs.diff.tar.gzip` | `gzipr` |
| `application/vnd.oci.image.layer.v1.tar+zstd` | `zstdr` |

## Concurrency

`FS` is safe for concurrent use after construction. Each `Open` call may issue range requests to the registry; the context passed to `New` governs all such requests. Call `Close` to release decompressor pools when the `FS` is no longer needed.
50 changes: 50 additions & 0 deletions ocifs/blobra/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# blobra

Package `blobra` adapts an OCI registry's range-request API to [`io.ReaderAt`](https://pkg.go.dev/io#ReaderAt), allowing decompressors (`gzipr`, `zstdr`) to fetch arbitrary byte ranges of a compressed layer blob without downloading the entire blob.

## How it works

Each `ReadAt(p, off)` call translates to a single `GetBlobRange` registry request for the half-open byte range `[off, off+len(p))`. The response body is read in full into `p`. No buffering or caching is performed — callers are responsible for requesting only the ranges they need.

## Usage

```go
import (
"context"
"github.com/docker/oci"
"github.com/docker/oci/ocifs/blobra"
)

// desc is an oci.Descriptor with Digest, MediaType, and Size set.
ra := blobra.New(ctx, registry, "library/alpine", desc)

// ra now satisfies io.ReaderAt over the compressed blob.
buf := make([]byte, 512)
n, err := ra.ReadAt(buf, 1024) // fetches bytes [1024, 1536) from the registry
```

`blobra.New` performs no I/O. Registry requests are only issued by `ReadAt`.

## Key types

### `BlobRanger`

The narrow interface `blobra` requires from the registry. Any `oci.Interface` implementation satisfies it:

```go
type BlobRanger interface {
GetBlobRange(ctx context.Context, repo string, digest oci.Digest, offset0, offset1 int64) (oci.BlobReader, error)
}
```

Using this interface rather than the full `oci.Interface` makes `blobra.Reader` easy to test with a small fake.

### `Reader`

`*Reader` implements `io.ReaderAt`. It also exposes `Size() int64` (from the descriptor) and `Descriptor() oci.Descriptor`.

## `io.ReaderAt` contract notes

- `ReadAt(p, off)` where `off >= desc.Size` returns `(0, io.EOF)` without issuing a request.
- A read that is clamped by end-of-blob (i.e. `off + len(p) > desc.Size`) returns `(n, io.EOF)` where `n < len(p)`. This signals a legitimate short read at end-of-blob.
- If the registry returns fewer bytes than the requested range, the error is surfaced as `io.ErrUnexpectedEOF` (never `io.EOF`), distinguishing a protocol violation from a legitimate end-of-blob.
106 changes: 106 additions & 0 deletions ocifs/blobra/blobra.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
// Package blobra adapts a range-capable OCI blob source to [io.ReaderAt].
//
// The package depends on [oci.Digest] and [oci.BlobReader] from the parent
// oci package, but intentionally does not depend on the sealed
// [oci.Interface] type. Instead it consumes the narrow [BlobRanger]
// interface, which any [oci.Interface] implementation satisfies and which
// test fakes can implement directly.
package blobra

import (
"context"
"fmt"
"io"

"github.com/docker/oci"
)

// Compile-time interface check.
var _ io.ReaderAt = (*Reader)(nil)

// BlobRanger is the subset of [oci.Interface] that [Reader] requires.
// Any [oci.Interface] implementation satisfies it.
//
// The range is half-open: [offset0, offset1). The response body contains
// exactly offset1-offset0 bytes when both endpoints are within the blob.
type BlobRanger interface {
GetBlobRange(ctx context.Context, repo string, digest oci.Digest, offset0, offset1 int64) (oci.BlobReader, error)
}

// Reader serves [io.ReaderAt] calls against a single OCI blob using
// range requests against an underlying [BlobRanger].
type Reader struct {
ctx context.Context
ranger BlobRanger
repo string
desc oci.Descriptor
}

// New returns a Reader that serves ReadAt calls via range requests on
// ranger. desc.Size is the blob size in bytes; it is used to detect
// end-of-blob without issuing a probe request. No I/O is performed by
// this constructor.
func New(ctx context.Context, ranger BlobRanger, repo string, desc oci.Descriptor) *Reader {
return &Reader{
ctx: ctx,
ranger: ranger,
repo: repo,
desc: desc,
}
}

// Descriptor returns the descriptor that the Reader was constructed with.
func (r *Reader) Descriptor() oci.Descriptor {
return r.desc
}

// Size returns the size of the underlying blob in bytes.
func (r *Reader) Size() int64 {
return r.desc.Size
}

// ReadAt implements [io.ReaderAt]. See the package documentation for the
// full contract; in particular, a server-side truncation of a clamped
// range request surfaces as [io.ErrUnexpectedEOF] rather than [io.EOF]
// so that downstream consumers (gzipr, zstdr) cannot mistake a protocol
// violation for legitimate end-of-blob.
func (r *Reader) ReadAt(p []byte, off int64) (int, error) {
if len(p) == 0 {
return 0, nil
}
if off < 0 {
return 0, fmt.Errorf("blobra: negative offset %d", off)
}
if off >= r.desc.Size {
return 0, io.EOF
}

n := int64(len(p))
if remaining := r.desc.Size - off; n > remaining {
n = remaining
}

br, err := r.ranger.GetBlobRange(r.ctx, r.repo, r.desc.Digest, off, off+n)
if err != nil {
return 0, err
}
defer br.Close()

if _, err := io.ReadFull(br, p[:n]); err != nil {
// io.ReadFull maps a fully empty stream to io.EOF and a partial
// stream to io.ErrUnexpectedEOF. Both indicate the registry
// returned fewer bytes than the clamped range demanded — a
// protocol violation. Surface it uniformly as
// io.ErrUnexpectedEOF; never propagate io.EOF here, which is
// reserved for off >= desc.Size.
if err == io.EOF || err == io.ErrUnexpectedEOF {
return 0, io.ErrUnexpectedEOF
}
return 0, err
}

if int64(len(p)) > n {
return int(n), io.EOF
}
return int(n), nil
}
Loading
Loading