orcadev tool (benchmarking, dev setup, etc.) by plombardi89 · Pull Request #176 · Azure/unbounded

plombardi89 · 2026-05-21T16:45:18Z

Why this PR exists

Other developers are starting to build components that talk to Orca. They need a simple way to get Orca running in a Kubernetes cluster (kind on their laptop or an existing AKS / EKS / k3d cluster), put some test data into it, and run benchmarks or canned scenarios against it. Before this PR the answer was "read three READMEs, copy an env file, edit it, run several make targets across two directories, and hope the kind NodePorts work." That is not a coherent entrypoint.

What this PR adds

Three things, in roughly increasing order of footprint:

1. A developer tool: bin/orcadev (in hack/cmd/orcadev/)

One command-line tool for everything a developer wants to do against a running Orca:

Command	What it does
`orcadev upload`	Put a file (or N random blobs) into the origin.
`orcadev list`	Show what's in the origin.
`orcadev delete`	Remove things from the origin.
`orcadev roundtrip`	Upload a file, fetch it back through Orca, check the bytes match.
`orcadev cache list / inspect / clear`	See or remove what's in Orca's cache.
`orcadev bench`	Run a parallel-GET benchmark. Outputs human text and JSON for comparing runs.
`orcadev scenario`	Run a canned end-to-end scenario (`cold-warm`, `range-stress`, `empty-object`, `etag-change`).

The tool figures out how to reach the cluster automatically: it opens kubectl port-forward connections to the three services it needs (the Orca edge, Azurite for origin storage, LocalStack for the cache) and tears them down when the command finishes. The same commands work on kind and on any other cluster you can reach with kubectl.

2. A single install script: ./hack/orca/setup-orca.sh

One script with a small number of flags. It builds the Orca manifests on the fly, applies them to whatever kubectl context you point it at, waits for everything to be ready, and prints the next-step commands. The default install runs:

Orca itself (3 replicas).
Azurite as the Azure Blob Storage origin.
LocalStack as the S3 cache store.
Zero real cloud credentials required.

Two helper scripts handle the kind cluster lifecycle: kind-up.sh (create a kind cluster suitable for Orca) and kind-down.sh (delete it). The root Makefile exposes make orca-kind-up, make orca-kind-down, make orca-install, and make orca-reset as muscle-memory wrappers.

3. One quickstart README: hack/orca/README.md

A single document that walks a developer from "I have nothing" to "I am running benchmarks." It replaces the previous mix of quickstart.md, dev-harness.md, and a .env.example file (all deleted in this PR).

What reviewers should look at first

In order of importance:

hack/orca/README.md. This is what new developers will read. If it doesn't make sense to a colleague who has never touched Orca, the PR has failed at its primary goal. Read it top to bottom and try the commands.
hack/orca/setup-orca.sh. Especially the uninstall path (label-based delete + the --delete-namespace opt-in) and the safety rails on --build / --kind-load. This script is the one thing developers will run on their clusters; mistakes here can delete unrelated resources.
hack/cmd/orcadev/orcadev/portforward.go and port_forward_coverage_test.go. The port-forward logic is what makes bin/orcadev work the same on kind and non-kind clusters. The coverage test asserts that every subcommand that talks to a storage service opens the right forwards first; please confirm this contract feels right.
deploy/orca/04-deployment.yaml.tmpl and deploy/orca/dev/*.yaml.tmpl. Two template changes worth a careful read: the new RequireAntiAffinity knob in the Orca Deployment (kind keeps required, others get preferred), and the switch from NodePort to ClusterIP for Azurite and LocalStack.
Makefile changes to the ##@ Orca block. New / renamed targets: orca-install, orca-kind-up, orca-kind-down, orca-reset. Old aliases orca-up / orca-down are kept so existing muscle memory keeps working.
designs/orca/dev-workflow-remediation-plan.md. The plan document captured during the second review round; useful context for "why is this structured this way."

Files you can probably skim:

All *_test.go additions: structural guards, no production behavior.
hack/cmd/orcadev/orcadev/*.go outside portforward.go / preset.go: the subcommand bodies are mostly the original orcadev work that was already on this branch before the review.

How to try it

# Build the tools.
make orca-build orcadev

# Bring up kind + install Orca + Azurite + LocalStack.
make orca-kind-up

# Verify.
kubectl --context kind-orca-dev -n unbounded-kube get pods

# Seed and exercise.
bin/orcadev upload --generate --count 5 --size 10MiB
bin/orcadev roundtrip --file /tmp/some-file.bin
bin/orcadev scenario cold-warm
bin/orcadev bench --key synth1 --duration 30s

# Tear down.
make orca-kind-down

orcadev is a multi-purpose CLI for working with a running orca dev cluster. It replaces the older orcaseed seed-only tool: every orcaseed capability (synthetic-blob generation, single-file upload, origin listing, bulk delete) is reachable here as a subcommand, plus a broader debugging surface. Subcommands ----------- upload Seed the origin (file or N synthetic blobs). Supports both awss3 (LocalStack) and azureblob (Azurite / real Azure) drivers - orcaseed only spoke azureblob. list Enumerate origin objects. delete Bulk delete origin objects (interactive by default). roundtrip Upload data, fetch through orca's edge, and compare a streaming SHA-256 of the source bytes against a streaming SHA-256 of the response. Headline correctness check. Supports --range, --repeat, --cleanup, and --dump-diff (side-by-side hex dump of the first differing bytes on mismatch). cache Inspect or clear the cachestore. list Enumerate cachestore chunks (raw paths). inspect Given (bucket, key), compute the canonical chunk paths via internal/orca/chunk and HEAD each in the cachestore; print per-chunk presence + size. clear Bulk delete chunks by prefix or by object. bench Parallel GET throughput / latency benchmark. Emits human-friendly text on stdout plus optional JSON (--output json or --json-out PATH) with a log-spaced latency histogram (configurable bounds and bucket count; default 50 buckets across 100us..10s). JSON schema is versioned (schema_version=1) for cross-run comparison. scenario Canned end-to-end scenarios: cold-warm, range-stress, empty-object, etag-change. Same JSON envelope as bench so a CI pipeline can chain them. All subcommands accept --config <orca.yaml> to populate origin and cachestore coordinates from the same YAML the orca daemon consumes. Per-flag overrides win over the YAML value. Dev-harness changes ------------------- LocalStack now exposes a NodePort (default 30200) mirroring the existing Azurite NodePort 30100, so the host-side tool can talk to the cachestore + awss3 origin without a kubectl port-forward. The kind extraPortMappings are extended accordingly. A new render flag LocalstackNodePort is passed through hack/orca/Makefile's render-dev target. Makefile targets renamed ------------------------ seed-azure -> dev-azure seed-generate -> data-generate seed-upload -> data-upload seed-list -> data-list seed-delete -> data-delete Plus six new orcadev-specific targets: roundtrip, cache-list, cache-inspect, cache-clear, bench, scenario. The SEED_ARGS variable is renamed ARGS for action-oriented clarity. The orcaseed source tree under hack/cmd/orcaseed/ is deleted in this commit; orcaseed's TestParseSize coverage moves to orcadev's size_test.go. Verification ------------ go build ./... clean go test -count=1 ./hack/cmd/orcadev/... clean golangci-lint run -c .golangci.yaml ./hack/cmd/orcadev/... clean Smoke-tested against the existing dev cluster: upload + list + delete cycle against Azurite works end-to-end; bench correctly reports connection refused when LocalStack is not yet on the new NodePort (existing dev clusters need a 'make -C hack/orca deploy' to pick up the NodePort change).

Adds a new Step 8 to hack/orca/quickstart.md walking through the common orcadev workflows against a running dev cluster: 8a Roundtrip - one-command SHA-256 correctness check, including a sample --dump-diff mismatch output for triage. 8b Benchmarks - throughput + latency runs, plus the jq-based comparison workflow using --json-out + --label across iterations. 8c Scenarios - one-line invocations of cold-warm, range-stress, empty-object, etag-change. 8d Cache inspection - cache-list / cache-inspect / cache-clear for inspect-and-clear-between-experiments workflows. The existing 'Tear down' section becomes Step 9. dev-harness.md's 'Exercise the cache' section gets a one-line pointer to the new step so readers landing there for benchmarking find their way to the tutorial.

Replaces the one-shot Kubernetes Jobs that previously created the LocalStack S3 buckets ('orca-cache', 'orca-origin') and the Azurite container ('orca-test') with mechanisms that re-fire on every emulator restart. Failure mode being fixed ------------------------ LocalStack and Azurite both run with ephemeral state (emptyDir + PERSISTENCE=0 / no persistence mode). When their pods restart (OOM, eviction, manual delete, kind node restart) state is wiped. The existing 'orca-buckets-init' and 'orca-azurite-container-init' Jobs are Kubernetes Jobs - they ran once at first deploy and could not re-run after emulator restarts. Result: orca pods CrashLoopBackOff forever with 'NoSuchBucket' on the cachestore versioning probe; manual recovery was required. Mechanism --------- LocalStack: 'localstack-init-buckets' ConfigMap mounted at /etc/localstack/init/ready.d/init-buckets.sh (defaultMode 0755). LocalStack 3.x's native init-hooks pattern rescans this directory on every container start. The script idempotently creates both buckets and re-checks cachestore-versioning is unset (orca's versioningGate requirement). Azurite: 'container-ensurer' sidecar in the same Pod, running a 30-second forever-loop that calls 'az storage container create' idempotently. Talks to Azurite over loopback (127.0.0.1:10000) so it doesn't depend on cluster DNS. Both files lose their separate init-Job templates (02-init-job.yaml.tmpl, 04-azurite-init.yaml.tmpl); 'make deploy-localstack' and 'make deploy-azurite' lose the Job apply + wait steps and gain bucket/container readiness polls so operators see explicit success when the init mechanisms have run. After this change, 'make -C hack/orca deploy-localstack' and 'deploy-azurite' are clean idempotent recovery targets: re-running either against a stale cluster heals the buckets without a full 'orca-down && orca-up'. Verification ------------ Live-tested against an existing kind cluster: - Applied the new LocalStack manifest; init-hook ran and created both buckets within 5s of container Ready. - Deleted the LocalStack pod; new pod's init-hook recreated both buckets within 5s of restart. - Applied the new Azurite manifest with the container-ensurer sidecar; sidecar created 'orca-test' within 30s. - Deleted the Azurite pod; new sidecar recreated the container within 30s. - Three orca pods that had been CrashLoopBackOff for 9 days with NoSuchBucket reached 3/3 Ready after a rollout-restart. Initial sidecar memory limit of 64Mi was too low (Azure CLI is Python-based and loads ~150MB of modules); bumped to 256Mi limit / 128Mi request, sized to be comfortably above measured RSS. Templates render and pass kubectl --dry-run validation; no Go code changed; CI surface unaffected.

…data-random Replaces the existing 'upload --generate --prefix' flag with '--name' (same semantics in --file mode; in --generate mode the per-blob index is appended). Indices are now 1-based, so: upload --generate --count 1 --name foo -> blob 'foo1' upload --generate --count 3 --name foo -> blobs 'foo1','foo2','foo3' upload --generate --count 5 -> blobs 'synth1..synth5' (default --name is 'synth') Adds a new make target wrapping the singleton-blob case: make -C hack/orca data-random NAME=foo SIZE=10MiB which uploads exactly one random blob literally named 'foo1' of SIZE bytes. Useful for seeding a key to feed 'bench KEY=foo1' or 'roundtrip --key foo1' without dropping a file on local disk first. Breaking change vs. the prior flag surface: --prefix is removed, not deprecated. Acceptable since PR #176 (which introduced --prefix into orcadev) has not yet merged. The default output blob-name shape changes from 'synth-0' to 'synth1' (no dash separator; indices start at 1 rather than 0). Determinism note: under --seed, blob i's content is now derived from (seed + i) with i in 1..count, replacing the previous 0..count-1 range. Within the new world the byte-identical-across- runs guarantee holds. Drive-by fix: awss3 Put now buffers the body into a bytes.Reader before calling PutObject. The aws-sdk-go-v2 SigV4 signer requires a seekable body to pre-compute X-Amz-Content-SHA256; the previous implementation passed an unseekable io.Reader from newRandomReader and failed at runtime with 'failed to seek body to start, request stream is not seekable'. Caught while live-testing the new data-random target against LocalStack. Verification ------------ go build / go test / golangci-lint: clean (0 issues) Live against Azurite NodePort 30100: make data-random NAME=foo SIZE=1MiB -> foo1 (1.00 MiB) make data-generate ARGS='--count 3 --name multi --size 256KiB' -> multi1,multi2,multi3 make data-generate ARGS='--count 2 --size 128KiB' (default name) -> synth1, synth2

…dtrip/scenario The orca edge listener (8443) is a ClusterIP-only Service in the dev harness, so anything orcadev does that hits it (bench, roundtrip, scenario) requires a kubectl port-forward. Today operators have to run 'make -C hack/orca port-forward' in another shell first; forgetting yields 'dial tcp 127.0.0.1:8443: connect: connection refused' and a confusing user experience. This change adds an auto-managed port-forward: 1. Before constructing edgeClient, ensureEdgeReachable probes localhost:8443 with a 500ms TCP dial. 2. If the probe succeeds (operator already has a port-forward, or anything else is bound) -> no-op, proceed. 3. If the probe fails AND --orca-url is the dev default (localhost:8443 / 127.0.0.1:8443) AND --auto-port-forward is true (default) -> spawn 'kubectl --context kind-orca-dev -n unbounded-kube port-forward svc/orca 8443:8443' as a subprocess, wait up to 10s for the 'Forwarding from' stdout sentinel, then return a deferred cleanup that SIGTERMs the subprocess. 4. If --auto-port-forward=false OR the URL is non-default -> no-op (operator clearly knows what they're doing); the original connection error surfaces through the actual request. New flags (both persistent, both opt-out flavored): --auto-port-forward (bool, default true) --kube-context (string, default 'kind-orca-dev') A one-line 'auto port-forward: localhost:8443 -> svc/orca:8443' is printed to stderr when the forward fires so operators aren't surprised. Subprocess stderr is captured into an in-memory buffer and surfaced in any error message (handles 'context not found', 'service not found', kubectl-not-on-PATH). Tests cover the loopback probe (open and closed paths), the URL host:port parser, and the 'Forwarding from' sentinel detector. The kubectl subprocess itself is exercised live. Verified live against the dev cluster: - Without a port-forward: 'auto port-forward:' fires, orca reachable, subprocess cleaned up on exit. - With operator's own 'make -C hack/orca port-forward' running: probe succeeds; no new forward spawned. - --auto-port-forward=false: connection refused (today's behavior preserved). - --orca-url http://nonexistent:9999: auto-forward skipped; DNS error pass-through. Makefile help text and quickstart.md Step 8 prelude updated to note the auto-managed behaviour; the manual port-forward target remains available for long-lived sessions.

…mes opt-in The dev harness was internally inconsistent: .env.example shipped ORIGIN_DRIVER=awss3 and the Makefile fell back to awss3, but quickstart.md instructed operators to override to azureblob and the orcadev tool's tutorial assumed an Azurite origin. Operators following quickstart-as-written ended up in azureblob mode while the rest of the harness still defaulted to awss3, producing a mismatch where orcadev's awss3 defaults seeded the wrong backend and orca returned 404 on every fetch. Flips the default everywhere: hack/orca/.env.example ORIGIN_DRIVER=awss3 -> azureblob ORIGIN_ID=awss3-localstack -> azureblob-azurite Reorder the three modes block: azureblob primary, awss3 secondary, real-Azure tertiary hack/orca/Makefile ${ORIGIN_DRIVER:-awss3} -> :-azureblob ${ORIGIN_ID:-awss3-localstack} -> :-azureblob-azurite Remove the deploy-azurite-maybe target entirely; both LocalStack and Azurite are now always deployed regardless of ORIGIN_DRIVER. The cost is a few MB of memory per cluster; the benefit is eliminating the 'switched modes mid-cluster left Azurite undeployed' bug class and making every mode switch require only a ConfigMap change. hack/cmd/orcadev/orcadev/config.go defaultGlobalFlags originDriver: awss3 -> azureblob originID: inttest-origin -> azureblob-azurite originBucket: orca-origin -> orca-test originEndpoint: localhost:30200 -> localhost:30100/devstoreaccount1/ (Cachestore fields unchanged; awss3 fields kept so the --origin-driver=awss3 opt-in path still works without additional credential flags.) hack/orca/dev-harness.md Origin modes table reordered; 'What you get' updated to always-on Azurite; bring-up sequence step 7 updated to 'deploy-azurite' (no longer conditional); 'Switching origins' inverted (was 'switch FROM awss3 TO azureblob'; now the reverse); 'Recovery' lists both deploy-localstack and deploy-azurite; troubleshooting section invokes the right mode. hack/orca/quickstart.md Step 1 simplified: defaults are already azureblob-with-Azurite, so the only optional edit is LOG_LEVEL=debug. Step 2 pod-list drops the stale 'orca-buckets-init' and 'orca-azurite-container-init' Job rows (those Jobs were removed in commit 886607c; the list shows the current always-on Deployments instead). hack/cmd/orcadev/orcadev/orcadev_test.go Default-driver assertion flipped to expect 'azureblob'. Drive-by: add a !.env.example exception to .gitignore so the new defaults actually ship. Previously the .env.* rule silently swallowed .env.example, leaving the file local-only on each developer's machine and explaining why the .env.example shape has been drifting from what quickstart.md actually recommends. Real-Azure mode unchanged. Existing operators with a custom .env in awss3 mode keep working (their .env overrides the new default); they will need to pass --origin-driver=awss3 to orcadev or update their .env to track the new default. Verified live against the dev cluster: make -C hack/orca render -> origin driver: azureblob make -C hack/orca deploy -> Azurite + LocalStack + orca up make -C hack/orca data-random NAME=defaulttest SIZE=1MiB -> uploads to Azurite/orca-test make -C hack/orca bench KEY=defaulttest1 ARGS='--duration 10s' -> 107 MiB/s, 1078/1078 ok make -C hack/orca roundtrip FILE=/tmp/r.bin -> PASS, SHA-256 matches make -C hack/orca cache-list -> chunks under azureblob-azurite/... make -C hack/orca cache-inspect BUCKET=orca-test KEY=defaulttest1 -> 1/1 chunks (100%) awss3 opt-in regression check -> uploads, lists, deletes OK

… of cancelling them Before this change, '--duration N' used a single context.WithTimeout for both the worker-loop gate AND every in-flight HTTP request. When the timer fired, mid-flight requests were ripped out from under the SDK and counted as 'context_canceled' / 'body_read_error' in the errors-by-code summary. A typical 16-worker run reported ~16 errors at the tail of every benchmark - benign but indistinct from a real failure mode. Splits the loop into three contexts: gateCtx - controls new-work admission. Expires at --duration. Workers select on it before picking up another request; when it fires they stop accepting work. reqCtx - what HTTP calls actually use. Initially open even after gateCtx closes; this is what lets in-flight requests run to natural completion. drainTimer (context.AfterFunc on gateCtx + time.AfterFunc on the drain budget) - cancels reqCtx after the configurable drain timeout if in-flight requests still haven't finished. So a typical 30-second bench run now exits at ~30s if everything is healthy and at most 30s + --drain-timeout if something is genuinely hung. New flag: --drain-timeout 10s how long to wait for in-flight requests after the gate closes. Default 10s (matches g.timeout; well above the observed p99 for any reasonable run). New JSON fields (schema_version still 1; additive non-breaking): config.drain_timeout_seconds the configured drain budget results.gate_seconds wall-clock the gate was open results.drain_seconds wall-clock spent draining (drain_seconds == 0 in --requests mode since there is no gate phase) elapsed_seconds remains the total (gate + drain), so consumers keying off it continue to work. Human output gains two lines mirroring the JSON fields. In --requests N mode the new fields still behave sensibly: the last worker through the reqLimit gate calls setGateClosed(), producing gate_seconds == elapsed_seconds and a tiny drain_seconds covering any concurrent workers still finishing their last fetch. Verified live against the dev cluster: bench --duration 10s --concurrency 8 --range-size 256KiB --read-pattern random -> 969 requests, 0 errors, gate 10.001s, drain 59ms. The 16-error tail storm is gone. bench --duration 5s --concurrency 8 --drain-timeout 1ms (...) -> 8 errors (exactly --concurrency), drain budget exhausted. Proves the drain cap fires when in-flight requests overrun the budget. bench --duration= --requests 50 --concurrency 4 (...) -> 50 requests, 0 errors, gate 551ms, drain 9ms. --requests mode unaffected.

…factory

…tion

…credentials.sh

…pers) - percentile: collapse percentile() to call percentileSorted() after the in-place sort, eliminating a copy of the ceiling-rank math. - upload.go: drop the i := i loop-variable shadow now that the module is on go 1.26 (per-iteration variable since 1.22). - roundtrip.go: replace the bespoke parseInt with strconv.ParseInt and the manual dash-search loop with strings.IndexByte / HasPrefix. - hash.go: add unquoteETag() helper and reuse at the 4 origin sites and 1 edge site that previously inlined strings.Trim(s, "\""). - list.go: route runDelete through the existing confirmPrompt helper. Introduce errConfirmAborted as a sentinel so runDelete preserves its 'aborted.' + exit 0 behaviour while cache.go's callers still surface a non-zero exit on decline. confirmPrompt also no longer prints a leading space when msg is empty.

…cenario env) - io.go: add emitJSONResult[T any] that bench.emitBenchResult and scenario.emitScenarioResult now delegate to. Encoding json into stdout / --json-out path was duplicated verbatim across the two. - cachestore.go: add buildS3Client() with the static credentials / checksum opt-out / endpoint / path-style configuration. Both newCachestoreClient and newAWSS3Origin call it; drops a half-screen of awsconfig.LoadDefaultConfig boilerplate from origin.go. - cachestore.go: add walkS3() pagination helper consumed by both cachestoreClient.List and awss3Origin.List. The visit callback returns false to short-circuit the walk for limit-bounded callers. - cache.go: add forEachChunk() that loops over chunk.Key indices, HEADing each path. clearByObject (cache) and clearScenarioObject (scenario) now drive the same loop. The latter previously had a hardcoded 1024-index cap; it now derives nChunks from the origin HEAD size via the new resolveObjectMetadata helper, with the 1024 fallback only used when size is unknown. - scenario.go: add newScenarioEnv + scenarioCleanup. Each of the four scenarios was opening with the same 8 lines of origin-client / ensure-bucket / edge-client construction and the same 5-line defer-cleanup-if-not-keepData block. Both are now one-liners.

… precedence - bench.go: extract pickBenchRange() from the worker hot loop in runBenchLoop. The original 25-line if/else cascade is now a single call site with three intent-named branches (full / random / sequential). Adds bench_test.go::TestPickBenchRange covering each branch including the wrap-at-boundary case for sequential and a 256-draw in-bounds check for random. - config.go: switch globalFlags.resolve from 'value differs from default => operator set the flag' to cobra Flag.Changed. This is strictly more correct: an operator who passes the literal default (e.g. --origin-bucket=orca-test) now wins over the YAML rather than silently letting the YAML override. resolve now takes a *cobra.Command instead of a context.Context (the ctx parameter was unused). Drops the apologetic 'good-enough' comment. - orcadev.go: pass cmd directly to resolve.

The dev-harness manifests under deploy/orca/dev/ used to include two one-shot Jobs (02-init-job.yaml.tmpl, 04-azurite-init.yaml.tmpl) that created the LocalStack S3 buckets and the Azurite container at first deploy. Commit 886607c ("dev-harness: self-healing LocalStack + Azurite bucket bootstrap") deleted both Jobs and replaced them with PostStart lifecycle hooks / a sidecar driven by an inline ConfigMap so the buckets/container are re-created on every emulator restart. That commit did not update TestDevManifestsRender, which still asserted a Job kind in the rendered output, leaving 'make' broken on this branch and main since the May 21 dev-harness change. The rendered kind set is now ConfigMap + Deployment + Service; update the assertion to match and document why Job is no longer expected. Unrelated drive-by fix included here so the orcadev refactor branch runs 'make' green end-to-end.

Adds an 'orcadev' target that runs 'test' then builds bin/orcadev from ./hack/cmd/orcadev. Recipe shape mirrors the existing forge target; the ORCADEV_BIN / ORCADEV_CMD variables sit alongside the target itself rather than the global variables header so the tool's build config travels with its only consumer. Also adds the tool to the .PHONY list (next to forge) and the manually curated help text under the Build section. Deliberately NOT added to the 'all' target: orcadev is a host-side operator tool used against an already-running orca harness, not a shipped binary, so the default 'make' shouldn't pay its build cost. hack/orca/Makefile keeps its 'go run hack/cmd/orcadev' invocations so the inner dev loop stays free of a separate compile step.

Replace the previous .env + Make-targets + multiple shell scripts flow with one install script (hack/orca/setup-orca.sh) plus thin kind cluster lifecycle helpers (kind-up.sh / kind-down.sh) and one quickstart (hack/orca/README.md). The install script is cluster-agnostic: it works against any kubectl context (kind, AKS, EKS, k3d, ...) with the same handful of flags. Defaults match the previously-supported dev shape: azureblob origin backed by in-cluster Azurite, S3 cachestore backed by in-cluster LocalStack, zero real cloud credentials required. Real-Azure mode is opted into via AZURE_STORAGE_* env vars. orcadev grows a --preset=dev flag (the default) that bundles the well-known dev coordinates. The auto-port-forward machinery now covers svc/orca + svc/azurite + svc/localstack so the same `bin/orcadev <verb>` invocation works on every cluster flavor without depending on kind NodePort hostPort mappings. Kube context default switches from kind-orca-dev to empty (current context). Removed: quickstart.md, dev-harness.md (folded into README.md); .env.example, orcadev-flags.sh (replaced by CLI flags + Go preset); kind-create.sh, kind-load.sh, down.sh, deploy-credentials.sh (folded into setup-orca.sh / kind-up.sh / kind-down.sh); rendered-dev/ directory (setup-orca.sh renders to a tempdir). Root Makefile gains orca-install (current context) and orca-kind-up / orca-kind-down (with orca-up / orca-down kept as back-compat aliases). hack/orca/Makefile shrinks to the three operational verbs (status, logs, port-forward); everything else is done directly with bin/orcadev <verb>.

…ward, anti-affinity) Addresses the seven dev-workflow review findings logged in designs/orca/dev-workflow-remediation-plan.md. 1. Drop kind NodePort: emulator Services switch to ClusterIP unconditionally; remove kind extraPortMappings. orcadev port-forwards everywhere now, so NodePort offered no value and was an AKS / shared-cluster footgun (fixed nodePort 30100/30200 could collide or be policy-blocked). 2. Hardened uninstall: setup-orca.sh --uninstall now deletes only resources matching app.kubernetes.io/name=orca or app.kubernetes.io/part-of=orca-dev labels and leaves the namespace intact. New --delete-namespace flag explicitly opts into removing the namespace (and every unrelated resource in it). orca-credentials Secret now carries the orca name label so the label-selector picks it up. 3. Universal port-forward coverage: every orcadev subcommand that touches origin / cachestore / edge now calls ensurePortForwards at the top of its RunE (upload, list, delete, cache list/inspect/ clear, roundtrip, bench, scenario). Previously only roundtrip/ bench/scenario opened forwards, so bin/orcadev upload --generate failed on AKS once kind NodePorts were dropped. TestEverySubcommandOpensPortForwards is the structural guard. 4. README accuracy: drop multi-object and range-large from the scenario list (not implemented), drop the same from the orcadev package docstring, update cache examples to the default azureblob container orca-test, document --delete-namespace and the existing-cluster anti-affinity behavior. 5. RequireAntiAffinity template knob in deploy/orca/04-deployment: true (kind) renders the strict requiredDuringScheduling block; false (non-kind) renders preferredDuringScheduling so clusters with fewer than 3 schedulable nodes still roll out. setup-orca.sh picks the right default by detecting kind-* contexts. 6. Tempdir trap consolidation in setup-orca.sh: single cleanup_paths stack + one EXIT trap, replacing the previous double-trap that leaked the kind image archive tempdir on --kind-load runs. 7. Safety rails: make orca-install errors when targeted at a non-kind context with the default ghcr.io/azure/orca:dev image (which wouldn't be pullable). setup-orca.sh --build without --kind-load now errors fast instead of building an image nothing uses. Validated on kind end-to-end: fresh cluster + setup-orca.sh + orcadev upload/list/cache/roundtrip/scenario/bench. AKS validation remains a follow-up owner task per the plan.

jveski

LGTM, one nit

The old roundtrip output buried the source and received SHA-256s at the right edges of two long, differently-shaped lines, so a human had to compare 64 hex characters across visual whitespace to confirm they matched. The new layout puts each sha256 on its own indented line under a short heading and appends a MATCH or MISMATCH marker so the at-a-glance verdict is unambiguous: source: orca-test.bin (5.00 MiB) sha256: 45a643a91d90c4fb... iter 0: status=200 bytes=5.00 MiB elapsed=151ms rate=33.08 MiB/s sha256: 45a643a91d90c4fb... MATCH PASS sha256=45a643a9... (3 iters) The MISMATCH branch keeps the existing copy-paste summary block (MISMATCH on iter N + source + received) so operators still get full hashes for incident triage, but the per-iter MISMATCH marker now appears at the same indentation as MATCH so visual scanning catches it immediately. Coverage: new roundtrip_output_test.go captures stderr through an os.Pipe and asserts the heading + indented sha256 + marker + PASS format end-to-end against a fake origin + httptest edge. Two new helpers (shortHash, iterLabel) get their own table tests for singular/plural and short-hash boundary cases. Also fixes the stale setup-orca.sh "Next steps" block: it previously printed `bin/orcadev roundtrip --file /tmp/test.bin` without telling the user where /tmp/test.bin comes from. Now it prints a `dd` line first and uses /tmp/orca-test.bin to match the README example, and leads with `scenario cold-warm` which needs no seed file at all.

The previous errors were terse: Error: --file and --key are mutually exclusive Error: one of --file or --key is required This left the user staring at the help text trying to figure out which mode they actually wanted. The new messages spell out the source-of-truth contract for each mode and, in the mutually-exclusive case, suggest the workaround ("upload it under a different name") that the user actually wants if they were trying to compare a local file against an existing origin object. The two modes are: --file PATH: source-of-truth is the local file. Upload it, fetch it back through orca, compare bytes. --key NAME: source-of-truth is the current origin object. Read it from origin, fetch the same key through orca, compare bytes. Combining them would be ambiguous (which is the source?) so it is rejected, but the error now says so in words the operator can act on without re-reading the long help. Adds TestRunRoundtrip_FlagErrors which locks both messages.

PR #176 review nit: the edge HTTP client used the zero-value http.Client which falls back to http.DefaultTransport. That transport caps MaxIdleConnsPerHost at 2, meaning every concurrent request beyond the first two against the single orca host had to pay a fresh TCP handshake (and TLS, in production deployments). The bench subcommand defaults to 8 workers and the README suggests --concurrency 16; scenarios spin a few dozen parallel range reads. Capping the keep-alive pool at 2 throttled all of these to the stdlib default, silently slowing benchmarks and hiding real performance characteristics. Fix clones http.DefaultTransport (so we inherit Dial / TLS / HTTP-2 defaults), raises MaxIdleConnsPerHost to 256 (more than any realistic dev concurrency), and explicitly pins MaxConnsPerHost=0 (unlimited) so a future reviewer can see we never silently throttle the operator's --concurrency choice. Two new tests lock the transport sizing and the 5x-timeout cap so a future refactor of newEdgeClient cannot quietly reintroduce the regression.

plombardi89 requested a review from a team May 21, 2026 16:45

plombardi89 mentioned this pull request May 21, 2026

docs: add orcadev tutorial (roundtrip, bench, scenarios) to quickstart #177

Merged

plombardi89 mentioned this pull request May 21, 2026

dev-harness: self-healing LocalStack + Azurite bucket bootstrap #179

Closed

plombardi89 added 24 commits May 21, 2026 17:18

orcadev: allow bench --requests with default duration

36ffea2

orcadev: drain managed port-forward stdout after readiness

da4de6b

orcadev: enforce roundtrip --expect-etag

9f08cec

orcadev: bind cache inspect bucket before origin client creation

515fb37

orcadev: fail cold-warm when cache drop fails

750e67b

orcadev: verify range-stress response status and bytes

a14adc2

orcadev: plumb scenario chunk-size into cache clearing

9edbee0

hack/orca: keep real Azure renders off the Azurite endpoint

54c51a2

hack/orca: default credentials deployment to azureblob

760f780

hack/orca: pass .env settings through orcadev wrapper targets

9c6f97d

hack/orca: parameterize orcadev wrapper in-cluster URLs on NAMESPACE

df67acd

orcadev: extract runRoundtripWith and cover --expect-etag end-to-end

4fcb8b8

orcadev: drive cache inspect bucket override through newOriginClient …

c762559

…factory

orcadev: extract runScenarioColdWarmWith and cover drop_cache propaga…

a5c06b3

…tion

orcadev: retry port-forward once on bind-race with concurrent invocation

613e6d9

orcadev: inline resolveScenarioChunkSize

6862534

orcadev: inline prepareCacheInspectOriginBucket

912392c

orcadev: retire waitForForwarding compat wrapper

929f9e1

orcadev: document why recordDropCacheStep stays a named helper

7087c41

orcadev: reject explicit --duration 0 with a pointer to --requests

6fa7cb6

plombardi89 added 16 commits May 22, 2026 16:10

orcadev: normalizeETag trims whitespace after the weak validator prefix

3b06ad7

orcadev: verifyRangeResponse includes first-diff offset and hex window

b320b0f

orcadev: cap range-stress --size at 4 GiB in-memory ceiling

70ed939

orcadev: bound port-forward stderr buffer with a ring

a59290b

orcadev: drop redundant 1* multiplier in range-stress notice threshold

095c912

hack/orca: drop hardcoded test/test creds; defer to orcadev + deploy-…

729d521

…credentials.sh

hack/orca: extract orcadev wrapper flag translation into a sh helper

51f95b5

hack/orca: refresh orcadev help text to mention .env auto-derivation

5419e4b

Merge remote-tracking branch 'origin/main' into phlombar/orcadev-tool

3b7c1de

plombardi89 changed the title ~~hack: introduce orcadev, dev/debug tool (subsumes orcaseed)~~ orcadev tool (benchmarking, dev setup, etc.) May 28, 2026

jveski reviewed May 28, 2026

View reviewed changes

Comment thread hack/cmd/orcadev/orcadev/edge.go

jveski previously approved these changes May 28, 2026

View reviewed changes

plombardi89 dismissed jveski’s stale review via 013261a May 28, 2026 22:20

plombardi89 added 2 commits May 28, 2026 18:45

jveski approved these changes May 29, 2026

View reviewed changes

plombardi89 merged commit 67622ee into main May 29, 2026
24 checks passed

plombardi89 deleted the phlombar/orcadev-tool branch May 29, 2026 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orcadev tool (benchmarking, dev setup, etc.)#176

orcadev tool (benchmarking, dev setup, etc.)#176
plombardi89 merged 46 commits into
mainfrom
phlombar/orcadev-tool

plombardi89 commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

jveski left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

plombardi89 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why this PR exists

What this PR adds

What reviewers should look at first

How to try it

Uh oh!

Uh oh!

jveski left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

plombardi89 commented May 21, 2026 •

edited

Loading