Skip to content

Add batchextract command for parallel package extraction#6

Merged
thesprockee merged 2 commits intomainfrom
feature/batch-extract
Mar 14, 2026
Merged

Add batchextract command for parallel package extraction#6
thesprockee merged 2 commits intomainfrom
feature/batch-extract

Conversation

@thesprockee
Copy link
Copy Markdown
Member

Summary

  • Adds cmd/batchextract — a new CLI tool that extracts all packages from a _data directory in one command
  • Uses a configurable goroutine worker pool (defaults to runtime.NumCPU())
  • Shows real-time progress ([N/M] extracting...) with a final summary

Usage

batchextract -data ./ready-at-dawn/_data -output ./extracted
batchextract -data ./ready-at-dawn/_data -output ./extracted -workers 8
batchextract -data ./ready-at-dawn/_data -output ./extracted -verbose

Addresses the Phase 2 gap from EXTRACTION_PLAN.md — previously required running evrtools -mode extract once per package.

Test plan

  • Run against a _data directory with multiple manifests
  • Verify output structure matches single-package extraction
  • Test -verbose flag shows per-package file counts
  • Test with -workers 1 (serial) and -workers 16 (parallel)

🤖 Generated with Claude Code

Extracts all packages from a _data directory in one command, using
a configurable worker pool (default: runtime.NumCPU()).

  batchextract -data ./rad/_data -output ./extracted
  batchextract -data ./rad/_data -output ./extracted -workers 8

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 14, 2026 07:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new batchextract CLI command to batch-extract every package referenced by manifests in an EVR _data directory, using a configurable parallel worker pool and a simple progress display.

Changes:

  • Introduces cmd/batchextract command with flags for -data, -output, -workers, and -verbose.
  • Enumerates manifest files and extracts packages concurrently via a worker pool.
  • Aggregates results into a final summary with per-package failure reporting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/batchextract/main.go
Comment on lines +23 to +36
var (
dataDir string
outputDir string
workers int
filterType string
verbose bool
)

func init() {
flag.StringVar(&dataDir, "data", "", "Path to _data directory containing manifests/ and packages/")
flag.StringVar(&outputDir, "output", "", "Output directory for extracted files")
flag.IntVar(&workers, "workers", runtime.NumCPU(), "Number of parallel extraction workers")
flag.StringVar(&filterType, "filter", "", "Only extract files of this type symbol (hex, e.g. beac1969cb7b8861)")
flag.BoolVar(&verbose, "verbose", false, "Print each package as it is extracted")
Comment thread cmd/batchextract/main.go
Comment on lines +71 to +73
if err := os.MkdirAll(outputDir, 0755); err != nil {
return fmt.Errorf("create output directory: %w", err)
}
Comment thread cmd/batchextract/main.go
Comment on lines +163 to +169
dest := filepath.Join(outputDir, name)
if err := os.MkdirAll(dest, 0755); err != nil {
return 0, fmt.Errorf("create output dir: %w", err)
}

if err := pkg.Extract(dest); err != nil {
return 0, fmt.Errorf("extract: %w", err)
Comment thread cmd/batchextract/main.go
Comment on lines +124 to +131
} else {
totalFiles.Add(int64(r.files))
if verbose {
fmt.Printf("[%d/%d] %s (%d files)\n", done.Load(), total, r.name, r.files)
} else {
fmt.Printf("\r[%d/%d] extracting... ", done.Load(), total)
}
}
Comment thread cmd/batchextract/main.go
Comment on lines +83 to +98
fmt.Printf("Found %d manifests, extracting with %d workers...\n", len(names), workers)
start := time.Now()

// Feed work through a channel
work := make(chan string, len(names))
for _, n := range names {
work <- n
}
close(work)

results := make(chan result, len(names))
var wg sync.WaitGroup

for range workers {
wg.Add(1)
go func() {
Comment thread cmd/batchextract/main.go
names = append(names, e.Name())
}
}

…e empty manifests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@thesprockee thesprockee merged commit 6377a58 into main Mar 14, 2026
2 checks passed
@thesprockee thesprockee deleted the feature/batch-extract branch March 14, 2026 18:09
thesprockee added a commit that referenced this pull request Mar 14, 2026
* origin/main:
  Add batchextract command for parallel package extraction (#6)
  chore: prepare v1.0.0 release
  Add test coverage for analyze and inventory modes
  Add analyze and diff modes to evrtools (#10)
  Add symhash tool and pkg/hash for EVR symbol hash computation (#8)

# Conflicts:
#	cmd/symhash/main.go
#	pkg/hash/hash_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants