Add repeatable test+benchmark workflow, CI artifact upload, and scheduler benchmarks by igor-holt · Pull Request #16 · Genesis-Conductor-Engine/Instinct

Igor Holt (igor-holt) · 2026-02-01T18:42:48Z

Motivation

Provide a repeatable, logged test and benchmark workflow so scheduler behavior can be measured and archived.

Description

Add ./scripts/run_checks.sh to run tests (go test -count=2) and Go benchmarks (-bench . -benchmem -count=2) and write output to reports/.
Add scheduler benchmarks in agent/internal/scheduler/scheduler_bench_test.go and update Makefile to include a bench target that writes reports/bench.log.
Wire CI to execute ./scripts/run_checks.sh and upload aios-layer/reports as an artifact via the workflow in .github/workflows/ci.yml.
Update README.md and .gitignore to document the new workflow and ignore reports/; add go.mod/go.sum entries pulled in by go mod tidy so tests run cleanly.

Testing

Ran go mod tidy to populate go.sum and satisfy dependencies, which completed successfully.
Executed ./scripts/run_checks.sh locally, which ran go test -count=2 and the benchmarks; unit tests (scheduler) passed and benchmarks completed, with logs written to reports/test.log and reports/bench.log.
Measured benchmark results (examples captured during the run): BenchmarkCreateLease ~ 666–676 ns/op, 824 B/op, 5 allocs/op; BenchmarkReapExpired ~ 837–851 ns/op, 808 B/op, 4 allocs/op.

Codex Task

gemini-code-assist · 2026-02-01T18:43:22Z

Summary of Changes

Hello Igor Holt (@igor-holt), I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's development and operational capabilities by establishing a robust and repeatable testing and benchmarking infrastructure. It introduces a standardized script for running both unit tests and performance benchmarks, integrates this process into the GitHub Actions CI pipeline, and ensures that all generated reports are archived as build artifacts. Furthermore, it adds specific performance benchmarks for the GPU scheduler, allowing for continuous monitoring and optimization of its core functionalities. This foundational work improves code quality, facilitates performance analysis, and streamlines the development workflow for the AIOS Layer.

Highlights

Automated Testing & Benchmarking Workflow: Introduced ./scripts/run_checks.sh to standardize execution of go test and go bench, ensuring consistent and repeatable results.
CI/CD Integration with Artifact Uploads: The new .github/workflows/ci.yml now automatically runs the checks script and uploads the generated reports/ directory as a CI artifact, enabling historical tracking of test and benchmark outcomes.
Scheduler Performance Benchmarks: Added dedicated Go benchmarks (scheduler_bench_test.go) for the agent/internal/scheduler component, focusing on CreateLease and ReapExpired operations to measure their efficiency.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	Gemini (@gemini-code-assist)	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on Gemini (@gemini-code-assist) comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant amount of new functionality, including CI/CD workflows, benchmarking, and the core logic for the AIOS agent. The changes are well-structured. My review focuses on improving correctness, robustness, and performance. Key suggestions include fixing issues in the Makefile and benchmark tests, improving error handling in the CLI and GPU discovery, and addressing code duplication and best practices in the agent's main logic. Overall, this is a solid foundation.

gemini-code-assist · 2026-02-01T18:45:57Z

+
+bench:
+	mkdir -p reports
+	go test ./agent/... -run ^$ -bench . -benchmem -count=2 | tee reports/bench.log


The bench target pipes the output of go test to tee. In a Makefile, the exit code of a pipeline is determined by the last command, which is tee. This means that if go test fails, the Makefile will not report an error. To ensure that the failure is caught, you should use set -o pipefail.

set -o pipefail; go test ./agent/... -run ^$ -bench . -benchmem -count=2 | tee reports/bench.log

gemini-code-assist · 2026-02-01T18:45:57Z

+	gpus := []GPU{{Index: 0, Name: "gpu0", MemoryTotal: 1000}}
+	for i := 0; i < b.N; i++ {
+		sched := New(gpus)
+		_, _ = sched.CreateLease("lease", "user", time.Nanosecond)
+		sched.ReapExpired()
+	}


This benchmark includes the scheduler initialization (New) and lease creation inside the loop, so it measures the combined performance of New, CreateLease, and ReapExpired. To isolate the performance of ReapExpired, you should set up the scheduler with an expired lease outside the loop and use b.ResetTimer() before starting the measurement.

gpus := []GPU{{Index: 0, Name: "gpu0", MemoryTotal: 1000}} sched := New(gpus) b.ResetTimer() for i := 0; i < b.N; i++ { b.StopTimer() // Setup: create a lease that is already expired. sched.leases["lease"] = Lease{ID: "lease", ExpiresAt: time.Now().Add(-time.Second)} b.StartTimer() sched.ReapExpired() }

gemini-code-assist · 2026-02-01T18:45:57Z

+func requestLease(agent, user string, duration int, launch bool) {
+	payload := leaseRequest{User: user, DurationSeconds: duration, LaunchRuntime: launch}
+	body, _ := json.Marshal(payload)
+	resp, err := http.Post(agent+"/v1/leases", "application/json", bytes.NewReader(body))
+	if err != nil {
+		fmt.Printf("error: %v\n", err)
+		return
+	}
+	defer resp.Body.Close()
+	data, _ := io.ReadAll(resp.Body)
+	fmt.Println(string(data))
+}


Errors from json.Marshal, http.Post, and io.ReadAll are either ignored or not handled properly. This can lead to silent failures. For a CLI tool, it's crucial to check all errors, print a descriptive message to stderr, and exit with a non-zero status code to signal failure to the user or calling scripts.

func requestLease(agent, user string, duration int, launch bool) { payload := leaseRequest{User: user, DurationSeconds: duration, LaunchRuntime: launch} body, err := json.Marshal(payload) if err != nil { fmt.Fprintf(os.Stderr, "error creating request: %v\n", err) os.Exit(1) } resp, err := http.Post(agent+"/v1/leases", "application/json", bytes.NewReader(body)) if err != nil { fmt.Fprintf(os.Stderr, "error making request: %v\n", err) os.Exit(1) } defer resp.Body.Close() data, err := io.ReadAll(resp.Body) if err != nil { fmt.Fprintf(os.Stderr, "error reading response: %v\n", err) os.Exit(1) } fmt.Println(string(data)) }

gemini-code-assist · 2026-02-01T18:45:57Z

+func sendInference(endpoint, model, prompt string) {
+	payload := map[string]any{
+		"model": model,
+		"messages": []map[string]string{
+			{"role": "user", "content": prompt},
+		},
+		"max_tokens": 64,
+		"temperature": 0.2,
+	}
+	body, _ := json.Marshal(payload)
+	client := http.Client{Timeout: time.Second * 60}
+	resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body))
+	if err != nil {
+		fmt.Printf("error: %v\n", err)
+		return
+	}
+	defer resp.Body.Close()
+	data, _ := io.ReadAll(resp.Body)
+	fmt.Println(string(data))
+}


Similar to requestLease, errors from json.Marshal, client.Post, and io.ReadAll are not properly handled. This can cause the tool to fail silently or produce no output on error. All potential errors should be checked, with messages printed to stderr and a non-zero exit code on failure.

func sendInference(endpoint, model, prompt string) { payload := map[string]any{ "model": model, "messages": []map[string]string{ {"role": "user", "content": prompt}, }, "max_tokens": 64, "temperature": 0.2, } body, err := json.Marshal(payload) if err != nil { fmt.Fprintf(os.Stderr, "error creating request: %v\n", err) os.Exit(1) } client := http.Client{Timeout: time.Second * 60} resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body)) if err != nil { fmt.Fprintf(os.Stderr, "error making request: %v\n", err) os.Exit(1) } defer resp.Body.Close() data, err := io.ReadAll(resp.Body) if err != nil { fmt.Fprintf(os.Stderr, "error reading response: %v\n", err) os.Exit(1) } fmt.Println(string(data)) }

gemini-code-assist · 2026-02-01T18:45:57Z

+      - uses: actions/setup-go@v5
+        with:
+          go-version: "1.22"


To improve the performance of the CI workflow, you can enable caching for Go modules. The actions/setup-go action provides a simple way to do this, which will speed up subsequent runs by caching downloaded dependencies.

- uses: actions/setup-go@v5 with: go-version: "1.22" cache: true

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Igor Holt <iholt@mymail.aacc.edu>

Igor Holt (igor-holt)

Copilot

Copilot · 2026-02-04T13:45:37Z

Igor Holt (@igor-holt) I've opened a new pull request, #17, to work on those changes. Once the pull request is ready, I'll request review from you.

Add benchmarked test workflow for AIOS layer

7e1771d

Igor Holt (igor-holt) added the codex label Feb 1, 2026 — with ChatGPT Codex Connector

gemini-code-assist Bot reviewed Feb 1, 2026

View reviewed changes

Update aios-layer/agent/internal/scheduler/scheduler.go

f4bf774

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Igor Holt <iholt@mymail.aacc.edu>

Igor Holt (igor-holt) assigned Copilot Feb 4, 2026

Igor Holt (igor-holt) commented Feb 4, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 4, 2026

[WIP] Update to address feedback on test and benchmark workflow #17

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add repeatable test+benchmark workflow, CI artifact upload, and scheduler benchmarks#16

Add repeatable test+benchmark workflow, CI artifact upload, and scheduler benchmarks#16
Igor Holt (igor-holt) wants to merge 2 commits into
mainfrom
codex/design-ai/gpu-os-layer-architecture-l2ddz3

Igor Holt (igor-holt) commented Feb 1, 2026

Uh oh!

gemini-code-assist Bot commented Feb 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Igor Holt (igor-holt) left a comment

Uh oh!

Copilot AI commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Igor Holt (igor-holt) commented Feb 1, 2026

Motivation

Description

Testing

Uh oh!

gemini-code-assist Bot commented Feb 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Igor Holt (igor-holt) left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants