Add repeatable test+benchmark workflow, CI artifact upload, and scheduler benchmarks by igor-holt · Pull Request #15 · Genesis-Conductor-Engine/Instinct

Igor Holt (igor-holt) · 2026-02-01T18:42:46Z

Motivation

Provide a repeatable, logged test and benchmark workflow so scheduler behavior can be measured and archived.

Description

Add ./scripts/run_checks.sh to run tests (go test -count=2) and Go benchmarks (-bench . -benchmem -count=2) and write output to reports/.
Add scheduler benchmarks in agent/internal/scheduler/scheduler_bench_test.go and update Makefile to include a bench target that writes reports/bench.log.
Wire CI to execute ./scripts/run_checks.sh and upload aios-layer/reports as an artifact via the workflow in .github/workflows/ci.yml.
Update README.md and .gitignore to document the new workflow and ignore reports/; add go.mod/go.sum entries pulled in by go mod tidy so tests run cleanly.

Testing

Ran go mod tidy to populate go.sum and satisfy dependencies, which completed successfully.
Executed ./scripts/run_checks.sh locally, which ran go test -count=2 and the benchmarks; unit tests (scheduler) passed and benchmarks completed, with logs written to reports/test.log and reports/bench.log.
Measured benchmark results (examples captured during the run): BenchmarkCreateLease ~ 666–676 ns/op, 824 B/op, 5 allocs/op; BenchmarkReapExpired ~ 837–851 ns/op, 808 B/op, 4 allocs/op.

Codex Task

gemini-code-assist · 2026-02-01T18:43:24Z

Summary of Changes

Hello Igor Holt (@igor-holt), I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays a robust foundation for continuous quality and performance monitoring within the AIOS Layer project. It introduces a standardized, repeatable workflow for executing both unit tests and Go benchmarks, ensuring consistent evaluation of code changes. This workflow is seamlessly integrated into the project's CI pipeline, automatically running checks and archiving detailed performance reports as artifacts. Furthermore, specific benchmarks have been added for the core scheduler component, enabling precise measurement and tracking of its efficiency over time.

Highlights

Repeatable Test and Benchmark Workflow: A new script (./scripts/run_checks.sh) has been introduced to standardize the execution of Go tests and benchmarks with consistent configurations, ensuring reliable and comparable results across runs.
CI Integration with Artifact Upload: The project's CI pipeline (.github/workflows/ci.yml) now automatically triggers the test and benchmark workflow on code changes, uploading the generated reports/ directory as a CI artifact for easy access to historical performance data and test outcomes.
Dedicated Scheduler Benchmarks: New Go benchmarks (agent/internal/scheduler/scheduler_bench_test.go) have been added to specifically measure the performance of critical scheduler operations, such as CreateLease and ReapExpired, providing targeted performance insights.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	Gemini (@gemini-code-assist)	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on Gemini (@gemini-code-assist) comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant amount of new functionality, including a repeatable test and benchmark workflow, CI integration, and scheduler benchmarks. The overall structure is well-organized, and the separation of concerns into different packages is good. I've identified several areas for improvement, particularly around error handling in the CLI, benchmark correctness, and strengthening security aspects like lease ID generation. My review includes specific suggestions to address these points.

gemini-code-assist · 2026-02-01T18:46:09Z

+	for i := 0; i < b.N; i++ {
+		sched := New(gpus)
+		_, _ = sched.CreateLease("lease", "user", time.Second)
+	}


The benchmark setup (creating a new scheduler with New(gpus)) is inside the for loop. This means the benchmark is measuring the setup cost in every iteration, not just the CreateLease call. For an accurate benchmark of CreateLease, the scheduler should be initialized once outside the loop, and b.ResetTimer() should be called after the setup. You will also need to ensure the scheduler's state is reset for each iteration to make the benchmark valid for b.N runs.

gemini-code-assist · 2026-02-01T18:46:09Z

+	for i := 0; i < b.N; i++ {
+		sched := New(gpus)
+		_, _ = sched.CreateLease("lease", "user", time.Nanosecond)
+		sched.ReapExpired()
+	}


This benchmark has two issues. First, like BenchmarkCreateLease, it re-initializes the scheduler in every iteration, which skews the measurement. Second, creating a lease with time.Nanosecond duration doesn't guarantee it will be expired when ReapExpired is called. A better approach is to set up the scheduler outside the loop, use b.ResetTimer(), and inside the loop, use b.StopTimer()/b.StartTimer() to exclude the setup of creating an already expired lease for each iteration.

gemini-code-assist · 2026-02-01T18:46:09Z

+				writeError(w, http.StatusForbidden, err.Error())
+				return
+			}
+			leaseID := fmt.Sprintf("lease-%d", rand.Intn(1_000_000))


Using rand.Intn for lease IDs is not robust. It can lead to collisions and is not cryptographically secure, which is a concern for a system aiming to be 'secure-by-default'. Consider using a standard UUID library (like github.com/google/uuid) to generate unique and non-sequential IDs. You will need to add the dependency and import the package.

Suggested change

leaseID := fmt.Sprintf("lease-%d", rand.Intn(1_000_000))

leaseID := uuid.NewString()

gemini-code-assist · 2026-02-01T18:46:09Z

+	body, _ := json.Marshal(payload)
+	resp, err := http.Post(agent+"/v1/leases", "application/json", bytes.NewReader(body))
+	if err != nil {
+		fmt.Printf("error: %v\n", err)
+		return
+	}
+	defer resp.Body.Close()
+	data, _ := io.ReadAll(resp.Body)
+	fmt.Println(string(data))


Errors from json.Marshal and io.ReadAll are being ignored by using the blank identifier _. This is a bug and can hide issues like marshalling failures or problems reading the HTTP response body. These errors must be checked and handled properly.

body, err := json.Marshal(payload) if err != nil { fmt.Printf("error marshalling request: %v\n", err) return } resp, err := http.Post(agent+"/v1/leases", "application/json", bytes.NewReader(body)) if err != nil { fmt.Printf("error: %v\n", err) return } defer resp.Body.Close() data, err := io.ReadAll(resp.Body) if err != nil { fmt.Printf("error reading response: %v\n", err) return } fmt.Println(string(data))

gemini-code-assist · 2026-02-01T18:46:09Z

+	body, _ := json.Marshal(payload)
+	client := http.Client{Timeout: time.Second * 60}
+	resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body))
+	if err != nil {
+		fmt.Printf("error: %v\n", err)
+		return
+	}
+	defer resp.Body.Close()
+	data, _ := io.ReadAll(resp.Body)
+	fmt.Println(string(data))


Similar to the requestLease function, errors from json.Marshal and io.ReadAll are being ignored. This can lead to silent failures and makes debugging difficult. Always check and handle errors returned by these functions.

body, err := json.Marshal(payload) if err != nil { fmt.Printf("error marshalling request: %v\n", err) return } client := http.Client{Timeout: time.Second * 60} resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body)) if err != nil { fmt.Printf("error: %v\n", err) return } defer resp.Body.Close() data, err := io.ReadAll(resp.Body) if err != nil { fmt.Printf("error reading response: %v\n", err) return } fmt.Println(string(data))

gemini-code-assist · 2026-02-01T18:46:09Z

+      - uses: actions/setup-go@v5
+        with:
+          go-version: "1.22"


To improve the performance of the CI workflow, consider enabling caching for Go modules. The actions/setup-go action provides a simple way to do this by adding the cache: true option. This will speed up subsequent runs by caching dependencies.

- uses: actions/setup-go@v5 with: go-version: "1.22" cache: true

gemini-code-assist · 2026-02-01T18:46:09Z

+		if len(parts) < 3 {
+			continue
+		}
+		index, err := strconv.Atoi(strings.TrimSpace(parts[0]))
+		if err != nil {
+			continue
+		}
+		name := strings.TrimSpace(parts[1])
+		mem, err := strconv.Atoi(strings.TrimSpace(parts[2]))
+		if err != nil {
+			continue


The current implementation silently skips malformed lines from nvidia-smi. This could hide potential issues with the output format or the command itself. It would be beneficial to log these parsing errors to aid in debugging. You will need to import the log package.

Suggested change

if len(parts) < 3 {

continue

}

index, err := strconv.Atoi(strings.TrimSpace(parts[0]))

if err != nil {

continue

}

name := strings.TrimSpace(parts[1])

mem, err := strconv.Atoi(strings.TrimSpace(parts[2]))

if err != nil {

continue

if len(parts) < 3 {

log.Printf("skipping malformed line from nvidia-smi: %q", line)

continue

}

index, err := strconv.Atoi(strings.TrimSpace(parts[0]))

if err != nil {

log.Printf("failed to parse GPU index from %q: %v", parts[0], err)

continue

}

name := strings.TrimSpace(parts[1])

mem, err := strconv.Atoi(strings.TrimSpace(parts[2]))

if err != nil {

log.Printf("failed to parse GPU memory from %q: %v", parts[2], err)

continue

}

gemini-code-assist · 2026-02-01T18:46:09Z

+type GPU struct {
+	Index       int    `json:"index"`
+	Name        string `json:"name"`
+	MemoryTotal int    `json:"memory_total_mb"`
+}


The GPU struct is defined here and also in the aios-layer/agent/internal/gpu package. This duplication can lead to maintenance issues if the struct needs to be changed in the future. Consider defining this struct in a shared internal package or having the scheduler package import the gpu package to reuse the definition.

gemini-code-assist · 2026-02-01T18:46:10Z

+		metrics := "# HELP aios_leases Active leases\n# TYPE aios_leases gauge\n"
+		metrics += fmt.Sprintf("aios_leases %d\n", len(sched.ListLeases()))
+		w.Write([]byte(metrics))


Manually constructing the Prometheus metrics exposition format as a string is brittle and hard to maintain or extend. It's recommended to use the official Go client library for Prometheus (prometheus/client_golang). It provides types for metrics (like Gauges, Counters) and handles the formatting correctly, making your metrics implementation more robust.

gemini-code-assist · 2026-02-01T18:46:10Z

+type openAIRequest struct {
+	Model    string   `json:"model"`
+	Messages []string `json:"messages"`
+}


The openAIRequest struct is defined but never used in the code. It should be removed to improve code clarity and maintainability.

Add benchmarked test workflow for AIOS layer

28cf180

Igor Holt (igor-holt) added the codex label Feb 1, 2026 — with ChatGPT Codex Connector

gemini-code-assist Bot reviewed Feb 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add repeatable test+benchmark workflow, CI artifact upload, and scheduler benchmarks#15

Add repeatable test+benchmark workflow, CI artifact upload, and scheduler benchmarks#15
Igor Holt (igor-holt) wants to merge 1 commit into
mainfrom
codex/design-ai/gpu-os-layer-architecture-gxvxpb

Igor Holt (igor-holt) commented Feb 1, 2026

Uh oh!

gemini-code-assist Bot commented Feb 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	leaseID := fmt.Sprintf("lease-%d", rand.Intn(1_000_000))
	leaseID := uuid.NewString()

Conversation

Igor Holt (igor-holt) commented Feb 1, 2026

Motivation

Description

Testing

Uh oh!

gemini-code-assist Bot commented Feb 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant