Copilot/sub pr 16 by igor-holt · Pull Request #18 · Genesis-Conductor-Engine/Instinct

Igor Holt (igor-holt) · 2026-02-04T13:48:25Z

/

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Igor Holt <iholt@mymail.aacc.edu>

gemini-code-assist · 2026-02-04T13:49:11Z

Summary of Changes

Hello Igor Holt (@igor-holt), I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for the AIOS Layer, a system designed to provide a secure and reproducible environment for AI/GPU inference workloads. It encompasses the initial development of a Go-based host agent responsible for managing GPU resources through a lease-based scheduling mechanism and configurable policies. The changes also introduce a command-line interface for user interaction, along with extensive documentation and deployment artifacts to facilitate setup, operation, and future development.

Highlights

Core AIOS Agent Implementation: Introduced the aios-agent in Go, featuring GPU discovery, a lease-based scheduler, and policy enforcement for managing GPU resources.
Comprehensive Tooling and Documentation: Added a CLI (aiosctl), a basic web UI, detailed documentation (architecture, security, roadmap), and scripts for building, testing, and deploying the system (including Dockerfiles and systemd service).
CI/CD and Development Workflow: Established a GitHub Actions CI pipeline, a Makefile for common tasks, and contributing guidelines to streamline development.

Changelog

aios-layer/.github/workflows/ci.yml
- Added a GitHub Actions workflow to automate building, testing, and running checks for the 'aios-layer' component.
aios-layer/.gitignore
- Added entries to ignore compiled binaries (/bin) and test/benchmark reports (/reports).
aios-layer/CONTRIBUTING.md
- Introduced a contributing guide detailing development setup, code style requirements (gofmt), and pull request guidelines.
aios-layer/LICENSE
- Added the MIT License for the project, with a copyright year of 2025.
aios-layer/Makefile
- Created a Makefile to simplify common development operations such as build, test, bench, and run.
aios-layer/README.md
- Provided a comprehensive README for the AIOS Layer MVP, including a project overview, 10-minute quickstart guide, architecture summary, NVIDIA GPU integration details, security notes, and testing instructions.
aios-layer/SECURITY.md
- Established a security policy document covering vulnerability reporting, supported versions, and hardening recommendations for the aios-agent.
aios-layer/agent/internal/config/config.go
- Implemented Go structures and a loading function for managing agent configuration from YAML files, covering server, policy, scheduler, runtime, model, and metrics settings.
aios-layer/agent/internal/gpu/discovery.go
- Developed GPU discovery logic in Go, utilizing nvidia-smi to identify available GPUs and parse their details.
aios-layer/agent/internal/policy/policy.go
- Defined a policy engine in Go to validate GPU lease requests based on user quotas and maximum duration.
aios-layer/agent/internal/runtime/docker.go
- Created a Docker runtime launcher in Go to start model server containers with specified GPU access.
aios-layer/agent/internal/scheduler/scheduler.go
- Implemented a GPU scheduler in Go to manage exclusive GPU leases, track active leases, and automatically reap expired ones.
aios-layer/agent/internal/scheduler/scheduler_bench_test.go
- Added Go benchmark tests for the scheduler's CreateLease and ReapExpired functions to measure performance.
aios-layer/agent/internal/scheduler/scheduler_test.go
- Included Go unit tests for the scheduler, verifying exclusive GPU assignment and the correct reaping of expired leases.
aios-layer/agent/main.go
- Developed the main aios-agent application, which serves HTTP API endpoints for health checks, GPU inventory, lease management (create, list, delete), and Prometheus-compatible metrics.
aios-layer/cli/main.go
- Introduced aiosctl, a command-line interface tool for users to request GPU leases and send inference prompts to the model server.
aios-layer/config/aios.yaml
- Provided a default YAML configuration file for the aios-agent, specifying network settings, policy limits, Docker runtime parameters, and metrics enablement.
aios-layer/deploy/aios-agent.service
- Added a systemd service unit file to enable running the aios-agent as a background service with defined security contexts.
aios-layer/deploy/docker-compose.yml
- Created a Docker Compose file for easily deploying the aios-agent alongside a vllm model server for local development and demonstration.
aios-layer/deploy/security/apparmor.profile
- Included an AppArmor profile to enhance the security of the aios-agent by restricting its system capabilities.
aios-layer/deploy/security/seccomp.json
- Added a seccomp profile to limit the system calls available to the aios-agent, further improving its security posture.
aios-layer/docker/agent.Dockerfile
- Provided a Dockerfile for building a lightweight Docker image for the aios-agent.
aios-layer/docker/vllm.Dockerfile
- Provided a Dockerfile for building a vllm model server Docker image, configured with a default model.
aios-layer/docs/architecture.md
- Documented the architectural overview of the AIOS Layer, distinguishing between control and data planes, trust boundaries, and the policy model.
aios-layer/docs/nvidia_setup.md
- Provided a detailed guide for setting up NVIDIA drivers and the container toolkit on Ubuntu 24.04.
aios-layer/docs/operations.md
- Documented operational aspects including observability (logs, metrics), update strategies, and the plan for signed releases.
aios-layer/docs/roadmap.md
- Outlined the project roadmap, detailing plans for MVP, v1.0, multi-node support, and future platform/hardware integrations.
aios-layer/docs/threat_model.md
- Presented a threat model for the MVP, identifying potential threats, their risks, and proposed mitigations.
aios-layer/go.mod
- Initialized the Go module for the aios-layer project and declared its dependencies.
aios-layer/go.sum
- Generated the Go module checksums for dependency verification.
aios-layer/scripts/benchmark.sh
- Added a shell script to benchmark inference latency and monitor GPU utilization.
aios-layer/scripts/check_prereqs.sh
- Created a script to verify the presence of necessary prerequisites like Docker and nvidia-smi.
aios-layer/scripts/install.sh
- Provided an installation script to deploy the aios-agent as a systemd service.
aios-layer/scripts/run_checks.sh
- Developed a script to execute all Go tests and benchmarks, directing their output to report files.
aios-layer/scripts/run_demo.sh
- Created a script to set up and run the AIOS Layer demo environment using Docker Compose, including building necessary images.
aios-layer/tests/integration/demo.sh
- Added an integration test script to validate the functionality of the running aios-agent and model server by making API calls.
aios-layer/ui/index.html
- Developed a simple web-based dashboard to display GPU inventory and active leases from the aios-agent.

Activity

No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	Gemini (@gemini-code-assist)	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on Gemini (@gemini-code-assist) comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the aios-layer project, a comprehensive system for managing AI/GPU resources. It includes a Go agent, a CLI, Docker configurations, and documentation. The overall structure is well-organized. My review focuses on improving correctness, robustness, and maintainability. Key areas for improvement include fixing a critical Docker build issue, correcting flawed benchmark tests, improving error handling in the CLI, and addressing potential race conditions and configuration oversights in the agent.

gemini-code-assist · 2026-02-04T13:51:54Z

+COPY go.mod ./
+RUN go mod download
+COPY agent ./agent
+RUN cd agent && go build -o /bin/aios-agent


The command cd agent && go build is incorrect. The go build command should be run from the WORKDIR (/src) where go.mod is located. The current command will likely fail the Docker build because the Go toolchain won't be able to resolve the module paths correctly from within the agent subdirectory.

RUN go build -o /bin/aios-agent ./agent

gemini-code-assist · 2026-02-04T13:51:54Z

+func BenchmarkCreateLease(b *testing.B) {
+	gpus := []GPU{{Index: 0, Name: "gpu0", MemoryTotal: 1000}, {Index: 1, Name: "gpu1", MemoryTotal: 1000}}
+	for i := 0; i < b.N; i++ {
+		sched := New(gpus)
+		_, _ = sched.CreateLease("lease", "user", time.Second)
+	}
+}
+
+func BenchmarkReapExpired(b *testing.B) {
+	gpus := []GPU{{Index: 0, Name: "gpu0", MemoryTotal: 1000}}
+	for i := 0; i < b.N; i++ {
+		sched := New(gpus)
+		_, _ = sched.CreateLease("lease", "user", time.Nanosecond)
+		sched.ReapExpired()
+	}
+}


The benchmarks in this file are flawed. They re-create the scheduler inside the benchmark loop (for i := 0; i < b.N; i++), which means the benchmark is measuring the setup cost (New(gpus)) on every iteration, not just the function under test. Additionally, BenchmarkCreateLease will start benchmarking the error path after available GPUs are leased. Benchmarks should isolate the code being measured and correctly handle state.

func BenchmarkCreateLease(b *testing.B) { gpus := []GPU{{Index: 0, Name: "gpu0", MemoryTotal: 1000}} sched := New(gpus) b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { lease, _ := sched.CreateLease("lease", "user", time.Second) sched.Release(lease.ID) } } func BenchmarkReapExpired(b *testing.B) { gpus := []GPU{{Index: 0, Name: "gpu0", MemoryTotal: 1000}} sched := New(gpus) _, _ = sched.CreateLease("lease", "user", time.Nanosecond) time.Sleep(time.Millisecond) // Ensure lease is expired b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { sched.ReapExpired() } }

gemini-code-assist · 2026-02-04T13:51:54Z

+build:
+	go build -o bin/aios-agent ./agent
+	go build -o bin/aiosctl ./cli


The build target attempts to write binaries to the bin/ directory, but it doesn't ensure this directory exists first. If the bin/ directory is missing, the go build command will fail.

build: mkdir -p bin go build -o bin/aios-agent ./agent go build -o bin/aiosctl ./cli

gemini-code-assist · 2026-02-04T13:51:54Z

+func sendInference(endpoint, model, prompt string) {
+	payload := map[string]any{
+		"model": model,
+		"messages": []map[string]string{
+			{"role": "user", "content": prompt},
+		},
+		"max_tokens": 64,
+		"temperature": 0.2,
+	}
+	body, _ := json.Marshal(payload)
+	client := http.Client{Timeout: time.Second * 60}
+	resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body))
+	if err != nil {
+		fmt.Printf("error: %v\n", err)
+		return
+	}
+	defer resp.Body.Close()
+	data, _ := io.ReadAll(resp.Body)
+	fmt.Println(string(data))
+}


Errors from json.Marshal and io.ReadAll are being ignored. This can hide bugs and lead to unexpected behavior. For a CLI tool, it's crucial to check all errors and report them to the user, preferably on stderr.

func sendInference(endpoint, model, prompt string) { payload := map[string]any{ "model": model, "messages": []map[string]string{ {"role": "user", "content": prompt}, }, "max_tokens": 64, "temperature": 0.2, } body, err := json.Marshal(payload) if err != nil { fmt.Fprintf(os.Stderr, "error: %v\n", err) return } client := http.Client{Timeout: time.Second * 60} resp, err := client.Post(endpoint, "application/json", bytes.NewReader(body)) if err != nil { fmt.Fprintf(os.Stderr, "error: %v\n", err) return } defer resp.Body.Close() data, err := io.ReadAll(resp.Body) if err != nil { fmt.Fprintf(os.Stderr, "error: %v\n", err) return } fmt.Println(string(data)) }

gemini-code-assist · 2026-02-04T13:51:54Z

+func requestLease(agent, user string, duration int, launch bool) {
+	payload := leaseRequest{User: user, DurationSeconds: duration, LaunchRuntime: launch}
+	body, _ := json.Marshal(payload)
+	resp, err := http.Post(agent+"/v1/leases", "application/json", bytes.NewReader(body))
+	if err != nil {
+		fmt.Printf("error: %v\n", err)
+		return
+	}
+	defer resp.Body.Close()
+	data, _ := io.ReadAll(resp.Body)
+	fmt.Println(string(data))
+}


Errors from json.Marshal and io.ReadAll are being ignored. This can hide bugs and lead to unexpected behavior. For a CLI tool, it's crucial to check all errors and report them to the user, preferably on stderr.

func requestLease(agent, user string, duration int, launch bool) { payload := leaseRequest{User: user, DurationSeconds: duration, LaunchRuntime: launch} body, err := json.Marshal(payload) if err != nil { fmt.Fprintf(os.Stderr, "error: %v\n", err) return } resp, err := http.Post(agent+"/v1/leases", "application/json", bytes.NewReader(body)) if err != nil { fmt.Fprintf(os.Stderr, "error: %v\n", err) return } defer resp.Body.Close() data, err := io.ReadAll(resp.Body) if err != nil { fmt.Fprintf(os.Stderr, "error: %v\n", err) return } fmt.Println(string(data)) }

gemini-code-assist · 2026-02-04T13:51:55Z

+type openAIRequest struct {
+	Model    string   `json:"model"`
+	Messages []string `json:"messages"`
+}


The openAIRequest struct is defined but never used in the code. It should be removed to avoid dead code.

gemini-code-assist · 2026-02-04T13:51:55Z

+func (p Policy) ValidateRequest(existingLeases int, requestedDuration int) error {
+	if p.MaxGPUsPerUser > 0 && existingLeases >= p.MaxGPUsPerUser {
+		return errors.New("gpu quota exceeded")
+	}
+	if p.MaxDurationSec > 0 && requestedDuration > p.MaxDurationSec {
+		return errors.New("requested duration exceeds policy")
+	}
+	if requestedDuration <= 0 {
+		return errors.New("invalid duration")
+	}
+	return nil
+}


The ValidateRequest function returns errors created with errors.New(). It's a better practice to define these as exported error variables. This allows callers to use errors.Is() to check for specific error types and handle them programmatically.

Suggested change

func (p Policy) ValidateRequest(existingLeases int, requestedDuration int) error {

if p.MaxGPUsPerUser > 0 && existingLeases >= p.MaxGPUsPerUser {

return errors.New("gpu quota exceeded")

}

if p.MaxDurationSec > 0 && requestedDuration > p.MaxDurationSec {

return errors.New("requested duration exceeds policy")

}

if requestedDuration <= 0 {

return errors.New("invalid duration")

}

return nil

}

var (

ErrQuotaExceeded = errors.New("gpu quota exceeded")

ErrDurationExceedsPolicy = errors.New("requested duration exceeds policy")

ErrInvalidDuration = errors.New("invalid duration")

)

func (p Policy) ValidateRequest(existingLeases int, requestedDuration int) error {

if p.MaxGPUsPerUser > 0 && existingLeases >= p.MaxGPUsPerUser {

return ErrQuotaExceeded

}

if p.MaxDurationSec > 0 && requestedDuration > p.MaxDurationSec {

return ErrDurationExceedsPolicy

}

if requestedDuration <= 0 {

return ErrInvalidDuration

}

return nil

}

gemini-code-assist · 2026-02-04T13:51:55Z

+      - uses: actions/setup-go@v5
+        with:
+          go-version: "1.22"
+      - run: go mod download


The CI workflow can be optimized by caching Go modules to speed up subsequent builds. Consider adding a caching step for ~/go/pkg/mod and the Go build cache before the go mod download step. You can use actions/cache for this.

gemini-code-assist · 2026-02-04T13:51:55Z

+	if err != nil {
+		return nil, errors.New("nvidia-smi not available")
+	}


The error returned when nvidia-smi is not found loses the original error context. Wrapping the error provides more detailed information for debugging, such as why the command failed to run (e.g., not in PATH).

return nil, fmt.Errorf("nvidia-smi not available: %w", err)

gemini-code-assist · 2026-02-04T13:51:55Z

+sleep 5
+curl -s http://127.0.0.1:8080/v1/health


Using a fixed sleep is a fragile way to wait for a service to become available. It can fail on slower systems or under load. It's more robust to use a loop that polls the health endpoint until it succeeds or a timeout is reached.

Suggested change

sleep 5

curl -s http://127.0.0.1:8080/v1/health

echo "Waiting for aios-agent to be healthy..."

for i in {1..10}; do

if curl -s http://127.0.0.1:8080/v1/health | grep -q "ok"; then

echo "aios-agent is healthy."

break

fi

if [ "$i" -eq 10 ]; then

echo "aios-agent failed to start in time." >&2

exit 1

fi

sleep 1

done

curl -s http://127.0.0.1:8080/v1/health

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Igor Holt (igor-holt) and others added 3 commits February 1, 2026 13:42

Add benchmarked test workflow for AIOS layer

7e1771d

Update aios-layer/agent/internal/scheduler/scheduler.go

f4bf774

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Igor Holt <iholt@mymail.aacc.edu>

Initial plan

049ef35

Igor Holt (igor-holt) requested a review from Copilot February 4, 2026 13:48

Igor Holt (igor-holt) assigned Copilot Feb 4, 2026

gemini-code-assist Bot reviewed Feb 4, 2026

View reviewed changes

Copilot AI reviewed Feb 4, 2026

View reviewed changes

-sleep 5
-curl -s http://127.0.0.1:8080/v1/health
+echo "Waiting for aios-agent to be healthy..."
+for i in {1..10}; do
+    if curl -s http://127.0.0.1:8080/v1/health | grep -q "ok"; then
+        echo "aios-agent is healthy."
+        break
+    fi
+    if [ "$i" -eq 10 ]; then
+        echo "aios-agent failed to start in time." >&2
+        exit 1
+    fi
+    sleep 1
+done
+curl -s http://127.0.0.1:8080/v1/health

Conversation

Igor Holt (igor-holt) commented Feb 4, 2026

Uh oh!

gemini-code-assist Bot commented Feb 4, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants