Benchmark load balancing #4

atoniolo76 · 2026-01-29T02:18:47Z

Create infrastructure for clusters, load balancing, tokenizers and more. Overhaul the existing db system with stateless calls from lambda's backend. Add a bunch of commands for interacting with various components specific to GORGO load balancing. Add policies such as prefix-tree and least-load for inference users. Integrate @Orgo as a provider.

…g gotoni binary on each instance using scp

…ancer.go to use sglang_metrics.go for updating availability state, remove WrapHandler to make Handler() flow more streamlined, update lb.go, and remove claude bloat.

…uest into trie before execution [ ] network policy can use the existing metric polling with timestamps to get and update real time inter-node latency metrics

… TestClusterSGLang test

…istral-7b for non-gated model serving

…er with go->rust bindings. Integrated GetTokenCount into new GORGO Policy for loadbalancer.go. Added FlushSGLangCache() in sglang_metrics.go for use in accurate logistic regression of server response times varying with token count from 100 requests in wildchat_flat.jsonl

…r tokenizer_cgo.go compilation on Linux, and include tokenizer test directly in TestClusterSGLang

… when running TestClusterSGLang. Added endpoint in loadbalancer.go for testing tokenizer functionality

…ter_test.go mods and new testing script.

…glang metrics and loadbalancer event logs

…rest cluster endpoint.

…oard

…the load balancer endpoint for metrics

…and updated project structure

…o76/gotoni into benchmark-load-balancing

…ancer/delete binary before reuploading gotoni binary

…o76/gotoni into benchmark-load-balancing Merge changes into benchmark-load-balancing

…o76/gotoni into benchmark-load-balancing hh

…o76/gotoni into benchmark-load-balancing

…o76/gotoni into benchmark-load-balancing dfkj.

Copilot

Pull request overview

This PR introduces a comprehensive load balancing infrastructure for distributed GPU cluster management. It adds event-driven tracing (Perfetto-compatible), tokenizer sidecar integration (Rust), SGLang cluster management, and multiple load balancing policies (GORGO, prefix-tree, least-loaded). The PR also refactors instance state management to use Lambda API as the source of truth instead of local database storage.

Changes:

Added distributed tracing system with Perfetto format support
Implemented Rust tokenizer sidecar with Unix socket communication
Created cluster management infrastructure for SGLang deployments
Added load balancing policies and configuration system
Removed local DB dependency for instance state, using Lambda API instead
Added comprehensive benchmarking tools using WildChat-1M dataset

Reviewed changes

Copilot reviewed 39 out of 42 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
pkg/trace/trace.go	New event-driven tracing system (Perfetto format)
pkg/cluster/trace.go	Duplicate of pkg/trace/trace.go
pkg/tokenizer/*.go	Tokenizer sidecar management and client
pkg/tokenizer/main.rs	Rust tokenizer service with Unix socket
pkg/sglang/sglang.go	SGLang Docker deployment and diagnostics
pkg/cluster/*.go	Cluster management, metrics, build/deploy
pkg/config/constants.go	Centralized configuration constants
pkg/remote/client.go	Refactored SSH key lookup from Lambda API
cmd/*.go	New commands for cluster, tokenizer, LB management
benchmark/*.py	Geo-distributed benchmarking tools
go.mod	Added tokenizers dependency (with conflicts)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-29T02:23:54Z

pkg/tokenizer/main.rs

+        .expect("Failed to download tokenizer.json");
+
+    if !response.status().is_success() {
+        panic!("Failed to download tokenizer: HTTP {}", response.status());
+    }
+
+    let bytes = response.bytes().await
+        .expect("Failed to read tokenizer bytes");
+
+    // Cache the tokenizer
+    std::fs::write(TOKENIZER_CACHE_PATH, &bytes)
+        .expect("Failed to cache tokenizer");
+
+    println!("Tokenizer cached to: {}", TOKENIZER_CACHE_PATH);
+
+    Tokenizer::from_bytes(&bytes)
+        .expect("Failed to parse tokenizer.json")
+}
+
+#[tokio::main]
+async fn main() {
+    println!("Loading tokenizer model: {}", MODEL_ID);
+
+    // Load the Mistral tokenizer
+    let tokenizer: Arc<Tokenizer> = Arc::new(load_or_download_tokenizer().await);
+    println!("Tokenizer loaded successfully");
+
+    // Build the router with shared state
+    let tk_for_count = Arc::clone(&tokenizer);
+
+    let app = Router::new()
+        .route("/count", post(move |Json(payload): Json<TokenizeRequest>| {
+            let tk: Arc<Tokenizer> = Arc::clone(&tk_for_count);
+            async move {
+                let encoding = tk.encode(payload.text, false).expect("Tokenization failed");
+                Json(TokenizeResponse {
+                    count: encoding.get_ids().len(),
+                })
+            }
+        }))
+        .route("/health", get(|| async {
+            Json(HealthResponse {
+                status: "ok".to_string(),
+                model: MODEL_ID.to_string(),
+            })
+        }));
+
+    // Unix Domain Socket path
+    let socket_path = "/tmp/tokenizer.sock";
+
+    // Clean up old socket if it exists
+    let _ = std::fs::remove_file(socket_path);
+
+    println!("Starting tokenizer sidecar on {}", socket_path);
+
+    let listener = UnixListener::bind(socket_path).expect("Failed to bind to socket");
+
+    // Make socket world-readable/writable
+    std::fs::set_permissions(socket_path, std::fs::Permissions::from_mode(0o777))
+        .expect("Failed to set socket permissions");
+
+    // Serve using hyper with Unix socket
+    loop {
+        let (stream, _) = listener.accept().await.expect("Accept failed");


The tokenizer sidecar uses expect() for error handling in several places (lines 52, 59, 63, 68, 86, 107, 111, 115), which will panic the entire process if the operation fails. For a production service, these should use proper error handling with ? operator or unwrap_or_else() with logging to allow graceful degradation rather than crashing.

Copilot · 2026-01-29T02:23:55Z

pkg/remote/client.go

+// RemoveInstanceFromConfig removes an instance from the config
+// Deprecated: Instance state is now managed by Lambda API, not local DB.
+// This function is kept for backwards compatibility but does nothing meaningful.
+func RemoveInstanceFromConfig(instanceID string) error {
+	// Instance state is managed by Lambda API, not local DB
+	// No action needed - when an instance is terminated via Lambda API,
+	// it will no longer appear in ListRunningInstances
+	return nil
 }


The RemoveInstanceFromConfig function is deprecated but returns no error and does nothing. Callers expecting this function to clean up instance state will silently fail. Consider either removing this function entirely or returning an error indicating it's no longer supported to make the deprecation explicit.

Copilot · 2026-01-29T02:23:55Z

pkg/cluster/tokenizer.go

+	tokenizerMu.RLock()
+	available := tokenizerAvailable
+	tokenizerMu.RUnlock()
+
+	if !available {
+		// Try to reconnect periodically
+		go func() {
+			tokenizerMu.Lock()
+			defer tokenizerMu.Unlock()
+			if !tokenizerAvailable {
+				tokenizerAvailable = checkTokenizerHealth()
+			}
+		}()
+		return fallbackTokenCount(text)


The pkg/cluster/tokenizer.go file has a potential race condition. The tokenizerAvailable flag is read without lock on line 94 and checked again on line 102-103, but another goroutine could modify it between these checks. The reconnection goroutine spawned on line 99 should be synchronized to prevent multiple concurrent reconnection attempts.

Copilot · 2026-01-29T02:23:55Z

pkg/cluster/trace.go

+/*
+Copyright © 2025 ALESSIO TONIOLO
+
+trace.go provides event-driven tracing for the load balancer.
+
+Design principles (inspired by PyTorch profiler):
+- NOT constantly running - only active when explicitly started
+- Event-driven with hooks inserted at key points
+- Produces Perfetto-compatible traces (Chrome Trace Format)
+- Supports distributed tracing across nodes via header propagation
+- Minimal overhead when disabled
+
+Usage:
+ 1. Start trace via endpoint: POST /lb/trace/start
+ 2. Run workload (benchmark)
+ 3. Stop trace via endpoint: POST /lb/trace/stop
+ 4. Returns Perfetto JSON that can be loaded in chrome://tracing or ui.perfetto.dev
+*/
+package serve
+
+import (
+	"encoding/json"
+	"fmt"
+	"sync"
+	"sync/atomic"
+	"time"
+)
+
+// TraceEvent represents a single event in Chrome Trace Format (Perfetto-compatible)
+// See: https://docs.google.com/document/d/1CvAClvFfyA5R-PhYUmn5OOQtYMH4h6I0nSsKchNAySU
+type TraceEvent struct {
+	Name      string                 `json:"name"`           // Event name
+	Cat       string                 `json:"cat"`            // Category (e.g., "lb", "forward", "queue")
+	Ph        string                 `json:"ph"`             // Phase: B=begin, E=end, X=complete, i=instant
+	Ts        int64                  `json:"ts"`             // Timestamp in microseconds
+	Dur       int64                  `json:"dur,omitempty"`  // Duration in microseconds (for ph=X)
+	Pid       int                    `json:"pid"`            // Process ID (node ID hash)
+	Tid       int                    `json:"tid"`            // Thread ID (request ID hash)
+	Args      map[string]interface{} `json:"args,omitempty"` // Additional metadata
+	ID        string                 `json:"id,omitempty"`   // For async events (flow)
+	Scope     string                 `json:"s,omitempty"`    // Scope for flow events
+	BindingPt string                 `json:"bp,omitempty"`   // Binding point for flow events
+}
+
+// TraceSession manages an active tracing session
+type TraceSession struct {
+	ID        string
+	StartTime time.Time
+	NodeID    string
+
+	events   []TraceEvent
+	eventsMu sync.Mutex
+
+	// Atomic flag for fast "is tracing active" check
+	active atomic.Bool
+
+	// For distributed tracing - events received from other nodes
+	remoteEvents   []TraceEvent
+	remoteEventsMu sync.Mutex
+}
+
+// Tracer is the global tracer instance
+type Tracer struct {
+	session   *TraceSession
+	sessionMu sync.RWMutex
+	nodeID    string
+	nodePID   int // Hash of node ID for Perfetto PID
+}
+
+// NewTracer creates a new tracer instance
+func NewTracer(nodeID string) *Tracer {
+	return &Tracer{
+		nodeID:  nodeID,
+		nodePID: hashString(nodeID) % 10000, // Keep PID small for readability
+	}
+}
+
+// IsActive returns true if tracing is currently active (fast path)
+func (t *Tracer) IsActive() bool {
+	t.sessionMu.RLock()
+	defer t.sessionMu.RUnlock()
+	return t.session != nil && t.session.active.Load()
+}
+
+// Start begins a new tracing session
+func (t *Tracer) Start() string {
+	t.sessionMu.Lock()
+	defer t.sessionMu.Unlock()
+
+	sessionID := fmt.Sprintf("trace-%d", time.Now().UnixNano())
+	t.session = &TraceSession{
+		ID:        sessionID,
+		StartTime: time.Now(),
+		NodeID:    t.nodeID,
+		events:    make([]TraceEvent, 0, 1000),
+	}
+	t.session.active.Store(true)
+
+	// Record trace start event
+	t.session.events = append(t.session.events, TraceEvent{
+		Name: "trace_session",
+		Cat:  "trace",
+		Ph:   "B",
+		Ts:   0, // Relative to trace start
+		Pid:  t.nodePID,
+		Tid:  0,
+		Args: map[string]interface{}{
+			"session_id": sessionID,
+			"node_id":    t.nodeID,
+		},
+	})
+
+	return sessionID
+}
+
+// Stop ends the tracing session and returns the Perfetto JSON
+func (t *Tracer) Stop() ([]byte, error) {
+	t.sessionMu.Lock()
+	defer t.sessionMu.Unlock()
+
+	if t.session == nil {
+		return nil, fmt.Errorf("no active trace session")
+	}
+
+	t.session.active.Store(false)
+
+	// Record trace end event
+	endTs := time.Since(t.session.StartTime).Microseconds()
+	t.session.events = append(t.session.events, TraceEvent{
+		Name: "trace_session",
+		Cat:  "trace",
+		Ph:   "E",
+		Ts:   endTs,
+		Pid:  t.nodePID,
+		Tid:  0,
+	})
+
+	// Merge remote events
+	t.session.eventsMu.Lock()
+	t.session.remoteEventsMu.Lock()
+	allEvents := make([]TraceEvent, 0, len(t.session.events)+len(t.session.remoteEvents))
+	allEvents = append(allEvents, t.session.events...)
+	allEvents = append(allEvents, t.session.remoteEvents...)
+	t.session.remoteEventsMu.Unlock()
+	t.session.eventsMu.Unlock()
+
+	// Build Perfetto trace - just the events array for direct compatibility
+	// Perfetto/Chrome Trace Viewer expects a plain array or {"traceEvents": [...]}
+	// We output just the array for maximum compatibility
+
+	// Add metadata as a special "M" (metadata) event at the start
+	metadataEvent := TraceEvent{
+		Name: "process_name",
+		Cat:  "__metadata",
+		Ph:   "M",
+		Ts:   0,
+		Pid:  t.nodePID,
+		Tid:  0,
+		Args: map[string]interface{}{
+			"name": t.nodeID,
+		},
+	}
+
+	// Prepend metadata event
+	finalEvents := make([]TraceEvent, 0, len(allEvents)+1)
+	finalEvents = append(finalEvents, metadataEvent)
+	finalEvents = append(finalEvents, allEvents...)
+
+	// Clear session
+	t.session = nil
+
+	return json.MarshalIndent(finalEvents, "", "  ")
+}
+
+// GetStatus returns current trace status
+func (t *Tracer) GetStatus() map[string]interface{} {
+	t.sessionMu.RLock()
+	defer t.sessionMu.RUnlock()
+
+	if t.session == nil {
+		return map[string]interface{}{
+			"active":  false,
+			"node_id": t.nodeID,
+		}
+	}
+
+	t.session.eventsMu.Lock()
+	eventCount := len(t.session.events)
+	t.session.eventsMu.Unlock()
+
+	return map[string]interface{}{
+		"active":      t.session.active.Load(),
+		"session_id":  t.session.ID,
+		"node_id":     t.nodeID,
+		"duration_ms": float64(time.Since(t.session.StartTime).Microseconds()) / 1000.0,
+		"event_count": eventCount,
+		"start_time":  t.session.StartTime.Format(time.RFC3339),
+	}
+}
+
+// =====================================================
+// TRACE HOOKS - Insert these at key points in code
+// =====================================================
+
+// TraceRequestReceived records when a request arrives at the LB
+func (t *Tracer) TraceRequestReceived(requestID string, method, path, remoteAddr string) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "request",
+		Cat:  "lb",
+		Ph:   "B", // Begin
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":  requestID,
+			"method":      method,
+			"path":        path,
+			"remote_addr": remoteAddr,
+		},
+	})
+}
+
+// TraceRequestComplete records when a request finishes
+func (t *Tracer) TraceRequestComplete(requestID string, statusCode int, servedBy string) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "request",
+		Cat:  "lb",
+		Ph:   "E", // End
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":  requestID,
+			"status_code": statusCode,
+			"served_by":   servedBy,
+		},
+	})
+}
+
+// TraceQueueEnter records when a request enters the queue
+func (t *Tracer) TraceQueueEnter(requestID string, queueDepth int) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "queue",
+		Cat:  "queue",
+		Ph:   "B",
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":  requestID,
+			"queue_depth": queueDepth,
+			"action":      "enqueue",
+		},
+	})
+}
+
+// TraceQueueExit records when a request leaves the queue
+func (t *Tracer) TraceQueueExit(requestID string, waitTimeMs float64, exitReason string) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "queue",
+		Cat:  "queue",
+		Ph:   "E",
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":   requestID,
+			"wait_time_ms": waitTimeMs,
+			"exit_reason":  exitReason, // "local_process", "forwarded_to_peer", "timeout", "cancelled"
+		},
+	})
+}
+
+// TraceQueueForwardAttempt records when the queue processor attempts to forward a request
+func (t *Tracer) TraceQueueForwardAttempt(requestID string, queueDepth int, peerSelected bool, targetPeer string) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "queue_forward_attempt",
+		Cat:  "queue",
+		Ph:   "i", // Instant event
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":    requestID,
+			"queue_depth":   queueDepth,
+			"peer_selected": peerSelected,
+			"target_peer":   targetPeer,
+		},
+	})
+}
+
+// TraceQueueProbe records when the queue probe runs
+func (t *Tracer) TraceQueueProbe(queueDepth int, peersAvailable int, forwarded int) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "queue_probe",
+		Cat:  "queue",
+		Ph:   "i", // Instant event
+		Tid:  0,   // Use thread 0 for system events
+		Args: map[string]interface{}{
+			"queue_depth":     queueDepth,
+			"peers_available": peersAvailable,
+			"forwarded":       forwarded,
+		},
+	})
+}
+
+// TraceForwardStart records when we start forwarding to a peer
+// Returns a flow ID for stitching with remote trace
+func (t *Tracer) TraceForwardStart(requestID, targetNode, targetIP string) string {
+	if !t.IsActive() {
+		return ""
+	}
+
+	flowID := fmt.Sprintf("%s-%s", requestID, targetNode)
+
+	// Record the forward span start
+	t.recordEvent(TraceEvent{
+		Name: "forward_to_peer",
+		Cat:  "forward",
+		Ph:   "B",
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":  requestID,
+			"target_node": targetNode,
+			"target_ip":   targetIP,
+			"flow_id":     flowID,
+		},
+	})
+
+	// Record flow event (arrow in Perfetto)
+	t.recordEvent(TraceEvent{
+		Name:  "forward_flow",
+		Cat:   "forward",
+		Ph:    "s", // Flow start
+		ID:    flowID,
+		Tid:   hashString(requestID) % 1000,
+		Scope: "forward",
+	})
+
+	return flowID
+}
+
+// TraceForwardEnd records when forwarding completes
+func (t *Tracer) TraceForwardEnd(requestID string, statusCode int, durationMs float64) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "forward_to_peer",
+		Cat:  "forward",
+		Ph:   "E",
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":  requestID,
+			"status_code": statusCode,
+			"duration_ms": durationMs,
+		},
+	})
+}
+
+// TraceLocalProcess records local SGLang processing
+func (t *Tracer) TraceLocalProcessStart(requestID string) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "local_sglang",
+		Cat:  "process",
+		Ph:   "B",
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id": requestID,
+		},
+	})
+}
+
+func (t *Tracer) TraceLocalProcessEnd(requestID string, statusCode int) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "local_sglang",
+		Cat:  "process",
+		Ph:   "E",
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":  requestID,
+			"status_code": statusCode,
+		},
+	})
+}
+
+// TracePolicyDecision records a policy decision (instant event)
+func (t *Tracer) TracePolicyDecision(requestID, policy, decision string, metadata map[string]interface{}) {
+	if !t.IsActive() {
+		return
+	}
+
+	args := map[string]interface{}{
+		"request_id": requestID,
+		"policy":     policy,
+		"decision":   decision,
+	}
+	for k, v := range metadata {
+		args[k] = v
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "policy_decision",
+		Cat:  "policy",
+		Ph:   "i", // Instant event
+		Tid:  hashString(requestID) % 1000,
+		Args: args,
+	})
+}
+
+// TraceCapacityCheck records capacity check result
+func (t *Tracer) TraceCapacityCheck(requestID string, hasCapacity bool, runningReqs, waitingReqs int) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "capacity_check",
+		Cat:  "lb",
+		Ph:   "i",
+		Tid:  hashString(requestID) % 1000,
+		Args: map[string]interface{}{
+			"request_id":   requestID,
+			"has_capacity": hasCapacity,
+			"running_reqs": runningReqs,
+			"waiting_reqs": waitingReqs,
+		},
+	})
+}
+
+// TraceTokenization records how long tokenization took (critical for GORGO overhead)
+func (t *Tracer) TraceTokenization(requestID string, textLen, tokenCount int, durationUs int64) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "tokenize",
+		Cat:  "gorgo",
+		Ph:   "X", // Complete event with duration
+		Tid:  hashString(requestID) % 1000,
+		Dur:  durationUs,
+		Args: map[string]interface{}{
+			"request_id":  requestID,
+			"text_len":    textLen,
+			"token_count": tokenCount,
+			"duration_us": durationUs,
+		},
+	})
+}
+
+// TraceGORGODecision records GORGO's routing decision with cost calculation
+func (t *Tracer) TraceGORGODecision(requestID string, decision string, metadata map[string]interface{}) {
+	if !t.IsActive() {
+		return
+	}
+
+	args := map[string]interface{}{
+		"request_id": requestID,
+		"decision":   decision,
+	}
+	for k, v := range metadata {
+		args[k] = v
+	}
+
+	t.recordEvent(TraceEvent{
+		Name: "gorgo_decision",
+		Cat:  "gorgo",
+		Ph:   "i",
+		Tid:  hashString(requestID) % 1000,
+		Args: args,
+	})
+}
+
+// =====================================================
+// DISTRIBUTED TRACING - Receive events from other nodes
+// =====================================================
+
+// ReceiveRemoteEvents merges trace events from a forwarded request
+func (t *Tracer) ReceiveRemoteEvents(events []TraceEvent) {
+	if !t.IsActive() {
+		return
+	}
+
+	t.sessionMu.RLock()
+	session := t.session
+	t.sessionMu.RUnlock()
+
+	if session == nil {
+		return
+	}
+
+	session.remoteEventsMu.Lock()
+	session.remoteEvents = append(session.remoteEvents, events...)
+	session.remoteEventsMu.Unlock()
+}
+
+// GetTraceHeaders returns headers to propagate trace context
+func (t *Tracer) GetTraceHeaders(requestID string) map[string]string {
+	if !t.IsActive() {
+		return nil
+	}
+
+	t.sessionMu.RLock()
+	defer t.sessionMu.RUnlock()
+
+	if t.session == nil {
+		return nil
+	}
+
+	return map[string]string{
+		"X-Trace-ID":      t.session.ID,
+		"X-Trace-Request": requestID,
+		"X-Trace-Origin":  t.nodeID,
+	}
+}
+
+// =====================================================
+// INTERNAL HELPERS
+// =====================================================
+
+func (t *Tracer) recordEvent(event TraceEvent) {
+	t.sessionMu.RLock()
+	session := t.session
+	t.sessionMu.RUnlock()
+
+	if session == nil || !session.active.Load() {
+		return
+	}
+
+	// Set timestamp relative to trace start
+	event.Ts = time.Since(session.StartTime).Microseconds()
+	event.Pid = t.nodePID
+
+	session.eventsMu.Lock()
+	session.events = append(session.events, event)
+	session.eventsMu.Unlock()
+}
+
+// hashString creates a simple hash for consistent thread IDs
+func hashString(s string) int {
+	h := 0
+	for _, c := range s {
+		h = 31*h + int(c)
+	}
+	if h < 0 {
+		h = -h
+	}
+	return h
+}


The pkg/trace/trace.go and pkg/cluster/trace.go files are identical duplicates (573 lines each). This violates the DRY principle and creates maintenance burden. One should be removed and the other should be imported where needed.

Copilot · 2026-01-29T02:23:55Z

pkg/tokenizer/main.rs

+    std::fs::set_permissions(socket_path, std::fs::Permissions::from_mode(0o777))
+        .expect("Failed to set socket permissions");


In the Rust tokenizer sidecar, the socket permissions are set to 0o777 (world-readable/writable). This is a security risk as any user on the system can connect to the tokenizer socket and potentially cause denial of service or other issues. Consider using more restrictive permissions like 0o660 and ensuring proper group ownership.

Copilot · 2026-01-29T02:23:56Z

benchmark/geo_proxy.py

+            )
+
+        except Exception as e:
+            status = "error"


Variable status is not used.

Copilot · 2026-01-29T02:23:56Z

benchmark/geo_proxy.py

+            self.logger.error(f"[{request_id}] Request failed: {e}")
+            return web.json_response(
+                {"error": {"message": str(e), "type": "server_error"}},


Variable error_msg is not used.

Suggested change

self.logger.error(f"[{request_id}] Request failed: {e}")

return web.json_response(

{"error": {"message": str(e), "type": "server_error"}},

self.logger.error(f"[{request_id}] Request failed: {error_msg}")

return web.json_response(

{"error": {"message": error_msg, "type": "server_error"}},

Copilot · 2026-01-29T02:23:56Z

benchmark/analyze_trace.py

+import sys
+import argparse
+from collections import defaultdict
+from typing import Dict, List, Tuple, Optional


Import of 'Optional' is not used.

Suggested change

from typing import Dict, List, Tuple, Optional

from typing import Dict, List, Tuple

Copilot · 2026-01-29T02:23:57Z

benchmark/geo_proxy.py

+import time
+import uuid
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple


Import of 'Optional' is not used.
Import of 'List' is not used.

Suggested change

from typing import Any, Dict, List, Optional, Tuple

from typing import Any, Dict, Tuple

Copilot · 2026-01-29T02:23:57Z

benchmark/run_guidellm_geo.sh

+# Build GuideLLM command
+GUIDELLM_CMD="guidellm benchmark \
+    --target http://localhost:$PROXY_PORT/v1 \
+    --profile $PROFILE \
+    --max-seconds $DURATION \
+    --data $SCRIPT_DIR/wildchat_guidellm.jsonl \
+    --data-column-mapper '{\"text_column\": \"prompt\"}' \
+    --output-path $OUTPUT_DIR"
+
+# Add rate for profiles that need it
+if [ "$PROFILE" = "poisson" ] || [ "$PROFILE" = "constant" ] || [ "$PROFILE" = "concurrent" ] || [ "$PROFILE" = "throughput" ]; then
+    GUIDELLM_CMD="$GUIDELLM_CMD --rate $RATE"
+fi
+
+echo "Running: $GUIDELLM_CMD"
+echo ""
+
+eval $GUIDELLM_CMD


The use of eval $GUIDELLM_CMD with command-line-controlled variables like PROFILE, RATE, DURATION, and OUTPUT_DIR creates a command injection risk: a crafted argument (for example, --profile 'poisson; rm -rf ~') would cause arbitrary shell commands to execute in the context of whoever runs this script. To prevent this, avoid eval entirely and invoke guidellm directly using a properly quoted argument list (e.g., building an array and calling it without eval) so user-supplied values cannot break out of their argument positions.

Suggested change

# Build GuideLLM command

GUIDELLM_CMD="guidellm benchmark \

--target http://localhost:$PROXY_PORT/v1 \

--profile $PROFILE \

--max-seconds $DURATION \

--data $SCRIPT_DIR/wildchat_guidellm.jsonl \

--data-column-mapper '{\"text_column\": \"prompt\"}' \

--output-path $OUTPUT_DIR"

# Add rate for profiles that need it

if [ "$PROFILE" = "poisson" ] || [ "$PROFILE" = "constant" ] || [ "$PROFILE" = "concurrent" ] || [ "$PROFILE" = "throughput" ]; then

GUIDELLM_CMD="$GUIDELLM_CMD --rate $RATE"

fi

echo "Running: $GUIDELLM_CMD"

echo ""

eval $GUIDELLM_CMD

# Build GuideLLM command as an argument array to avoid command injection

GUIDELLM_ARGS=(

benchmark

--target "http://localhost:$PROXY_PORT/v1"

--profile "$PROFILE"

--max-seconds "$DURATION"

--data "$SCRIPT_DIR/wildchat_guidellm.jsonl"

--data-column-mapper '{"text_column": "prompt"}'

--output-path "$OUTPUT_DIR"

)

# Add rate for profiles that need it

if [ "$PROFILE" = "poisson" ] || [ "$PROFILE" = "constant" ] || [ "$PROFILE" = "concurrent" ] || [ "$PROFILE" = "throughput" ]; then

GUIDELLM_ARGS+=( --rate "$RATE" )

fi

# Show the command being run

printf 'Running: guidellm'

for arg in "${GUIDELLM_ARGS[@]}"; do

printf ' %q' "$arg"

done

printf '\n\n'

guidellm "${GUIDELLM_ARGS[@]}"

atoniolo76 · 2026-01-29T02:33:02Z

Sounds good, thanks copilot. 1. Get rid of the entire benchmark folder for this PR 2. Productionize the tokenizer policies. Add graceful error handling, restrict socket access from local processes only, and fix the race condition from two concurrent requests connecting. 3. get rid of remove instance from config

…

On Wed, Jan 28, 2026 at 9:24 PM Copilot ***@***.***> wrote: ***@***.**** commented on this pull request. Pull request overview This PR introduces a comprehensive load balancing infrastructure for distributed GPU cluster management. It adds event-driven tracing (Perfetto-compatible), tokenizer sidecar integration (Rust), SGLang cluster management, and multiple load balancing policies (GORGO, prefix-tree, least-loaded). The PR also refactors instance state management to use Lambda API as the source of truth instead of local database storage. *Changes:* - Added distributed tracing system with Perfetto format support - Implemented Rust tokenizer sidecar with Unix socket communication - Created cluster management infrastructure for SGLang deployments - Added load balancing policies and configuration system - Removed local DB dependency for instance state, using Lambda API instead - Added comprehensive benchmarking tools using WildChat-1M dataset Reviewed changes Copilot reviewed 39 out of 42 changed files in this pull request and generated 12 comments. Show a summary per file File Description pkg/trace/trace.go New event-driven tracing system (Perfetto format) pkg/cluster/trace.go Duplicate of pkg/trace/trace.go pkg/tokenizer/*.go Tokenizer sidecar management and client pkg/tokenizer/main.rs Rust tokenizer service with Unix socket pkg/sglang/sglang.go SGLang Docker deployment and diagnostics pkg/cluster/*.go Cluster management, metrics, build/deploy pkg/config/constants.go Centralized configuration constants pkg/remote/client.go Refactored SSH key lookup from Lambda API cmd/*.go New commands for cluster, tokenizer, LB management benchmark/*.py Geo-distributed benchmarking tools go.mod Added tokenizers dependency (with conflicts) ------------------------------ 💡 Add Copilot custom instructions <http:///atoniolo76/gotoni/new/main/.github/instructions?filename=*.instructions.md> for smarter, more guided reviews. Learn how to get started <https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot> . ------------------------------ In pkg/tokenizer/main.rs <#4 (comment)>: > + .expect("Failed to download tokenizer.json"); + + if !response.status().is_success() { + panic!("Failed to download tokenizer: HTTP {}", response.status()); + } + + let bytes = response.bytes().await + .expect("Failed to read tokenizer bytes"); + + // Cache the tokenizer + std::fs::write(TOKENIZER_CACHE_PATH, &bytes) + .expect("Failed to cache tokenizer"); + + println!("Tokenizer cached to: {}", TOKENIZER_CACHE_PATH); + + Tokenizer::from_bytes(&bytes) + .expect("Failed to parse tokenizer.json") +} + +#[tokio::main] +async fn main() { + println!("Loading tokenizer model: {}", MODEL_ID); + + // Load the Mistral tokenizer + let tokenizer: Arc<Tokenizer> = Arc::new(load_or_download_tokenizer().await); + println!("Tokenizer loaded successfully"); + + // Build the router with shared state + let tk_for_count = Arc::clone(&tokenizer); + + let app = Router::new() + .route("/count", post(move |Json(payload): Json<TokenizeRequest>| { + let tk: Arc<Tokenizer> = Arc::clone(&tk_for_count); + async move { + let encoding = tk.encode(payload.text, false).expect("Tokenization failed"); + Json(TokenizeResponse { + count: encoding.get_ids().len(), + }) + } + })) + .route("/health", get(|| async { + Json(HealthResponse { + status: "ok".to_string(), + model: MODEL_ID.to_string(), + }) + })); + + // Unix Domain Socket path + let socket_path = "/tmp/tokenizer.sock"; + + // Clean up old socket if it exists + let _ = std::fs::remove_file(socket_path); + + println!("Starting tokenizer sidecar on {}", socket_path); + + let listener = UnixListener::bind(socket_path).expect("Failed to bind to socket"); + + // Make socket world-readable/writable + std::fs::set_permissions(socket_path, std::fs::Permissions::from_mode(0o777)) + .expect("Failed to set socket permissions"); + + // Serve using hyper with Unix socket + loop { + let (stream, _) = listener.accept().await.expect("Accept failed"); The tokenizer sidecar uses expect() for error handling in several places (lines 52, 59, 63, 68, 86, 107, 111, 115), which will panic the entire process if the operation fails. For a production service, these should use proper error handling with ? operator or unwrap_or_else() with logging to allow graceful degradation rather than crashing. ------------------------------ In pkg/remote/client.go <#4 (comment)>: > +// RemoveInstanceFromConfig removes an instance from the config +// Deprecated: Instance state is now managed by Lambda API, not local DB. +// This function is kept for backwards compatibility but does nothing meaningful. +func RemoveInstanceFromConfig(instanceID string) error { + // Instance state is managed by Lambda API, not local DB + // No action needed - when an instance is terminated via Lambda API, + // it will no longer appear in ListRunningInstances + return nil } The RemoveInstanceFromConfig function is deprecated but returns no error and does nothing. Callers expecting this function to clean up instance state will silently fail. Consider either removing this function entirely or returning an error indicating it's no longer supported to make the deprecation explicit. ------------------------------ In pkg/cluster/tokenizer.go <#4 (comment)>: > + tokenizerMu.RLock() + available := tokenizerAvailable + tokenizerMu.RUnlock() + + if !available { + // Try to reconnect periodically + go func() { + tokenizerMu.Lock() + defer tokenizerMu.Unlock() + if !tokenizerAvailable { + tokenizerAvailable = checkTokenizerHealth() + } + }() + return fallbackTokenCount(text) The pkg/cluster/tokenizer.go file has a potential race condition. The tokenizerAvailable flag is read without lock on line 94 and checked again on line 102-103, but another goroutine could modify it between these checks. The reconnection goroutine spawned on line 99 should be synchronized to prevent multiple concurrent reconnection attempts. ------------------------------ In pkg/cluster/trace.go <#4 (comment)>: > + +// TraceGORGODecision records GORGO's routing decision with cost calculation +func (t *Tracer) TraceGORGODecision(requestID string, decision string, metadata map[string]interface{}) { + if !t.IsActive() { + return + } + + args := map[string]interface{}{ + "request_id": requestID, + "decision": decision, + } + for k, v := range metadata { + args[k] = v + } + + t.recordEvent(TraceEvent{ + Name: "gorgo_decision", + Cat: "gorgo", + Ph: "i", + Tid: hashString(requestID) % 1000, + Args: args, + }) +} + +// ===================================================== +// DISTRIBUTED TRACING - Receive events from other nodes +// ===================================================== + +// ReceiveRemoteEvents merges trace events from a forwarded request +func (t *Tracer) ReceiveRemoteEvents(events []TraceEvent) { + if !t.IsActive() { + return + } + + t.sessionMu.RLock() + session := t.session + t.sessionMu.RUnlock() + + if session == nil { + return + } + + session.remoteEventsMu.Lock() + session.remoteEvents = append(session.remoteEvents, events...) + session.remoteEventsMu.Unlock() +} + +// GetTraceHeaders returns headers to propagate trace context +func (t *Tracer) GetTraceHeaders(requestID string) map[string]string { + if !t.IsActive() { + return nil + } + + t.sessionMu.RLock() + defer t.sessionMu.RUnlock() + + if t.session == nil { + return nil + } + + return map[string]string{ + "X-Trace-ID": t.session.ID, + "X-Trace-Request": requestID, + "X-Trace-Origin": t.nodeID, + } +} + +// ===================================================== +// INTERNAL HELPERS +// ===================================================== + +func (t *Tracer) recordEvent(event TraceEvent) { + t.sessionMu.RLock() + session := t.session + t.sessionMu.RUnlock() + + if session == nil || !session.active.Load() { + return + } + + // Set timestamp relative to trace start + event.Ts = time.Since(session.StartTime).Microseconds() + event.Pid = t.nodePID + + session.eventsMu.Lock() + session.events = append(session.events, event) + session.eventsMu.Unlock() +} + +// hashString creates a simple hash for consistent thread IDs +func hashString(s string) int { + h := 0 + for _, c := range s { + h = 31*h + int(c) + } + if h < 0 { + h = -h + } + return h +} The pkg/trace/trace.go and pkg/cluster/trace.go files are identical duplicates (573 lines each). This violates the DRY principle and creates maintenance burden. One should be removed and the other should be imported where needed. ------------------------------ In pkg/tokenizer/main.rs <#4 (comment)>: > + std::fs::set_permissions(socket_path, std::fs::Permissions::from_mode(0o777)) + .expect("Failed to set socket permissions"); In the Rust tokenizer sidecar, the socket permissions are set to 0o777 (world-readable/writable). This is a security risk as any user on the system can connect to the tokenizer socket and potentially cause denial of service or other issues. Consider using more restrictive permissions like 0o660 and ensuring proper group ownership. ------------------------------ In benchmark/geo_proxy.py <#4 (comment)>: > + ) + await response.prepare(request) + + # Stream response chunks + first_chunk = True + async for chunk in resp.content.iter_any(): + if first_chunk: + ttft = time.time() - start_time + first_chunk = False + await response.write(chunk) + total_tokens += 1 # Approximate + + await response.write_eof() + + except asyncio.TimeoutError: + status = "timeout" Variable status is not used. ------------------------------ In benchmark/geo_proxy.py <#4 (comment)>: > + await response.write(chunk) + total_tokens += 1 # Approximate + + await response.write_eof() + + except asyncio.TimeoutError: + status = "timeout" + error_msg = f"Request timed out after {REQUEST_TIMEOUT}s" + self.logger.error(f"[{request_id}] {error_msg}") + return web.json_response( + {"error": {"message": error_msg, "type": "timeout_error"}}, + status=504 + ) + + except Exception as e: + status = "error" Variable status is not used. ------------------------------ In benchmark/geo_proxy.py <#4 (comment)>: > + self.logger.error(f"[{request_id}] Request failed: {e}") + return web.json_response( + {"error": {"message": str(e), "type": "server_error"}}, Variable error_msg is not used. ⬇️ Suggested change - self.logger.error(f"[{request_id}] Request failed: {e}") - return web.json_response( - {"error": {"message": str(e), "type": "server_error"}}, + self.logger.error(f"[{request_id}] Request failed: {error_msg}") + return web.json_response( + {"error": {"message": error_msg, "type": "server_error"}}, ------------------------------ In benchmark/analyze_trace.py <#4 (comment)>: > +Usage: + # Fetch and analyze trace from a running LB + curl http://192.9.140.201:8000/lb/trace/stop | python analyze_trace.py + + # Analyze a saved trace file + python analyze_trace.py trace.json + + # Or use the built-in fetch + python analyze_trace.py --fetch http://192.9.140.201:8000 +""" + +import json +import sys +import argparse +from collections import defaultdict +from typing import Dict, List, Tuple, Optional Import of 'Optional' is not used. ⬇️ Suggested change -from typing import Dict, List, Tuple, Optional +from typing import Dict, List, Tuple ------------------------------ In benchmark/geo_proxy.py <#4 (comment)>: > +Usage: + python geo_proxy.py [--port 9000] [--lookup-file wildchat_location_lookup.json] + +Then point GuideLLM at: + guidellm benchmark --target http://localhost:9000/v1 ... +""" +import argparse +import asyncio +import hashlib +import json +import logging +import math +import time +import uuid +from pathlib import Path +from typing import Any, Dict, List, Optional, Tuple Import of 'Optional' is not used. Import of 'List' is not used. ⬇️ Suggested change -from typing import Any, Dict, List, Optional, Tuple +from typing import Any, Dict, Tuple ------------------------------ In benchmark/run_guidellm_geo.sh <#4 (comment)>: > +# Build GuideLLM command +GUIDELLM_CMD="guidellm benchmark \ + --target http://localhost:$PROXY_PORT/v1 \ + --profile $PROFILE \ + --max-seconds $DURATION \ + --data $SCRIPT_DIR/wildchat_guidellm.jsonl \ + --data-column-mapper '{\"text_column\": \"prompt\"}' \ + --output-path $OUTPUT_DIR" + +# Add rate for profiles that need it +if [ "$PROFILE" = "poisson" ] || [ "$PROFILE" = "constant" ] || [ "$PROFILE" = "concurrent" ] || [ "$PROFILE" = "throughput" ]; then + GUIDELLM_CMD="$GUIDELLM_CMD --rate $RATE" +fi + +echo "Running: $GUIDELLM_CMD" +echo "" + +eval $GUIDELLM_CMD The use of eval $GUIDELLM_CMD with command-line-controlled variables like PROFILE, RATE, DURATION, and OUTPUT_DIR creates a command injection risk: a crafted argument (for example, --profile 'poisson; rm -rf ~') would cause arbitrary shell commands to execute in the context of whoever runs this script. To prevent this, avoid eval entirely and invoke guidellm directly using a properly quoted argument list (e.g., building an array and calling it without eval) so user-supplied values cannot break out of their argument positions. ⬇️ Suggested change -# Build GuideLLM command -GUIDELLM_CMD="guidellm benchmark \ - --target http://localhost:$PROXY_PORT/v1 \ - --profile $PROFILE \ - --max-seconds $DURATION \ - --data $SCRIPT_DIR/wildchat_guidellm.jsonl \ - --data-column-mapper '{\"text_column\": \"prompt\"}' \ - --output-path $OUTPUT_DIR" - -# Add rate for profiles that need it -if [ "$PROFILE" = "poisson" ] || [ "$PROFILE" = "constant" ] || [ "$PROFILE" = "concurrent" ] || [ "$PROFILE" = "throughput" ]; then - GUIDELLM_CMD="$GUIDELLM_CMD --rate $RATE" -fi - -echo "Running: $GUIDELLM_CMD" -echo "" - -eval $GUIDELLM_CMD +# Build GuideLLM command as an argument array to avoid command injection +GUIDELLM_ARGS=( + benchmark + --target "http://localhost:$PROXY_PORT/v1" + --profile "$PROFILE" + --max-seconds "$DURATION" + --data "$SCRIPT_DIR/wildchat_guidellm.jsonl" + --data-column-mapper '{"text_column": "prompt"}' + --output-path "$OUTPUT_DIR" +) + +# Add rate for profiles that need it +if [ "$PROFILE" = "poisson" ] || [ "$PROFILE" = "constant" ] || [ "$PROFILE" = "concurrent" ] || [ "$PROFILE" = "throughput" ]; then + GUIDELLM_ARGS+=( --rate "$RATE" ) +fi + +# Show the command being run +printf 'Running: guidellm' +for arg in "${GUIDELLM_ARGS[@]}"; do + printf ' %q' "$arg" +done +printf '\n\n' + +guidellm "${GUIDELLM_ARGS[@]}" — Reply to this email directly, view it on GitHub <#4 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AKPYOD5UGWICIQS2MALIVED4JFVNJAVCNFSM6AAAAACTIBXQFSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTOMRQGE3TMMBYG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

…o76/gotoni into benchmark-load-balancing

Adds a new CLI command to visualize network latencies between cluster nodes: - `gotoni lb latency --all` shows latencies from all nodes to their peers - `--matrix` flag displays as a latency matrix for easier comparison - Helps understand inter-region transport costs for GORGO/GORGO2 decisions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

SSH keys are now looked up directly from ~/.ssh directory by name, without requiring the local database. The Lambda API provides the SSH key name for each instance, and we find the corresponding .pem file in the user's ssh directory. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add pkg/proxy/proxy.go: Centralized HTTP proxy with GORGO-style routing - Global request tracking across all SGLang servers - Cost calculation: queued_tokens * ms_per_token + running_tokens * ms_per_token * 0.5 - EWMA latency tracking with dedicated probing - Prefix tree for KV cache-aware routing - Add cmd/proxy.go: CLI commands for proxy management - proxy start/stop/status: Local proxy control - proxy deploy: Build, upload, and start proxy on remote instance - proxy servers add/remove: Manage SGLang server pool - proxy tune: Adjust GORGO parameters at runtime - proxy latency: Display latencies to all servers - --remote flag: Resolve instance names via Lambda API 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…+ lb installation, Update README for Research

…ub link

Updated README to include './' before commands for consistency.

…equests, updated constants and README

…equests, and updated README

…o76/gotoni into benchmark-load-balancing

atoniolo76 and others added 30 commits January 25, 2026 15:32

Add commands to run and test load balancers

f83dfa8

Create code for starting load balancer across cluster after installin…

2eee50a

…g gotoni binary on each instance using scp

Write PrefixTreePolicy's Insert and Select methods, refactor load_bal…

2a5e157

…ancer.go to use sglang_metrics.go for updating availability state, remove WrapHandler to make Handler() flow more streamlined, update lb.go, and remove claude bloat.

[ ] if forward to peer and if prefix trie make sure to insert the req…

9427614

…uest into trie before execution [ ] network policy can use the existing metric polling with timestamps to get and update real time inter-node latency metrics

Add health and status endpoints in loadbalancer.go for use in updated…

b1834c1

… TestClusterSGLang test

Update TestClusterSGLang to correctly deploy load balancers and use M…

82b36de

…istral-7b for non-gated model serving

Add TestClusterSGLang test and update .gitignore

0f671ee

Add GORGO strategy argument to lb cli, updated gotoni binary setup fo…

ca03b2d

…r tokenizer_cgo.go compilation on Linux, and include tokenizer test directly in TestClusterSGLang

Remove local tokenizer test and restart load balancer with new binary…

a34f2df

… when running TestClusterSGLang. Added endpoint in loadbalancer.go for testing tokenizer functionality

Add observability instrumentation and update branch to Abinaya's clus…

e044aec

…ter_test.go mods and new testing script.

Use prometheus metrics from sglang and add nocgo fallback

ca22d10

Update lb_observability.go to record event traces from loadbalancer.go

f39da1b

Add monitoring stack with grafana, prometheus, and loki for viewing s…

faf149b

…glang metrics and loadbalancer event logs

Add peers when deploying load balancer with config in cluster_test.go.

56a28d1

Created helpful script that geo-proximally routes requests to the nea…

2ee4106

…rest cluster endpoint.

Update grafana docs with instructions on viewing gotoni cluster dashb…

016273e

…oard

Fix sglang_metrics.go to poll the sglang backend directly instead of …

62838dc

…the load balancer endpoint for metrics

Include working paper draft for Abinaya to view

e96f1c1

fixed code errors

06de02e

Add cleaner observability, trace generation, rust sidecar tokenizer, …

d66eba9

…and updated project structure

Merge branch 'benchmark-load-balancing' of https://github.com/atoniol…

a54953a

…o76/gotoni into benchmark-load-balancing

Simplify cluster.go

c7f08a1

Delete benchmark

c0cf374

Remove db functionality so clusters use ground truth from Lambda servers

f2b03c6

Add policy changing

6db279c

Add Radix-tree clearing on the loadbalancer side

13b6ec4

Enable max running requests flag in sglang and kill existing load bal…

609cb07

…ancer/delete binary before reuploading gotoni binary

Include in last commit

bf14871

Add peers to cluster

4a17087

atoniolo76 and others added 10 commits January 28, 2026 19:23

Merge branch 'benchmark-load-balancing' of https://github.com/atoniol…

92763ca

…o76/gotoni into benchmark-load-balancing Merge changes into benchmark-load-balancing

Allow multiple-hops using MaxHops constant

0a43ac2

Fix syntax errors and missing parts of forwardToPeer function

841a9c1

fix throughput request profile in GuideLLM

c55bc6b

Merge branch 'benchmark-load-balancing' of https://github.com/atoniol…

6306733

…o76/gotoni into benchmark-load-balancing hh

Fix requestPath error causing least-load fallback in gorgo2

8943ec1

Merge branch 'benchmark-load-balancing' of https://github.com/atoniol…

7fd42f6

…o76/gotoni into benchmark-load-balancing

Update lb.go to include GORGO2 policy

35649f5

finally add the new http request headers

9f5897c

Merge branch 'benchmark-load-balancing' of https://github.com/atoniol…

c07c0d3

…o76/gotoni into benchmark-load-balancing dfkj.

atoniolo76 requested a review from Copilot January 29, 2026 02:18

Copilot started reviewing on behalf of atoniolo76 January 29, 2026 02:19 View session

Copilot AI reviewed Jan 29, 2026

View reviewed changes

abinayadinesh1 and others added 16 commits January 28, 2026 18:58

benchmarking gorgo2 with GuideLLM

3bfe8fc

Add tunable runtime parameters in lb.go

6213144

Merge branch 'benchmark-load-balancing' of https://github.com/atoniol…

a73e377

…o76/gotoni into benchmark-load-balancing

Add aggregate metrics and prefix tree population

058f5fc

Refactor GORGO2 into GORGO, Add cluster setup command for e2e sglang …

b5b0be8

…+ lb installation, Update README for Research

Update README

49baecf

Add .env to .gitignore

738bf2f

Add docs/perfetto_trace.png to repository and update README with GitH…

b3e95d6

…ub link

Update readme.md

b5691ad

Add './' prefix to command line instructions

b4966ec

Updated README to include './' before commands for consistency.

Add prefix trie stats method, fixed timeout error with long-running r…

0f84d0a

…equests, updated constants and README

Add prefix trie stats method, fixed timeout error with long-running r…

1ceede1

…equests, and updated README

Merge branch 'benchmark-load-balancing' of https://github.com/atoniol…

74f79c1

…o76/gotoni into benchmark-load-balancing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark load balancing #4

Benchmark load balancing #4

Uh oh!

atoniolo76 commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

Copilot AI Jan 29, 2026

Uh oh!

atoniolo76 commented Jan 29, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		std::fs::set_permissions(socket_path, std::fs::Permissions::from_mode(0o777))
		.expect("Failed to set socket permissions");

	from typing import Dict, List, Tuple, Optional
	from typing import Dict, List, Tuple

	from typing import Any, Dict, List, Optional, Tuple
	from typing import Any, Dict, Tuple

Benchmark load balancing #4

Are you sure you want to change the base?

Benchmark load balancing #4

Uh oh!

Conversation

atoniolo76 commented Jan 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

atoniolo76 commented Jan 29, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants