Skip to content

[Audit][High] Mesh build failure leaves chunk stuck in invalid state #349

@github-actions

Description

@github-actions

🔍 Module Scanned

src/world/ (automated audit scan)

📝 Summary

In processMeshJob(), when buildWithNeighbors() fails, the error is caught and logged but execution continues, leaving the chunk stuck in an invalid .mesh_ready state with corrupt or missing mesh data. The chunk will not be regenerated until the player moves far enough away for it to be unloaded.

📍 Location

  • File: src/world/world_streamer.zig:482-490
  • Function/Scope: processMeshJob

🔴 Severity: High

  • Critical: Crashes, data corruption, security vulnerabilities, GPU device loss
  • High: Memory leaks, race conditions, incorrect rendering, broken features
  • Medium: Performance degradation, missing error handling, suboptimal patterns
  • Low: Code style, dead code, minor improvements

💥 Impact

When mesh building fails (e.g., due to memory allocation failure in the greedy mesher or boundary module), the chunk state transitions to .mesh_ready even though the mesh data is incomplete or corrupt. This causes:

  • Visual artifacts: missing chunks or corrupt geometry visible to players
  • Potential crashes during GPU upload when processing invalid vertex data
  • The chunk becomes permanently stuck until player moves out of render distance
  • Hard-to-reproduce bugs in production when memory pressure occurs

🔎 Evidence

// src/world/world_streamer.zig lines 482-490
if (chunk_data.chunk.state == .meshing and chunk_data.chunk.job_token == job.data.chunk.job_token) {
    chunk_data.mesh.buildWithNeighbors(&chunk_data.chunk, neighbors, self.atlas) catch |err| {
        log.log.errWithTrace("Mesh build failed for chunk ({}, {}): {}", .{ cx, cz, err });
    };  // ERROR: Execution continues after catch!
    
    if (self.mesh_queue.abort_worker) {
        chunk_data.chunk.state = .generated;
        return;
    }
    chunk_data.chunk.state = .mesh_ready;  // BUG: Always set to ready even if build failed
}

Problem: The catch block only logs the error - it doesn't return or change control flow. After the error is logged, execution falls through to line 490 which unconditionally sets the chunk state to .mesh_ready. The chunk is now marked as ready for rendering/uploading but contains invalid mesh data.

🛠️ Proposed Fix

Set chunk state back to .generated on error so it can be retried, and only transition to .mesh_ready on success:

if (chunk_data.chunk.state == .meshing and chunk_data.chunk.job_token == job.data.chunk.job_token) {
    chunk_data.mesh.buildWithNeighbors(&chunk_data.chunk, neighbors, self.atlas) catch |err| {
        log.log.errWithTrace("Mesh build failed for chunk ({}, {}): {}", .{ cx, cz, err });
        // Reset to generated state so it can be retried
        chunk_data.chunk.state = .generated;
        return;
    };
    
    if (self.mesh_queue.abort_worker) {
        chunk_data.chunk.state = .generated;
        return;
    }
    chunk_data.chunk.state = .mesh_ready;
}

Alternative approach using try if error propagation is preferred:

if (chunk_data.chunk.state == .meshing and chunk_data.chunk.job_token == job.data.chunk.job_token) {
    if (self.mesh_queue.abort_worker) {
        chunk_data.chunk.state = .generated;
        return;
    }
    
    chunk_data.mesh.buildWithNeighbors(&chunk_data.chunk, neighbors, self.atlas) catch |err| {
        log.log.errWithTrace("Mesh build failed for chunk ({}, {}): {}", .{ cx, cz, err });
        chunk_data.chunk.state = .generated;
        return;
    };
    
    chunk_data.chunk.state = .mesh_ready;
}

✅ Acceptance Criteria

  • When buildWithNeighbors() fails, chunk state is reset to .generated instead of .mesh_ready
  • Failed chunks are automatically re-queued for meshing on next update cycle
  • Error is still logged with chunk coordinates for debugging
  • Unit test added to verify error handling path (mock a failing mesh build)
  • All unit tests in src/tests.zig pass
  • No regressions in chunk loading/visual stability during gameplay

📚 References

  • Chunk state machine documentation in world_streamer.zig lines 6-22
  • Related error handling pattern in processGenJob() (lines 393-429) for comparison
  • Similar issue patterns found in lod_manager.zig lines 964-965 (audit recommended)

Metadata

Metadata

Assignees

No one assigned

    Labels

    automated-auditIssues found by automated opencode audit scans

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions