Fix scale issues when listing results with tens of thousands of jobs#117
Fix scale issues when listing results with tens of thousands of jobs#117daniel-thom merged 19 commits intomainfrom
Conversation
The command `torc results list <id>` was timing out when there are tens of thousands of results.
There was a problem hiding this comment.
Pull request overview
Adds a 100K-job scale test fixture and tightens/standardizes authorization handling across HTTP endpoints to improve correctness and scalability when operating on large workflows.
Changes:
- Added a 100K-jobs
scale_test/workflow documentation + fixtures. - Centralized access control checks via
authorize_*macros and expanded API response enums withForbidden/NotFoundvariants. - Updated integration tests to reflect stricter access control behavior and corrected broken user-data/missing-resource scenarios.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/workflows/scale_test/output1/job_stdio/* | Adds many per-job stdio fixture outputs for the 100K scale test |
| tests/workflows/scale_test/README.md | Documents how to run and interpret the 100K-job scale test |
| tests/workflows/README.md | Adds scale_test/ entry to workflow test index |
| tests/test_user_data.rs | Fixes broken test by creating placeholder user_data rows and using lifecycle APIs |
| tests/test_slurm_regenerate.rs | Updates expectation: nonexistent workflow regenerate should return 404 |
| tests/test_jobs.rs | Clarifies status immutability test naming/messages |
| tests/test_access_groups.rs | Adjusts tests to authenticate workflow creation and adds resource-level access tests |
| tests/common.rs | Adds htpasswd users used by new resource-access tests |
| src/server/http_server.rs | Introduces authorization/error macros and applies workflow/job/resource authorization broadly |
| src/server/authorization.rs | Adds InternalError, resource-level checks, and a resource table whitelist |
| src/server/api_types.rs | Expands many API response enums with Forbidden/NotFound variants |
| src/server/api/user_data.rs | Propagates Forbidden responses for update/delete flows |
| src/server/api/schedulers.rs | Propagates Forbidden responses for update/delete flows |
| src/server/api/results.rs | Propagates Forbidden responses for update/delete flows |
| src/server/api/resource_requirements.rs | Propagates Forbidden responses for update/delete flows |
| src/server/api/files.rs | Propagates Forbidden responses for update/delete flows |
| src/server/api/events.rs | Propagates Forbidden responses for update/delete flows |
| src/server/api/compute_nodes.rs | Propagates Forbidden responses for update/delete flows |
| src/server/api/access_groups.rs | Populates new reason field in AccessCheckResponse |
| src/models.rs | Adds optional reason to AccessCheckResponse |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The per-job unblock was far too slow in large workflows. It caused runners to timeout and exit.
The previous behavior was causing timeouts when initializing a workflow with 100,000 jobs.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.