Skip to content

Conversation

@gmarciani
Copy link
Collaborator

@gmarciani gmarciani commented Apr 2, 2025

Description

Fix the retrieval of job info by making the SSM command to store the outputs on CloudWatch logs to prevent truncation.
This change fixes #376

Screenshot 2025-04-02 at 4 48 08 PM

How Has This Been Tested?

Verified that PCUI is now able to show information when 200+ jobs are submitted.
In particular, tested with 9999 jobs, which seems to be the maximum amount of jobs that Slurm can handle in queue for a single compute node.

Current Limitation
Unit tests have been implemented, but skipped, because they require the refactoring of the logging packages to prevent test failures, which is a more invasive change we want to decouple from this bugfix. This seems unreasonable, but actually caused by the fact that PCUI logging utilities clashes with the logging library of Python, disturbing pytest.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…store the outputs on CloudWatch logs to prevent truncation.

This change fixes aws#376
@gmarciani gmarciani marked this pull request as ready for review April 2, 2025 21:02
@himani2411 himani2411 merged commit 4e92b51 into aws:main Apr 2, 2025
2 checks passed
@gmarciani gmarciani deleted the wip/mgiacomo/2025040/fix-jobinfo-1 branch April 3, 2025 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large numbers of jobs cause slow loading and many error messages in Job status tab

2 participants