Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/cpuinfo: Increase the file descriptors limit to handle more CPUs #263

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

babumoger
Copy link
Contributor

The pqos tool fails with the following errors on systems with 300 or more CPU cores.
$pqos
NOTE: Mixed use of MSR and kernel interfaces to manage
CAT or CMT & MBM may lead to unexpected behavior.
ERROR: Could not open /sys/fs/resctrl directory
ERROR: Failed to stop resctrl events
ERROR: Failed to start all selected OS monitoring events Monitoring start error on core(s) 339, status 1

By default, the file descriptor limit is set to 1024 for a session. pqos monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing the descriptor limit using system call getrlimit and setrlimit respectively. Increase the limit to 4 times the number of CPUs to take care of open files limit.

Description

By default, the file descriptor limit is set to 1024 for a session. pqos monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing the descriptor limit using system call getrlimit and setrlimit respectively. Increase the limit to 4 times the number of CPUs to take care of open files limit.

Affected parts

  • library
  • pqos utility
  • rdtset utility
  • App QoS
  • other: (please specify)

Motivation and Context

#261

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@babumoger
Copy link
Contributor Author

Please take a look at the code.

@rkanagar
Copy link
Contributor

Please take a look at the code.

Yes, we are reviewing this code. Thanks

@rkanagar
Copy link
Contributor

rkanagar commented Apr 18, 2024

Hi Babu,
Please implement the attached fd_diff.txt
fd_diff.txt

@babumoger
Copy link
Contributor Author

Hi Babu, Please implement the attached fd_diff.txt fd_diff.txt

Hi Raghavan, I have implemented your changes. Please review. thanks

lib/common.c Outdated Show resolved Hide resolved
lib/common.c Outdated Show resolved Hide resolved
lib/cpuinfo.c Outdated Show resolved Hide resolved
lib/os_cpuinfo.c Outdated Show resolved Hide resolved
The pqos tool fails with the following errors on systems with 300 or more
CPU cores.
$pqos
NOTE:  Mixed use of MSR and kernel interfaces to manage
       CAT or CMT & MBM may lead to unexpected behavior.
ERROR: Could not open /sys/fs/resctrl directory
ERROR: Failed to stop resctrl events
ERROR: Failed to start all selected OS monitoring events
Monitoring start error on core(s) 339, status 1

By default, the file descriptor limit is set to 1024 for a session. pqos
monitor uses 3 descriptors for each CPU for perf monitoring. So, it runs
out of limit(1024) on systems with 300 or more CPUs.

Fix the issue by detecting the number of CPUs in the system and increasing
the descriptor limit using system call getrlimit and setrlimit respectively.
Increase the limit to 4 times the number of CPUs to take care of open files
limit.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants