<fix>[prometheus]: add label in gpu#508
<fix>[prometheus]: add label in gpu#508zstack-robot-2 wants to merge 1 commit intofeature-5.1.8-GPU-monitorfrom
Conversation
Resolves: ZSTAC-65807 Change-Id: I696675636b66726a6d6d6c68796e7a756f6a6972
Walkthrough此次变更主要涉及对 Prometheus 插件的 GPU 监控指标的修改。所有 GPU 相关的指标现在都增加了 Changes
TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
Out of diff range and nitpick comments (2)
kvmagent/kvmagent/plugins/prometheus.py (2)
Line range hint
447-447: 请确保在单行中不要编写多条语句。- if ALARM_CONFIG is None: return + if ALARM_CONFIG is None: + return
Line range hint
1617-1619: 请确保在单行中不要编写多条语句。- if output == "" or output is None: PAGE_SIZE = 4096 + if output == "" or output is None: + PAGE_SIZE = 4096
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- kvmagent/kvmagent/plugins/prometheus.py (4 hunks)
Additional Context Used
Ruff (4)
kvmagent/kvmagent/plugins/prometheus.py (4)
447-447: SyntaxError: Simple statements must be separated by newlines or semicolons
1617-1617: Multiple statements on one line (colon)
1618-1618: Multiple statements on one line (colon)
1619-1619: Multiple statements on one line (colon)
Additional comments not posted (4)
kvmagent/kvmagent/plugins/prometheus.py (4)
1274-1281: 添加了新的标签gpu_serial和gpuState以及更新了gpu_state的描述。
1322-1323: 在gpu_rxpci_in_bytes和gpu_txpci_in_bytes指标中添加了新的标签gpu_serial。
1340-1347: 为 AMD GPU 相关的指标添加了新的标签gpu_serial和gpuState。
1363-1370: 确保convert_pci_state_to_int函数能够正确处理所有可能的 PCI 状态。
| metrics['gpu_power_draw'].add_metric([pci_device_address, gpu_serial], float(info[0].replace('W', '').strip())) | ||
| metrics['gpu_temperature'].add_metric([pci_device_address, gpu_serial], float(info[1].strip())) | ||
| metrics['gpu_fan_speed'].add_metric([pci_device_address, gpu_serial], float(info[2].replace('%', '').strip())) | ||
| metrics['gpu_utilization'].add_metric([pci_device_address, gpu_serial], float(info[3].replace('%', '').strip())) | ||
| metrics['gpu_memory_utilization'].add_metric([pci_device_address, gpu_serial], float(info[4].replace('%', '').strip())) | ||
| gpuState, gpu_state_int_value = convert_pci_state_to_int(pci_device_address) | ||
| metrics['gpu_state'].add_metric([pci_device_address, gpuState, gpu_serial], gpu_state_int_value) |
There was a problem hiding this comment.
Tip
Codebase Verification
def convert_pci_state_to_int(pci_address):
r, pci_status = bash_ro("lspci -s %s| grep -i 'ref ff'" % pci_address)
if r == 0 and len(pci_status.strip()) != 0:
return "critical", 0
return "nominal", 1The convert_pci_state_to_int function uses the lspci command to check the PCI status. If the status contains 'ref ff', it returns "critical" with a value of 0. Otherwise, it returns "nominal" with a value of 1. This function appears to handle two states: "critical" and "nominal".
Based on this implementation, it seems the function is designed to handle these specific states. However, it may not cover all possible PCI states comprehensively.
Final Response
确保 convert_pci_state_to_int 函数能够正确处理所有可能的 PCI 状态。目前,该函数仅处理 "critical" 和 "nominal" 两种状态,可能需要扩展以涵盖所有可能的 PCI 状态。
- 文件:
kvmagent/kvmagent/plugins/prometheus.py - 函数:
convert_pci_state_to_int
Analysis chain
确保 convert_pci_state_to_int 函数能够正确处理所有可能的 PCI 状态。
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# 检查 `convert_pci_state_to_int` 函数的实现,确保它能正确处理所有 PCI 状态。
ast-grep --lang python --pattern $'def convert_pci_state_to_int($_, $_) { $$$ }'
Length of output: 548
Script:
#!/bin/bash
# Correcting the pattern to search for the `convert_pci_state_to_int` function implementation.
ast-grep --lang python --pattern 'def convert_pci_state_to_int($_) { $$$ }'
Length of output: 540
Script:
#!/bin/bash
# Correcting the pattern to search for the `convert_pci_state_to_int` function implementation.
ast-grep --lang python --pattern 'def convert_pci_state_to_int($_):'
Length of output: 70
Script:
#!/bin/bash
# Using `rg` to search for the `convert_pci_state_to_int` function in the codebase.
rg 'def convert_pci_state_to_int'
Length of output: 118
Script:
#!/bin/bash
# Extracting the implementation of the `convert_pci_state_to_int` function from the specified file.
ast-grep --lang python --pattern 'def convert_pci_state_to_int($_): { $$$ }' kvmagent/kvmagent/plugins/prometheus.py
Length of output: 118
Script:
#!/bin/bash
# Displaying the contents of the file to locate the `convert_pci_state_to_int` function.
cat kvmagent/kvmagent/plugins/prometheus.py
Length of output: 78715
| return "critical", 0 | ||
|
|
||
| return 1 | ||
| return "nominal", 1 |
There was a problem hiding this comment.
convert_pci_state_to_int 函数应该在返回之前检查 pci_status 的内容,以确保状态转换的准确性。
- if r == 0 and len(pci_status.strip()) != 0:
+ if r == 0 and 'ref ff' in pci_status:Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
| return "critical", 0 | |
| return 1 | |
| return "nominal", 1 | |
| return "critical", 0 | |
| if r == 0 and 'ref ff' in pci_status: | |
| return "nominal", 1 |
Resolves: ZSTAC-65807
Change-Id: I696675636b66726a6d6d6c68796e7a756f6a6972
sync from gitlab !4770