Add num_chiplets by dhernandez0 · Pull Request #1018 · ROCm/MITuna

dhernandez0 · 2025-12-12T10:22:03Z

Motivation

See ROCm/rocMLIR#2168

Technical Details

See ROCm/rocMLIR#2168

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR adds support for the num_chiplets field and introduces attention operation support to the rocMLIR tuning framework. The changes extend the database schema and configuration handling to track chiplet information alongside existing architecture details, and add a complete implementation for attention operation tuning including config tables, jobs, and results. Additionally, it includes various code quality improvements such as refactoring variable names to follow Python conventions, improving error handling, and updating dependencies.

Key changes include:

Addition of num_chiplets field to session tracking and result export (though column definition is missing)
New attention operation support with dedicated config, job, and results tables
Addition of tuning space parameter to control configuration space exploration
Dependency updates and code quality improvements

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tuna/worker_interface.py	Added sys import and exit code handling for failed jobs via any_failed attribute
tuna/rocmlir/tuning_space.py	New enum class defining tuning space options (quick, full, exhaustive)
tuna/rocmlir/triggers.py	Added database trigger for attention operation timestamp tracking
tuna/rocmlir/rocmlir_worker.py	Added num_chiplets parsing, attention operation support, sanity checks for bad data, and improved error handling with traceback
tuna/rocmlir/rocmlir_tables.py	Major changes: added num_chiplets to session queries/exports (missing column definition), added complete attention config/job/results tables, refactored variable naming to snake_case, added CSV export functionality
tuna/rocmlir/rocmlir_lib.py	Added tuning_space argument, made mutually exclusive group required, removed -e shorthand, added attention trigger
tuna/rocmlir/export_configs.py	Added CSV export option alongside existing TSV export
tuna/rocmlir/config_type.py	Added attention to ConfigType enum and implemented str method
tuna/machine.py	Changed arch_full default from arch reference to empty string
tuna/libraries.py	Fixed typo in module docstring (liraries -> libraries)
requirements.txt	Updated multiple dependencies including cryptography, paramiko, sphinx, and added new packages (celery, flower, redis, asyncio)

Comments suppressed due to low confidence (16)

tuna/rocmlir/rocmlir_tables.py:176

These commented-out lines should either be removed or properly documented if they're intended as alternative implementation. Leaving commented code without explanation reduces code clarity.

#       with DbCursor() as cur:
#         # (may need list(cur....))
#         outcsv.writerows(cur.execute(f"select * from {self.__tablename__};"))

tuna/rocmlir/rocmlir_tables.py:593

Good improvement: the variable naming has been changed from camelCase to snake_case (one_config) to follow Python naming conventions consistently.

        one_config = f"{dataTypeString}{outDataTypeString}\
                       {transAString}{transBString}{line}".strip()
        if one_config not in configs:
          configs.append(one_config)

        # Special case to get both i8_i8 and i8_i32, w/o --data-type or output-type-map.
        if "-out_datatype" not in line and datatype == 'i8':
          outDataTypeString = "-out_datatype i32 "
          one_config = f"{dataTypeString}{outDataTypeString}\
                         {transAString}{transBString}{line}".strip()
          if one_config not in configs:
            configs.append(one_config)

tuna/rocmlir/rocmlir_tables.py:693

This debug print statement should be replaced with proper logging using self.logger.debug() to be consistent with the logging pattern used elsewhere in the codebase and to allow better control of debug output.

    print(f"Parsing line {line}")

tuna/rocmlir/rocmlir_tables.py:639

The column names use Python naming convention (snake_case) but end with uppercase letters (transpose_Q, transpose_K, transpose_V, transpose_O). Consider using fully lowercase names like transpose_q, transpose_k, transpose_v, transpose_o to be consistent with Python naming conventions, or capitalize as TransposeQ if representing matrix names.

  transpose_Q = Column(Boolean, nullable=False, server_default="0")
  transpose_K = Column(Boolean, nullable=False, server_default="0")
  transpose_V = Column(Boolean, nullable=False, server_default="0")
  transpose_O = Column(Boolean, nullable=False, server_default="0")

tuna/rocmlir/rocmlir_tables.py:748

Multiple boolean parameters (withAttnScale, withAttnBias, causal, return_lse) are hardcoded to only iterate over ['false']. This significantly limits the configuration space being explored. If these are intentionally disabled, add a comment explaining why. Otherwise, these should iterate over ['false', 'true'] like the transpose parameters.

      for datatype, transQ, transK, transV, transO, withAttnScale, withAttnBias, causal, return_lse, line in \
              itertools.product(['f32', 'f16'], ['false', 'true'],
                                ['false', 'true'], ['false', 'true'],
                                ['false', 'true'], ['false'], ['false'], ['false'], ['false'], lines):

tuna/rocmlir/rocmlir_tables.py:76

The unique constraint and query filters reference 'num_chiplets', but this column is not defined in the SessionRocMLIR class. The column should be added as a Column definition (similar to num_cu) before it can be used in the unique constraint and filters. Add a line like: num_chiplets = Column(Integer, nullable=False, server_default="1") to the SessionRocMLIR class.

  __table_args__ = (UniqueConstraint("arch",
                                     "num_cu",
                                     "num_chiplets",
                                     "rocm_v",
                                     "mlir_v",
                                     "reason",
                                     "config_type",
                                     "tuning_space",

tuna/rocmlir/rocmlir_worker.py:153

The comment "Empty command result, retrying" suggests this is a retry scenario, but TryAgain is being raised. Consider logging more context about why this is empty and whether this is expected behavior, especially since it could indicate an issue with the underlying command execution.

              self.logger.info("Empty command result, retrying.")
              raise TryAgain

tuna/rocmlir/rocmlir_worker.py:163

Good improvement: using traceback.format_exc() provides much more detailed error information than just the exception object, which will help with debugging. However, be aware this includes full stack traces which could be verbose in logs.

      self.logger.error('Exception occurred while running job %s:  %s',
                        self.job.id, traceback.format_exc())

tuna/rocmlir/rocmlir_tables.py:331

Good improvement: moving DATA_TYPES and other configuration lists from class-level variables to function-local variables improves encapsulation and makes the code more maintainable. These values are only used within the get_configurations method.

    directions = ['-F 1', '-F 2', '-F 4']
    data_types = ['conv', 'convfp16', 'convint8']
    layouts = ['NHWC', 'NCHW']

tuna/rocmlir/rocmlir_tables.py:121

These commented-out lines should either be removed or properly documented if they're intended as alternative implementation. Leaving commented code without explanation reduces code clarity.

    with open(filename, 'w', encoding='utf8') as f:
      outcsv = csv.writer(f)
#       with DbCursor() as cur:
#         # (may need list(cur....))
#         outcsv.writerows(cur.execute(f"select * from {self.__tablename__};"))

tuna/rocmlir/rocmlir_tables.py:363

Good improvement: the variable naming has been changed from camelCase to snake_case (one_config) to follow Python naming conventions consistently.

        one_config = ""
        # For datatype, check for the presence of a positional arg.
        if line[0][0] == "-":
          one_config = f"{datatype} "

        if "-F" not in line:
          one_config += f"{direction} "  # -F included in direction.
        one_config += make_option_if_not_in_line("-f", layout, line)
        one_config += make_option_if_not_in_line("-I", layout, line)
        one_config += make_option_if_not_in_line("-O", layout, line)
        one_config += line
        one_config = one_config.strip()

tuna/rocmlir/rocmlir_tables.py:119

This comment appears to contain commented-out code.

    with open(filename, 'w', encoding='utf8') as f:
      outcsv = csv.writer(f)
#       with DbCursor() as cur:

tuna/rocmlir/rocmlir_tables.py:174

This comment appears to contain commented-out code.

    with open(filename, 'w', encoding='utf8') as f:
      outcsv = csv.writer(f)
#       with DbCursor() as cur:

tuna/rocmlir/rocmlir_tables.py:679

This comment appears to contain commented-out code.

  def config_string(self):
    """Return config as a flag/value string suitable for tuningRunner.py."""
    string = ""
    #     for field, value in self.to_dict().items():

tuna/rocmlir/rocmlir_tables.py:44

Import of 'DbCursor' is not used.

from tuna.sql import DbCursor

tuna/rocmlir/rocmlir_tables.py:774

This class does not call ResultsMixin.init during initialization. (The class lacks an init method to ensure every base class init is called.)

    return configs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T10:30:06Z

    """Write the contents of the table as a .tsv file for perfRunner.py."""
    arch = dbt.session.arch_full
    num_cu = dbt.session.num_cu
+    num_chiplets = dbt.session.num_chiplets


The code references dbt.session.num_chiplets, but this attribute is not defined in the SessionRocMLIR class. This will cause an AttributeError at runtime.

Suggested change

num_chiplets = dbt.session.num_chiplets

num_chiplets = getattr(dbt.session, "num_chiplets", 0)

Copilot · 2025-12-12T10:30:07Z

    arch = dbt.session.arch_full
    num_cu = dbt.session.num_cu
+    num_chiplets = dbt.session.num_chiplets
    session_id = dbt.session_id


The TSV header comment should be updated to include 'num_chiplets' to match the data format being written. The header should be something like: "# arch\tnumCUs\tnumChiplets\ttestVector\tperfConfig (tuna)"

Copilot · 2025-12-12T10:30:07Z

    obj = self.dbt.results()

-    arch, num_cu, config, perf_config, tflops = obj.parse(result_str)
+    arch, num_cu, num_chiplets, config, perf_config, tflops = obj.parse(result_str)


The parse method is expected to return 6 values (arch, num_cu, num_chiplets, config, perf_config, tflops), but the ResultsMixin.parse method only splits the line by tabs. The calling code needs to match what the external tool (tuningRunner.py) actually outputs, or the unpacking will fail with a ValueError.

Suggested change

arch, num_cu, num_chiplets, config, perf_config, tflops = obj.parse(result_str)

try:

parsed = obj.parse(result_str)

if len(parsed) != 6:

self.logger.warning('Malformed result string for job_id=%s: expected 6 fields, got %d. Skipping. String: %r', self.job.id, len(parsed), result_str)

return True # To avoid an update retry.

arch, num_cu, num_chiplets, config, perf_config, tflops = parsed

except Exception as e:

self.logger.warning('Failed to parse result string for job_id=%s: %s. Skipping. String: %r', self.job.id, str(e), result_str)

return True # To avoid an update retry.

Copilot · 2025-12-12T10:30:07Z

        .filter(sess_obj.arch == entry.arch)\
        .filter(sess_obj.num_cu == entry.num_cu)\
+        .filter(sess_obj.num_chiplets == entry.num_chiplets)\


The query attempts to filter by num_chiplets, but this attribute is not defined in the SessionRocMLIR class. This will cause an AttributeError at runtime.

dhernandez0 requested review from Copilot, dorde-antic and umangyadav December 12, 2025 10:22

Add num_chiplets

5925523

Copilot started reviewing on behalf of dhernandez0 December 12, 2025 10:22 View session

dhernandez0 changed the base branch from develop to pf-tuna-rocmlir-3 December 12, 2025 10:23

dhernandez0 self-assigned this Dec 12, 2025

dorde-antic approved these changes Dec 12, 2025

View reviewed changes

Copilot AI reviewed Dec 12, 2025

View reviewed changes

dhernandez0 added 2 commits December 12, 2025 13:10

fix

e78eb6c

Add num_chiplets

b0643d1

dhernandez0 mentioned this pull request Dec 15, 2025

Use num_chiplets ROCm/rocMLIR#2168

Merged

1 task

dhernandez0 requested review from justinrosner and pabloantoniom December 15, 2025 12:33

justinrosner approved these changes Dec 15, 2025

View reviewed changes

umangyadav approved these changes Dec 15, 2025

View reviewed changes

dhernandez0 merged commit b2aafb1 into pf-tuna-rocmlir-3 Jan 12, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add num_chiplets#1018

Add num_chiplets#1018
dhernandez0 merged 3 commits intopf-tuna-rocmlir-3from
num_chiplets

dhernandez0 commented Dec 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	num_chiplets = dbt.session.num_chiplets
	num_chiplets = getattr(dbt.session, "num_chiplets", 0)

-    arch, num_cu, num_chiplets, config, perf_config, tflops = obj.parse(result_str)
+    try:
+      parsed = obj.parse(result_str)
+      if len(parsed) != 6:
+        self.logger.warning('Malformed result string for job_id=%s: expected 6 fields, got %d. Skipping. String: %r', self.job.id, len(parsed), result_str)
+        return True  # To avoid an update retry.
+      arch, num_cu, num_chiplets, config, perf_config, tflops = parsed
+    except Exception as e:
+      self.logger.warning('Failed to parse result string for job_id=%s: %s. Skipping. String: %r', self.job.id, str(e), result_str)
+      return True  # To avoid an update retry.

Conversation

dhernandez0 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dhernandez0 commented Dec 12, 2025 •

edited

Loading