Skip to content

Conversation

@stbaione
Copy link
Collaborator

Motivation

To verify that our ROCm enabled triton-inference-server and python_backend work correctly for our desired use case, we needed to set up a test that runs a transformer-based tabular classification model on ROCm via the triton-inference-server and validate the outputs of the model against a dataset for accuracy.

Technical Details

Pipeline Overview

To accomplish this, we do the following:

  • Run the FTTransform model on CPU & store the weights it was initialized with
  • Run the CPU model against a dataset of 1000 randomly generated samples & store the outputs for each sample
  • Run the same FTTransform model on GPU via the ROCm enabled triton-inference-server, reusing the same weights as the CPU model
  • Reuse the 10000 randomly generates inputs and run them against the triton-inference-server
  • Collect the outputs and compare them to out outputs from the CPU model
  • If the output for all samples are within tolerance, we can be confident out server & python_backend maintained accuracy

Artifacts

  • The following files have been added to a new examples/tab_transform_pytorch directory:
    • model.py - Define the model we'll be running.
    • client.py - Script that we'll use to run against the triton-inference-server
    • config.pbtxt - Required for the triton-inference-server
    • generate_reference.py - Runs the CPU model, collects the weights, inputs, and outputs for later comparison.
  • Instructions have been added to the README.md file in a new Validate Tabular Accuracy for ROCm section

Test Result

Local runs showed that we were able to match the outputs between the CPU and GPU models, within tolerance for all samples:

Results:
  Max absolute difference:  4.17e-07
  Mean absolute difference: 7.86e-08
  Tolerance:                1.00e-05
  Samples exceeding tolerance: 0/10000

============================================================
PASS: All 10000 samples within tolerance (1e-05)
      GPU implementation matches CPU reference!
============================================================

Submission Checklist

@stbaione stbaione self-assigned this Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant