Skip to content

Conversation

@alextmagro
Copy link
Contributor

Description

Added a combined check for scale and values
If an scale difference is detected, values are checked for an off by one boundary condition
mismatch tolerance allows for a few scale differences (1% by default), before throwing an error.

Additionally, issues with dgelu CPU-side jitter have been resolved with data generation fixes.

@alextmagro
Copy link
Contributor Author

I have added a refactor in the 2nd commit -- I am unsure if it is cleaner this way around, but both solve the issue. @wenchenvincent and @ipanfilo , please have a look at both versions and let me know which one looks better

const size_t row_blocks, const size_t col_blocks, const size_t stride,
double tol, bool rowwise, std::vector<std::tuple<size_t, size_t, int>> &mismatch_idx) {
constexpr bool on_gpus = true;
if (on_gpus) output.to_cpu();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting cpu_scale_inv_ptr() below performs to_cpu()

if (std::abs(t_scale - r_scale) == 1) {
mismatch_idx.emplace_back(i, j, r_scale-t_scale);
} else {
ASSERT_FALSE(1) << "Error in " << name << std::endl
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use GTEST_FAIL() instead of ASSERT_FALSE(1)

for (; ii_min < ii_max; ii_min++) {
size_t jj_min = j * row_blocks;
const size_t jj_max = std::min(jj_min + row_blocks, cols);
for (; jj_min < jj_max; jj_min++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have ii and jj nested loops here? One scale value refers to 32 items either in row or in col but not in both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either row_blocks or col_blocks is always 1, so we are doing the logic in one direction or the other.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one of loops is guaranteed to be 1, which means either of col_blocks or row_blocks is always 1, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.

Copy link
Collaborator

@wangye805 wangye805 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

please merge after getting LGTM from Ilya as well

for (; ii_min < ii_max; ii_min++) {
size_t jj_min = j * row_blocks;
const size_t jj_max = std::min(jj_min + row_blocks, cols);
for (; jj_min < jj_max; jj_min++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one of loops is guaranteed to be 1, which means either of col_blocks or row_blocks is always 1, right?

const size_t jj_max = std::min(jj_min + row_blocks, cols);
for (; jj_min < jj_max; jj_min++) {
const size_t data_idx = ii_min * cols + jj_min;
if (scale_diff == 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be move IF out of the loops by making float scale_value 2.0 or 0.5

} else if (scale_diff == -1) {
ref_data[data_idx] = static_cast<T>(static_cast<float>(ref_data[data_idx])/2);
} else { // Shouldn't ever reach this
ASSERT_FALSE(1) << "Error in adjust_ref, |scale_diff| > 1";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTEST_FAIL() too?

Copy link
Collaborator

@wenchenvincent wenchenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's wait for CI before merging.

@alextmagro alextmagro merged commit b092058 into dev Oct 31, 2025
6 checks passed
@alextmagro alextmagro deleted the mxfp8_cast_test_fix branch October 31, 2025 15:43
ipanfilo pushed a commit that referenced this pull request Nov 8, 2025
* MXFP8 test scale off by 1 fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants