Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug fix] Append() in Nullable columns causes nulls_ array to be incorrectly appended to if nested column Append() fails #376

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

moaazassali
Copy link

If Append() of the nested column fails, the nulls_ column will be incremented despite nothing being appended.

This causes the following issues:
1- The Size() reported by the Nullable<T> column will be wrong since it depends on the size of nulls_.
2- If you try to access this new "fake" element of the Nullable<T> column, an error will be thrown because false would be appended to nulls_ indicating that a valid non-null element exists. When At() or [] is used to access it, the actual nested column does not have this value, which causes the error to be thrown.
3- This is especially the case for columns like FixedString which could throw.

This pull request resolves that by moving the appending of nulls_ to the end of the Append() operation of the nested column. If the nested column Append() does fail, then the Append() to nulls_ is never called.

If Append() of the nested column fails, the nulls column will be incremented despite nothing being appended.

It will also be marked as not null, causing an error to be thrown if this new "fake" value in the column is accessed later.

this fixes that by moving the appending of the nulls markers to the end of the Append() operation of the nested column
if (value.has_value()) {
typed_nested_data_->Append(std::move(*value));
} else {
typed_nested_data_->Append(typename ValueType::value_type{});
}
// Must be called after Append to nested column because it might fail
ColumnNullable::Append(!value.has_value());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may also fail, so perhaps we can make ColumnNullable::Append(!value.has_value()); the first operation, wrap second part in try-catch (or with some RAII wrapper), and rollback last value in nulls_ in case of error. (perhaps by introducing protected method ColumnNullable::PopBackNull())

Copy link
Author

@moaazassali moaazassali Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the wanted behavior for Append() to fail if the nested Append() fails too? For example, if we append a 10-character string to ColumnFixedString(5), it will throw a validation error:

string.cpp, line 38:

    if (str.size() > string_size_) {
        throw ValidationError("Expected string of length not greater than "
                                 + std::to_string(string_size_) + " bytes, received "
                                 + std::to_string(str.size()) + " bytes.");
    }

So, my assumption was that Nullable(ColumnFixedString(5)) should behave the same way as the nested column and also throw an error instead of catching the error and failing silently. By moving ColumnNullable::Append(!value.has_value()); to the end, we ensure that we throw an error early, matching the behavior of the nested column without causing unintended side effects.

However, I may have misunderstood the intended behavior. Could you please clarify further?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it needs to fail, but also the behavior must be consistent. I.e. nulls_->Size() must be the same as typed_nested_data_->Size(), which is not true with current implementation (and your proposed change).

So I just propose the way to rollback changes of nulls_, since it is the easiest one to do, if there is any exception.

Copy link
Author

@moaazassali moaazassali Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide a specific example of how it leads to inconsistencies? I can't seem to think of one. This is my understanding:

We have 3 cases:
1- value is not null and nested Append() does not fail:

  • we call typed_nested_data_->Append(std::move(*value)); which does not fail
  • typed_nested_data_ size = N+1, nulls_ size = N
  • we call ColumnNullable::Append(!value.has_value());
  • typed_nested_data_ size = N+1, nulls_ size = N+1
  • we have consistent sizes

2- value is not null and nested Append fails:

  • we call typed_nested_data_->Append(std::move(*value)); which fails
  • an error is thrown so we exit the function
  • typed_nested_data_ size = N, nulls_ size = N
  • we have consistent sizes

3- value is null:

  • we call typed_nested_data_->Append(typename ValueType::value_type{}); which cannot fail
  • we call ColumnNullable::Append(!value.has_value());
  • typed_nested_data_ size = N+1, nulls_ size = N+1
  • we have consistent sizes

I have also tested those 3 cases and it passed them:

const auto col = new ColumnNullableT<ColumnFixedString>(5);
col->Append("hello");
CHECK(col->Size() == 1);
CHECK(col->typed_nested_data_->Size() == 1);

CHECK_THROWS(col->Append("world!"));
CHECK(col->Size() == 1);
CHECK(col->typed_nested_data_->Size() == 1);

col->Append(std::nullopt);
CHECK(col->Size() == 2);
CHECK(col->typed_nested_data_->Size() == 2);

Copy link
Collaborator

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some changes required

@Enmk
Copy link
Collaborator

Enmk commented Jul 8, 2024

Hi @moaazassali thank you for contribution! Please fix the issue mentioned in the review, and then we can merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants