Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] arrow::json::DecimalConverter should rescale values based on the explicit_schema #33200

Closed
asfimport opened this issue Oct 12, 2022 · 3 comments
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Priority: Critical Type: bug
Milestone

Comments

@asfimport
Copy link
Collaborator

The C++ lib doesn't read JSON decimal values correctly based on the explicit_schema. This can be reproduced by this helloworld program: https://github.com/stiga-huang/arrow-helloworld/tree/d267862

The input JSON file has the following rows:

{"id":1,"str":"Some","price":"30.04"}
{"id":2,"str":"data","price":"1.234"} 

If we read the price column using decimal128(9, 2), the values are


      30.04,
      12.34

If we use decimal128(9, 3) instead, the values are


      3.004,
      1.234

The decimal type in the explicit_schema is set here: https://github.com/stiga-huang/arrow-helloworld/blob/d26786270e87d9ab847658ead96a96190461b98f/json_decimal_example.cc#L38

The cause is arrow::json::DecimalConverter doesn't rescale the value based on the out_type_:

  Status Convert(const std::shared_ptr<Array>& in, std::shared_ptr<Array>* out) override {
    if (in->type_id() == Type::NA) {
      return MakeArrayOfNull(out_type_, in->length(), pool_).Value(out);
    }
    const auto& dict_array = GetDictionaryArray(in);

    using Builder = typename TypeTraits<T>::BuilderType;
    Builder builder(out_type_, pool_);
    RETURN_NOT_OK(builder.Resize(dict_array.indices()->length()));

    auto visit_valid = [&builder](string_view repr) {
      ARROW_ASSIGN_OR_RAISE(value_type value,
                            TypeTraits<T>::BuilderType::ValueType::FromString(repr));
      //////////// Should rescale the value based on out_type_ here
      builder.UnsafeAppend(value);
      return Status::OK();
    };

    auto visit_null = [&builder]() {
      builder.UnsafeAppendNull();
      return Status::OK();
    };

    RETURN_NOT_OK(VisitDictionaryEntries(dict_array, visit_valid, visit_null));
    return builder.Finish(out);
  }

ARROW_ASSIGN_OR_RAISE(value_type value,
TypeTraits<T>::BuilderType::ValueType::FromString(repr));
builder.UnsafeAppend(value);

Reporter: Quanlong Huang / @stiga-huang
Assignee: Quanlong Huang / @stiga-huang

PRs and other links:

Note: This issue was originally created as ARROW-17995. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Issue resolved by pull request 14380
#14380

@asfimport
Copy link
Collaborator Author

Quanlong Huang / @stiga-huang:
@pitrou Can we back port this to older branches like 8.0.2 and 9.0.1 ? I think it's a clean cherry-pick.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
We're unlikely to do a 9.0.1 as 10.0.0 is entering the release phase very soon. Sorry!

@raulcd raulcd modified the milestones: 9.0.1, 10.0.0 Jan 12, 2023
@wjones127 wjones127 added the Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. label Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: C++ Critical Fix Bugfixes for security vulnerabilities, crashes, or invalid data. Priority: Critical Type: bug
Projects
None yet
Development

No branches or pull requests

3 participants