You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Arrow is column based, but often clients need to import external data sources that are stored in a row based fashion. To help simplify the process, I propose we create a RowsToBatches utility function that can take any valid C++ range (std::begin/std::end is defined for T) and returns an arrow::RecordBatchReader (convertible to an arrow::Table). This is particularly useful when useful when the data types for each column are not known at compile time - like in the case of an std::variant
The interface could look like the following (simplified for clarity)
See linked pull request for full details. The client would only need to provide their Schema and a callable type that converts their structure’s types into the associated arrow types.
If the client type is not a C++ range, they can either add iterators or write a wrapper/adaptor that provides the iterators for the type.
Example Usage:
auto IntConvertor = [](ArrayBuilder& array_builder, int value) {
return static_cast<Int64Builder&>(array_builder).Append(value);
};
std::vector<std::vector<int>> data = {{1, 2, 4}, {5, 6, 7}};
auto batches = RowsToBatches(kTestSchema, std::ref(data), IntConvertor);
…ased structure into an `arrow::RecordBatchReader` or an `arrow::Table` (#34057)
*Are these changes tested?*
The following tests are provided:
- basic usage
- const ranges
- custom struct accessor
- usage with `std::variant`
* Closes: #34056
Lead-authored-by: Mike Hancock <mhancock34@bloomberg.net>
Co-authored-by: Michael Hancock <javaiscoolmike@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
… row-based structure into an `arrow::RecordBatchReader` or an `arrow::Table` (apache#34057)
*Are these changes tested?*
The following tests are provided:
- basic usage
- const ranges
- custom struct accessor
- usage with `std::variant`
* Closes: apache#34056
Lead-authored-by: Mike Hancock <mhancock34@bloomberg.net>
Co-authored-by: Michael Hancock <javaiscoolmike@gmail.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Signed-off-by: Will Jones <willjones127@gmail.com>
Enhancement Description:
Arrow is column based, but often clients need to import external data sources that are stored in a row based fashion. To help simplify the process, I propose we create a
RowsToBatches
utility function that can take any valid C++ range (std::begin
/std::end
is defined forT
) and returns anarrow::RecordBatchReader
(convertible to anarrow::Table
). This is particularly useful when useful when the data types for each column are not known at compile time - like in the case of anstd::variant
The interface could look like the following (simplified for clarity)
See linked pull request for full details. The client would only need to provide their
Schema
and a callable type that converts their structure’s types into the associated arrow types.If the client type is not a C++ range, they can either add iterators or write a wrapper/adaptor that provides the iterators for the type.
Example Usage:
Example Supported Types:
std::vector<std::vector<std::variant<int, bsl::string>>>
std::vector<MyRowStruct>
Component(s)
C++
The text was updated successfully, but these errors were encountered: