Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37978: [C++] Add support for specifying custom Array element delimiter to arrow::PrettyPrintOptions #37981

Merged
merged 12 commits into from
Oct 5, 2023

Conversation

kevingurney
Copy link
Member

@kevingurney kevingurney commented Oct 2, 2023

Rationale for this change

In order to make the arrow::PrettyPrint functionality for arrow::Array/arrow::ChunkedArray more flexible, it would be useful to be able to specify a custom element delimiter other than ",".

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom Array element delimiter, would make it possible to make the display of MATLAB arrow.array.Array objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the | between individual Array elements):

% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]

In order to support custom Array element delimiters, this pull request adds a new struct type PrettyPrintDelimiters. The PrettyPrintDelimiters type has one property element (of type std::string), which allows client code to control the delimiter used to distinguish between individual elements of an arrow::Array / arrow::ChunkedArray.

In a future pull request, we plan to add more properties like open and close to allow client code to specify the opening and closing delimiters to use when printing an arrow::Array / arrow::ChunkedArray (e.g. "<" rather than "[" and ">" rather than "]").

What changes are included in this PR?

  1. Added a new struct type PrettyPrintDelimiters with one property element (of type std::string). The element property allows client code to specify any string value as the delimiter to distinguish between individual elements of an arrow::Array or arrow::ChunkedArray when printing using arrow::PrettyPrint.
  2. Added two new properties to arrow::PrettyPrintOptions - (1) array_delimiters (of type arrow::PrettyPrintDelimiters) and chunked_array_delimiters (of type arrow::PrettyPrintDelimiters). These properties can be modified to customize how arrow::Arrow/arrow::ChunkedArray are printed when using arrow::PrettyPrint.

Are these changes tested?

Yes.

  1. Added new tests ArrayCustomElementDelimiter and ChunkedArrayCustomElementDelimiter to pretty_print_test.cc.
  2. All existing PrettyPrint-related C++ tests pass.

Are there any user-facing changes?

Yes.

  1. User's can now specify a custom element delimiter to use when printing arrow::Arrays / arrow::ChunkedArrays using arrow::PrettyPrint by modifying the array_delimiters or chunked_array_delimiters properties of arrow::PrettyPrintOptions.

Example:

auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);

Future Directions

  1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing Array delimiters (e.g. use < and > instead of [ and ]). I've captured the idea of supporting custom opening and closing Array delimiters in [C++] Add support for specifying custom Array opening and closing delimiters to arrow::PrettyPrintDelimiters #37979. I will follow up with a future PR to address this.

Notes

  1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
  2. @rok helpfully pointed out in [C++] Add support for specifying custom Array element delimiter to arrow::PrettyPrintOptions #37978 (comment) that a similar attempt to modify the default Array element delimiter to be ", " (note the space after the comma) was taken in [C++][Python] Add a space after commas in pretty-print output #30951. However, this issue appears to have gone stale and the PR (ARROW-15475: [C++][Python] Add a space after commas in pretty-print output #12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be ", ". However, for PyArrow, array_delimiters.element and chunked_array_delimiters.element could just be set to ", " after merging these changes to change the default display if that is still desirable).

@kevingurney
Copy link
Member Author

I will address the C++ linting errors shortly, my apologies for the delay.

2. Fix typo. ChunkedArrayPrimitiveTypeCustomArrayDelimiter -> ChunkedArrayPrimitiveTypeCustomArrayElementDelimiter.
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

FYI: We can fix C++ format by ninja format.

cpp/src/arrow/pretty_print.h Outdated Show resolved Hide resolved
cpp/src/arrow/pretty_print.h Outdated Show resolved Hide resolved
cpp/src/arrow/pretty_print_test.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Oct 3, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Oct 3, 2023
@kevingurney
Copy link
Member Author

@kou - thanks for the helpful review!

Sorry for the delay in fixing the C++ linting errors. I am having some trouble getting ninja format to work as expected.

@kevingurney
Copy link
Member Author

kevingurney commented Oct 3, 2023

Sorry, I am still having machine configuration issues that are causing issues with fixing the C++ linting errors. I will try to get this resolved tomorrow.

@kou
Copy link
Member

kou commented Oct 3, 2023

Do you want me to run ninja format and push the fix?

@kevingurney
Copy link
Member Author

@kou - thank you for offering! That would be very helpful if it wouldn't be too much trouble.

Unfortunately I think it will take some time for me to sort out the machine configuration issues I am having.

@kou
Copy link
Member

kou commented Oct 3, 2023

Sure! But I couldn't push to the branch... Could you apply the following patch?

diff --git a/cpp/src/arrow/pretty_print_test.cc b/cpp/src/arrow/pretty_print_test.cc
index 6c57505a3..cdced96e7 100644
--- a/cpp/src/arrow/pretty_print_test.cc
+++ b/cpp/src/arrow/pretty_print_test.cc
@@ -212,38 +212,19 @@ TEST_F(TestPrettyPrint, PrimitiveTypeCustomArrayElementDelimiter) {
 
   // Short array without ellipsis
   {
-      std::vector<bool> is_valid = {true, true, false, true, false};
-      std::vector<int32_t> values = {1, 2, 3, 4, 5};
-      const char* expected = "[1 | 2 | null | 4 | null]";
-      CheckPrimitive<Int32Type, int32_t>(options, is_valid, values, expected, false);
+    std::vector<bool> is_valid = {true, true, false, true, false};
+    std::vector<int32_t> values = {1, 2, 3, 4, 5};
+    const char* expected = "[1 | 2 | null | 4 | null]";
+    CheckPrimitive<Int32Type, int32_t>(options, is_valid, values, expected, false);
   }
 
   // Longer array with ellipsis
   {
-      std::vector<bool> is_valid = {
-          true,
-          false,
-          true,
-          true,
-          false,
-          true,
-          false,
-          true,
-          true
-      };
-      std::vector<int32_t> values = {
-          1,
-          2,
-          3,
-          4,
-          5,
-          6,
-          7,
-          8,
-          9
-      };
-      const char* expected = "[1 | null | 3 | ... | null | 8 | 9]";
-      CheckPrimitive<Int32Type, int32_t>(options, is_valid, values, expected, false);
+    std::vector<bool> is_valid = {true, false, true, true, false,
+                                  true, false, true, true};
+    std::vector<int32_t> values = {1, 2, 3, 4, 5, 6, 7, 8, 9};
+    const char* expected = "[1 | null | 3 | ... | null | 8 | 9]";
+    CheckPrimitive<Int32Type, int32_t>(options, is_valid, values, expected, false);
   }
 }
 
@@ -1081,20 +1062,20 @@ TEST_F(TestPrettyPrint, ChunkedArrayPrimitiveTypeCustomArrayElementDelimiter) {
 
   // ChunkedArray with 1 chunk
   {
-      const ChunkedArray chunked_array(chunk);
+    const ChunkedArray chunked_array(chunk);
 
-      static const char* expected = R"expected([[1 | 2 | null | 4 | null]])expected";
-      CheckStream(chunked_array, options, expected);
+    static const char* expected = R"expected([[1 | 2 | null | 4 | null]])expected";
+    CheckStream(chunked_array, options, expected);
   }
 
   // ChunkedArray with 2 chunks
   {
-      const ChunkedArray chunked_array({chunk, chunk});
+    const ChunkedArray chunked_array({chunk, chunk});
 
-      static const char* expected =
-          R"expected([[1 | 2 | null | 4 | null],[1 | 2 | null | 4 | null]])expected";
+    static const char* expected =
+        R"expected([[1 | 2 | null | 4 | null],[1 | 2 | null | 4 | null]])expected";
 
-      CheckStream(chunked_array, options, expected);
+    CheckStream(chunked_array, options, expected);
   }
 }
 

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 3, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 3, 2023
@kevingurney
Copy link
Member Author

Thanks for providing the patch @kou! I've applied it and made some edits to the tests based on your suggestions.

`chunked_array_delimiters` properties to `PrettyPrintOptions` struct
based on @kou's suggestion.
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 4, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Oct 4, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Oct 4, 2023
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Thanks! Could you update the description before we merge this?

@github-actions github-actions bot added awaiting merge Awaiting merge awaiting changes Awaiting changes and removed awaiting change review Awaiting change review awaiting merge Awaiting merge labels Oct 4, 2023
@kevingurney
Copy link
Member Author

I have updated the description. Thanks for the reminder!

At this point, I think this is ready to be merged, so I will go ahead and merge it.

Thanks very much for your help @kou!

@kevingurney
Copy link
Member Author

+1

@kevingurney kevingurney merged commit f5e40dc into apache:main Oct 5, 2023
42 checks passed
@kevingurney kevingurney deleted the GH-37978 branch October 5, 2023 16:25
@kevingurney kevingurney removed the awaiting changes Awaiting changes label Oct 5, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit f5e40dc.

There were 6 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 6 possible false positives for unstable benchmarks that are known to sometimes produce them.

@kevingurney
Copy link
Member Author

kevingurney commented Oct 6, 2023

This PR appears to have triggered a regression, but I am not clear why that would be. I wouldn't have expected the overhead of accessing a nested property on PrettyPrintOptions to be so high.

Do these benchmarks exercise the PrettyPrint code heavily?

I am new to interpreting Conbench results, so please let me know if there is more I can do to understand what is happening here.

@kou
Copy link
Member

kou commented Oct 7, 2023

I think that it's a false positive...

@kevingurney
Copy link
Member Author

OK, I will treat it as a false positive for the time being, in that case.

In the future, if we wanted, we could follow up with another PR to store the custom delimiter values in local variables inside of WriteValues. I'm not sure, but, there is a chance that might be slightly more efficient than repeatedly indexing into a nested struct.

jorisvandenbossche pushed a commit that referenced this pull request Oct 12, 2023
…losing delimiters to `arrow::PrettyPrintDelimiters` (#38187)

### Rationale for this change

This is a follow up to  #37981.

in order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array` more flexible, it would be useful to be able to specify a custom `Array` opening and closing delimiter other than `"["` and `"]"`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom opening and closing delimiter for `Array` would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

In order to support custom `Array` opening and closing delimiters, this pull request adds two new properties, `open` and `close`, to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct.

This enable use cases like the ability to display an `arrow::Array` as `<1,2,3>` instead of `[1,2,3]`, by setting `options.array_delimiters.open = "<"` and `options.array_delimiters.close = ">"`.

### What changes are included in this PR?

This pull request adds two new properties to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct:

1. `open` - the opening delimiter to use for an `Array` or `ChunkedArray` (default = `[`).
1. `close` - the closing delimiter to use for an `Array` or `ChunkedArray` (default = `]`).

### Are these changes tested?

Yes.

1. Added two new tests: (1) `ArrayCustomOpenCloseDelimiter` and (2) `ChunkedArrayCustomOpenCloseDelimiter`.
2. All existing tests related to `arrow::PrettyPrint` pass.

### Are there any user-facing changes?

Yes.

This pull request adds two new public, user-facing properties, (1) `open` (of type `std::string`) and (2) `close` (also of type `std::string`) to the `PrettyPrintDelimiters` struct. This enables client code to specify custom opening and closing delimiters to use when printing an `arrow::Array` or `arrow::ChunkedArray` by changing the values of the nested `open` and `close` properties of the `array_delimiters`/`chunked_array_delimiters` properties of `PrettyPrintOptions`.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.

### Future Directions

1. Now that client code can easily specify custom opening, closing, and element delimiters, it may make sense to address #30951.
* Closes: #37979

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
llama90 pushed a commit to llama90/arrow that referenced this pull request Oct 12, 2023
… and closing delimiters to `arrow::PrettyPrintDelimiters` (apache#38187)

### Rationale for this change

This is a follow up to  apache#37981.

in order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array` more flexible, it would be useful to be able to specify a custom `Array` opening and closing delimiter other than `"["` and `"]"`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom opening and closing delimiter for `Array` would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

In order to support custom `Array` opening and closing delimiters, this pull request adds two new properties, `open` and `close`, to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct.

This enable use cases like the ability to display an `arrow::Array` as `<1,2,3>` instead of `[1,2,3]`, by setting `options.array_delimiters.open = "<"` and `options.array_delimiters.close = ">"`.

### What changes are included in this PR?

This pull request adds two new properties to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct:

1. `open` - the opening delimiter to use for an `Array` or `ChunkedArray` (default = `[`).
1. `close` - the closing delimiter to use for an `Array` or `ChunkedArray` (default = `]`).

### Are these changes tested?

Yes.

1. Added two new tests: (1) `ArrayCustomOpenCloseDelimiter` and (2) `ChunkedArrayCustomOpenCloseDelimiter`.
2. All existing tests related to `arrow::PrettyPrint` pass.

### Are there any user-facing changes?

Yes.

This pull request adds two new public, user-facing properties, (1) `open` (of type `std::string`) and (2) `close` (also of type `std::string`) to the `PrettyPrintDelimiters` struct. This enables client code to specify custom opening and closing delimiters to use when printing an `arrow::Array` or `arrow::ChunkedArray` by changing the values of the nested `open` and `close` properties of the `array_delimiters`/`chunked_array_delimiters` properties of `PrettyPrintOptions`.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.

### Future Directions

1. Now that client code can easily specify custom opening, closing, and element delimiters, it may make sense to address apache#30951.
* Closes: apache#37979

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
… delimiter to `arrow::PrettyPrintOptions` (apache#37981)

### Rationale for this change

In order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would be useful to be able to specify a custom element delimiter other than `","`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom `Array` element delimiter, would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the ` | ` between individual `Array` elements):

```matlab
% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]
```

In order to support custom `Array` element delimiters, this pull request adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` type has one property `element` (of type `std::string`), which allows client code to control the delimiter used to distinguish between individual elements of an `arrow::Array` / `arrow::ChunkedArray`. 

In a future pull request, we plan to add more properties like `open` and `close` to allow client code to specify the opening and closing delimiters to use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather than `"["` and `">"` rather than `"]"`).

### What changes are included in this PR?

1. Added a new `struct` type `PrettyPrintDelimiters` with one property `element` (of type `std::string`). The `element` property allows client code to specify any string value as the delimiter to distinguish between individual elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using `arrow::PrettyPrint`.
2. Added two new properties to `arrow::PrettyPrintOptions` - (1) `array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and `chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These properties can be modified to customize how `arrow::Arrow`/`arrow::ChunkedArray` are printed when using `arrow::PrettyPrint`.

### Are these changes tested?

Yes.

1. Added new tests `ArrayCustomElementDelimiter` and `ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
2. All existing `PrettyPrint`-related C++ tests pass.

### Are there any user-facing changes?

Yes.

1. User's can now specify a custom element delimiter to use when printing `arrow::Array`s / `arrow::ChunkedArray`s using  [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) by modifying the `array_delimiters` or `chunked_array_delimiters` properties of `arrow::PrettyPrintOptions`.

**Example**:

```cpp
auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);
```

### Future Directions

1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing `Array` delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the idea of supporting custom opening and closing `Array` delimiters in apache#37979. I will follow up with a future PR to address this.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
3. @ rok helpfully pointed out in apache#37978 (comment) that a similar attempt to modify the default `Array` element delimiter to be `", "` (note the space after the comma) was taken in apache#30951. However, this issue appears to have gone stale and the PR (apache#12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be `", "`. However, for `PyArrow`, `array_delimiters.element` and `chunked_array_delimiters.element` could just be set to `", "` after merging these changes to change the default display if that is still desirable).
* Closes: apache#37978

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
… and closing delimiters to `arrow::PrettyPrintDelimiters` (apache#38187)

### Rationale for this change

This is a follow up to  apache#37981.

in order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array` more flexible, it would be useful to be able to specify a custom `Array` opening and closing delimiter other than `"["` and `"]"`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom opening and closing delimiter for `Array` would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

In order to support custom `Array` opening and closing delimiters, this pull request adds two new properties, `open` and `close`, to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct.

This enable use cases like the ability to display an `arrow::Array` as `<1,2,3>` instead of `[1,2,3]`, by setting `options.array_delimiters.open = "<"` and `options.array_delimiters.close = ">"`.

### What changes are included in this PR?

This pull request adds two new properties to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct:

1. `open` - the opening delimiter to use for an `Array` or `ChunkedArray` (default = `[`).
1. `close` - the closing delimiter to use for an `Array` or `ChunkedArray` (default = `]`).

### Are these changes tested?

Yes.

1. Added two new tests: (1) `ArrayCustomOpenCloseDelimiter` and (2) `ChunkedArrayCustomOpenCloseDelimiter`.
2. All existing tests related to `arrow::PrettyPrint` pass.

### Are there any user-facing changes?

Yes.

This pull request adds two new public, user-facing properties, (1) `open` (of type `std::string`) and (2) `close` (also of type `std::string`) to the `PrettyPrintDelimiters` struct. This enables client code to specify custom opening and closing delimiters to use when printing an `arrow::Array` or `arrow::ChunkedArray` by changing the values of the nested `open` and `close` properties of the `array_delimiters`/`chunked_array_delimiters` properties of `PrettyPrintOptions`.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.

### Future Directions

1. Now that client code can easily specify custom opening, closing, and element delimiters, it may make sense to address apache#30951.
* Closes: apache#37979

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
… delimiter to `arrow::PrettyPrintOptions` (apache#37981)

### Rationale for this change

In order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would be useful to be able to specify a custom element delimiter other than `","`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom `Array` element delimiter, would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the ` | ` between individual `Array` elements):

```matlab
% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]
```

In order to support custom `Array` element delimiters, this pull request adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` type has one property `element` (of type `std::string`), which allows client code to control the delimiter used to distinguish between individual elements of an `arrow::Array` / `arrow::ChunkedArray`. 

In a future pull request, we plan to add more properties like `open` and `close` to allow client code to specify the opening and closing delimiters to use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather than `"["` and `">"` rather than `"]"`).

### What changes are included in this PR?

1. Added a new `struct` type `PrettyPrintDelimiters` with one property `element` (of type `std::string`). The `element` property allows client code to specify any string value as the delimiter to distinguish between individual elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using `arrow::PrettyPrint`.
2. Added two new properties to `arrow::PrettyPrintOptions` - (1) `array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and `chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These properties can be modified to customize how `arrow::Arrow`/`arrow::ChunkedArray` are printed when using `arrow::PrettyPrint`.

### Are these changes tested?

Yes.

1. Added new tests `ArrayCustomElementDelimiter` and `ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
2. All existing `PrettyPrint`-related C++ tests pass.

### Are there any user-facing changes?

Yes.

1. User's can now specify a custom element delimiter to use when printing `arrow::Array`s / `arrow::ChunkedArray`s using  [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) by modifying the `array_delimiters` or `chunked_array_delimiters` properties of `arrow::PrettyPrintOptions`.

**Example**:

```cpp
auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);
```

### Future Directions

1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing `Array` delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the idea of supporting custom opening and closing `Array` delimiters in apache#37979. I will follow up with a future PR to address this.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
3. @ rok helpfully pointed out in apache#37978 (comment) that a similar attempt to modify the default `Array` element delimiter to be `", "` (note the space after the comma) was taken in apache#30951. However, this issue appears to have gone stale and the PR (apache#12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be `", "`. However, for `PyArrow`, `array_delimiters.element` and `chunked_array_delimiters.element` could just be set to `", "` after merging these changes to change the default display if that is still desirable).
* Closes: apache#37978

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
… and closing delimiters to `arrow::PrettyPrintDelimiters` (apache#38187)

### Rationale for this change

This is a follow up to  apache#37981.

in order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array` more flexible, it would be useful to be able to specify a custom `Array` opening and closing delimiter other than `"["` and `"]"`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom opening and closing delimiter for `Array` would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

In order to support custom `Array` opening and closing delimiters, this pull request adds two new properties, `open` and `close`, to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct.

This enable use cases like the ability to display an `arrow::Array` as `<1,2,3>` instead of `[1,2,3]`, by setting `options.array_delimiters.open = "<"` and `options.array_delimiters.close = ">"`.

### What changes are included in this PR?

This pull request adds two new properties to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct:

1. `open` - the opening delimiter to use for an `Array` or `ChunkedArray` (default = `[`).
1. `close` - the closing delimiter to use for an `Array` or `ChunkedArray` (default = `]`).

### Are these changes tested?

Yes.

1. Added two new tests: (1) `ArrayCustomOpenCloseDelimiter` and (2) `ChunkedArrayCustomOpenCloseDelimiter`.
2. All existing tests related to `arrow::PrettyPrint` pass.

### Are there any user-facing changes?

Yes.

This pull request adds two new public, user-facing properties, (1) `open` (of type `std::string`) and (2) `close` (also of type `std::string`) to the `PrettyPrintDelimiters` struct. This enables client code to specify custom opening and closing delimiters to use when printing an `arrow::Array` or `arrow::ChunkedArray` by changing the values of the nested `open` and `close` properties of the `array_delimiters`/`chunked_array_delimiters` properties of `PrettyPrintOptions`.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.

### Future Directions

1. Now that client code can easily specify custom opening, closing, and element delimiters, it may make sense to address apache#30951.
* Closes: apache#37979

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
… delimiter to `arrow::PrettyPrintOptions` (apache#37981)

### Rationale for this change

In order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would be useful to be able to specify a custom element delimiter other than `","`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom `Array` element delimiter, would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the ` | ` between individual `Array` elements):

```matlab
% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]
```

In order to support custom `Array` element delimiters, this pull request adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` type has one property `element` (of type `std::string`), which allows client code to control the delimiter used to distinguish between individual elements of an `arrow::Array` / `arrow::ChunkedArray`. 

In a future pull request, we plan to add more properties like `open` and `close` to allow client code to specify the opening and closing delimiters to use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather than `"["` and `">"` rather than `"]"`).

### What changes are included in this PR?

1. Added a new `struct` type `PrettyPrintDelimiters` with one property `element` (of type `std::string`). The `element` property allows client code to specify any string value as the delimiter to distinguish between individual elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using `arrow::PrettyPrint`.
2. Added two new properties to `arrow::PrettyPrintOptions` - (1) `array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and `chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These properties can be modified to customize how `arrow::Arrow`/`arrow::ChunkedArray` are printed when using `arrow::PrettyPrint`.

### Are these changes tested?

Yes.

1. Added new tests `ArrayCustomElementDelimiter` and `ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
2. All existing `PrettyPrint`-related C++ tests pass.

### Are there any user-facing changes?

Yes.

1. User's can now specify a custom element delimiter to use when printing `arrow::Array`s / `arrow::ChunkedArray`s using  [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) by modifying the `array_delimiters` or `chunked_array_delimiters` properties of `arrow::PrettyPrintOptions`.

**Example**:

```cpp
auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);
```

### Future Directions

1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing `Array` delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the idea of supporting custom opening and closing `Array` delimiters in apache#37979. I will follow up with a future PR to address this.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
3. @ rok helpfully pointed out in apache#37978 (comment) that a similar attempt to modify the default `Array` element delimiter to be `", "` (note the space after the comma) was taken in apache#30951. However, this issue appears to have gone stale and the PR (apache#12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be `", "`. However, for `PyArrow`, `array_delimiters.element` and `chunked_array_delimiters.element` could just be set to `", "` after merging these changes to change the default display if that is still desirable).
* Closes: apache#37978

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
… and closing delimiters to `arrow::PrettyPrintDelimiters` (apache#38187)

### Rationale for this change

This is a follow up to  apache#37981.

in order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array` more flexible, it would be useful to be able to specify a custom `Array` opening and closing delimiter other than `"["` and `"]"`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom opening and closing delimiter for `Array` would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

In order to support custom `Array` opening and closing delimiters, this pull request adds two new properties, `open` and `close`, to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct.

This enable use cases like the ability to display an `arrow::Array` as `<1,2,3>` instead of `[1,2,3]`, by setting `options.array_delimiters.open = "<"` and `options.array_delimiters.close = ">"`.

### What changes are included in this PR?

This pull request adds two new properties to the [`arrow::PrettyPrintDelimiters`](https://github.com/apache/arrow/blob/c37059ad7b87f0cbb681f6388aca0e3f02860351/cpp/src/arrow/pretty_print.h#L38) struct:

1. `open` - the opening delimiter to use for an `Array` or `ChunkedArray` (default = `[`).
1. `close` - the closing delimiter to use for an `Array` or `ChunkedArray` (default = `]`).

### Are these changes tested?

Yes.

1. Added two new tests: (1) `ArrayCustomOpenCloseDelimiter` and (2) `ChunkedArrayCustomOpenCloseDelimiter`.
2. All existing tests related to `arrow::PrettyPrint` pass.

### Are there any user-facing changes?

Yes.

This pull request adds two new public, user-facing properties, (1) `open` (of type `std::string`) and (2) `close` (also of type `std::string`) to the `PrettyPrintDelimiters` struct. This enables client code to specify custom opening and closing delimiters to use when printing an `arrow::Array` or `arrow::ChunkedArray` by changing the values of the nested `open` and `close` properties of the `array_delimiters`/`chunked_array_delimiters` properties of `PrettyPrintOptions`.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.

### Future Directions

1. Now that client code can easily specify custom opening, closing, and element delimiters, it may make sense to address apache#30951.
* Closes: apache#37979

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[C++] Add support for specifying custom Array element delimiter to arrow::PrettyPrintOptions
2 participants