Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Add support for specifying custom Array element delimiter to arrow::PrettyPrintOptions #37978

Closed
kevingurney opened this issue Oct 2, 2023 · 3 comments · Fixed by #37981

Comments

@kevingurney
Copy link
Member

kevingurney commented Oct 2, 2023

Describe the enhancement requested

In order to make the arrow::PrettyPrint functionality for arrow::Array more flexible, it would be useful to be to specify a custom Array element delimiter other than ",".

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom Array element delimiter, would make it possible to make the display of MATLAB arrow.array.Array objects more MATLAB-like.

In order to support custom Array element delimiters, we could add a new option to the arrow::PrettyPrintOptions struct named something like array_element_delimiter with type std::string.

Implementing this functionality would enable use cases like the ability to display an arrow::Array that normally displays as [1,2,3] to display as [1 | 2 | 3] by setting array_element_delimiter = " | ".

Component(s)

C++

@rok
Copy link
Member

rok commented Oct 2, 2023

This is related but stalled: #30951.

@kevingurney
Copy link
Member Author

@rok - thank you for sharing this! My apologies - I didn't see this issue before.

I just opened #37981 to add support for specifying a custom Array element delimiter. If I am not mistaken, it looks like #30951 took the approach of trying to change the default delimiter to be ", " instead of just ",". My approach is slightly different in that it adds a new property to arrow::PrettyPrintOptions which allows clients to decide what they would like to use for the delimiter rather than trying to modify the default delimiter.

If it seems OK, I will proceed with #37978 and #37981 for now to see if the community feels comfortable with this approach. If these changes end up getting merged, then it may make sense to mark #30951 as closed since it kind of feels like #37981 partially address it (although, admittedly, it isn't exactly the same, since it doesn't change the default display to include a space after the comma).

@rok
Copy link
Member

rok commented Oct 2, 2023

@kevingurney Thank you for the extensive and thoughtful explanation of your proposal! :D
I think it's best you proceed with your proposal and indeed close out #30951 once your work is merged.

kevingurney added a commit that referenced this issue Oct 5, 2023
…iter to `arrow::PrettyPrintOptions` (#37981)

### Rationale for this change

In order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would be useful to be able to specify a custom element delimiter other than `","`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom `Array` element delimiter, would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the ` | ` between individual `Array` elements):

```matlab
% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]
```

In order to support custom `Array` element delimiters, this pull request adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` type has one property `element` (of type `std::string`), which allows client code to control the delimiter used to distinguish between individual elements of an `arrow::Array` / `arrow::ChunkedArray`. 

In a future pull request, we plan to add more properties like `open` and `close` to allow client code to specify the opening and closing delimiters to use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather than `"["` and `">"` rather than `"]"`).

### What changes are included in this PR?

1. Added a new `struct` type `PrettyPrintDelimiters` with one property `element` (of type `std::string`). The `element` property allows client code to specify any string value as the delimiter to distinguish between individual elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using `arrow::PrettyPrint`.
2. Added two new properties to `arrow::PrettyPrintOptions` - (1) `array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and `chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These properties can be modified to customize how `arrow::Arrow`/`arrow::ChunkedArray` are printed when using `arrow::PrettyPrint`.

### Are these changes tested?

Yes.

1. Added new tests `ArrayCustomElementDelimiter` and `ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
2. All existing `PrettyPrint`-related C++ tests pass.

### Are there any user-facing changes?

Yes.

1. User's can now specify a custom element delimiter to use when printing `arrow::Array`s / `arrow::ChunkedArray`s using  [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) by modifying the `array_delimiters` or `chunked_array_delimiters` properties of `arrow::PrettyPrintOptions`.

**Example**:

```cpp
auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);
```

### Future Directions

1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing `Array` delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the idea of supporting custom opening and closing `Array` delimiters in #37979. I will follow up with a future PR to address this.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
3. @ rok helpfully pointed out in #37978 (comment) that a similar attempt to modify the default `Array` element delimiter to be `", "` (note the space after the comma) was taken in #30951. However, this issue appears to have gone stale and the PR (#12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be `", "`. However, for `PyArrow`, `array_delimiters.element` and `chunked_array_delimiters.element` could just be set to `", "` after merging these changes to change the default display if that is still desirable).
* Closes: #37978

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Oct 5, 2023
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
… delimiter to `arrow::PrettyPrintOptions` (apache#37981)

### Rationale for this change

In order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would be useful to be able to specify a custom element delimiter other than `","`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom `Array` element delimiter, would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the ` | ` between individual `Array` elements):

```matlab
% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]
```

In order to support custom `Array` element delimiters, this pull request adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` type has one property `element` (of type `std::string`), which allows client code to control the delimiter used to distinguish between individual elements of an `arrow::Array` / `arrow::ChunkedArray`. 

In a future pull request, we plan to add more properties like `open` and `close` to allow client code to specify the opening and closing delimiters to use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather than `"["` and `">"` rather than `"]"`).

### What changes are included in this PR?

1. Added a new `struct` type `PrettyPrintDelimiters` with one property `element` (of type `std::string`). The `element` property allows client code to specify any string value as the delimiter to distinguish between individual elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using `arrow::PrettyPrint`.
2. Added two new properties to `arrow::PrettyPrintOptions` - (1) `array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and `chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These properties can be modified to customize how `arrow::Arrow`/`arrow::ChunkedArray` are printed when using `arrow::PrettyPrint`.

### Are these changes tested?

Yes.

1. Added new tests `ArrayCustomElementDelimiter` and `ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
2. All existing `PrettyPrint`-related C++ tests pass.

### Are there any user-facing changes?

Yes.

1. User's can now specify a custom element delimiter to use when printing `arrow::Array`s / `arrow::ChunkedArray`s using  [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) by modifying the `array_delimiters` or `chunked_array_delimiters` properties of `arrow::PrettyPrintOptions`.

**Example**:

```cpp
auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);
```

### Future Directions

1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing `Array` delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the idea of supporting custom opening and closing `Array` delimiters in apache#37979. I will follow up with a future PR to address this.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
3. @ rok helpfully pointed out in apache#37978 (comment) that a similar attempt to modify the default `Array` element delimiter to be `", "` (note the space after the comma) was taken in apache#30951. However, this issue appears to have gone stale and the PR (apache#12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be `", "`. However, for `PyArrow`, `array_delimiters.element` and `chunked_array_delimiters.element` could just be set to `", "` after merging these changes to change the default display if that is still desirable).
* Closes: apache#37978

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
… delimiter to `arrow::PrettyPrintOptions` (apache#37981)

### Rationale for this change

In order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would be useful to be able to specify a custom element delimiter other than `","`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom `Array` element delimiter, would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the ` | ` between individual `Array` elements):

```matlab
% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]
```

In order to support custom `Array` element delimiters, this pull request adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` type has one property `element` (of type `std::string`), which allows client code to control the delimiter used to distinguish between individual elements of an `arrow::Array` / `arrow::ChunkedArray`. 

In a future pull request, we plan to add more properties like `open` and `close` to allow client code to specify the opening and closing delimiters to use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather than `"["` and `">"` rather than `"]"`).

### What changes are included in this PR?

1. Added a new `struct` type `PrettyPrintDelimiters` with one property `element` (of type `std::string`). The `element` property allows client code to specify any string value as the delimiter to distinguish between individual elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using `arrow::PrettyPrint`.
2. Added two new properties to `arrow::PrettyPrintOptions` - (1) `array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and `chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These properties can be modified to customize how `arrow::Arrow`/`arrow::ChunkedArray` are printed when using `arrow::PrettyPrint`.

### Are these changes tested?

Yes.

1. Added new tests `ArrayCustomElementDelimiter` and `ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
2. All existing `PrettyPrint`-related C++ tests pass.

### Are there any user-facing changes?

Yes.

1. User's can now specify a custom element delimiter to use when printing `arrow::Array`s / `arrow::ChunkedArray`s using  [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) by modifying the `array_delimiters` or `chunked_array_delimiters` properties of `arrow::PrettyPrintOptions`.

**Example**:

```cpp
auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);
```

### Future Directions

1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing `Array` delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the idea of supporting custom opening and closing `Array` delimiters in apache#37979. I will follow up with a future PR to address this.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
3. @ rok helpfully pointed out in apache#37978 (comment) that a similar attempt to modify the default `Array` element delimiter to be `", "` (note the space after the comma) was taken in apache#30951. However, this issue appears to have gone stale and the PR (apache#12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be `", "`. However, for `PyArrow`, `array_delimiters.element` and `chunked_array_delimiters.element` could just be set to `", "` after merging these changes to change the default display if that is still desirable).
* Closes: apache#37978

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
… delimiter to `arrow::PrettyPrintOptions` (apache#37981)

### Rationale for this change

In order to make the [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) functionality for `arrow::Array`/`arrow::ChunkedArray` more flexible, it would be useful to be able to specify a custom element delimiter other than `","`.

For example, the MATLAB interface wraps the Arrow C++ libraries and being able to specify a custom `Array` element delimiter, would make it possible to make the display of MATLAB `arrow.array.Array` objects more MATLAB-like.

For the MATLAB interface, we would like to enable display that looks something like the following (note the ` | ` between individual `Array` elements):

```matlab
% Make a MATLAB array.
>> A = 1:5

A =

     1     2     3     4     5

% Make an Arrow array from the MATLAB array.
>> B = arrow.array(A)

B = 

    [ 1 | 2 | 3 | 4 | 5 ]
```

In order to support custom `Array` element delimiters, this pull request adds a new `struct` type `PrettyPrintDelimiters`. The `PrettyPrintDelimiters` type has one property `element` (of type `std::string`), which allows client code to control the delimiter used to distinguish between individual elements of an `arrow::Array` / `arrow::ChunkedArray`. 

In a future pull request, we plan to add more properties like `open` and `close` to allow client code to specify the opening and closing delimiters to use when printing an `arrow::Array` / `arrow::ChunkedArray` (e.g. `"<"` rather than `"["` and `">"` rather than `"]"`).

### What changes are included in this PR?

1. Added a new `struct` type `PrettyPrintDelimiters` with one property `element` (of type `std::string`). The `element` property allows client code to specify any string value as the delimiter to distinguish between individual elements of an `arrow::Array` or `arrow::ChunkedArray` when printing using `arrow::PrettyPrint`.
2. Added two new properties to `arrow::PrettyPrintOptions` - (1) `array_delimiters` (of type `arrow::PrettyPrintDelimiters`) and `chunked_array_delimiters` (of type `arrow::PrettyPrintDelimiters`). These properties can be modified to customize how `arrow::Arrow`/`arrow::ChunkedArray` are printed when using `arrow::PrettyPrint`.

### Are these changes tested?

Yes.

1. Added new tests `ArrayCustomElementDelimiter` and `ChunkedArrayCustomElementDelimiter` to `pretty_print_test.cc`.
2. All existing `PrettyPrint`-related C++ tests pass.

### Are there any user-facing changes?

Yes.

1. User's can now specify a custom element delimiter to use when printing `arrow::Array`s / `arrow::ChunkedArray`s using  [`arrow::PrettyPrint`](https://github.com/apache/arrow/blob/7667b81bffcb5b361fab6d61c42ce396d98cc6e1/cpp/src/arrow/pretty_print.h#L101) by modifying the `array_delimiters` or `chunked_array_delimiters` properties of `arrow::PrettyPrintOptions`.

**Example**:

```cpp
auto array = ...;
auto stream = ...
arrow::PrettyPrintOptions options = arrow::PrettyPrintOptions::Defaults();
// Use " | " as the element-wise (element = scalar value) delimiter for arrow::Array.
options.array_delimiters.element = " | ";
// Use "';" as the element-wise (element = chunk) delimiter for arrow::ChunkedArray.
options.chunked_array_delimiters.element = ";";
arrow::PrettyPrint(array, options, stream);
```

### Future Directions

1. To keep this pull request small and focused, I intentionally chose not to include changes related to specifying custom opening and closing `Array` delimiters (e.g. use `<` and `>` instead of `[` and `]`). I've captured the idea of supporting custom opening and closing `Array` delimiters in apache#37979. I will follow up with a future PR to address this.

### Notes

1. This pull request was motivated by our desire to improve the display of Arrow related classes in the MATLAB interface, but it is hopefully a generic enough change that it may benefit other use cases too.
3. @ rok helpfully pointed out in apache#37978 (comment) that a similar attempt to modify the default `Array` element delimiter to be `", "` (note the space after the comma) was taken in apache#30951. However, this issue appears to have gone stale and the PR (apache#12420) that was opened also seems to have gone stale. If these changes get merged, it may make sense to close out this issue since this one seems to at least partially address it (although, it isn't exactly the same, since it doesn't change the default delimiter to be `", "`. However, for `PyArrow`, `array_delimiters.element` and `chunked_array_delimiters.element` could just be set to `", "` after merging these changes to change the default display if that is still desirable).
* Closes: apache#37978

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants