Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Go] NewTable() and NewTableFromRecords() should return values using arrow.Table interface, not internal *simpleTable (etc.) #40261

Closed
gtomitsuka opened this issue Feb 27, 2024 · 1 comment

Comments

@gtomitsuka
Copy link
Contributor

gtomitsuka commented Feb 27, 2024

Describe the enhancement requested

Currently, the definitions in Go's arrow/array are inconsistent. Most table-related methods (e.g., AddColumn() or TableFromJSON()) return tables via the arrow.Table interface, while NewTable, NewTableFromSlice, and NewTableFromColumns return a *simpleTable.

While *simpleTable conforms to the arrow.Table interface, this makes typings harder as it's not immediately obvious whether *simpleTable conforms to arrow.Table, *arrow.Table, or any interface at all; you have to actively look through the types and figure it out yourself, leading to significant wasted time as a developer.
You can't use *simpleTable in function signatures or variables with explicit types, so it's a meaningful DX improvement, leading to better docs and IDE support for Arrow.

Upon further inspection, the same issue applies to two other functions from arrow/array:

  • NewRecord returns *simpleRecord instead of arrow.Record
  • NewRecordReader returns *simpleRecords instead of array.RecordReader

I made those changes locally on a large project and tested it, so I'm confident the changes would be purely cosmetic, not impacting any functionality (which intuitively should be true, since *simpleTable isn't exposed and conforms to all arrow.Table interfaces), though the unit tests should catch that too

I would like to take on this issue and work on it myself, given that contributors agree this would be a meaningful task to work on.

Thank you!

Component(s)

Go

@gtomitsuka gtomitsuka changed the title [Go] NewTable() and NewTableFromRecords() should return values using arrow.Table interface, not internal *simpleTable [Go] NewTable() and NewTableFromRecords() should return values using arrow.Table interface, not internal *simpleTable (etc.) Feb 27, 2024
zeroshade pushed a commit that referenced this issue Feb 29, 2024
…es (#40272)

### Rationale for this change
Exposing functions that return unexposed types in Go is considered poor practice. This approach complicates type handling, making it challenging for developers to utilize these return values in their functions. Developers must undertake the cumbersome process of identifying the applicable interface for the return type, a task that often results in significant time consumption and leads to confusing, non-informative types being suggested by godocs and IDEs.

Consider the difficulty in discerning the relationship between two return types, `*simpleTable` and `arrow.Table`, at a glance. It is not immediately clear whether they implement the same interface or are distinct entities:

<img width="1161" alt="image" src="https://github.com/apache/arrow/assets/10295671/463cd8a7-47f3-44ce-9871-2885025e5a5c">
<img width="1151" alt="image" src="https://github.com/apache/arrow/assets/10295671/4ffc049c-fb88-43fb-bd57-fc1ad5d4dc68">

Returning exposed interfaces is already commonly done in the Arrow package to ensure API consistency and usability, as evidenced in methods like `AddColumn() -> arrow.Table` and `RecordFromJSON() -> arrow.Record`. Extending this to all functions, including `NewTable`, `NewTableFromSlice`, and `NewRecord`, will standardize the codebase in line with these principles.

The use of `*simpleTable` and similar types is restricted in explicit type declarations and function signatures. Therefore, transitioning to exposed return types is a backward-compatible improvement that will lead to enhanced documentation and better support in IDEs for Arrow users.

### What changes are included in this PR?
* Change return signature of functions using the following unexposed return types:
    * `*simpleTable` --> `arrow.Table`
    * `*simpleRecord` --> `arrow.Record`
    * `*simpleRecords` --> `array.RecordReader`
* Add the function `String()`, which is implemented by `*simpleTable`, to the `arrow.Table` interface. `*simpleTable` is the only implementation of `arrow.Table`, so this requires no further changes. 

### Are these changes tested?
Yes. The relevant code is already covered by tests in `arrow/array/table_test.go` (`TestTable`) and `arrow/array/record_test.go` (`TestRecord`, `TestRecordReader`).

All tests pass (subpackages without tests omitted):
```bash
ok      github.com/apache/arrow/go/v16/arrow    0.398s
ok      github.com/apache/arrow/go/v16/arrow/array      0.600s
ok      github.com/apache/arrow/go/v16/arrow/arrio      1.544s
ok      github.com/apache/arrow/go/v16/arrow/avro       0.629s
ok      github.com/apache/arrow/go/v16/arrow/bitutil    1.001s
ok      github.com/apache/arrow/go/v16/arrow/compute    2.147s
ok      github.com/apache/arrow/go/v16/arrow/compute/exec       0.813s
ok      github.com/apache/arrow/go/v16/arrow/compute/exprs      1.900s
ok      github.com/apache/arrow/go/v16/arrow/csv        0.288s
ok      github.com/apache/arrow/go/v16/arrow/decimal128 1.356s
ok      github.com/apache/arrow/go/v16/arrow/decimal256 1.718s
ok      github.com/apache/arrow/go/v16/arrow/encoded    0.493s
ok      github.com/apache/arrow/go/v16/arrow/flight     2.845s
ok      github.com/apache/arrow/go/v16/arrow/flight/flightsql   0.512s
ok      github.com/apache/arrow/go/v16/arrow/flight/flightsql/driver    7.386s
ok      github.com/apache/arrow/go/v16/arrow/float16    0.570s
ok      github.com/apache/arrow/go/v16/arrow/internal/arrjson   0.419s
ok      github.com/apache/arrow/go/v16/arrow/internal/dictutils 0.407s
ok      github.com/apache/arrow/go/v16/arrow/internal/testing/tools     0.247s
ok      github.com/apache/arrow/go/v16/arrow/ipc        1.984s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-cat  0.530s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-file-to-stream       1.267s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-json-integration-test        1.074s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-ls   1.263s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-stream-to-file       0.935s
ok      github.com/apache/arrow/go/v16/arrow/math       0.616s
ok      github.com/apache/arrow/go/v16/arrow/memory     1.275s
ok      github.com/apache/arrow/go/v16/arrow/memory/mallocator  0.348s
ok      github.com/apache/arrow/go/v16/arrow/scalar     0.484s
ok      github.com/apache/arrow/go/v16/arrow/tensor     0.418s
ok      github.com/apache/arrow/go/v16/arrow/util       0.621s
```

### Are there any user-facing changes?
This update will not affect users directly since the modified interfaces were previously unexposed and, consequently, inaccessible from user code. This change is aimed at improving the developer experience by simplifying type usage and enhancing documentation clarity.

* GitHub Issue: #40261

Authored-by: gtomitsuka <g@gtomitsuka.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
@zeroshade zeroshade added this to the 16.0.0 milestone Feb 29, 2024
@zeroshade
Copy link
Member

Issue resolved by pull request 40272
#40272

thisisnic pushed a commit to thisisnic/arrow that referenced this issue Mar 8, 2024
…rn types (apache#40272)

### Rationale for this change
Exposing functions that return unexposed types in Go is considered poor practice. This approach complicates type handling, making it challenging for developers to utilize these return values in their functions. Developers must undertake the cumbersome process of identifying the applicable interface for the return type, a task that often results in significant time consumption and leads to confusing, non-informative types being suggested by godocs and IDEs.

Consider the difficulty in discerning the relationship between two return types, `*simpleTable` and `arrow.Table`, at a glance. It is not immediately clear whether they implement the same interface or are distinct entities:

<img width="1161" alt="image" src="https://github.com/apache/arrow/assets/10295671/463cd8a7-47f3-44ce-9871-2885025e5a5c">
<img width="1151" alt="image" src="https://github.com/apache/arrow/assets/10295671/4ffc049c-fb88-43fb-bd57-fc1ad5d4dc68">

Returning exposed interfaces is already commonly done in the Arrow package to ensure API consistency and usability, as evidenced in methods like `AddColumn() -> arrow.Table` and `RecordFromJSON() -> arrow.Record`. Extending this to all functions, including `NewTable`, `NewTableFromSlice`, and `NewRecord`, will standardize the codebase in line with these principles.

The use of `*simpleTable` and similar types is restricted in explicit type declarations and function signatures. Therefore, transitioning to exposed return types is a backward-compatible improvement that will lead to enhanced documentation and better support in IDEs for Arrow users.

### What changes are included in this PR?
* Change return signature of functions using the following unexposed return types:
    * `*simpleTable` --> `arrow.Table`
    * `*simpleRecord` --> `arrow.Record`
    * `*simpleRecords` --> `array.RecordReader`
* Add the function `String()`, which is implemented by `*simpleTable`, to the `arrow.Table` interface. `*simpleTable` is the only implementation of `arrow.Table`, so this requires no further changes. 

### Are these changes tested?
Yes. The relevant code is already covered by tests in `arrow/array/table_test.go` (`TestTable`) and `arrow/array/record_test.go` (`TestRecord`, `TestRecordReader`).

All tests pass (subpackages without tests omitted):
```bash
ok      github.com/apache/arrow/go/v16/arrow    0.398s
ok      github.com/apache/arrow/go/v16/arrow/array      0.600s
ok      github.com/apache/arrow/go/v16/arrow/arrio      1.544s
ok      github.com/apache/arrow/go/v16/arrow/avro       0.629s
ok      github.com/apache/arrow/go/v16/arrow/bitutil    1.001s
ok      github.com/apache/arrow/go/v16/arrow/compute    2.147s
ok      github.com/apache/arrow/go/v16/arrow/compute/exec       0.813s
ok      github.com/apache/arrow/go/v16/arrow/compute/exprs      1.900s
ok      github.com/apache/arrow/go/v16/arrow/csv        0.288s
ok      github.com/apache/arrow/go/v16/arrow/decimal128 1.356s
ok      github.com/apache/arrow/go/v16/arrow/decimal256 1.718s
ok      github.com/apache/arrow/go/v16/arrow/encoded    0.493s
ok      github.com/apache/arrow/go/v16/arrow/flight     2.845s
ok      github.com/apache/arrow/go/v16/arrow/flight/flightsql   0.512s
ok      github.com/apache/arrow/go/v16/arrow/flight/flightsql/driver    7.386s
ok      github.com/apache/arrow/go/v16/arrow/float16    0.570s
ok      github.com/apache/arrow/go/v16/arrow/internal/arrjson   0.419s
ok      github.com/apache/arrow/go/v16/arrow/internal/dictutils 0.407s
ok      github.com/apache/arrow/go/v16/arrow/internal/testing/tools     0.247s
ok      github.com/apache/arrow/go/v16/arrow/ipc        1.984s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-cat  0.530s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-file-to-stream       1.267s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-json-integration-test        1.074s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-ls   1.263s
ok      github.com/apache/arrow/go/v16/arrow/ipc/cmd/arrow-stream-to-file       0.935s
ok      github.com/apache/arrow/go/v16/arrow/math       0.616s
ok      github.com/apache/arrow/go/v16/arrow/memory     1.275s
ok      github.com/apache/arrow/go/v16/arrow/memory/mallocator  0.348s
ok      github.com/apache/arrow/go/v16/arrow/scalar     0.484s
ok      github.com/apache/arrow/go/v16/arrow/tensor     0.418s
ok      github.com/apache/arrow/go/v16/arrow/util       0.621s
```

### Are there any user-facing changes?
This update will not affect users directly since the modified interfaces were previously unexposed and, consequently, inaccessible from user code. This change is aimed at improving the developer experience by simplifying type usage and enhancing documentation clarity.

* GitHub Issue: apache#40261

Authored-by: gtomitsuka <g@gtomitsuka.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants