Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add arrow.array.ChunkedArray class #37448

Closed
sgilmore10 opened this issue Aug 29, 2023 · 1 comment · Fixed by #37525
Closed

[MATLAB] Add arrow.array.ChunkedArray class #37448

sgilmore10 opened this issue Aug 29, 2023 · 1 comment · Fixed by #37525

Comments

@sgilmore10
Copy link
Member

Describe the enhancement requested

In order to add a arrow.tabular.Table class to the MATLAB Interface, we first need to add a MATLAB class representing arrow::ChunkedArrays. This is required because an arrow::Table is backed by a vector of arrow::ChunkedArrays, and the output of its column(int index) method is an arrow::ChunkedArray.

Component(s)

MATLAB

@sgilmore10
Copy link
Member Author

take

@kou kou closed this as completed in #37525 Sep 3, 2023
kou pushed a commit that referenced this issue Sep 3, 2023
### Rationale for this change

In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`.

### What changes are included in this PR?

1. Introduced a new class called `arrow.array.ChunkedArray`. 
2. `arrow.array.ChunkedArray` has the following properties:
    1.  `Type` - datatype of the `arrow.array.Array`s
    2. `Length` - Sum of the `arrow.array.Array` lengths 
    3. `NumChunks` - Number of `arrow.array.Array`s
3. `arrow.array.ChunkedArray` has the following methods:
   1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index
   2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`.

**Example Usage**

```matlab
>> a1 = arrow.array(1:100);
>> a2 = arrow.array(101:250);
>> a3 = arrow.array(251:300);

% Create a ChunkedArray from 3 Float64Arrays
>> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3)

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.Float64Type]
    NumChunks: 3
       Length: 300

% Extract the first chunk and compare it to a1
>> c1 = c.chunk(1);
>> tf = isequal(c1, a1)

tf =

  logical

   1

% Create an empty ChunkedArray by providing the Type nv-pair
>> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp())

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.TimestampType]
    NumChunks: 0
       Length: 0

```

### Are these changes tested?

Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class.

### Are there any user-facing changes?

Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. 

### Future Directions

1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 
2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc.
* Closes: #37448

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 14.0.0 milestone Sep 3, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…#37525)

### Rationale for this change

In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`.

### What changes are included in this PR?

1. Introduced a new class called `arrow.array.ChunkedArray`. 
2. `arrow.array.ChunkedArray` has the following properties:
    1.  `Type` - datatype of the `arrow.array.Array`s
    2. `Length` - Sum of the `arrow.array.Array` lengths 
    3. `NumChunks` - Number of `arrow.array.Array`s
3. `arrow.array.ChunkedArray` has the following methods:
   1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index
   2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`.

**Example Usage**

```matlab
>> a1 = arrow.array(1:100);
>> a2 = arrow.array(101:250);
>> a3 = arrow.array(251:300);

% Create a ChunkedArray from 3 Float64Arrays
>> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3)

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.Float64Type]
    NumChunks: 3
       Length: 300

% Extract the first chunk and compare it to a1
>> c1 = c.chunk(1);
>> tf = isequal(c1, a1)

tf =

  logical

   1

% Create an empty ChunkedArray by providing the Type nv-pair
>> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp())

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.TimestampType]
    NumChunks: 0
       Length: 0

```

### Are these changes tested?

Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class.

### Are there any user-facing changes?

Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. 

### Future Directions

1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 
2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc.
* Closes: apache#37448

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…#37525)

### Rationale for this change

In order to add an `arrow.tabular.Table` class to the MATLAB Interface, we first need to add a MATLAB class representing `arrow::ChunkedArray`s. This is required because an `arrow::Table` is backed by a vector of `arrow::ChunkedArray`s, and the output of its `column(int index)` method is an `arrow::ChunkedArray`.

### What changes are included in this PR?

1. Introduced a new class called `arrow.array.ChunkedArray`. 
2. `arrow.array.ChunkedArray` has the following properties:
    1.  `Type` - datatype of the `arrow.array.Array`s
    2. `Length` - Sum of the `arrow.array.Array` lengths 
    3. `NumChunks` - Number of `arrow.array.Array`s
3. `arrow.array.ChunkedArray` has the following methods:
   1. `chunk(index)` - Returns the `arrow.array.Array` stored at the specified index
   2. `fromArrays(array1, array1, ..., arrayN, Type=type)` - Creates a `ChunkedArray` from the arrays provided. If `Type` is provided, all arrays are expected to have the specified `Type`.

**Example Usage**

```matlab
>> a1 = arrow.array(1:100);
>> a2 = arrow.array(101:250);
>> a3 = arrow.array(251:300);

% Create a ChunkedArray from 3 Float64Arrays
>> c = arrow.array.ChunkedArray.fromArrays(a1, a2, a3)

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.Float64Type]
    NumChunks: 3
       Length: 300

% Extract the first chunk and compare it to a1
>> c1 = c.chunk(1);
>> tf = isequal(c1, a1)

tf =

  logical

   1

% Create an empty ChunkedArray by providing the Type nv-pair
>> c = arrow.array.ChunkedArray.fromArrays(Type=arrow.timestamp())

c = 

  ChunkedArray with properties:

         Type: [1×1 arrow.type.TimestampType]
    NumChunks: 0
       Length: 0

```

### Are these changes tested?

Yes. I added a new test class called `tChunkedArray.m` that contains unit tests for the new class.

### Are there any user-facing changes?

Yes. Users can now create a `ChunkedArray` in the MATLAB Interface. 

### Future Directions

1. In this PR, we deliberately didn't include a convenience constructor function because we're not sure if we want users to create `ChunkedArray`s themselves. We think users will mostly use `ChunkedArray` when extracting columns from `Table`s. 
2. We will implement more methods on `ChunkedArray`, such as `flatten()` and `combineChunks()`, etc.
* Closes: apache#37448

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants