Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add gateway arrow.array function to create Arrow Arrays from MATLAB data #36953

Closed
sgilmore10 opened this issue Jul 31, 2023 · 1 comment · Fixed by #36978
Closed

Comments

@sgilmore10
Copy link
Member

Describe the enhancement requested

As discussed in #36855, we think it would be better to move the recommended APIs for the MATLAB Interface directly under the top-level arrow.* package. This should help simplify the interface, and will make it easier for users to switch between multiple language bindings. We have already moved the type convenience constructors to the arrow package. Now we want to add a gateway function that creates arrays to mirror PyArrow. As part of this change, we will modify the array constructors to accept libmexclass.proxy.Proxy objects - similar to how the arrow.type.<Type> constructors accept libmexclass.proxy.Proxy objects.

Component(s)

MATLAB

@sgilmore10
Copy link
Member Author

take

kevingurney pushed a commit that referenced this issue Aug 3, 2023
… Arrays from MATLAB data (#36978)

### Rationale for this change

As discussed in #36855, we think it would be better to move the recommended APIs for the MATLAB Interface directly under the top-level `arrow.*` package. This should help simplify the interface, and will make it easier for users to switch between multiple language bindings. We have already moved the `type` convenience constructors to the `arrow` package.  Now we want to add a gateway function that creates arrays to mirror `PyArrow`. As part of this change, we will modify the array constructors to accept `libmexclass.proxy.Proxy` objects - similar to how the `arrow.type.<Type>` constructors accept  `libmexclass.proxy.Proxy` objects.

### What changes are included in this PR?

1. Added `arrow.array()` gateway function that can be used to construct arrays:

```matlab
>> arrowArray = arrow.array([1 2 3 4]);
>> class(arrowArray)

ans =

    'arrow.array.Float64Array'

>> arrowArray = arrow.array(["A" "B" "C"]);
>> class(arrowArray)

ans =

    'arrow.array.StringArray'

```

2. Added a static `fromMATLAB()` method to all  subclasses of`arrow.array.Array`. 

```matlab
>> array = arrow.array.StringArray.fromMATLAB(["A" "B" "C"])

array = 

[
  "A",
  "B",
  "C"
]

>> array = arrow.array.TimestampArray.fromMATLAB(datetime(2023, 8, 1))

array = 

[
  2023-08-01 00:00:00.000000
]

```

As part of this change, users can no longer use the `arrow.array.Array` subclass constructors to create arrays. Instead, they can use either `arrow.array()` or the static `fromMATLAB` method.

### Are these changes tested?

Updated the existing tests to account for the API changes and added the following new test classes:

1. arrow/internal/validate/tType.m
2. arrow/internal/validate/tShape.m
3. arrow/internal/validate/tRealNumeric.m
4. arrow/internal/validate/tNonsparse.m
5. arrow/internal/validate/tNumeric.m
6. arrow/array/tArray.m

### Are there any user-facing changes?

Yes, we changed the signature of all `arrow.array.Array` subclasses to accept scalar `libmexclass.proxy.Proxy` classes. NOTE: The MATLAB interface is still under active development. 

### Future Directions

1. In a followup PR, we plan on adding a new name-value pair to `arrow.array()` called `Type`, which can be set to an `arrow.type.Type` object. This will let users specify what kind of arrow array they would like to create from MATLAB data.

* Closes: #36953

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Aug 3, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
… Arrow Arrays from MATLAB data (apache#36978)

### Rationale for this change

As discussed in apache#36855, we think it would be better to move the recommended APIs for the MATLAB Interface directly under the top-level `arrow.*` package. This should help simplify the interface, and will make it easier for users to switch between multiple language bindings. We have already moved the `type` convenience constructors to the `arrow` package.  Now we want to add a gateway function that creates arrays to mirror `PyArrow`. As part of this change, we will modify the array constructors to accept `libmexclass.proxy.Proxy` objects - similar to how the `arrow.type.<Type>` constructors accept  `libmexclass.proxy.Proxy` objects.

### What changes are included in this PR?

1. Added `arrow.array()` gateway function that can be used to construct arrays:

```matlab
>> arrowArray = arrow.array([1 2 3 4]);
>> class(arrowArray)

ans =

    'arrow.array.Float64Array'

>> arrowArray = arrow.array(["A" "B" "C"]);
>> class(arrowArray)

ans =

    'arrow.array.StringArray'

```

2. Added a static `fromMATLAB()` method to all  subclasses of`arrow.array.Array`. 

```matlab
>> array = arrow.array.StringArray.fromMATLAB(["A" "B" "C"])

array = 

[
  "A",
  "B",
  "C"
]

>> array = arrow.array.TimestampArray.fromMATLAB(datetime(2023, 8, 1))

array = 

[
  2023-08-01 00:00:00.000000
]

```

As part of this change, users can no longer use the `arrow.array.Array` subclass constructors to create arrays. Instead, they can use either `arrow.array()` or the static `fromMATLAB` method.

### Are these changes tested?

Updated the existing tests to account for the API changes and added the following new test classes:

1. arrow/internal/validate/tType.m
2. arrow/internal/validate/tShape.m
3. arrow/internal/validate/tRealNumeric.m
4. arrow/internal/validate/tNonsparse.m
5. arrow/internal/validate/tNumeric.m
6. arrow/array/tArray.m

### Are there any user-facing changes?

Yes, we changed the signature of all `arrow.array.Array` subclasses to accept scalar `libmexclass.proxy.Proxy` classes. NOTE: The MATLAB interface is still under active development. 

### Future Directions

1. In a followup PR, we plan on adding a new name-value pair to `arrow.array()` called `Type`, which can be set to an `arrow.type.Type` object. This will let users specify what kind of arrow array they would like to create from MATLAB data.

* Closes: apache#36953

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants