Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Implement Feather V1 Reader using new MATLAB Interface APIs #37041

Closed
sgilmore10 opened this issue Aug 7, 2023 · 2 comments · Fixed by #37044
Closed

[MATLAB] Implement Feather V1 Reader using new MATLAB Interface APIs #37041

sgilmore10 opened this issue Aug 7, 2023 · 2 comments · Fixed by #37044

Comments

@sgilmore10
Copy link
Member

Describe the enhancement requested

Now that we've have the basic building blocks for tabular IO in the MATLAB Interface (Array, Schema, RecordBatch), we can implement a Feather V1 reader in terms of the new APIs.

This is the first in a series of issues in which we will work on replacing the legacy feather V1 infrastructure with a new implementation that use the MATLAB Interface APIs. A side effect of doing this work is that we can eventually delete a lot of legacy build infrastructure and code.

Component(s)

MATLAB

@sgilmore10
Copy link
Member Author

take

@sgilmore10 sgilmore10 removed their assignment Aug 7, 2023
@sgilmore10
Copy link
Member Author

I meant to take ownership of #37042. Un-assigning my self.

@kevingurney kevingurney self-assigned this Aug 7, 2023
kevingurney pushed a commit that referenced this issue Aug 7, 2023
…face APIs (#37043)

### Rationale for this change

Now that we've have the basic building blocks for tabular IO in the MATLAB Interface (`Array`, `Schema`, `RecordBatch`), we can implement a Feather V1 writer in terms of the new APIs.

This is the first in a series of pull requests in which we will work on replacing the legacy feather V1 infrastructure with a new implementation that use the MATLAB Interface APIs. A side effect of doing this work is that we can eventually delete a lot of legacy build infrastructure and code.

### What changes are included in this PR?

1. Added a new class called `arrow.internal.io.feather.Writer` which can be used to write feather V1 files. It has one public property named `Filename` and one public method `write`. 

Below is an example of its usage:

```matlab
>> T = table([1; 2; 3], single([10; 11; 12]));

T =

  3×2 table

    Var1    Var2
    ____    ____

     1       10 
     2       11 
     3       12 

>> filename = "/tmp/table.feather";
>> writer = arrow.internal.io.feather.Writer(filename)

writer = 

  Writer with properties:

    Filename: "/tmp/table.feather"

>> writer.write(T);

```

2. Added an `unwrap` method to `proxy::RecordBatch` so that the `FeatherWriter::write` method can access the underlying `RecordBatch` from the proxy.
3.  Changed the `SetAccess` and `GetAccess` of the `Proxy` property on `arrow.tabular.RecordBatch` to `private` and `public`, respectively. 

### Are these changes tested?

Yes, added a new test file called `tRoundTrip.m` in the `matlab/test/arrow/io/feather` folder. 

### Are there any user-facing changes?

No. 

### Future Directions

1. Add a new class for reading feather V1 files (See #37041).
2. Integrate this class in the public `featherwrite` function. 
5. Once this class is integrated with `featherwrite`, we can delete the legacy build infrastructure and source code.
* Closes: #37042 

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
kevingurney added a commit that referenced this issue Aug 7, 2023
…face APIs (#37044)

### Rationale for this change

Now that we've have the basic building blocks for tabular IO in the MATLAB Interface (Array, Schema, RecordBatch), we can implement a Feather V1 reader in terms of the new APIs.

This is a follow up to #37043, where a new Feather V1 internal `Writer` object was added.

### What changes are included in this PR?

1. Added a new class called arrow.internal.io.feather.Reader which can be used to read Feather V1 files. It has one public property named `Filename` and one public method named `read`.

**Example Usage:**

```matlab
>> T = array2table(rand(3))       

T =

  3x3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.79221    0.035712    0.67874
    0.95949     0.84913    0.75774
    0.65574     0.93399    0.74313

>> filename = "test.feather";

>> featherwrite(filename, T)

>> reader = arrow.internal.io.feather.Reader(filename)

reader = 

  Reader with properties:

    Filename: "test.feather"

>> T = reader.read()

T =

  3x3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.79221    0.035712    0.67874
    0.95949     0.84913    0.75774
    0.65574     0.93399    0.74313
```

### Are these changes tested?

Yes.

1. Added `Reader` to `feather/tRoundTrip.m`.

### Are there any user-facing changes?

No.

These are only internal objects right now. 

### Future Directions

1. Re-implement `featherread` in terms of the new `Reader` object.
2. Remove legacy feather code and infrastructure.

### Notes

1. For conciseness, I renamed the C++ Proxy class `FeatherWriter` to `Writer` since it is already inside of a `feather` namespace / "package".
* Closes: #37041

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Aug 7, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
… Interface APIs (apache#37043)

### Rationale for this change

Now that we've have the basic building blocks for tabular IO in the MATLAB Interface (`Array`, `Schema`, `RecordBatch`), we can implement a Feather V1 writer in terms of the new APIs.

This is the first in a series of pull requests in which we will work on replacing the legacy feather V1 infrastructure with a new implementation that use the MATLAB Interface APIs. A side effect of doing this work is that we can eventually delete a lot of legacy build infrastructure and code.

### What changes are included in this PR?

1. Added a new class called `arrow.internal.io.feather.Writer` which can be used to write feather V1 files. It has one public property named `Filename` and one public method `write`. 

Below is an example of its usage:

```matlab
>> T = table([1; 2; 3], single([10; 11; 12]));

T =

  3×2 table

    Var1    Var2
    ____    ____

     1       10 
     2       11 
     3       12 

>> filename = "/tmp/table.feather";
>> writer = arrow.internal.io.feather.Writer(filename)

writer = 

  Writer with properties:

    Filename: "/tmp/table.feather"

>> writer.write(T);

```

2. Added an `unwrap` method to `proxy::RecordBatch` so that the `FeatherWriter::write` method can access the underlying `RecordBatch` from the proxy.
3.  Changed the `SetAccess` and `GetAccess` of the `Proxy` property on `arrow.tabular.RecordBatch` to `private` and `public`, respectively. 

### Are these changes tested?

Yes, added a new test file called `tRoundTrip.m` in the `matlab/test/arrow/io/feather` folder. 

### Are there any user-facing changes?

No. 

### Future Directions

1. Add a new class for reading feather V1 files (See apache#37041).
2. Integrate this class in the public `featherwrite` function. 
5. Once this class is integrated with `featherwrite`, we can delete the legacy build infrastructure and source code.
* Closes: apache#37042 

Authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
… Interface APIs (apache#37044)

### Rationale for this change

Now that we've have the basic building blocks for tabular IO in the MATLAB Interface (Array, Schema, RecordBatch), we can implement a Feather V1 reader in terms of the new APIs.

This is a follow up to apache#37043, where a new Feather V1 internal `Writer` object was added.

### What changes are included in this PR?

1. Added a new class called arrow.internal.io.feather.Reader which can be used to read Feather V1 files. It has one public property named `Filename` and one public method named `read`.

**Example Usage:**

```matlab
>> T = array2table(rand(3))       

T =

  3x3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.79221    0.035712    0.67874
    0.95949     0.84913    0.75774
    0.65574     0.93399    0.74313

>> filename = "test.feather";

>> featherwrite(filename, T)

>> reader = arrow.internal.io.feather.Reader(filename)

reader = 

  Reader with properties:

    Filename: "test.feather"

>> T = reader.read()

T =

  3x3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.79221    0.035712    0.67874
    0.95949     0.84913    0.75774
    0.65574     0.93399    0.74313
```

### Are these changes tested?

Yes.

1. Added `Reader` to `feather/tRoundTrip.m`.

### Are there any user-facing changes?

No.

These are only internal objects right now. 

### Future Directions

1. Re-implement `featherread` in terms of the new `Reader` object.
2. Remove legacy feather code and infrastructure.

### Notes

1. For conciseness, I renamed the C++ Proxy class `FeatherWriter` to `Writer` since it is already inside of a `feather` namespace / "package".
* Closes: apache#37041

Authored-by: Kevin Gurney <kgurney@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants