Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Add CSV TableReader and TableWriter MATLAB classes #37770

Closed
kevingurney opened this issue Sep 18, 2023 · 0 comments · Fixed by #37773
Closed

[MATLAB] Add CSV TableReader and TableWriter MATLAB classes #37770

kevingurney opened this issue Sep 18, 2023 · 0 comments · Fixed by #37773

Comments

@kevingurney
Copy link
Member

Describe the enhancement requested

To enable initial CSV I/O support in the MATLAB interface, we should add a TableReader and TableWriter MATLAB class which work with arrow.tabular.Tables.

Component(s)

MATLAB

kevingurney added a commit that referenced this issue Sep 20, 2023
…sses (#37773)

### Rationale for this change

To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface.

### What changes are included in this PR?

1. Added a new `arrow.io.csv.TableReader` class
2. Added a new `arrow.io.csv.TableWriter` class

**Example**
```matlab
>> matlabTableWrite = array2table(rand(3))

matlabTableWrite =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> arrowTableWrite = arrow.table(matlabTableWrite)

arrowTableWrite = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> writer = arrow.io.csv.TableWriter("example.csv")

writer = 

  TableWriter with properties:

    Filename: "example.csv"

>> writer.write(arrowTableWrite)

>> reader = arrow.io.csv.TableReader("example.csv")

reader = 

  TableReader with properties:

    Filename: "example.csv"

>> arrowTableRead = reader.read()

arrowTableRead = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> matlabTableRead = table(arrowTableRead)

matlabTableRead =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> isequal(arrowTableRead, arrowTableWrite)

ans =

  logical

   1

>> isequal(matlabTableRead, matlabTableWrite)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`.
2. Both of these test classes inherit from a `CSVTest` superclass.

### Are there any user-facing changes?

Yes.

1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`.

### Future Directions

1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB.
2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values.
4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV.
5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc.

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request.
* Closes: #37770

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
@kevingurney kevingurney added this to the 14.0.0 milestone Sep 20, 2023
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 23, 2023
…AB classes (apache#37773)

### Rationale for this change

To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface.

### What changes are included in this PR?

1. Added a new `arrow.io.csv.TableReader` class
2. Added a new `arrow.io.csv.TableWriter` class

**Example**
```matlab
>> matlabTableWrite = array2table(rand(3))

matlabTableWrite =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> arrowTableWrite = arrow.table(matlabTableWrite)

arrowTableWrite = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> writer = arrow.io.csv.TableWriter("example.csv")

writer = 

  TableWriter with properties:

    Filename: "example.csv"

>> writer.write(arrowTableWrite)

>> reader = arrow.io.csv.TableReader("example.csv")

reader = 

  TableReader with properties:

    Filename: "example.csv"

>> arrowTableRead = reader.read()

arrowTableRead = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> matlabTableRead = table(arrowTableRead)

matlabTableRead =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> isequal(arrowTableRead, arrowTableWrite)

ans =

  logical

   1

>> isequal(matlabTableRead, matlabTableWrite)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`.
2. Both of these test classes inherit from a `CSVTest` superclass.

### Are there any user-facing changes?

Yes.

1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`.

### Future Directions

1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB.
2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values.
4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV.
5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc.

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request.
* Closes: apache#37770

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…AB classes (apache#37773)

### Rationale for this change

To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface.

### What changes are included in this PR?

1. Added a new `arrow.io.csv.TableReader` class
2. Added a new `arrow.io.csv.TableWriter` class

**Example**
```matlab
>> matlabTableWrite = array2table(rand(3))

matlabTableWrite =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> arrowTableWrite = arrow.table(matlabTableWrite)

arrowTableWrite = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> writer = arrow.io.csv.TableWriter("example.csv")

writer = 

  TableWriter with properties:

    Filename: "example.csv"

>> writer.write(arrowTableWrite)

>> reader = arrow.io.csv.TableReader("example.csv")

reader = 

  TableReader with properties:

    Filename: "example.csv"

>> arrowTableRead = reader.read()

arrowTableRead = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> matlabTableRead = table(arrowTableRead)

matlabTableRead =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> isequal(arrowTableRead, arrowTableWrite)

ans =

  logical

   1

>> isequal(matlabTableRead, matlabTableWrite)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`.
2. Both of these test classes inherit from a `CSVTest` superclass.

### Are there any user-facing changes?

Yes.

1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`.

### Future Directions

1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB.
2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values.
4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV.
5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc.

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request.
* Closes: apache#37770

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…AB classes (apache#37773)

### Rationale for this change

To enable initial CSV I/O support, this PR adds `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter` MATLAB classes to the MATLAB interface.

### What changes are included in this PR?

1. Added a new `arrow.io.csv.TableReader` class
2. Added a new `arrow.io.csv.TableWriter` class

**Example**
```matlab
>> matlabTableWrite = array2table(rand(3))

matlabTableWrite =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> arrowTableWrite = arrow.table(matlabTableWrite)

arrowTableWrite = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> writer = arrow.io.csv.TableWriter("example.csv")

writer = 

  TableWriter with properties:

    Filename: "example.csv"

>> writer.write(arrowTableWrite)

>> reader = arrow.io.csv.TableReader("example.csv")

reader = 

  TableReader with properties:

    Filename: "example.csv"

>> arrowTableRead = reader.read()

arrowTableRead = 

Var1: double
Var2: double
Var3: double
----
Var1:
  [
    [
      0.9113083542736461,
      0.5131490075412158,
      0.42942202968065213
    ]
  ]
Var2:
  [
    [
      0.09159480217154525,
      0.27367730380496647,
      0.8866478145458545
    ]
  ]
Var3:
  [
    [
      0.2459443412735529,
      0.6211893868708748,
      0.49500739584280073
    ]
  ]

>> matlabTableRead = table(arrowTableRead)

matlabTableRead =

  3×3 table

     Var1        Var2       Var3  
    _______    ________    _______

    0.91131    0.091595    0.24594
    0.51315     0.27368    0.62119
    0.42942     0.88665    0.49501

>> isequal(arrowTableRead, arrowTableWrite)

ans =

  logical

   1

>> isequal(matlabTableRead, matlabTableWrite)

ans =

  logical

   1
```

### Are these changes tested?

Yes.

1. Added new CSV I/O tests including `test/arrow/io/csv/tRoundTrip.m` and `test/arrow/io/csv/tError.m`.
2. Both of these test classes inherit from a `CSVTest` superclass.

### Are there any user-facing changes?

Yes.

1. Users can now read and write CSV files using `arrow.io.csv.TableReader` and `arrow.io.csv.TableWriter`.

### Future Directions

1. Expose [options](https://github.com/apache/arrow/blob/main/cpp/src/arrow/csv/options.h) for controlling CSV reading and writing in MATLAB.
2. Add more read/write tests for null value handling and other datatypes beyond numeric and string values.
4. Add a `RecordBatchReader` and `RecordBatchWriter` for CSV.
5. Add support for more I/O formats like Parquet, JSON, ORC, Arrow IPC, etc.

### Notes

1. Thank you @ sgilmore10 for your help with this pull request!
2. I chose to add both the `TableReader` and `TableWriter` in one pull request because it simplified testing. My apologies for the slightly lengthy pull request.
* Closes: apache#37770

Lead-authored-by: Kevin Gurney <kgurney@mathworks.com>
Co-authored-by: Sarah Gilmore <sgilmore@mathworks.com>
Signed-off-by: Kevin Gurney <kgurney@mathworks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant