Skip to content

Feat: add support for CSV input data loading from file or inline#2640

Merged
tobymao merged 7 commits intomainfrom
themis/csv_test_support
May 22, 2024
Merged

Feat: add support for CSV input data loading from file or inline#2640
tobymao merged 7 commits intomainfrom
themis/csv_test_support

Conversation

@themisvaltinos
Copy link
Contributor

@themisvaltinos themisvaltinos commented May 20, 2024

This is WIP. Feedback is welcome.

  • Adds support for CSV external file loading and inline CSV rows, as well as yaml from external file.
  • Still to add tests and Docs to be updated accordingly.

EDIT (George):

This addresses the last item in #1637!

@CLAassistant
Copy link

CLAassistant commented May 20, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, looks good. Some general comments to keep in mind w.r.t. code style & conventions. The direction here is good, let's add some tests.

@georgesittas
Copy link
Contributor

@Themiscodes let's make sure to test NULL behavior in CSVs as well. In YAML, when you leave out a column from a row, set its value to null, or leaving its value unspecified (i.e. col: ), this results in a NULL value when converting it. We should think about how we want to treat empty CSV values.

The problem is that ,, is ambiguous - it can mean both "empty string" and "null value". Perhaps we want to go with the latter here and if users ever need empty strings, they'd have to use a different format or something.

@themisvaltinos
Copy link
Contributor Author

@Themiscodes let's make sure to test NULL behavior in CSVs as well. In YAML, when you leave out a column from a row, set its value to null, or leaving its value unspecified (i.e. col: ), this results in a NULL value when converting it. We should think about how we want to treat empty CSV values.

The problem is that ,, is ambiguous - it can mean both "empty string" and "null value". Perhaps we want to go with the latter here and if users ever need empty strings, they'd have to use a different format or something.

I agree that the latter option seems better. Currently empty columns are handled similar to the yaml and are set to nan. If the user wants an empty string it seems more reasonable that they should declare it explicitly.

@themisvaltinos themisvaltinos force-pushed the themis/csv_test_support branch from d91f17b to 029c79a Compare May 21, 2024 20:22
@themisvaltinos themisvaltinos marked this pull request as ready for review May 21, 2024 20:22
Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Themiscodes thanks for addressing comments, did another pass on this. Comments are mostly about refactoring the unit tests.

Let's also test 1) specifying paths (both csv, yaml) - look into tmp_path etc in other tests 2) doing format: yaml and then supplying the rows like we usually do (rows: ...), 3) supplying a path without a format (expected is to load yaml) 4) supplying a format that is not supported

@themisvaltinos themisvaltinos force-pushed the themis/csv_test_support branch from de930df to 6f069ae Compare May 22, 2024 14:46
@themisvaltinos
Copy link
Contributor Author

@georgesittas addressed your comments

  1. specifying paths (both csv, yaml) - look into tmp_path etc in other tests 3) supplying a path without a format (expected is to load yaml)
    in: test_format_path

  2. doing format: yaml and then supplying the rows like we usually do (rows: ...),
    in: test_format_inline

  3. supplying a format that is not supported
    in: test_unsupported_format_failure

Also removed unnecessary tests as per your suggestions and parametrized dictionaries where possible.

Copy link
Contributor

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested a couple of minor refactors, but otherwise LGTM. Thanks for addressing all comments @Themiscodes! 🙌

@themisvaltinos themisvaltinos force-pushed the themis/csv_test_support branch from 87adebc to bd4d8cb Compare May 22, 2024 16:32
@georgesittas georgesittas requested a review from a team May 22, 2024 16:48
@tobymao tobymao merged commit 77a7c36 into main May 22, 2024
@tobymao tobymao deleted the themis/csv_test_support branch May 22, 2024 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants