Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOP-6705] Add CSV file format #69

Merged
merged 1 commit into from
Jul 13, 2023
Merged

[DOP-6705] Add CSV file format #69

merged 1 commit into from
Jul 13, 2023

Conversation

dolfinus
Copy link
Member

@dolfinus dolfinus commented Jul 13, 2023

Change Summary

  • Added CSV file format class and related documentation & unit tests
  • Small fix of typing of GenericOptions.parse method

If was previously planned to implement nested classes:

  • CSV.Reading(delimiter=',', inferSchema=True)
  • CSV.Writing(delimiter=',', compression='gzip')

But all implementations I've tried were messy or quite tricky to make working as expected.

So instead I've merged all options to just one class. We don't do anything with input options, they are just passed to Spark as-is, and there is no actual reason to make them separated to 2 classes.

Also I've added several default options just to class instead of known_options to improve developer experience - IDE should suggest attribute names, default values will be printed to logs, etc.

Related issue number

Checklist

  • Commit message and PR title is comprehensive
  • Keep the change as small as possible
  • Unit and integration tests for the changes exist
  • Tests pass on CI and coverage does not decrease
  • Documentation reflects the changes where applicable
  • docs/changelog/next_release/<pull request or issue id>.<change type>.rst file added describing change
    (see CONTRIBUTING.rst for details.)
  • My PR is ready to review.

@dolfinus dolfinus added the ci:skip-changelog Add this label to skip changelog file check label Jul 13, 2023
@dolfinus dolfinus self-assigned this Jul 13, 2023
@dolfinus dolfinus temporarily deployed to test-pypi July 13, 2023 09:02 — with GitHub Actions Inactive
@dolfinus dolfinus temporarily deployed to test-pypi July 13, 2023 09:06 — with GitHub Actions Inactive
@dolfinus dolfinus marked this pull request as ready for review July 13, 2023 09:12
@codecov
Copy link

codecov bot commented Jul 13, 2023

Codecov Report

Merging #69 (c64c671) into develop (f4affb6) will decrease coverage by 0.03%.
The diff coverage is 90.38%.

@@             Coverage Diff             @@
##           develop      #69      +/-   ##
===========================================
- Coverage    93.16%   93.13%   -0.03%     
===========================================
  Files          129      132       +3     
  Lines         6146     6192      +46     
  Branches      1149     1156       +7     
===========================================
+ Hits          5726     5767      +41     
- Misses         322      327       +5     
  Partials        98       98              
Impacted Files Coverage Δ
onetl/base/base_file_format.py 100.00% <ø> (ø)
onetl/file/format/file_format.py 80.00% <80.00%> (ø)
onetl/file/format/csv.py 95.83% <95.83%> (ø)
onetl/connection/db_connection/mongodb.py 94.09% <100.00%> (ø)
onetl/file/format/__init__.py 100.00% <100.00%> (ø)
onetl/impl/generic_options.py 100.00% <100.00%> (ø)

@dolfinus dolfinus merged commit 2caf92f into develop Jul 13, 2023
37 of 38 checks passed
@dolfinus dolfinus deleted the feature/DOP-6705 branch July 13, 2023 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:skip-changelog Add this label to skip changelog file check
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants