Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOP-9787] Test strategies when DBReader reads empty source #188

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

dolfinus
Copy link
Member

@dolfinus dolfinus commented Dec 13, 2023

Change Summary

If DBReader reads data from source (including empty source), the resulting dataframe contains all the data from the source at the moment of calling .run(). But some data is appended to the source after .run() is called, the result if different for different strategies:

  1. Dataframe returned within SnapshotStrategy always contains all the source data. This is because no additional filters are applied to original query, and Spark dataframes are lazy.
  2. Other strategies return dataframes which contains only data present in the source at the moment of calling .run(), but not data added to source after it. This is because filters based on current HWM value & strategy are baked into the query.

Added tests covering these edge cases.

Handling case source is empty -> HWM value is None -> cannot apply filter to query -> return e empty dataframe required adding new limit option to DBConnection.read_source_as_df. Passing limit=0 returns dataframe which is always empty, which is implementation detail.

Also improved behavior of batch strategies - if source contains no data, instead of raising ValueError just stop after first iteration, and return empty dataframe.

Related issue number

Checklist

  • Commit message and PR title is comprehensive
  • Keep the change as small as possible
  • Unit and integration tests for the changes exist
  • Tests pass on CI and coverage does not decrease
  • Documentation reflects the changes where applicable
  • docs/changelog/next_release/<pull request or issue id>.<change type>.rst file added describing change
    (see CONTRIBUTING.rst for details.)
  • My PR is ready to review.

@dolfinus dolfinus self-assigned this Dec 13, 2023
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from d9bf4c7 to 8571874 Compare December 13, 2023 08:09
@dolfinus dolfinus added the ci:skip-changelog Add this label to skip changelog file check label Dec 13, 2023
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from 8571874 to d8f41fe Compare December 13, 2023 08:13
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from d8f41fe to f12dd6b Compare December 13, 2023 08:28
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from f12dd6b to f1fe008 Compare December 13, 2023 08:36
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from f1fe008 to a2b6f51 Compare December 13, 2023 09:07
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from a2b6f51 to 30c5c26 Compare December 13, 2023 09:59
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from 30c5c26 to 29a9fe3 Compare December 13, 2023 10:00
@dolfinus dolfinus changed the title [DOP-9787] Test case then IncrementalStrategy reads empty source [DOP-9787] Test strategies when DBReader reads empty source Dec 13, 2023
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from 29a9fe3 to 94b21c3 Compare December 13, 2023 10:06
@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from 94b21c3 to 6c1d2bc Compare December 13, 2023 10:44
@dolfinus dolfinus removed the ci:skip-changelog Add this label to skip changelog file check label Dec 13, 2023
Copy link

codecov bot commented Dec 13, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (2b03409) 94.19% compared to head (f8df474) 94.30%.

Files Patch % Lines
onetl/strategy/batch_hwm_strategy.py 81.81% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #188      +/-   ##
===========================================
+ Coverage    94.19%   94.30%   +0.10%     
===========================================
  Files          204      204              
  Lines         7684     7705      +21     
  Branches      1347     1355       +8     
===========================================
+ Hits          7238     7266      +28     
+ Misses         330      326       -4     
+ Partials       116      113       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dolfinus dolfinus force-pushed the feature/DOP-9787-tests-strategy-no-data branch from e8f1751 to f8df474 Compare December 13, 2023 12:16
@dolfinus dolfinus marked this pull request as ready for review December 13, 2023 12:28
@dolfinus dolfinus merged commit 6c48fe7 into develop Dec 13, 2023
42 checks passed
@dolfinus dolfinus deleted the feature/DOP-9787-tests-strategy-no-data branch December 13, 2023 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants