Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to interact between local dataframes and Delta tables… #1397

Merged
merged 2 commits into from Dec 15, 2022

Conversation

dimberman
Copy link
Collaborator

@dimberman dimberman commented Dec 8, 2022

… in databricks

Description

Currently there is no established path for moving a dataframe into a databricks delta table except to manually save that dataframe as a local file and then push that file into delta using aql.load_file.

What is the new behavior?

This PR allows users to pass local dataframes into delta files using all of the existing methods withing the astro SDK (e.g. passing it into a transform function). Users should not need to learn any new functionality as all of this will work out of the box with existing aql based DAGs.

Does this introduce a breaking change?

No.

Checklist

  • Created tests which fail without the change (if possible)
  • Extended the README / documentation, if necessary

@codecov
Copy link

codecov bot commented Dec 8, 2022

Codecov Report

Base: 88.91% // Head: 97.32% // Increases project coverage by +8.41% 🎉

Coverage data is based on head (b206bb5) compared to base (e26f9fd).
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1397      +/-   ##
==========================================
+ Coverage   88.91%   97.32%   +8.41%     
==========================================
  Files          60       19      -41     
  Lines        3120      672    -2448     
  Branches      368        0     -368     
==========================================
- Hits         2774      654    -2120     
+ Misses        252       18     -234     
+ Partials       94        0      -94     
Impacted Files Coverage Δ
python-sdk/src/astro/databases/base.py
python-sdk/src/astro/databricks/api_utils.py
python-sdk/src/astro/databricks/delta.py
...dk/src/astro/databricks/load_file/load_file_job.py
python-sdk/src/astro/databricks/load_options.py
python-sdk/src/astro/utils/path.py
python-sdk/src/astro/sql/operators/export_file.py
python-sdk/src/astro/settings.py
python-sdk/src/astro/files/types/ndjson.py
python-sdk/src/astro/sql/operators/dataframe.py
... and 69 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@tatiana tatiana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dimberman, I'm happy for this to be merged once:

  • row limit is -1 by default (and that is when it is unlimited)
  • we remove Postgres from the test name

…pushing local dataframes into spark delta tables
@dimberman dimberman merged commit c532efb into main Dec 15, 2022
@dimberman dimberman deleted the spark-dataframe-interaction branch December 15, 2022 16:25
@dimberman dimberman linked an issue Dec 16, 2022 that may be closed by this pull request
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add local dataframe to delta support
3 participants