Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability load files using autoloader #1452

Merged
merged 10 commits into from
Dec 21, 2022
Merged

Conversation

dimberman
Copy link
Collaborator

@dimberman dimberman commented Dec 17, 2022

Description

What is the current behavior?

Currently all loading is done via the COPY INTO command. This command can work fine on small to mid-sized datasets but falls short when dealing with very large datasets. It also does not have a lot of popular autoloader features such as incremental loading.

CLOSES: #1445

What is the new behavior?

Now the user will by default use autoloader unless a) they set the load_options.load_mode to COPY INTO or b) they load a single file (since autoloader can not handle single files).

Does this introduce a breaking change?

Checklist

  • Created tests which fail without the change (if possible)
  • Extended the README / documentation, if necessary

@codecov
Copy link

codecov bot commented Dec 17, 2022

Codecov Report

Base: 97.36% // Head: 97.36% // No change to project coverage 👍

Coverage data is based on head (3d71d00) compared to base (0925b32).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1452   +/-   ##
=======================================
  Coverage   97.36%   97.36%           
=======================================
  Files          19       19           
  Lines         682      682           
=======================================
  Hits          664      664           
  Misses         18       18           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

python-sdk/src/astro/databricks/load_options.py Outdated Show resolved Hide resolved
python-sdk/src/astro/files/base.py Outdated Show resolved Hide resolved
python-sdk/src/astro/files/base.py Outdated Show resolved Hide resolved
python-sdk/docs/guides/databricks.rst Outdated Show resolved Hide resolved
python-sdk/src/astro/constants.py Outdated Show resolved Hide resolved
@dimberman dimberman merged commit 89a3639 into main Dec 21, 2022
@dimberman dimberman deleted the databricks/add-autoloader branch December 21, 2022 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Autoloader support
3 participants