Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to load from Azure blob storage into Databricks #1561

Merged
merged 8 commits into from
Mar 27, 2023

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jan 10, 2023

Closes: #1250

@tatiana tatiana changed the title [WIP] Add support for Databricks to load from Azure blob storage Add support for Databricks to load from Azure blob storage Jan 10, 2023
@codecov
Copy link

codecov bot commented Jan 10, 2023

Codecov Report

Patch coverage: 95.83% and project coverage change: +0.06 🎉

Comparison is base (1a3ed81) 86.35% compared to head (1b423d8) 86.42%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1561      +/-   ##
==========================================
+ Coverage   86.35%   86.42%   +0.06%     
==========================================
  Files         126      126              
  Lines        6786     6813      +27     
  Branches      670      672       +2     
==========================================
+ Hits         5860     5888      +28     
+ Misses        779      778       -1     
  Partials      147      147              
Flag Coverage Δ
PythonSDK 92.30% <95.83%> (+0.07%) ⬆️
SQL-CLI 97.67% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python-sdk/src/astro/files/locations/azure/wasb.py 97.61% <95.00%> (-0.82%) ⬇️
...ro/databases/databricks/load_file/load_file_job.py 96.77% <100.00%> (ø)
python-sdk/src/astro/files/locations/base.py 79.64% <100.00%> (+0.55%) ⬆️

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tatiana tatiana changed the title Add support for Databricks to load from Azure blob storage Add support to load from Azure blob storage into Databricks Jan 10, 2023
@tatiana tatiana force-pushed the python-sdk/1250 branch 2 times, most recently from 9064b44 to 93af85d Compare February 2, 2023 12:12
Copy link
Collaborator

@dimberman dimberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tatiana this looks great! Also fantastic that we can add new sources by just adding one function :). Can you please add an example DAG and update the documentation to discuss any steps a user would need to take to load data from Azure?

@tatiana tatiana force-pushed the python-sdk/1250 branch 2 times, most recently from 603389e to 8cd01ea Compare March 17, 2023 11:39
There is a deeper problem at the moment, in the sense that oure tests/implementation
rely on things being setup in the cluster in advance.

In gcs, this means:
- spark.hadoop.fs.gs.auth.service.account.email
- spark.hadoop.fs.gs.project.id
- spark.hadoop.google.cloud.auth.service.account.enable
- spark.hadoop.fs.gs.auth.service.account.private.key
- spark.hadoop.fs.gs.auth.service.account.private.key.id
@tatiana tatiana marked this pull request as ready for review March 27, 2023 01:35
@tatiana tatiana merged commit fe16af2 into main Mar 27, 2023
@tatiana tatiana deleted the python-sdk/1250 branch March 27, 2023 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Databricks autoloader support for Azure blob storage
2 participants