-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add azure blob storage filesystem/staging destination #592
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few comments. please check
|
||
def to_adlfs_credentials(self) -> Dict[str, Any]: | ||
"""Return a dict that can be passed as kwargs to adlfs""" | ||
return dict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adlfs
does not need this token. (maybe it does that itself?)
fs = fsspec.filesystem("adl", account_name=adls_account_name, account_key=adls_account_key)
print(fs.ls("dlt-ci-test-bucket/"))
works for me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the SAS is needed for snowflake when not using named stage https://github.com/dlt-hub/dlt/pull/592/files#diff-577441894b0e65b848310ea282c52bab00b6f649eb2c1b9402a5423d70e4d3e4R84
Otherwise we don't need it
@sh-rp please take a look
thanks! |
@mjducutoae so blob storage will be available in alpha most probably on Tuesday. you can take a look at the implementation here. I think it is fairly standard way of using |
1a3e42d
to
284dc5f
Compare
@steinitzu: when providing the AZ credentials I am getting
when running "pytest tests/load/pipeline/test_stage_loading.py::test_all_data_types" with the "az-authorization" destination config. Can you see this locally too? I can't quite figure out which resource name is invalid. |
I have set up the az integration on snowflake and this seems to work, we only have to figure out this error for direct authentication. |
@sh-rp my bad. I was stripping out the bucket name when generating the snowflake compatible URL. Fixed in last commit, runs for me locally now. |
@steinitzu perfect, this works now. There is one failing test remaining which is: pytest tests/load/pipeline/test_filesystem_pipeline.py::test_pipeline_merge_write_disposition I am getting ('Failed to remove %s for %s', ['dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/_dlt_pipeline_state', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/_dlt_pipeline_state/', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/_dlt_pipeline_state/1693412282.432455.178b58ce42.jsonl', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/_dlt_pipeline_state/1693412284.13063.fdec11df2b.jsonl', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/other_data', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/other_data/', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/other_data/1693412284.13063.a5a78fdcf1.jsonl', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/some_data', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/some_data/', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/some_data/1693412284.13063.f72071cafb.jsonl', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/some_source._dlt_loads.1693412282.432455', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/some_source._dlt_loads.1693412283.199035', 'dlt-ci-test-bucket/test_23db9bc8501f4dc572e0c89497831423/some_source._dlt_loads.1693412284.13063'], ResourceExistsError('This operation is not permitted on a non-empty directory.\nRequestId:d7e03825-801e-002d-305d-db4ce4000000\nTime:2023-08-30T16:18:06.3619677Z\nErrorCode:DirectoryIsNotEmpty')) On the console for that, so maybe you're deleting a non empty dir which fails or something like that. |
One additional thing, azure is super verbose on the console even if the tests do not fail, maybe there is a way to limit that a bit? |
437131e
to
d8dbc1f
Compare
I couldn't replicate this specific error. But I was getting fail in this test because of the listings cache, so I disabled this for all fs clients f47473d |
The root logger is always set to This particular log can be disabled with: log = logging.getLogger('azure.core.pipeline.policies.http_logging_policy')
log.setLevel(logging.WARNING) But imo this is best solved by not overriding the root log level, then this verbosity is opt-in. |
@steinitzu hmmmm we do not touch the root logger. dlt was logging to root but that was removed several months ago.
log_cli_level is probably messing up with root logger during the tests several logs are throttled in
|
Hey @rudolfix import dlt
import logging
log = logging.getLogger()
print(log)
dlt.pipeline('my_pipe')
print(log) output ->
|
It's coming from airflow, just from being installed. |
6797065
to
410b1ed
Compare
Resolves #560