New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AzureStor::list_storage_files(cont) fails under R Server since last update #109
Comments
|
Can you provide some more details? Are you running the same code on both your local machine and the server? What version of R does Workbench provide? Provide a sessionInfo() output for both of these. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@hongooi73 I'm having the same issue with the recent version 3.6.0 using the same script that runs without issue in 3.5.2. I did some testing and found out that the issue happens when the endpoint is in this format "https://storageaccount.**dfs**.core.windows.net/", but when using "https://storageaccount.**blob**.core.windows.net/", it works fine - knowing that our storage account is ADLSGen2. |
|
I'm still not able to replicate this, although I can see how various errors can happen. Make sure that the type of storage that you're trying to access matches the class of the storage endpoint object. In particular, if you do Similarly if you do Please post an exact, minimal sample of code that reproduces the problem, along with the exact error message that you're getting. |
|
One possibly workaround is to run |
|
Oh, one more thing:
Try removing the leading slash in the directory name ( |
@hongooi73 Using options(azure_storage_api_version = "2020-04-08") gave the exact same error. |
Make sure you're not calling |
Here is a code sample masking the organization data. Thanks in advance for the help! |
Even though I'm using |
|
It still works perfectly over here. What is the exact error you're getting? And post your sessionInfo() along with whether you're running this locally or on a remote server. |
|
Here is the error. I'm running this from RSutdio-Workbench. When I revert to the AzureStor V3.5.2, it works. |
|
Can you try running it in a local R session? I suspect that RStudio Workbench is screwing around with the URL that's being sent to the storage server. And you ARE authenticating with the AAD token, correct? |
|
I'm using AAD token V1. It works on my local machine with V3.6.0, however, I rarely use it and the environment (R version and the installed packages) may be different. |
|
Ok, if it works on your local machine but not on RStudio Workbench, that implies there's something funny going on with the latter. If there was a problem with the package, then it wouldn't work locally as well as on Workbench. |
|
You have any thoughts why would the previous version of AzureStor work on RStudio-Workbench, but not 3.6? We have contacts with RStudio partner but I need to understand more in order to raise the issue. |
|
Can you check if it works if you authenticate with a SAS? Also, does it work if you use the blob endpoint instead of ADLS2? It's accessing the same storage, but via a different API. |
I haven't tried to authenticate with SAS before. I'll have to do some research on that. Using the blob endpoint seems to work fine, is there really any difference between using the dfs vs blob endpoints? |
|
The blob endpoint is the older one. The ADLS2 endpoint is a more up-to-date design and generally better, although the blob API does have some features that aren't (yet) in ADLS2. In particular, directory listing is much faster and less error-prone when using ADLS2. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@hongooi73 Same error using SAS. |
@jmagosta Sorry for hijacking your issue, the error is identical and I didn't want to create a duplicate issue. |
|
@MichaelHannaEQT this is the issue for tracking RStudio Server/Workbench problems, no worries. @jmagosta were you having this same problem on AML? Or was it something different? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@jmagosta your problem is unrelated to this issue, but briefly: you are NOT authenticating to storage. AzureRMR is an interface to Azure Resource Manager which is the management plane for Azure resources. Authenticating there doesn't give you access to the data plane, which is what you need to actually read your files and whatnot. If you want to access the data plane via AAD, see the vignettes or the readme for this repo. |
|
@MichaelHannaEQT have you been able to resolve this yet? |
|
I'm also experiencing the same problem. On my local machine (RStudio Desktop), it works fine, but when using RStudio Server on an Azure ML Compute Instance, I'm getting the same error. Reverting to version 3.5.2 fixes it. Similarly, using |
@hongooi73 We had to revert back to 3.5.2 until we hear back from RStudio. Using the |
|
I have a tentative fix in the path-fix branch. Can you try that and see if it changes anything? |
Just tried with the path-fix branch and it works! |
|
@mjkanji excellent. Can you post the exact code that broke with 3.6.0 and works with the fix? I can see the change I made in 3.6.0, but I can't see why it should fail.... |
|
Hi @hongooi73, please see below: > pak::pkg_install("AzureStor") # v3.6.0 from CRAN is already installed so installation is skipped
✓ Loading metadata database ... done
ℹ No downloads are needed
✓ 1 pkg + 19 deps: kept 18 [3.6s]
# Restarting R session to reload packages
Restarting R session...
> list_files = function() {
+ org_name = "..."
+ some_container = "..."
+ some_dir = "..."
+
+ endpoint <- paste0("https://", org_name, ".dfs.core.windows.net")
+
+ token <- AzureAuth::get_azure_token("https://storage.azure.com",
+ tenant=Sys.getenv("TENANT_ID"),
+ app=Sys.getenv("APP_ID"),
+ password=Sys.getenv("CLIENT_SECRET"))
+
+ endp_token = AzureStor::storage_endpoint(endpoint, token = token)
+
+ container_name <- some_container
+ cont <- AzureStor::storage_container(endp_token, container_name)
+
+ AzureStor::list_storage_files(container = cont, some_dir)
+ }
> list_files()
Using client_credentials flow
Loading cached token
Error in list_adls_files(container, ...) :
Bad Request (HTTP 400). Failed to complete Storage Services operation. Message:
Value for one of the query parameters specified in the request URI is invalid.
RequestId: ...
Time: ...
# Now installing AzureStor using the path-fix branch
> pak::pkg_install("Azure/AzureStor@path-fix")
! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`.
✓ Loading metadata database ... done
→ Will update 1 package.
→ Will download 1 package with unknown size.
+ AzureStor 3.6.0 → 3.6.0.9000 [bld][cmp][dl] (GitHub: 1796dd3)
! AzureStor is loaded in the current R session, you probably need to restart R
after the installation.
? Do you want to continue (Y/n) Y
ℹ Getting 1 pkg with unknown size
✓ Cached copy of AzureStor 3.6.0.9000 (source) is the latest build
✓ No downloads needed, all packages are cached
ℹ Packaging AzureStor 3.6.0.9000
✓ Packaged AzureStor 3.6.0.9000 (557ms)
ℹ Building AzureStor 3.6.0.9000
✓ Built AzureStor 3.6.0.9000 (2.8s)
✓ Installed AzureStor 3.6.0.9000 (github::Azure/AzureStor@1796dd3) (28ms)
✓ 1 pkg + 19 deps: kept 17, upd 1 [11.2s]
# Restarting R session to reload packages
Restarting R session...
> list_files = function() {
+ org_name = "..."
+ some_container = "..."
+ some_dir = "..."
+
+ endpoint <- paste0("https://", org_name, ".dfs.core.windows.net")
+
+ token <- AzureAuth::get_azure_token("https://storage.azure.com",
+ tenant=Sys.getenv("TENANT_ID"),
+ app=Sys.getenv("APP_ID"),
+ password=Sys.getenv("CLIENT_SECRET"))
+
+ endp_token = AzureStor::storage_endpoint(endpoint, token = token)
+
+ container_name <- some_container
+ cont <- AzureStor::storage_container(endp_token, container_name)
+
+ AzureStor::list_storage_files(container = cont, some_dir)
+ }
> list_files()
Using client_credentials flow
Loading cached token
name size isdir lastModified permissions etag
1 ... ... ... ... ... ...For reference, I'm using RStudio on an Azure ML compute instance and the sessionInfo is: R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS
Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7.9000 purrr_0.3.4 readr_2.1.1
[6] tidyr_1.1.4 tibble_3.1.5 ggplot2_3.3.5 tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 cellranger_1.1.0 pillar_1.6.4 compiler_4.1.2
[5] dbplyr_2.1.1 AzureRMR_2.4.3 tools_4.1.2 lubridate_1.8.0
[9] jsonlite_1.7.3 lifecycle_1.0.1 gtable_0.3.0 pkgconfig_2.0.3
[13] rlang_1.0.0 reprex_2.0.1 AzureAuth_1.3.3 rstudioapi_0.13
[17] cli_3.1.1 DBI_1.1.2 curl_4.3.2 haven_2.4.3
[21] xml2_1.3.3 withr_2.4.3 httr_1.4.2 AzureStor_3.6.0.9000
[25] fs_1.5.2 hms_1.1.1 generics_0.1.1 vctrs_0.3.8
[29] askpass_1.1 rappdirs_0.3.3 grid_4.1.2 tidyselect_1.1.1
[33] glue_1.6.0 R6_2.5.1 fansi_1.0.2 readxl_1.3.1
[37] tzdb_0.2.0 modelr_0.1.8 magrittr_2.0.1 backports_1.4.1
[41] scales_1.1.1 ellipsis_0.3.2 rvest_1.0.2 assertthat_0.2.1
[45] colorspace_2.0-2 utf8_1.2.2 AzureGraph_1.3.2 stringi_1.7.6
[49] munsell_0.5.0 openssl_1.4.5 broom_0.7.10 crayon_1.4.2 |
|
Looks like another Linux/Windows |
|
@hongooi73 The update you made fixed the issue. Thank you for your help! |
The function worked until 10.01.2022, but fails since then. It is also working using RStudio Desktop, but not from RStudio Workbench.
I added the code below, where I also set "use_cache = FALSE" and renewed the token where I authenticated at https://microsoft.com/devicelogin and entered a code to authenticate.
Code used:
This is the error generated:
The text was updated successfully, but these errors were encountered: