Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzureStor::list_storage_files(cont) fails under R Server since last update #109

Closed
RocketRob97 opened this issue Jan 12, 2022 · 43 comments · Fixed by #111
Closed

AzureStor::list_storage_files(cont) fails under R Server since last update #109

RocketRob97 opened this issue Jan 12, 2022 · 43 comments · Fixed by #111
Labels
bug Something isn't working

Comments

@RocketRob97
Copy link

@RocketRob97 RocketRob97 commented Jan 12, 2022

The function worked until 10.01.2022, but fails since then. It is also working using RStudio Desktop, but not from RStudio Workbench.

I added the code below, where I also set "use_cache = FALSE" and renewed the token where I authenticated at https://microsoft.com/devicelogin and entered a code to authenticate.

Code used:

container <- "container-name" 
endpoint <- "https://storageacount.dfs.core.windows.net"
tenant <- "xxxx-yyyy-zzzz"
app <- "xxxx-yyyy-zzzz"

token <- AzureAuth::get_azure_token("https://storage.azure.com/", 
                                    tenant = tenant, 
                                    app = app, auth_type = "device_code", 
                                    use_cache = FALSE)
 
endp <- AzureStor::storage_endpoint(endpoint = endpoint, token = token)
cont <- AzureStor::storage_container(endp, container)
AzureStor::list_storage_files(cont)

This is the error generated:

AzureStor::list_storage_files(cont)
Error in list_adls_files(container, ...) : 
  Bad Request (HTTP 400). Failed to complete Storage Services operation. Message:
Value for one of the query parameters specified in the request URI is invalid.
RequestId:51225b87-701f-0094-16fe-06ac09000000
Time:2022-01-11T15:22:02.0858775Z.
@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 13, 2022

Can you provide some more details? Are you running the same code on both your local machine and the server?

What version of R does Workbench provide? Provide a sessionInfo() output for both of these.

@jmagosta

This comment has been minimized.

@hongooi73

This comment has been minimized.

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

@hongooi73 I'm having the same issue with the recent version 3.6.0 using the same script that runs without issue in 3.5.2. I did some testing and found out that the issue happens when the endpoint is in this format "https://storageaccount.**dfs**.core.windows.net/", but when using "https://storageaccount.**blob**.core.windows.net/", it works fine - knowing that our storage account is ADLSGen2.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

I'm still not able to replicate this, although I can see how various errors can happen.

Make sure that the type of storage that you're trying to access matches the class of the storage endpoint object. In particular, if you do blob_endpoint("https://acctname.dfs.core.windows.net", *) where the URL contains "dfs" and not "blob", this will cause the 400 error "Value for one of the query parameters specified in the request URI is invalid."

Similarly if you do adls_endpoint("https://acctname.blob.core.windows.net", *) where the URL has "blob" and not "dfs", this will cause a 404 error "The specified container does not exist.".

Please post an exact, minimal sample of code that reproduces the problem, along with the exact error message that you're getting.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

One possibly workaround is to run options(azure_storage_api_version="2020-04-08") which is the API version from the previous AzureStor release. This will uncover whether changes in API version are causing any problems.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

Oh, one more thing:

Here's an example that appears to authenticate, but returns null info (Could be a problem with the az login timing out)

Try removing the leading slash in the directory name (variance instead of /variance). I've tweaked the code to better handle this, but there may still be issues.

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

One possibly workaround is to run options(azure_storage_api_version="2020-04-08") which is the API version from the previous AzureStor release. This will uncover whether changes in API version are causing any problems.

@hongooi73 Using options(azure_storage_api_version = "2020-04-08") gave the exact same error.
Error in list_adls_files(container, ...) : Bad Request (HTTP 400). Failed to complete Storage Services operation. Message: Value for one of the query parameters specified in the request URI is invalid. RequestId:39c415fe-... Time:2022-01-19T16:30:43.3998962Z.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

Error in list_adls_files(container, ...)

Make sure you're not calling list_adls_files on a blob container object. That's what stands out from that error message.

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

I'm still not able to replicate this, although I can see how various errors can happen.

Make sure that the type of storage that you're trying to access matches the class of the storage endpoint object. In particular, if you do blob_endpoint("https://acctname.dfs.core.windows.net", *) where the URL contains "dfs" and not "blob", this will cause the 400 error "Value for one of the query parameters specified in the request URI is invalid."

Similarly if you do adls_endpoint("https://acctname.blob.core.windows.net", *) where the URL has "blob" and not "dfs", this will cause a 404 error "The specified container does not exist.".

Please post an exact, minimal sample of code that reproduces the problem, along with the exact error message that you're getting.

Here is a code sample masking the organization data. Thanks in advance for the help!

az_token <-
  AzureRMR::get_azure_token(
    resource = "https://mystorage.dfs.core.windows.net/",
    tenant = tenant,
    app = app_id,
    password = app_secret,
    auth_type = "client_credentials", 
    use_cache = FALSE
  )

blob_endpoint <- AzureStor::storage_endpoint(endpoint = "https://mystorage.dfs.core.windows.net/", token = az_token)

container <- AzureStor::storage_container(blob_endpoint, cont_name)

adls_files <- AzureStor::list_storage_files(container,
                                            dir = "/folder/subfolder1/subfolder2/",
                                            info = "all",
                                            recursive = TRUE)

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

list_storage_files

Even though I'm using list_storage_files, the error prints Error in list_adls_files(container, ...)

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

It still works perfectly over here.

tok <- AzureAuth::get_azure_token("https://storage.azure.com/.default", ...)
ad_endp <- storage_endpoint("https://hongstor3.dfs.core.windows.net", token=tok)
cont <- storage_container(ad_endp, "contname")
list_storage_files(cont, "/tests/testthat", recursive=TRUE, info="all")

                                     name  size isdir        lastModified permissions
1                  tests/testthat/setup.R  1715 FALSE 2021-12-10 13:45:24   rw-r-----   
2        tests/testthat/test01_resource.R  2053 FALSE 2021-12-10 13:45:24   rw-r-----   
3            tests/testthat/test02_blob.R 11902 FALSE 2021-12-10 13:45:24   rw-r-----   
4        tests/testthat/test02a_blobext.R  4944 FALSE 2021-12-10 13:45:24   rw-r-----   
5       tests/testthat/test02b_blobdirs.R  3019 FALSE 2021-12-10 13:45:24   rw-r-----   
6     tests/testthat/test02c_blobappend.R  2516 FALSE 2021-12-10 13:45:24   rw-r-----
...

What is the exact error you're getting? And post your sessionInfo() along with whether you're running this locally or on a remote server.

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

Here is the error. I'm running this from RSutdio-Workbench. When I revert to the AzureStor V3.5.2, it works.

Error in list_adls_files(container, ...) : 
  Bad Request (HTTP 400). Failed to complete Storage Services operation. Message:
Value for one of the query parameters specified in the request URI is invalid.
RequestId:38f08eeb-.........
Time:2022-01-19T17:34:40.2611651Z.
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7                     pillar_1.6.2                   compiler_4.1.1                 RColorBrewer_1.1-2             AzureRMR_2.4.2                 tokenizers_0.2.1              
 [7] statquotes_0.2.2               tools_4.1.1                    bit_4.0.4                      jsonlite_1.7.2                 lifecycle_1.0.0                tibble_3.1.4                  
[13] lattice_0.20-44                pkgconfig_2.0.3                rlang_0.4.11                   Matrix_1.3-4                   AzureAuth_1.3.2                DBI_1.1.1                     
[19] curl_4.3.2                     yaml_2.2.1                     httr_1.4.2                     AzureStor_3.6.0                janeaustenr_0.1.5              dplyr_1.0.7                   
[25] stringr_1.4.0                  generics_0.1.0                 vctrs_0.3.8                     rappdirs_0.3.3                 bit64_4.0.5                   
[31] grid_4.1.1                     tidyselect_1.1.1               wordcloud_2.6                  glue_1.4.2                     R6_2.5.1                       fansi_0.5.0                   
[37] purrr_0.3.4                    magrittr_2.0.1                 SnowballC_0.7.0                ellipsis_0.3.2                 assertthat_0.2.1               tidytext_0.3.1                
[43] arrow_5.0.0                    config_0.3.1                   AzureGraph_1.3.1               utf8_1.2.2                     stringi_1.7.4                  crayon_1.4.1   

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

Can you try running it in a local R session? I suspect that RStudio Workbench is screwing around with the URL that's being sent to the storage server. And you ARE authenticating with the AAD token, correct?

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

I'm using AAD token V1. It works on my local machine with V3.6.0, however, I rarely use it and the environment (R version and the installed packages) may be different.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

Ok, if it works on your local machine but not on RStudio Workbench, that implies there's something funny going on with the latter. If there was a problem with the package, then it wouldn't work locally as well as on Workbench.

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

You have any thoughts why would the previous version of AzureStor work on RStudio-Workbench, but not 3.6? We have contacts with RStudio partner but I need to understand more in order to raise the issue.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

Can you check if it works if you authenticate with a SAS?

Also, does it work if you use the blob endpoint instead of ADLS2? It's accessing the same storage, but via a different API.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

https://community.rstudio.com/t/issues-with-azurestor-and-rstudio-workbench/126884

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

Can you check if it works if you authenticate with a SAS?

Also, does it work if you use the blob endpoint instead of ADLS2? It's accessing the same storage, but via a different API.

I haven't tried to authenticate with SAS before. I'll have to do some research on that.

Using the blob endpoint seems to work fine, is there really any difference between using the dfs vs blob endpoints?

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 19, 2022

The blob endpoint is the older one. The ADLS2 endpoint is a more up-to-date design and generally better, although the blob API does have some features that aren't (yet) in ADLS2. In particular, directory listing is much faster and less error-prone when using ADLS2.

@jmagosta

This comment has been minimized.

@jmagosta

This comment has been minimized.

@hongooi73

This comment has been minimized.

@jmagosta

This comment has been minimized.

@hongooi73

This comment has been minimized.

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

Can you check if it works if you authenticate with a SAS?

Also, does it work if you use the blob endpoint instead of ADLS2? It's accessing the same storage, but via a different API.

@hongooi73 Same error using SAS.

@horeees
Copy link

@horeees horeees commented Jan 19, 2022

Hey @hongooi73 - actually still in Mountain View. I get to visit Boston if I'm good. :) Maybe we should split this issue -- I'm loosing track of the complaint.

@jmagosta Sorry for hijacking your issue, the error is identical and I didn't want to create a duplicate issue.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 20, 2022

@MichaelHannaEQT this is the issue for tracking RStudio Server/Workbench problems, no worries.

@jmagosta were you having this same problem on AML? Or was it something different?

@jmagosta

This comment has been minimized.

@jmagosta

This comment has been minimized.

@hongooi73

This comment has been minimized.

@jmagosta

This comment has been minimized.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 24, 2022

@jmagosta your problem is unrelated to this issue, but briefly: you are NOT authenticating to storage. AzureRMR is an interface to Azure Resource Manager which is the management plane for Azure resources. Authenticating there doesn't give you access to the data plane, which is what you need to actually read your files and whatnot.

If you want to access the data plane via AAD, see the vignettes or the readme for this repo.

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 24, 2022

@MichaelHannaEQT have you been able to resolve this yet?

@mjkanji
Copy link

@mjkanji mjkanji commented Jan 25, 2022

I'm also experiencing the same problem. On my local machine (RStudio Desktop), it works fine, but when using RStudio Server on an Azure ML Compute Instance, I'm getting the same error.

Reverting to version 3.5.2 fixes it.

Similarly, using org.**blob**.core.windows.net instead of org.**dfs**.core.windows.net returns a valid response, though it of course recursively shows all the contents of the sub-directories instead of just the top level.

@horeees
Copy link

@horeees horeees commented Jan 25, 2022

@MichaelHannaEQT have you been able to resolve this yet?

@hongooi73 We had to revert back to 3.5.2 until we hear back from RStudio. Using the blob.core.windows.net works fine, but we won't switch it to keep it consistent in the organization.

@hongooi73 hongooi73 added the bug Something isn't working label Jan 25, 2022
@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 25, 2022

I have a tentative fix in the path-fix branch. Can you try that and see if it changes anything?

@mjkanji
Copy link

@mjkanji mjkanji commented Jan 25, 2022

I have a tentative fix in the path-fix branch. Can you try that and see if it changes anything?

Just tried with the path-fix branch and it works! 🎊 🎆

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 25, 2022

@mjkanji excellent. Can you post the exact code that broke with 3.6.0 and works with the fix?

I can see the change I made in 3.6.0, but I can't see why it should fail....

@mjkanji
Copy link

@mjkanji mjkanji commented Jan 25, 2022

Hi @hongooi73, please see below:

> pak::pkg_install("AzureStor") # v3.6.0 from CRAN is already installed so installation is skippedLoading metadata database ... doneNo downloads are needed1 pkg + 19 deps: kept 18 [3.6s]

# Restarting R session to reload packages
Restarting R session...

> list_files = function() {
+   org_name = "..."
+   some_container = "..."
+   some_dir = "..."
+   
+   endpoint <- paste0("https://", org_name, ".dfs.core.windows.net")
+   
+   token <- AzureAuth::get_azure_token("https://storage.azure.com",
+                                       tenant=Sys.getenv("TENANT_ID"),
+                                       app=Sys.getenv("APP_ID"),
+                                       password=Sys.getenv("CLIENT_SECRET"))
+   
+   endp_token = AzureStor::storage_endpoint(endpoint, token = token)
+   
+   container_name <- some_container
+   cont <- AzureStor::storage_container(endp_token, container_name)
+   
+   AzureStor::list_storage_files(container = cont, some_dir)
+ }
> list_files()
Using client_credentials flow
Loading cached token
 Error in list_adls_files(container, ...) : 
  Bad Request (HTTP 400). Failed to complete Storage Services operation. Message:
Value for one of the query parameters specified in the request URI is invalid.
RequestId: ...
Time: ... 

# Now installing AzureStor using the path-fix branch
> pak::pkg_install("Azure/AzureStor@path-fix")
! Using bundled GitHub PAT. Please add your own PAT using `gitcreds::gitcreds_set()`.Loading metadata database ... doneWill update 1 package.Will download 1 package with unknown size.
+ AzureStor 3.6.03.6.0.9000 [bld][cmp][dl] (GitHub: 1796dd3)

! AzureStor is loaded in the current R session, you probably need to restart R
after the installation.

? Do you want to continue (Y/n) YGetting 1 pkg with unknown sizeCached copy of AzureStor 3.6.0.9000 (source) is the latest buildNo downloads needed, all packages are cachedPackaging AzureStor 3.6.0.9000Packaged AzureStor 3.6.0.9000 (557ms)                              
ℹ Building AzureStor 3.6.0.9000Built AzureStor 3.6.0.9000 (2.8s)                                  
✓ Installed AzureStor 3.6.0.9000 (github::Azure/AzureStor@1796dd3) (28ms)
✓ 1 pkg + 19 deps: kept 17, upd 1 [11.2s]                              

# Restarting R session to reload packages
Restarting R session...

> list_files = function() {
+   org_name = "..."
+   some_container = "..."
+   some_dir = "..."
+   
+   endpoint <- paste0("https://", org_name, ".dfs.core.windows.net")
+   
+   token <- AzureAuth::get_azure_token("https://storage.azure.com",
+                                       tenant=Sys.getenv("TENANT_ID"),
+                                       app=Sys.getenv("APP_ID"),
+                                       password=Sys.getenv("CLIENT_SECRET"))
+   
+   endp_token = AzureStor::storage_endpoint(endpoint, token = token)
+   
+   container_name <- some_container
+   cont <- AzureStor::storage_container(endp_token, container_name)
+   
+   AzureStor::list_storage_files(container = cont, some_dir)
+ }
> list_files()
Using client_credentials flow
Loading cached token
  name size isdir lastModified permissions etag
1  ...  ...   ...          ...         ...  ...

For reference, I'm using RStudio on an Azure ML compute instance and the sessionInfo is:

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] forcats_0.5.1    stringr_1.4.0    dplyr_1.0.7.9000 purrr_0.3.4      readr_2.1.1     
[6] tidyr_1.1.4      tibble_3.1.5     ggplot2_3.3.5    tidyverse_1.3.1 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7           cellranger_1.1.0     pillar_1.6.4         compiler_4.1.2      
 [5] dbplyr_2.1.1         AzureRMR_2.4.3       tools_4.1.2          lubridate_1.8.0     
 [9] jsonlite_1.7.3       lifecycle_1.0.1      gtable_0.3.0         pkgconfig_2.0.3     
[13] rlang_1.0.0          reprex_2.0.1         AzureAuth_1.3.3      rstudioapi_0.13     
[17] cli_3.1.1            DBI_1.1.2            curl_4.3.2           haven_2.4.3         
[21] xml2_1.3.3           withr_2.4.3          httr_1.4.2           AzureStor_3.6.0.9000
[25] fs_1.5.2             hms_1.1.1            generics_0.1.1       vctrs_0.3.8         
[29] askpass_1.1          rappdirs_0.3.3       grid_4.1.2           tidyselect_1.1.1    
[33] glue_1.6.0           R6_2.5.1             fansi_1.0.2          readxl_1.3.1        
[37] tzdb_0.2.0           modelr_0.1.8         magrittr_2.0.1       backports_1.4.1     
[41] scales_1.1.1         ellipsis_0.3.2       rvest_1.0.2          assertthat_0.2.1    
[45] colorspace_2.0-2     utf8_1.2.2           AzureGraph_1.3.2     stringi_1.7.6       
[49] munsell_0.5.0        openssl_1.4.5        broom_0.7.10         crayon_1.4.2  

@hongooi73
Copy link
Collaborator

@hongooi73 hongooi73 commented Jan 25, 2022

Looks like another Linux/Windows file.path hassle. I can repro this in R running in a WSL2 instance on my laptop.

@horeees
Copy link

@horeees horeees commented Jan 26, 2022

@hongooi73 The update you made fixed the issue. Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants