Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cr_deploy_r - API returned: Job exceeds maximum size of 100 KB - long R script #45

Closed
j450h1 opened this issue Feb 25, 2020 · 51 comments
Closed

Comments

@j450h1
Copy link

j450h1 commented Feb 25, 2020

I have an Rscript which is 1250 lines (115.5 kb) and after running:

CRON_SCHEDULE = '0 * * * *' #every hour
cr_deploy_r(r = here::here('all_emojis.R'),
            r_image = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'),
            schedule = CRON_SCHEDULE
)

I get this error message:

Error: API returned: Job exceeds maximum size of 100 KB.

Any suggested workarounds or is this a hard limit?

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 25, 2020 via email

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 25, 2020

Example code:

rcode <- cr_buildstep_r("filepath/in/source.R", name = {dockerfile}, r_source = "runtime")

build <- cr_build_yaml(steps = rcode)

# upload source to GCS
my_gcs_source <- cr_build_upload_gcs("my_folder")

# will execute R file in "my_folder/filpath/in/source.R"
cr_build(build, source = my_gcs_source)

Or use GitHub/Bitbucket, write out your cloudbuild.yml file and create a build trigger that will look for your R script.

@j450h1
Copy link
Author

j450h1 commented Feb 26, 2020

Got it. Thanks for the additional examples. I'll give it a shot.

@j450h1
Copy link
Author

j450h1 commented Feb 26, 2020

Can you clarify what this value should be?

"filepath/in/source.R"

I have a folder called large in the root of my existing Rstudio project. The file in this folder is called all_emojis.R

This one below gives the following Build error: Error: object 'all_emojis.R' not found

rcode <- cr_buildstep_r('all_emojis.R', name = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'), r_source = "runtime")

build <- cr_build_yaml(steps = rcode)

# upload source to GCS
my_gcs_source <- cr_build_upload_gcs("large")

# will execute R file in "my_folder/filpath/in/source.R"
cr_build(build, source = my_gcs_source)

If I modify as below to add large folder in the path, I still get a Build error: Error: unexpected '/' in "/"

rcode <- cr_buildstep_r('large/all_emojis.R', name = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'), r_source = "runtime")

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 26, 2020 via email

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 26, 2020 via email

@j450h1
Copy link
Author

j450h1 commented Feb 26, 2020

Typo fixed; it is quoted. I was thinking along the same lines, but the build fails before running the R code so how can I execute list.files()?

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 26, 2020 via email

@j450h1
Copy link
Author

j450h1 commented Feb 26, 2020

Unfortunately below did not due the trick (as per commented out code, also tried the full path):

# Try FULL path to directory
#rcode <- cr_buildstep_r('all_emojis.R', name = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'), r_source = "runtime", 
dir = here::here("large"))
# Nope

rcode <- cr_buildstep_r('all_emojis.R', name = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'), r_source = "runtime", 
dir = "large")

build <- cr_build_yaml(steps = rcode)

# upload source to GCS
my_gcs_source <- cr_build_upload_gcs("large")

# will execute R file in "my_folder/filpath/in/source.R"
cr_build(build, source = my_gcs_source)

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 26, 2020 via email

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 26, 2020 via email

@j450h1
Copy link
Author

j450h1 commented Feb 26, 2020

Okay let me try that method.

@axel-analyst
Copy link

Hi Mark,
almost the same problem but I stumbled upon the error in Cloud Build:

ERROR
ERROR: build step 0 "rocker/r-base" failed: step exited with non-zero status: 1 

I tried this variation of the solution:

rcode <- cr_buildstep_r("gs://python-for-bigquery_cloudbuild/searchconsole.R", r_source = "runtime")

build <- cr_build_yaml(steps = rcode)
httr::set_config(httr::config(http_version = 0))

# upload source to GCS
my_gcs_source <- cr_build_upload_gcs("")

# will execute R file in "my_folder/filpath/in/source.R"
cr_build(build, source = my_gcs_source)

@j450h1
Copy link
Author

j450h1 commented Feb 28, 2020

@axel-analyst

Thanks, I tried your variation as well. Looks like your going in the right direction as I now see Google Cloud Storage as the Source under Cloud Build History:

image

Unfortunately, I'm getting a similar build error message I've seen before:

ERROR: build step 0 "rocker/r-base" failed: step exited with non-zero status: 1
ERROR
Execution halted
Error: unexpected '/' in "gs:/"

Here is my variation based on your code:

library(googleCloudRunner)
#> Setting scopes to https://www.googleapis.com/auth/cloud-platform
#> Successfully auto-authenticated via {REDACTED_KEY_PATH}
library(glue)

# Uploaded all_emojis.R file to GCP bucket called {PROJECT}

PROJECT <- Sys.getenv("GCE_DEFAULT_PROJECT_ID")
rcode <- cr_buildstep_r(glue("gs://{PROJECT}/all_emojis.R"), r_source = "runtime")
#> 2020-02-28 00:58:22> Will read code in source from filepath gs:/{PROJECT}/all_emojis.R

build <- cr_build_yaml(steps = rcode)
httr::set_config(httr::config(http_version = 0))

# upload source to GCS
my_gcs_source <- cr_build_upload_gcs("")
#> 2020-02-28 00:58:22> #Upload    to  gs://{PROJECT}/20200228005822.tar.gz
#> 2020-02-28 00:58:22> Copying files from  to /deploy
#> 2020-02-28 00:58:22> Compressing files from /deploy to .tar.gz
#> 2020-02-28 00:58:22> Uploading .tar.gz to {PROJECT}/20200228005822.tar.gz
#> 2020-02-28 00:58:22 -- File size detected as 29 bytes

# will execute R file in "my_folder/filpath/in/source.R"
cr_build(build, source = my_gcs_source)
#> 2020-02-28 00:58:23> Cloud Build started - logs: 
#> https://console.cloud.google.com/gcr/builds/3132b55b-a8ae-4623-a0c8-0e06302b4edb?project=278886342413

Created on 2020-02-28 by the reprex package (v0.3.0)

Looks like step 0 as per the error is trying to execute this bash command:

Rscript -e gs://{PROJECT}/all_emojis.R

However, you can't simply execute R code directly from Google Cloud Storage bucket location.

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020 via email

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020 via email

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020 via email

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020 via email

@axel-analyst
Copy link

What do I need also to add to rebuild JSON after usage of gcs_source() ? Because I get another error before the last step

> gcs_global_bucket("python-for-bigquery_cloudbuild")
Set default bucket name to 'python-for-bigquery_cloudbuild'
> rcode2<-gcs_source("searchconsole.R", bucket = gcs_get_global_bucket())
2020-02-28 10:20:25> Token exists.
2020-02-28 10:20:25> Request: https://www.googleapis.com/storage/v1/b/python-for-bigquery_cloudbuild/o/searchconsole.R?alt=media
2020-02-28 10:20:25> No checks on content due to option googleAuthR.rawResponse,
              returning raw
2020-02-28 10:20:25 -- Saved searchconsole.R to /tmp/Rtmp5cEg3F/file87158cc8d66.R (5.8 Kb)
> 
> build <- cr_build_yaml(steps = rcode2)
> httr::set_config(httr::config(http_version = 0))
> 
> my_gcs_source <- cr_build_upload_gcs("")
2020-02-28 10:20:25> #Upload    to  gs://python-for-bigquery_cloudbuild/20200228102025.tar.gz
2020-02-28 10:20:25> Copying files from  to /deploy
2020-02-28 10:20:25> Compressing files from /deploy to .tar.gz
2020-02-28 10:20:25> Uploading .tar.gz to python-for-bigquery_cloudbuild/20200228102025.tar.gz
2020-02-28 10:20:25 -- File size detected as 29 bytes
2020-02-28 10:20:25 -- Simple upload
2020-02-28 10:20:25> Token exists.
2020-02-28 10:20:25> Request: https://www.googleapis.com/upload/storage/v1/b/python-for-bigquery_cloudbuild/o/?uploadType=media&name=20200228102025.tar.gz&predefinedAcl=bucketOwnerFullControl
2020-02-28 10:20:25> Could not parse body JSON
> cr_build(build, source = my_gcs_source)
2020-02-28 10:20:25> Token exists.
2020-02-28 10:20:25> Request: https://cloudbuild.googleapis.com/v1/projects/python-for-bigquery/builds/
2020-02-28 10:20:25> Body JSON parsed to: {"steps":{"value":["library(data.table)\nlibrary(googleAuthR)\n\n\n\n\nd<-NA\n\n\n\n\ndates <- seq(as.Date('2020-02-13'),as.Date('2020-02-25'), by=1)\ndates<-as.character(dates)","for (d in dates) {\n  \n  d1<-as.integer(format(as.Date(d), '%Y%m%d')) \n  \n  options(googleAuthR.scopes.selected = 'https://www.googleapis.com/auth/webmasters')\n  gar_auth_service('python-for-bigquery.json')\n  options('searchConsoleR.client_id' = '895678869460-c31eo4ubbphvd8hm8lfcafac10f5j081.apps.googleusercontent.com')\n  options('searchConsoleR.client_secret' = '74wKaGF_JUk8qCLIcazZ1Bki')\n  \n  library(searchConsoleR)\n  \n  soft<-search_analytics('https://soft.rozetka.com.ua/', \n                         startDate = d,\n                         endDate = d, \n                         dimensions = c('country','date','device','page','query'), \n                         searchType = c('web','video', 'image'), \n                         dimensionFilterExp = NULL,\n                         aggregationType = 'auto', \n                         prettyNames = TRUE, \n                         rowLimit=999999,\n                         walk_data = 'byBatch'\n                         \n  )","hunter<-search_analytics('https://hunter.rozetka.com.ua/', \n                           startDate = d,\n                           endDate = d, \n                           dimensions = c('country','date','device','page','query'), \n                           searchType = c('web','video', 'image'), \n                           dimensionFilterExp = NULL,\n                           aggregationType = 'auto', \n                           prettyNames = TRUE, \n                           rowLimit=999999,\n                           walk_data = 'byBatch'\n                           \n  )","auto<-search_analytics('http://auto.rozetka.com.ua/', \n                         startDate = d,\n                         endDate = d, \n                         dimensions = c('country','date','device','page','query'), \n                         searchType = c('web','video', 'image'), \n                         dimensionFilterExp = NULL,\n                         aggregationType = 'auto', \n                         prettyNames = TRUE, \n                         rowLimit=999999,\n                         walk_data = 'byBatch'\n                         \n  )","apteka<-search_analytics('https://apteka.rozetka.com.ua/', \n                           startDate = d,\n                           endDate = d, \n                           dimensions = c('country','date','device','page','query'), \n                           searchType = c('web','video', 'image'), \n                           dimensionFilterExp = NULL,\n                           aggregationType = 'auto', \n                           prettyNames = TRUE, \n                           rowLimit=999999,\n                           walk_data = 'byBatch'\n                           \n  )","hard<-search_analytics('https://hard.rozetka.com.ua/', \n                         startDate = d,\n                         endDate = d, \n                         dimensions = c('country','date','device','page','query'), \n                         searchType = c('web','video', 'image'), \n                         dimensionFilterExp = NULL,\n                         aggregationType = 'auto', \n                         prettyNames = TRUE, \n                         rowLimit=999999,\n                         walk_data = 'byBatch'\n                         \n  )","rozetka<-search_analytics('https://rozetka.com.ua/', \n                            startDate = d,\n                            endDate = d, \n                            dimensions = c('country','date','device','page','query'), \n                            searchType = c('web','video', 'image'), \n                            dimensionFilterExp = NULL,\n                            aggregationType = 'auto', \n                            prettyNames = TRUE, \n                            rowLimit=999999,\n                            walk_data = 'byBatch'\n                            \n  )","mobile<-search_analytics('https://m.rozetka.com.ua/ ', \n                           startDate = d,\n                           endDate = d, \n                           dimensions = c('country','date','device','page','query'), \n                           searchType = c('web','video', 'image'), \n                           dimensionFilterExp = NULL,\n                           aggregationType = 'auto', \n                           prettyNames = TRUE, \n                           rowLimit=999999,\n                           walk_data = 'byBatch'\n                           \n  )","bt<-search_analytics('https://bt.rozetka.com.ua/', \n                       startDate = d,\n                       endDate = d, \n                       dimensions = c('country','date','device','page','query'), \n                       searchType = c('web','video', 'image'), \n                       dimensionFilterExp = NULL,\n                       aggregationType = 'auto', \n                       prettyNames = TRUE, \n                       rowLimit=999999,\n                       walk_data = 'byBatch'\n                       \n  )","gsc_table<-rbindlist(list(soft,hunter,apteka,hard,rozetka,mobile,bt))\n  \n  options(googleAuthR.scopes.selected = 'https://www.googleapis.com/auth/bigquery')\n  gar_auth_service('scheduler.json')\n  #q_auth(path='/scheduler.json', email='kuchko@rozetka.com.ua')\n  library(bigrquery)","table<-paste('gsc',d1, sep='_') \n  \n  bq_table_upload(bq_table('rozetka-com-ua', 'GoogleSearchConsole', table), gsc_table,fields = list(\n    bq_field('country', 'string'),\n    bq_field('date', 'date'),\n    bq_field('device', 'string'),\n    bq_field('page', 'string'),\n    bq_field('query', 'string'),\n    bq_field('countryName', 'string'),\n    bq_field('clicks', 'integer'),\n    bq_field('impressions', 'integer'), \n    bq_field('ctr', 'float'), \n    bq_field('position', 'float')\n    \n    \n  ))\n  \n}\n"],"visible":false},"source":{"storageSource":{"bucket":"python-for-bigquery_cloudbuild","object":"20200228102025.tar.gz"}}}
2020-02-28 10:20:25> Request Status Code: 400
2020-02-28 10:20:25> API returned error: Invalid JSON payload received. Unknown name "value" at 'build.steps': Cannot find field.
Invalid JSON payload received. Unknown name "visible" at 'build.steps': Cannot find field.
2020-02-28 10:20:25> No retry attempted: Invalid JSON payload received. Unknown name "value" at 'build.steps': Cannot find field.
Invalid JSON payload received. Unknown name "visible" at 'build.steps': Cannot find field.
Error: API returned: Invalid JSON payload received. Unknown name "value" at 'build.steps': Cannot find field.
Invalid JSON payload received. Unknown name "visible" at 'build.steps': Cannot find field.

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020 via email

@axel-analyst
Copy link

Thanks, Mark. Yes it simplier, but I got something wrong because I still get the argument limit error. Here is the particular code:

build <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      "gsutil",
      args = c("cp",
               "gs://python-for-bigquery_cloudbuild/searchconsole.R",
               "searchconsole.R")),
    cr_buildstep_r(
      "gcr.io/gcer-public/render_rmd:master",
      r = "searchconsole.R",
    ))
)

httr::set_config(httr::config(http_version = 0))

my_gcs_source <- cr_build_upload_gcs("")


cr_build(build, source = my_gcs_source)

The error:

2020-02-28 13:00:29> Request Status Code: 400
2020-02-28 13:00:29> API returned error: invalid build: invalid .steps field: build step 1 arg 2 too long (max: 4000)
2020-02-28 13:00:29> No retry attempted: invalid build: invalid .steps field: build step 1 arg 2 too long (max: 4000)
Error: API returned: invalid build: invalid .steps field: build step 1 arg 2 too long (max: 4000)

@j450h1
Copy link
Author

j450h1 commented Feb 28, 2020

I'm able to successfully download the rscript from GCS, but cr_buildstep_r is not detecting the file for some reason:

ERROR: build step 1 "gcr.io/cardinal-path/hello-world:dev" failed: step exited with non-zero status: 1
ERROR
Finished Step #1
Step #1: Execution halted
Step #1: Error: object 'all_emojis.R' not found

library(googleCloudRunner)
library(glue)

PROJECT <- Sys.getenv('GCE_DEFAULT_PROJECT_ID')

large_r_job <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download R file",
      name = "gsutil",
      args = c("cp",
               glue("gs://{PROJECT}/all_emojis.R"),
               "all_emojis.R")
    )
    # ,cr_buildstep_bash(
    #   bash_script = "ls -altr && Rscript -e all_emojis.R",
    #   name = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'),
    #   bash_source = "runtime"
    # )
    ,cr_buildstep_r('all_emojis.R', name = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'), r_source = "runtime")
  )
)

httr::set_config(httr::config(http_version = 0))

my_gcs_source <- cr_build_upload_gcs("")

cr_build(large_r_job, source = my_gcs_source)

When running the bash variation that has been commented out, we can see that all_emojis.R seems to be there...

ERROR: build step 1 "gcr.io/cardinal-path/hello-world:dev" failed: step exited with non-zero status: 1
ERROR
Finished Step #1
Step #1: Execution halted
Step #1: Error: object 'all_emojis.R' not found
Step #1: drwxr-xr-x 1 root root 4096 Feb 28 13:21 ..
Step #1: drwxr-xr-x 2 root root 4096 Feb 28 13:21 .
Step #1: -rw-r--r-- 1 root root 118268 Feb 28 13:21 all_emojis.R
Step #1: total 124

Tried the same approach of downloading to a folder called workspace (then specifying workspace/all_emojis.R), then it doesn't detect the object workspace - not sure why it calls it an object. I would expect it to say it can't detect 'workspace/all_emojis.R' but that might be irrelevant.

@MarkEdmondson1234
Copy link
Owner

@axel-analyst It looks like this should work:

build <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      "gsutil",
      args = c("cp",
               "gs://python-for-bigquery_cloudbuild/searchconsole.R",
               "searchconsole.R")),
    cr_buildstep_r(
      "gcr.io/gcer-public/render_rmd:master",
      r = "searchconsole.R",
      r_source = "runtime"
    ))
)

The "runtime" tells the build step to not copy the file into the Cloud Build yaml, but to execute the file directly.

@axel-analyst
Copy link

@MarkEdmondson1234
Many thanks, I just forgot to copy from previous effort. Now I got the problem with finding file after import. The file is in Cloud and found first time, but then:


starting build "ee8b1956-64f0-4418-982f-60b777bd244f"

FETCHSOURCE
Fetching storage object: gs://python-for-bigquery_cloudbuild/20200228130029.tar.gz#1582894829431233
Copying gs://python-for-bigquery_cloudbuild/20200228130029.tar.gz#1582894829431233...
/ [0 files][    0.0 B/   29.0 B]                                                
/ [1 files][   29.0 B/   29.0 B]                                                
Operation completed over 1 objects/29.0 B.                                       
BUILD
Starting Step #0
Step #0: Already have image (with digest): gcr.io/cloud-builders/gsutil
Step #0: Copying gs://python-for-bigquery_cloudbuild/searchconsole.R...
Step #0: / [0 files][    0.0 B/  5.8 KiB]                                                
/ [1 files][  5.8 KiB/  5.8 KiB]                                                
Step #0: Operation completed over 1 objects/5.8 KiB.                                      
Finished Step #0
Starting Step #1
Step #1: Pulling image: gcr.io/gcer-public/render_rmd:master
Step #1: master: Pulling from gcer-public/render_rmd
Step #1: 16ea0e8c8879: Pulling fs layer
Step #1: 7ce39da2c1e2: Pulling fs layer
Step #1: ff1bceed0bef: Pulling fs layer
Step #1: e36d273bec5a: Pulling fs layer
Step #1: d3acc34c6c77: Pulling fs layer
Step #1: 14d07989ce8b: Pulling fs layer
Step #1: 73b6bcbfcb26: Pulling fs layer
Step #1: 70b803ec0e47: Pulling fs layer
Step #1: 60939e511b02: Pulling fs layer
Step #1: b9e4e93c5fff: Pulling fs layer
Step #1: e36d273bec5a: Waiting
Step #1: d3acc34c6c77: Waiting
Step #1: 14d07989ce8b: Waiting
Step #1: 73b6bcbfcb26: Waiting
Step #1: 70b803ec0e47: Waiting
Step #1: 60939e511b02: Waiting
Step #1: b9e4e93c5fff: Waiting
Step #1: 16ea0e8c8879: Verifying Checksum
Step #1: 16ea0e8c8879: Download complete
Step #1: e36d273bec5a: Verifying Checksum
Step #1: e36d273bec5a: Download complete
Step #1: d3acc34c6c77: Verifying Checksum
Step #1: d3acc34c6c77: Download complete
Step #1: 7ce39da2c1e2: Verifying Checksum
Step #1: 7ce39da2c1e2: Download complete
Step #1: 14d07989ce8b: Verifying Checksum
Step #1: 14d07989ce8b: Download complete
Step #1: 73b6bcbfcb26: Verifying Checksum
Step #1: 73b6bcbfcb26: Download complete
Step #1: ff1bceed0bef: Verifying Checksum
Step #1: ff1bceed0bef: Download complete
Step #1: b9e4e93c5fff: Verifying Checksum
Step #1: b9e4e93c5fff: Download complete
Step #1: 70b803ec0e47: Verifying Checksum
Step #1: 70b803ec0e47: Download complete
Step #1: 60939e511b02: Verifying Checksum
Step #1: 60939e511b02: Download complete
Step #1: 16ea0e8c8879: Pull complete
Step #1: 7ce39da2c1e2: Pull complete
Step #1: ff1bceed0bef: Pull complete
Step #1: e36d273bec5a: Pull complete
Step #1: d3acc34c6c77: Pull complete
Step #1: 14d07989ce8b: Pull complete
Step #1: 73b6bcbfcb26: Pull complete
Step #1: 70b803ec0e47: Pull complete
Step #1: 60939e511b02: Pull complete
Step #1: b9e4e93c5fff: Pull complete
Step #1: Digest: sha256:62fd61d22ec3632f6be57a2cecc597c3c781d4503acfead4e02e46f8d694f193
Step #1: Status: Downloaded newer image for gcr.io/gcer-public/render_rmd:master
Step #1: gcr.io/gcer-public/render_rmd:master
Step #1: Error: object 'searchconsole.R' not found
Step #1: Execution halted
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/gcer-public/render_rmd:master" failed: step exited with non-zero status: 1

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020

@axel-analyst that is a file path issue, its safest to use the absolute filepath when downloading/executing such as /workspace/filename.R

@j450h1 thanks, you found a bug when using cr_buildstep_r when r_source="runtime" - it was executing Rscript -e filename.R which is invalid. It should be Rscript filename.R and only use -e when in-line R code is used e.g. Rscript -e "print('hello world')"

I have fixed this is the latest commit, so to try it out please load dev version via remotes::install_github("MarkEdmondson1234/googleCloudRunner")

I will also add an option to not need to worry about how to download the R script from Cloud Storage, as the above thread shows its non-trivial. I will do it by saying if your R script starts with gs:// then assume its on a Cloud Storage bucket, and add the buildstep within the function to download and execute it in the right place.

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020

So given the above, this is a working example I believe should solve both @axel-analyst and @j450h1 's use cases:

library(googleCloudRunner)

large_r_job <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download R file",
      name = "gsutil",
      args = c("cp",
               "gs://mark-edmondson-public-read/schedule.R",
               "/workspace/schedule.R")
    ),
    cr_buildstep_r('/workspace/schedule.R', 
                           r_source = "runtime", 
                           name = "gcr.io/gcer-public/googleauthr-verse:latest")
  )
)

I've used a public bucket so you should be able to run the code above. For your own use case replace your bucket and R script location and docker image.

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020

The gs:// support is now in on the latest github version, so the above example becomes:

library(googleCloudRunner)

large_r_job <- cr_build_yaml(
    cr_buildstep_r("gs://mark-edmondson-public-read/schedule.R", 
                   name = "gcr.io/gcer-public/googleauthr-verse:latest")
)

cr_build(large_r_job)

MarkEdmondson1234 added a commit that referenced this issue Feb 28, 2020
@axel-analyst
Copy link

axel-analyst commented Feb 28, 2020

Unfortunately, I still have the problem:

Step #1: Status: Downloaded newer image for gcr.io/gcer-public/googleauthr-verse:latest
Step #1: gcr.io/gcer-public/googleauthr-verse:latest
Step #1: Error: unexpected '/' in "/"
Step #1: Execution halted
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/gcer-public/googleauthr-verse:latest" failed: step exited with non-zero status: 1

My workspace is '/home/m/' and tried different options and as you see I did not change anything except path from the usecase:

build <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      name="gsutil",
      args = c("cp",
               "gs://python-for-bigquery_cloudbuild/searchconsole.R",
               "/home/m/searchconsole.R")),
    cr_buildstep_r(
      name = "gcr.io/gcer-public/googleauthr-verse:latest",
      r = '/home/m/searchconsole.R',
      r_source = "runtime"
    ))
)

httr::set_config(httr::config(http_version = 0))
cr_build(build)

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020 via email

@j450h1
Copy link
Author

j450h1 commented Feb 28, 2020

Awesome! Thanks a lot for help with the troubleshooting/debugging.

✅ Updated to latest version of package (github version through remotes::)
✅ Successfully ran and deployed toy rscript shared by you as per your example above
✅ Successfully ran script called all_emojis.R uploaded on my private GCS bucket
✅ Have another use case for the emo package in R with all these checkmarks 😄

PROJECT <- Sys.getenv('GCE_DEFAULT_PROJECT_ID')
large_r_job <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download R file",
      name = "gsutil",
      args = c("cp",
               glue("gs://{PROJECT}/all_emojis.R"),
               "/workspace/all_emojis.R")
    ),
    cr_buildstep_r('/workspace/all_emojis.R', 
                           r_source = "runtime", 
                           name = glue('{DOCKER_IMAGE}:{DOCKER_IMAGE_TAG}'))
  )
)

my_gcs_source <- cr_build_upload_gcs("")

cr_build(large_r_job, source = my_gcs_source)

@axel-analyst
Copy link

Thanks for the commit, I see, but I still get the same error.

@j450h1
Copy link
Author

j450h1 commented Feb 28, 2020

@axel-analyst - simple thing, but did you double check that you ran

library(googleCloudRunner) after updating to new package version?

@axel-analyst
Copy link

@j450h1 thanks for a tip, but I did it, it's no an issue:

remotes::install_github("MarkEdmondson1234/googleCloudRunner")

library(googleCloudRunner)

large_r_job <- cr_build_yaml(
  cr_buildstep_r("gs://python-for-bigquery_cloudbuild/searchconsole1.R", 
                 name = "gcr.io/gcer-public/googleauthr-verse:latest")
)

cr_build(large_r_job)

@j450h1
Copy link
Author

j450h1 commented Feb 28, 2020

Have you successfully run Mark's example as per below? I can help troubleshoot this one if you have issues running that since its exact same code.

library(googleCloudRunner)

large_r_job <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download R file",
      name = "gsutil",
      args = c("cp",
               "gs://mark-edmondson-public-read/schedule.R",
               "/workspace/schedule.R")
    ),
    cr_buildstep_r('/workspace/schedule.R', 
                           r_source = "runtime", 
                           name = "gcr.io/gcer-public/googleauthr-verse:latest")
  )
)

@axel-analyst
Copy link

Yes, I tried both options after updating the package. The issue is with path of workspace, but I cannot figure out what. getwd() print "/home/m". I tried several options in cr_build_yaml() but still the error is there

@MarkEdmondson1234
Copy link
Owner

My workspace is '/home/m/' and tried different options and as you see I did not change anything except path from the usecase:

All Cloud Build builds run in /workspace/ - why is yours /home/m/ ?

@MarkEdmondson1234
Copy link
Owner

MarkEdmondson1234 commented Feb 28, 2020

@axel-analyst Could you output what you see when you run the below (with current dev version):

build <- cr_build_yaml(
    cr_buildstep_r("gs://python-for-bigquery_cloudbuild/searchconsole.R",
                   name = "gcr.io/gcer-public/googleauthr-verse:latest")
)

cr_build(build)

Remember to restart R in between loading the new version.

@axel-analyst
Copy link

axel-analyst commented Feb 29, 2020

Yes, I restarted and re-auth. Sorry, I thought that /workspace/ is getwd() result. I changed them.

The code:


library(googleComputeEngineR)
options(googleAuthR.scopes.selected = "https://www.googleapis.com/auth/cloud-platform")


cr_project_set('python-for-bigquery')
gce_global_zone('us-central1-a')

Sys.setenv("GCE_AUTH_FILE" = "scheduler.json")
gce_global_project('python-for-bigquery')
gcs_global_bucket('python-for-bigquery_cloudbuild')
cr_bucket_set('python-for-bigquery_cloudbuild')
cr_region_set('us-central1')
cr_email_set("scheduler@python-for-bigquery.iam.gserviceaccount.com")
library(googleCloudRunner)

build <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download R file",
      name = "gsutil",
      args = c("cp",
               "gs://python-for-bigquery_cloudbuild/searchconsole1.R",
               "/workspace/searchconsole1.R")
    ),
    cr_buildstep_r('workspace/searchconsole1.R', 
                   r_source = "runtime", 
                   name = "gcr.io/gcer-public/googleauthr-verse:latest")
  )
)

httr::set_config(httr::config(http_version = 0))
cr_build(build)

The result in Cloud Build:

tep #1: Digest: sha256:2b75f5357cf5c92a8e689a153e5ee0dcc1e0366edb452256c89acd2ccc968088
Step #1: Status: Downloaded newer image for gcr.io/gcer-public/googleauthr-verse:latest
Step #1: gcr.io/gcer-public/googleauthr-verse:latest
Step #1: Error: unexpected string constant in:
Step #1: "
Step #1: ""
Step #1: Execution halted
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/gcer-public/googleauthr-verse:latest" failed: step exited with non-zero status: 1

@axel-analyst
Copy link

@axel-analyst Could you output what you see when you run the below (with current dev version):

build <- cr_build_yaml(
    cr_buildstep_r("gs://python-for-bigquery_cloudbuild/searchconsole.R",
                   name = "gcr.io/gcer-public/googleauthr-verse:latest")
)

cr_build(build)

Remember to restart R in between loading the new version.
As for this, I just received such error:

starting build "0b30f2a9-a28f-44f1-a7ea-7de2d750385e"

FETCHSOURCE
BUILD
Starting Step #0 - "download r script"
Step #0 - "download r script": Already have image (with digest): gcr.io/cloud-builders/gsutil
Step #0 - "download r script": Copying gs://python-for-bigquery_cloudbuild/searchconsole1.R...
Step #0 - "download r script": / [0 files][    0.0 B/  6.0 KiB]                                                
/ [1 files][  6.0 KiB/  6.0 KiB]                                                
Step #0 - "download r script": Operation completed over 1 objects/6.0 KiB.                                      
Finished Step #0 - "download r script"
Starting Step #1
Step #1: Pulling image: gcr.io/gcer-public/googleauthr-verse:latest
Step #1: latest: Pulling from gcer-public/googleauthr-verse
Step #1: 8f0fdd3eaac0: Pulling fs layer
Step #1: c42f03650681: Pulling fs layer
Step #1: e8d8a2a587cb: Pulling fs layer
Step #1: 8070157c9f99: Pulling fs layer
Step #1: 0a7a0529ec26: Pulling fs layer
Step #1: 8781e7725be3: Pulling fs layer
Step #1: dfd244768473: Pulling fs layer
Step #1: 0346eddd3dca: Pulling fs layer
Step #1: ffc112aa5c49: Pulling fs layer
Step #1: f3da3d46fdec: Pulling fs layer
Step #1: 8070157c9f99: Waiting
Step #1: 0a7a0529ec26: Waiting
Step #1: 8781e7725be3: Waiting
Step #1: dfd244768473: Waiting
Step #1: 0346eddd3dca: Waiting
Step #1: ffc112aa5c49: Waiting
Step #1: f3da3d46fdec: Waiting
Step #1: 8f0fdd3eaac0: Verifying Checksum
Step #1: 8f0fdd3eaac0: Download complete
Step #1: 8070157c9f99: Verifying Checksum
Step #1: 8070157c9f99: Download complete
Step #1: 0a7a0529ec26: Verifying Checksum
Step #1: 0a7a0529ec26: Download complete
Step #1: 8781e7725be3: Verifying Checksum
Step #1: 8781e7725be3: Download complete
Step #1: e8d8a2a587cb: Verifying Checksum
Step #1: e8d8a2a587cb: Download complete
Step #1: dfd244768473: Verifying Checksum
Step #1: dfd244768473: Download complete
Step #1: c42f03650681: Verifying Checksum
Step #1: c42f03650681: Download complete
Step #1: ffc112aa5c49: Verifying Checksum
Step #1: ffc112aa5c49: Download complete
Step #1: f3da3d46fdec: Verifying Checksum
Step #1: f3da3d46fdec: Download complete
Step #1: 0346eddd3dca: Verifying Checksum
Step #1: 0346eddd3dca: Download complete
Step #1: 8f0fdd3eaac0: Pull complete
Step #1: c42f03650681: Pull complete
Step #1: e8d8a2a587cb: Pull complete
Step #1: 8070157c9f99: Pull complete
Step #1: 0a7a0529ec26: Pull complete
Step #1: 8781e7725be3: Pull complete
Step #1: dfd244768473: Pull complete
Step #1: 0346eddd3dca: Pull complete
Step #1: ffc112aa5c49: Pull complete
Step #1: f3da3d46fdec: Pull complete
Step #1: Digest: sha256:2b75f5357cf5c92a8e689a153e5ee0dcc1e0366edb452256c89acd2ccc968088
Step #1: Status: Downloaded newer image for gcr.io/gcer-public/googleauthr-verse:latest
Step #1: gcr.io/gcer-public/googleauthr-verse:latest
Step #1: Error: unexpected string constant in:
Step #1: "
Step #1: ""
Step #1: Execution halted
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/gcer-public/googleauthr-verse:latest" failed: step exited with non-zero status: 1

@j450h1
Copy link
Author

j450h1 commented Feb 29, 2020

@axel-analyst

For the line below, try adding the extra forward slash (/) before workspace:

cr_buildstep_r('/workspace/searchconsole1.R',

to match with how the file is copied over from GCS:

      args = c("cp",
               "gs://python-for-bigquery_cloudbuild/searchconsole1.R",
               "/workspace/searchconsole1.R")

@axel-analyst
Copy link

axel-analyst commented Feb 29, 2020

@j450h1 thanks, I saw that and tried - still the same error:

tep #1: Digest: sha256:2b75f5357cf5c92a8e689a153e5ee0dcc1e0366edb452256c89acd2ccc968088
Step #1: Status: Downloaded newer image for gcr.io/gcer-public/googleauthr-verse:latest
Step #1: gcr.io/gcer-public/googleauthr-verse:latest
Step #1: Error: unexpected string constant in:
Step #1: "
Step #1: ""
Step #1: Execution halted
Finished Step #1
ERROR
ERROR: build step 1 "gcr.io/gcer-public/googleauthr-verse:latest" failed: step exited with non-zero status: 1

@MarkEdmondson1234
Copy link
Owner

So just to be clear, my example code works but your example fails? If so then I think the error is now within the R script it is trying to execute.

@axel-analyst
Copy link

@MarkEdmondson1234 Yes, it is. The actual problem is that the script contains
gar_auth_service('python-for-bigquery.json') for auth and it cannot be executed for some reason

@MarkEdmondson1234
Copy link
Owner

Ok so you need to download the auth file into the workspace too.

@axel-analyst
Copy link

Yes, I understood, but how the code will look like? Can I just add aditional argument or use c() or new function in both options (with steps in cr_build_yaml() and directly from gs)?

@MarkEdmondson1234
Copy link
Owner

There are a few options for downloading auth files, check out the Use Cases on the website: https://code.markedmondson.me/googleCloudRunner/articles/usecases.html#polygot-cloud-builds---integrating-r-code-with-other-languages

You could either download the auth file from Cloud Storage as well in a previous buildstep, or perhaps you would like to encrypt it. First case shown below:

build <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download R file",
      name = "gsutil",
      args = c("cp",
               "gs://python-for-bigquery_cloudbuild/python-for-bigquery.json",
               "/workspace/python-for-bigquery.json")
    ),
    cr_buildstep_r("gs://python-for-bigquery_cloudbuild/searchconsole.R",
                   name = "gcr.io/gcer-public/googleauthr-verse:latest")
  )
)

I think the original issue is solved now, so closing this thread.

@axel-analyst
Copy link

I understand that I could replace in the args, but how to import both files (json and script) in one function

@j450h1
Copy link
Author

j450h1 commented Mar 1, 2020

You should be able to download the entire workspace folder as per this pattern with the -r flag added:

gsutil cp -r gs://bucketname/folder-name local-location

I might be a off a bit with the forward slashes (always get those confused).. but something like below

build <- cr_build_yaml(
  steps = c(
    cr_buildstep(
      id = "download R file",
      name = "gsutil",
      args = c("cp -r",
               "gs://python-for-bigquery_cloudbuild/workspace",
               "/workspace/")
    ),
    cr_buildstep_r("gs://python-for-bigquery_cloudbuild/searchconsole.R",
                   name = "gcr.io/gcer-public/googleauthr-verse:latest")
  )
)

@axel-analyst
Copy link

axel-analyst commented Mar 2, 2020

@j450h1
Thanks, but '-r' does not work, I even saw a similar issue here about it if I am not mistaken.

I work now in another project in Google Cloud, my company's, where I am not an full admin. For now I faced some constraints there. cr_build_yaml() (as in example above) return an error that it cannot find a script in the bucket. But the file is there and I can access it with googleComputeEngineR package functions. What can that be?

@MarkEdmondson1234
Copy link
Owner

Perhaps your Cloud Build service email needs object.read access to the file.

@axel-analyst
Copy link

axel-analyst commented Mar 2, 2020

@MarkEdmondson1234 Issue is solved:) Packages needed to be updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants