Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OFFSET SQL function yields 400 Array index 1 is out of bounds while extracting info from hacker_news dataset #2434

Open
MrCsabaToth opened this issue Sep 10, 2023 · 1 comment · May be fixed by #2433

Comments

@MrCsabaToth
Copy link

Many SQL sections in various notebooks where the instructions explore the information in the dataset uses OFFSET(1) while trying to extract the domain name stem as the source. Three labs are mentioned in the #2432 issue (with their name and Cloud Skills Boost URL) along with their notebook, but there are many more notebooks. Example query cell:

%%bigquery --project $PROJECT

SELECT
    ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.'))[OFFSET(1)] AS source,
    COUNT(title) AS num_articles
FROM
    `bigquery-public-data.hacker_news.full`
WHERE
    REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.com$')
    AND LENGTH(title) > 10
GROUP BY
    source
ORDER BY num_articles DESC
  LIMIT 100

Resulting error:

ERROR:
 400 Array index 1 is out of bounds (overflow)

Location: US
Job ID: 389a7292-2c3b-4f14-8129-af10d4270423

A workaround is to use SAFE_OFFSET instead of OFFSET. A few other notebooks use that, and all notebooks use that in the https://github.com/GoogleCloudPlatform/asl-ml-immersion/ repo. I'll amend the PR#2433 with this.

@MrCsabaToth MrCsabaToth changed the title OFFSET SQL function errors out while extracting info from bigquery-public-data.hacker_news.full dataset OFFSET SQL function yields 400 array index 1 is out of bounds while extracting info from hacker_news dataset Sep 10, 2023
@MrCsabaToth MrCsabaToth changed the title OFFSET SQL function yields 400 array index 1 is out of bounds while extracting info from hacker_news dataset OFFSET SQL function yields 400 Array index 1 is out of bounds while extracting info from hacker_news dataset Sep 10, 2023
MrCsabaToth added a commit to MrCsabaToth/training-data-analyst that referenced this issue Sep 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant