Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error when './data' directory already exists #2624

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yrribeiro
Copy link

Previously, the script attempted to create the './data' directory without checking its existence, leading to an Errno 17 File exists error. This error would prevent the data download script from proceeding further and since no data has been downloaded to the bucket, it throws another error when BigQuery tries to load and transform data ([ERROR] An exception occurred: 404 Not found: URI gs://PROJECT_ID-bucket/data/online_retail.csv;)).

Full error traceback:

2024-05-20 13:06:22,424 [ERROR] An exception occurred: [Errno 17] File exists: './data'

 2024-05-20 13:06:22,433 [INFO] Initializing BigQuery dataset.

 2024-05-20 13:06:22,586 [WARNING] Dataset online_retail already exists, not creating.

 2024-05-20 13:06:23,247 [INFO] BQ raw dataset load job starting...

 2024-05-20 13:06:24,034 [ERROR] An exception occurred: 404 Not found: URI gs://qwiklabs-gcp-00-3f0cc69e5b28-bucket/data/online_retail.csv; reason: notFound, message: Not found: URI gs://qwiklabs-gcp-00-3f0cc69e5b28-bucket/data/online_retail.csv
Traceback (most recent call last):
  File "/home/jupyter/training-data-analyst/self-paced-labs/vertex-ai/vertex-ai-qwikstart/utils/data_download.py", line 186, in <module>
    upload_gcs2bq(args, table_schema)
  File "/home/jupyter/training-data-analyst/self-paced-labs/vertex-ai/vertex-ai-qwikstart/utils/data_download.py", line 116, in upload_gcs2bq
    destination_table = client.get_table(RAW_TABLE_ID)  # Make an API request.
  File "/opt/conda/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 1079, in get_table
    api_response = self._call_api(
  File "/opt/conda/lib/python3.9/site-packages/google/cloud/bigquery/client.py", line 827, in _call_api
    return call()
  File "/opt/conda/lib/python3.9/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func
    return retry_target(
  File "/opt/conda/lib/python3.9/site-packages/google/api_core/retry.py", line 191, in retry_target
    return target()
  File "/opt/conda/lib/python3.9/site-packages/google/cloud/_http/__init__.py", line 494, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.NotFound: 404 GET https://bigquery.googleapis.com/bigquery/v2/projects/qwiklabs-gcp-00-3f0cc69e5b28/datasets/online_retail/tables/online_retail_clv_raw?prettyPrint=false: Not found: Table qwiklabs-gcp-00-3f0cc69e5b28:online_retail.online_retail_clv_raw

Previously, the script attempted to create the './data' directory without checking its existence, leading to an Errno 17 File exists error. This error would prevent the data download script from proceeding further and since no data has been downloaded to the bucket, it throws another error when BigQuery tries to load and transform data ([ERROR] An exception occurred: 404 Not found: URI gs://PROJECT_ID-bucket/data/online_retail.csv;)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant