Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snowflake Ingestion failing on Schemas with thousands of objects #9698

Closed
shcd-garjo3 opened this issue Jan 23, 2024 · 2 comments · Fixed by #10718
Closed

Snowflake Ingestion failing on Schemas with thousands of objects #9698

shcd-garjo3 opened this issue Jan 23, 2024 · 2 comments · Fixed by #10718
Assignees
Labels
accepted An Issue that is confirmed as a bug by the DataHub Maintainers. bug Bug report

Comments

@shcd-garjo3
Copy link

Describe the bug
Snowflake ingestion is failing for schemas with more than 10,000 views.

To Reproduce
Steps to reproduce the behavior:

  1. Ingest Snowflake schema with ~16,000 views, with debugging enabled on the UI.

Expected behavior
You will see the following error:
[2024-01-18 21:12:02,501] DEBUG {datahub.ingestion.source.snowflake.snowflake_v2:838} - Failed to get views for schema FOOBAR.BAZ due to error 090153 (22000): The result set size exceeded the max number of rows(10000) supported for SHOW statements. Use LIMIT option to limit result set to a smaller number.
Traceback (most recent call last):

This is the complete error log:

Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-snowflake-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/snowflake/snowflake_schema.py", line 307, in get_views_for_database
cur = self.query(SnowflakeQuery.show_views_for_database(db_name))
File "/tmp/datahub/ingest/venv-snowflake-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/snowflake/snowflake_utils.py", line 41, in query
resp = self.get_connection().cursor(DictCursor).execute(query)
File "/tmp/datahub/ingest/venv-snowflake-0.12.0/lib/python3.10/site-packages/snowflake/connector/cursor.py", line 1132, in execute
Error.errorhandler_wrapper(self.connection, self, error_class, errvalue)
File "/tmp/datahub/ingest/venv-snowflake-0.12.0/lib/python3.10/site-packages/snowflake/connector/errors.py", line 290, in errorhandler_wrapper
handed_over = Error.hand_to_other_handler(
File "/tmp/datahub/ingest/venv-snowflake-0.12.0/lib/python3.10/site-packages/snowflake/connector/errors.py", line 345, in hand_to_other_handler
cursor.errorhandler(connection, cursor, error_class, error_value)
File "/tmp/datahub/ingest/venv-snowflake-0.12.0/lib/python3.10/site-packages/snowflake/connector/errors.py", line 221, in default_errorhandler
raise error_class(
snowflake.connector.errors.ProgrammingError: 090153 (22000): The result set size exceeded the max number of rows(10000) supported for SHOW statements. Use LIMIT option to limit result set to a smaller number.
[2024-01-18 21:12:00,128] DEBUG {datahub.ingestion.source.snowflake.snowflake_schema:40} - Query : show views in schema "FOOBAR"."BAZ";
[2024-01-18 21:12:02,501] DEBUG {datahub.ingestion.source.snowflake.snowflake_v2:838} - Failed to get views for schema FOOBAR.BAZ due to error 090153 (22000): The result set size exceeded the max number of rows(10000) supported for SHOW statements. Use LIMIT option to limit result set to a smaller number.
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-snowflake-0.12.0/lib/python3.10/site-packages/datahub/ingestion/source/snowflake/snowflake_v2.py", line 820, in fetch_views_for_schema
for view in self.get_views_for_schema(schema_name, db_name):

Screenshots
N/A

Desktop (please complete the following information):

  • Running on EKS
  • Browser [Edge]
  • DataHub version 12.0.1

Additional context
Other schemas with fewer views are being ingested.

@shcd-garjo3 shcd-garjo3 added the bug Bug report label Jan 23, 2024
@hsheth2
Copy link
Collaborator

hsheth2 commented Jan 25, 2024

We looked into this a bit more, and it looks like the Snowflake SHOW VIEWS query supports pagination by combining the LIMIT and FROM clauses.

Implementing this would require modifying the get_views_for_database and the show_views_for_database methods.

Is this something that you'd be up for contributing?

@hsheth2 hsheth2 added the accepted An Issue that is confirmed as a bug by the DataHub Maintainers. label Jan 25, 2024
@shcd-garjo3
Copy link
Author

Hi Harshal, yes I would like to work on this. I will ping you on slack for a quick overview of what I need to do if that is okay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted An Issue that is confirmed as a bug by the DataHub Maintainers. bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants