Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Snowflake: Multithreading for performance? #314

Open
jaredx435k2d0 opened this issue Jan 31, 2023 · 5 comments
Open

Feature Request: Snowflake: Multithreading for performance? #314

jaredx435k2d0 opened this issue Jan 31, 2023 · 5 comments
Labels
triaged: yes Has been approved for future implementation

Comments

@jaredx435k2d0
Copy link

Is your feature request related to a problem? Please describe.
It seems like running generate sources sends the DESCRIBE TABLE ... statements to Snowflake sequentially one-by-one as it goes. It'd be great if this went a lot faster.

Describe the solution you'd like
Would it be reasonable to queue up all those database statements up front and run through them as the results return, so that it completes much more quickly?

Describe alternatives you've considered
Can't really think of any apart from just using it as it is and waiting much longer.

Additional context
python 3.10.9
Snowflake 7.3.1
macOS 13.2 (22D49)
output of pip freeze:
agate==1.6.3
asn1crypto==1.5.1
attrs==22.2.0
Babel==2.11.0
bump2version==1.0.1
bumpversion==0.6.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.5
commonmark==0.9.1
cryptography==36.0.2
dbt-core==1.3.2
dbt-coves==1.3.0a25
dbt-extractor==0.4.1
dbt-snowflake==1.3.0
filelock==3.9.0
future==0.18.3
hologram==0.0.15
idna==3.4
importlib-metadata==6.0.0
isodate==0.6.1
jaraco.classes==3.2.3
Jinja2==3.1.2
jsonschema==3.2.0
keyring==23.13.1
leather==0.3.4
Logbook==1.5.3
luddite==1.0.2
MarkupSafe==2.1.2
mashumaro==3.0.4
minimal-snowplow-tracker==0.0.2
more-itertools==9.0.0
msgpack==1.0.4
networkx==2.8.8
oscrypto==1.3.0
packaging==21.3
parsedatetime==2.4
pathspec==0.9.0
pretty-errors==1.2.25
prompt-toolkit==3.0.36
pycparser==2.21
pycryptodomex==3.17
pydantic==1.10.4
pyfiglet==0.8.post1
Pygments==2.14.0
PyJWT==2.6.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
python-slugify==7.0.0
pytimeparse==1.1.8
pytz==2022.7.1
PyYAML==6.0
questionary==1.10.0
requests==2.28.2
rich==12.6.0
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.7
six==1.16.0
snowflake-connector-python==2.7.12
sqlparse==0.4.3
text-unidecode==1.3
typing_extensions==4.4.0
urllib3==1.26.14
wcwidth==0.2.6
Werkzeug==2.2.2
yamlloader==1.2.2
zipp==3.12.0

@github-actions github-actions bot added the triaged: no Hasn't been approved for future implementation label Jan 31, 2023
@jaredx435k2d0
Copy link
Author

@BAntonellini
Hey, Bruno. Just wanted to bump here to get thoughts

@jaredx435k2d0
Copy link
Author

dbt-osmosis recently implemented something like this and it helped performance immensely.

Fivetran's Salesforce schema alone, for example, has 776 tables. I have a few other large schemas.

Running dbt-osmosis on multiple schemas / DBs becomes extremely slow (hours).

@BAntonellini
Copy link
Collaborator

Hey @jaredx435k2d0

We are aware this would be a good addition to dbt-coves, as it would be beneficial for use-cases like yours.

If you feel like contributing, follow our CONTRIBUTING guide and we will review it.

@jaredx435k2d0
Copy link
Author

Good to have it acknowledged.

If I had the skills, I'd absolutely do this myself.

I'm learning, so maybe if it's not done in a few months I'll take a stab at it.

@noel
Copy link
Contributor

noel commented May 2, 2023

It is a good idea, we just have a lot on our plate and need to prioritize. I dont know many people running this against 776 tables 😱

We can prioritize with some $ :)

We also help our customers out, so consider Datacoves.com

@BAntonellini BAntonellini added triaged: yes Has been approved for future implementation and removed triaged: no Hasn't been approved for future implementation labels Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged: yes Has been approved for future implementation
Projects
None yet
Development

No branches or pull requests

3 participants