Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarking results for S3 to Bigquery transfer #568

Merged
merged 3 commits into from Jul 26, 2022

Conversation

utkarsharma2
Copy link
Collaborator

@utkarsharma2 utkarsharma2 commented Jul 25, 2022

Description

What is the current behavior?

Benchmarking results for the S3 to bigquery is missing

related: #429

What is the new behavior?

Added benchmarking results for S3 to Bigquery

Does this introduce a breaking change?

No

Checklist

  • Extended the README / documentation, if necessary

@codecov
Copy link

codecov bot commented Jul 25, 2022

Codecov Report

Merging #568 (2826784) into main (9c7d7cc) will decrease coverage by 0.08%.
The diff coverage is n/a.

❗ Current head 2826784 differs from pull request most recent head 9276b9a. Consider uploading reports for the commit 9276b9a to get more accurate results

@@            Coverage Diff             @@
##             main     #568      +/-   ##
==========================================
- Coverage   92.68%   92.59%   -0.09%     
==========================================
  Files          40       40              
  Lines        1613     1594      -19     
  Branches      206      205       -1     
==========================================
- Hits         1495     1476      -19     
  Misses         93       93              
  Partials       25       25              
Impacted Files Coverage Δ
src/astro/databases/snowflake.py 95.20% <0.00%> (-0.49%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c7d7cc...9276b9a. Read the comment docs.

Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the results are missing for 5 GB and 10 GB

@utkarsharma2
Copy link
Collaborator Author

Note -

  1. We didn't publish the results with all the datasets because not all the datasets were working with native as well as the default path. The ones that were working are published in this PR.
  2. The idea with this benchmarking results was to check at what file size we should use the native path VS the default path.
  3. For larger files we needed to make changes in the dataset and benchmarking script, due to limitations in pandas and Bigquey DB for that reason we ran these tests in VM and not K8 Cluster.

cc: @kaxil @tatiana @sunank200

@utkarsharma2
Copy link
Collaborator Author

Since we are seeing significant disadvantages of using the native path for smaller file sizes. Added following ticket as one possible way of dealing with the issue - #573

cc: @kaxil @tatiana

@utkarsharma2
Copy link
Collaborator Author

@tatiana - #574

@utkarsharma2 utkarsharma2 merged commit 3016452 into main Jul 26, 2022
@utkarsharma2 utkarsharma2 deleted the S3ToBigqueryBenchmarking branch July 26, 2022 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants