Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarking result for Bigquery 10gb datasets #707

Merged
merged 4 commits into from
Aug 22, 2022

Conversation

utkarsharma2
Copy link
Collaborator

Description

What is the current behavior?

When running benchmarking for 10GB files we faced issues in loading and running benchmarking.

closes: #702

What is the new behavior?

Added 10GB dataset benchmarking numbers. Added a new dataset that works with big queries.

Does this introduce a breaking change?

Nope

Checklist

  • Extended the README/documentation, if necessary

@codecov
Copy link

codecov bot commented Aug 21, 2022

Codecov Report

Merging #707 (10060d9) into main (251e0dc) will not change coverage.
The diff coverage is n/a.

❗ Current head 10060d9 differs from pull request most recent head 3a2166e. Consider uploading reports for the commit 3a2166e to get more accurate results

@@           Coverage Diff           @@
##             main     #707   +/-   ##
=======================================
  Coverage   93.45%   93.45%           
=======================================
  Files          42       42           
  Lines        1726     1726           
  Branches      213      213           
=======================================
  Hits         1613     1613           
  Misses         91       91           
  Partials       22       22           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

| imdb/title_ratings_10mb.csv | 10MB | 19.40 |
| stackoverflow/stackoverflow_posts_1g.ndjson | 1GB | 30.26 |
| trimmed/pypi/* | 5GB | 59.90 |
| gs://astro-sdk/benchmark/trimmed/stackoverflow/10gb/ - 10 Files 1gb each | 10GB |1.94min|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a new dataset? May be we should add it to the list in Dataset.md

Copy link
Collaborator

@tatiana tatiana Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a new dataset..?

Why didn't the previous dataset work? Should we still address the root cause?

Once we have more understanding of these two questions, we may want to re-run the benchmark for the other databases to use this new dataset, so we can have a fair comparison.

Copy link
Collaborator Author

@utkarsharma2 utkarsharma2 Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tatiana

  1. invalid rows with null values
  2. There was a type mismatch as well

the issue was mostly related to data cleaning and that's why I choose a different dataset that worked for me in past.

@utkarsharma2 utkarsharma2 merged commit 577095d into main Aug 22, 2022
@utkarsharma2 utkarsharma2 deleted the Biguery10GBBenchmark branch August 22, 2022 10:49
@kaxil kaxil added this to the 1.0.1 milestone Aug 22, 2022
kaxil pushed a commit that referenced this pull request Aug 23, 2022
# Description
## What is the current behavior?
When running benchmarking for 10GB files we faced issues in loading and running benchmarking.

closes: #702

## What is the new behavior?
Added 10GB dataset benchmarking numbers. Added a new dataset that works with big queries.

## Does this introduce a breaking change?
Nope

### Checklist
- [X] Extended the README/documentation, if necessary

(cherry picked from commit 577095d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix Biguery native load for 10GB benchmark dataset
4 participants