New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish Redshift benchmark results #806
Conversation
Codecov ReportBase: 93.25% // Head: 93.30% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #806 +/- ##
==========================================
+ Coverage 93.25% 93.30% +0.04%
==========================================
Files 47 44 -3
Lines 2046 1911 -135
Branches 256 237 -19
==========================================
- Hits 1908 1783 -125
+ Misses 107 100 -7
+ Partials 31 28 -3
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@pankajkoti why did we have to use a fake dataset for redshift? Can you please add this to the description? |
@dimberman Updated the description. Please check |
@pankajkoti Let's not merge this PR until we resolve and merge - #805 |
Publish Redshift benchmark results for native and default approach.
The existing dataset failed with schema mismatch error while inserting rows.
The pandas auto-detection created schema with columns as
varchar(256)
limitingthe values to be 256 bytes long. However, some row contained a value larger than 256
bytes and then it complained with
value too long
error.Hence, we have created new fake data set and kept in S3 with data
of required various sizes for benchmarking purposes and have also updated
the
datasets.md
to provide details of this new files.blocked by: #805
closes: #748