BigQuery loading via GCS

vinceatbluelabs released this 15 Nov 15:27

e5dee88

Breaking changes:

None

New features:

BigQuery import from GCS buckets (#113)
Allow Parquet records format to be specified on mvrec command line (#129)
Allow environment-based configuration of GCS creds (#120)
Do slow redshift unload via SELECT when bucket unload not available (#117)
Load via INSERT on Redshift when scratch bucket not available (#114)

Bug fixes / reliability improvements:

Prefer file extension to dialect compression defaults in targets (#111)
Handle multiple fileobjs in DoMoveFromFileobjsSource (#107)
Fix README.md code sample errors (#106)
Also downcast constraints and statistics when downcasting field types (#103)
Add dependency fix for sudden BigQuery test failure (#109)
Allow Parquet to be used in import to BigQuery from records directories (#130)
Handle 'operation timed out error' during long Redshift unloads (#128)
Retry on more Google rate limit exceptions (#126)
Better error message when S3 _format_* file doesn't exist. (#121)
Improve logging for large moves (#122)
Update PyYAML dependency to match awscli (#102)

Other updates:

Fix dependencies for Homebrew processing (#135)
Rename some internally used methods (#124) (#116)
Drop dead code (#123) (#115)
Rename module for better Mypy support (#125)
Also test Redshift without S3 scratch bucket (#118)
Introduce component test suite (#108)
Bump Python version for internal development (#100)

Assets 2