Skip to content

BigQuery loading via GCS

Choose a tag to compare

@vinceatbluelabs vinceatbluelabs released this 15 Nov 15:27

Breaking changes:

  • None

New features:

  • BigQuery import from GCS buckets (#113)
  • Allow Parquet records format to be specified on mvrec command line (#129)
  • Allow environment-based configuration of GCS creds (#120)
  • Do slow redshift unload via SELECT when bucket unload not available (#117)
  • Load via INSERT on Redshift when scratch bucket not available (#114)

Bug fixes / reliability improvements:

  • Prefer file extension to dialect compression defaults in targets (#111)
  • Handle multiple fileobjs in DoMoveFromFileobjsSource (#107)
  • Fix README.md code sample errors (#106)
  • Also downcast constraints and statistics when downcasting field types (#103)
  • Add dependency fix for sudden BigQuery test failure (#109)
  • Allow Parquet to be used in import to BigQuery from records directories (#130)
  • Handle 'operation timed out error' during long Redshift unloads (#128)
  • Retry on more Google rate limit exceptions (#126)
  • Better error message when S3 _format_* file doesn't exist. (#121)
  • Improve logging for large moves (#122)
  • Update PyYAML dependency to match awscli (#102)

Other updates:

  • Fix dependencies for Homebrew processing (#135)
  • Rename some internally used methods (#124) (#116)
  • Drop dead code (#123) (#115)
  • Rename module for better Mypy support (#125)
  • Also test Redshift without S3 scratch bucket (#118)
  • Introduce component test suite (#108)
  • Bump Python version for internal development (#100)