Skip to content

amannm/lake-driver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lake-driver

environment setup

  • make sure you've already used AWS CLI's configure command to add credentials to whatever environment you dev in
  • in the test folder, replace all strings of "build.cauldron.tools" with an existing s3 bucket id of something your previously configured AWS credentials actually have read/write access to
  • use the LakeDriver.getConnection(...) methods to create JDBC connections
    • pass a list of TableSpecification defining all "external tables" your query needs to be a valid reference
    • (optional) specify one of the following Scan classes to configure behavior
      • LakeS3GetScan uses GetObject, full tables are downloaded, both projection and filtering are performed in memory
      • LakeS3SelectScan Uses SelectObjectContent, only the required projected columns are downloaded, filtering is done in memory
      • LakeS3SelectWhereScan (default) uses SelectObjectContent, both projection and filtering is done on AWS, the results are downloaded, any remaining untranslated filters are applied in memory

todo

  • improve WHERE push-down
  • performance profiling, optimization
  • smarter, more comprehensive testing
  • mixed scan mode: some table scans are better GET, others SELECT
  • integrate and test the parquet compression support and save cash
  • get rid of AmazonS3URI.java dependency
  • figure out a way to get S3 Select working on AWS SDK v2

About

cheaply runs SQL queries on S3 flat-file assets

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages