Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

akagi

https://readthedocs.org/projects/akagi/badge/?version=latest

akagi

  • Free software: MIT license

Features

akagi enables you to access various data sources such as Amazon Redshift, Amazon S3 and Google Spreadsheet (more in future) from python.

Installation

Install via pip:

pip install akagi

or from source:

$ git clone https://github.com/ayemos/akagi akagi
$ cd akagi
$ python setup.py install

Setup

To use RedshiftDataSource, you need to set environment variable AKAGI_UNLOAD_BUCKET the name of the Amazon S3 bucket you like to use as intermediate storage of Redshift Unload command.

$ export AKAGI_UNLOAD_BUCKET=xyz-unload-bucket.ap-northeast-1

To use SpreadsheetDetaSource, you need to set environment variable GOOGLE_APPLICATION_CREDENTIAL to indicate your service account credentials file. You can get the credential from here.

Associated client has to have read access to the sheets.

$ export GOOGLE_APPLICATION_CREDENTIAL=$HOME/.credentials/service-1a2b.json

Example

RedshiftDataSource

from akagi.data_sources import RedshiftDataSource

ds = RedshiftDataSource('select * from (select user_id, path from logs.imp limit 10000')

for d in ds:
    print(d) # iterate on result

S3DataSource

from akagi.data_sources import S3DataSource

ds = S3DataSource.for_prefix(
        'image-data.ap-northeast-1',
        'data/image_net/zebra',
        file_format='binary')

for d in ds:
    print(d) # iterate on result

SpreadsheetDataSource

from akagi.data_sources import LocalDataSource

ds = SpreadsheetDataSource(
      '1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs74OgvE2upms',  # sample sheet provided by Google
      sheet_range='Class Data!A2:F31')

for d in ds:
    print(d) # iterate on result

LocalDataSource

from akagi.data_sources import LocalDataSource

ds = LocalDataSource(
      './PATH/TO/YOUR/DATA/DIR',
      file_format='csv')

for d in ds:
    print(d) # iterate on result

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

Codenize your datasources.

Resources

License

Packages

No packages published