Skip to content

Commit

Permalink
Merge pull request #1 from Nordstrom/boto
Browse files Browse the repository at this point in the history
0.2.0
  • Loading branch information
nickbuker committed Jan 2, 2019
2 parents 6da6c43 + fdcc6a4 commit ff77eeb
Show file tree
Hide file tree
Showing 7 changed files with 196 additions and 68 deletions.
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Changelog
All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.0] - 2019-01-02
### Added
- Informative error message for the env_var argument in the `redshift_execute()` and `redshift_get_conn()` functions
- A `boto_get_creds()` function was created to allow the user to obtain credentials strings for purposes such as `COPY` and `UNLOAD` SQL statements
- Version documentation in the `CHANGELOG.md` file
### Changed
- The `create_session()` function was renamed to `boto_create_session()` to highlight that the session object can be used to access a wide variety of AWS tools

## [0.1.1] - 2018-12-14
### Fixed
- Docstrings and `README.md` documentation for S3 functions

## [0.1] - 2018-12-11
- Initial release
113 changes: 104 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,22 @@ Nordata is a small collection of utility functions for accessing AWS S3 and AWS
- [Deleting a list of files in S3](#s3-delete-list)
- [Deleting files matching a pattern in S3](#s3-delete-pattern)
- [Deleting all files in a directory in S3](#s3-delete-all)
- [Creating a boto3 session object (experienced users)](#boto-session)
- [Creating a bucket object (experienced users)](#get-bucket)

Boto3 (experienced users):

- [Importing boto3 functions](#boto-import)
- [Getting boto3 credentials](#boto-creds)
- [Creating a boto3 session object](#boto-session)

Transferring data between Redshift and S3:

- [Transferring data from Redshift to S3](#redshift-unload)
- [Transferring data from S3 to Redshift](#redshift-copy)

### Testing:
- [Testing Nordata](#nordata-testing)

- [Testing Nordata](#nordata-testing)

<a name="pip-installing-nordata"></a>
## Installing Nordata:
Expand Down Expand Up @@ -300,26 +310,111 @@ Deleting all files in a directory in S3:
resp = s3_delete(bucket='my_bucket', s3_filepath='tmp/*')
```

<a name="get-bucket"></a>
Creating a bucket object that can be manipulated directly by experienced users:

```python
bucket = s3_get_bucket(
bucket='my_bucket',
profile_name='default',
region_name='us-west-2')
```

### Boto3:
<a name="boto-import"></a>
Importing boto3 functions:

```python
from nordata import boto_get_creds, boto_create_session
```

<a name="boto-creds"></a>
Retrieves Boto3 credentials as a string for use in `COPY` and `UNLOAD` SQL statetments:

```python
creds = boto_get_creds(
profile_name='default',
region_name='us-west-2',
session=None)
```

<a name="boto-session"></a>
Creating a boto3 session object that can be manipulated directly by experienced users:

```python
session = create_session(profile_name='default', region_name='us-west-2')
session = boto_create_session(profile_name='default', region_name='us-west-2')
```

<a name="get-bucket"></a>
Creating a bucket object that can be manipulated directly by experienced users:
### Transferring data between Redshift and S3:

<a name="redshift-unload"></a>
Transferring data from Redshift to S3 using an `UNLOAD` statement (see [Redshift UNLOAD documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) for more information):
```python
bucket = s3_get_bucket(
bucket='my_bucket',

from nordata import boto_get_creds, redshift_execute_sql


creds = boto_get_creds(
profile_name='default',
region_name='us-west-2')
region_name='us-west-2',
session=None)

sql = f'''
unload (
'select
col1
,col2
from
my_schema.my_table'
)
to
's3://mybucket/unload/my_table/'
credentials
'{creds}'
parallel off header gzip allowoverwrite;
'''

redshift_execute_sql(
sql=sql,
env_var='REDSHIFT_CREDS',
return_data=False,
return_dict=False)
```

<a name="redshift-copy"></a>
Transferring data from S3 to Redshift using a `COPY` statement (see [Redshift COPY documentation](https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) for more information):
```python

from nordata import boto_get_creds, redshift_execute_sql


creds = boto_get_creds(
profile_name='default',
region_name='us-west-2',
session=None)

sql = f'''
copy
my_schema.my_table
from
's3://mybucket/unload/my_table/'
credentials
'{creds}'
ignoreheader 1 gzip;
'''

redshift_execute_sql(
sql=sql,
env_var='REDSHIFT_CREDS',
return_data=False,
return_dict=False)
```

<a name="nordata-testing"></a>
## Testing:
For those interested in contributing to Nordata or forking and editing the project, pytest is the testing framework used. To run the tests, create a virtual environment, install the `dev-requirements.txt`, and run the following command from the root directory of the project. The testing scripts can be found in the test/ directory.
For those interested in contributing to Nordata or forking and editing the project, pytest is the testing framework used. To run the tests, create a virtual environment, install the contents of `dev-requirements.txt`, and run the following command from the root directory of the project. The testing scripts can be found in the `test/` directory.

```bash
$ pytest
Expand Down
8 changes: 5 additions & 3 deletions nordata/__init__.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
'''Convenience wrappers for connecting to AWS S3 and Redshift'''

__version__ = '0.1.1'
__version__ = '0.2.0'


# Boto3 function
from ._boto import boto_get_creds
from ._boto import boto_create_session
# Redshift functions
from ._redshift import redshift_get_conn
from ._redshift import read_sql
from ._redshift import redshift_execute_sql
# S3 functions
from ._s3 import create_session
from ._s3 import s3_get_bucket
from ._s3 import s3_download
from ._s3 import s3_upload
from ._s3 import s3_delete


__all__ = ['_redshift', '_s3']
__all__ = ['_boto', '_redshift', '_s3']
57 changes: 57 additions & 0 deletions nordata/_boto.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import boto3


def boto_create_session(profile_name='default', region_name='us-west-2'):
""" Instantiates and returns a boto3 session object
Parameters
----------
profile_name : str
profile name under which credentials are stored (default 'default' unless organization specific)
region_name : str
name of AWS regions (default 'us-west-2')
Returns
-------
boto3 session object
Example use
-----------
session = create_session(profile_name='default', region_name='us-west-2')
"""
return boto3.session.Session(profile_name=profile_name, region_name=region_name)


def boto_get_creds(
profile_name='default',
region_name='us-west-2',
session=None):
""" Generates and returns an S3 credential string
Parameters
----------
profile_name : str
profile name under which credentials are stores (default 'default' unless organization specific)
region_name : str
name of AWS regions (default 'us-west-2')
session : boto3 session object or None
you can optionally provide a boto3 session object or the function can instantiate a new one if None
Returns
-------
str
credentials for accessing S3
Example use
-----------
creds = boto_get_creds(
profile_name='default',
region_name='us-west-2',
session=None)
"""
if session is None:
session = boto_create_session(profile_name=profile_name, region_name=region_name)
access_key = session.get_credentials().access_key
secret_key = session.get_credentials().secret_key
token = session.get_credentials().token
return f'''aws_access_key_id={access_key};aws_secret_access_key={secret_key};token={token}'''
52 changes: 2 additions & 50 deletions nordata/_s3.py
Original file line number Diff line number Diff line change
@@ -1,60 +1,12 @@
import os
import glob
import boto3
from ._boto import boto_create_session
from botocore.exceptions import ClientError
from boto3.exceptions import S3UploadFailedError
from boto3.s3.transfer import TransferConfig


def create_session(profile_name='default', region_name='us-west-2'):
""" Instantiates and returns a boto3 session object
Parameters
----------
profile_name : str
profile name under which credentials are stored (default 'default' unless organization specific)
region_name : str
name of AWS regions (default 'us-west-2')
Returns
-------
boto3 session object
Example use
-----------
session = create_session(profile_name='default', region_name='us-west-2')
"""
return boto3.session.Session(profile_name=profile_name, region_name=region_name)


def _s3_get_creds(
profile_name='default',
region_name='us-west-2',
session=None):
""" Generates and returns an S3 credential string
Parameters
----------
profile_name : str
profile name under which credentials are stores (default 'default' unless organization specific)
region_name : str
name of AWS regions (default 'us-west-2')
session : boto3 session object or None
you can optionally provide a boto3 session object or the function can instantiate a new one if None
Returns
-------
str
credentials for accessing S3
"""
if session is None:
session = create_session(profile_name=profile_name, region_name=region_name)
access_key = session.get_credentials().access_key,
secret_key = session.get_credentials().secret_key,
token = session.get_credentials().token,
return f'''aws_access_key_id={access_key};aws_secret_access_key={secret_key};token={token}'''


def s3_get_bucket(
bucket,
profile_name='default',
Expand All @@ -81,7 +33,7 @@ def s3_get_bucket(
profile_name='default',
region_name='us-west-2')
"""
session = create_session(profile_name=profile_name, region_name=region_name)
session = boto_create_session(profile_name=profile_name, region_name=region_name)
s3 = session.resource('s3')
my_bucket = s3.Bucket(bucket)
try:
Expand Down
8 changes: 8 additions & 0 deletions test/test_boto.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# TODO: write boto_get_creds() tests (types, errors, and str contents)
import boto3
from ..nordata import _boto as bt


def test_boto_create_session_type():
# test whether _create_session() returns the proper type
assert isinstance(bt.boto_create_session(), boto3.session.Session)
6 changes: 0 additions & 6 deletions test/test_s3.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,7 @@
import pytest
import boto3
from ..nordata import _s3 as s3


def test_create_session_type():
# test whether _create_session() returns the proper type
assert isinstance(s3.create_session(), boto3.session.Session)


download_upload_TypeError_args = [
(1, 'foo'),
('foo', 1),
Expand Down

0 comments on commit ff77eeb

Please sign in to comment.