# S3 - Accessing data in S3 quickly

The `S3` client is a wrapper over the standard AWS Python library, `boto`. It contains enhancements that are relevant for data-intensive applications:

 - Supports accessing large amounts of data quickly through parallel operations (functions with the `_many` suffix). You can download up to 20Gbps on a large EC2 instance.
 - Improved error handling.
 - Supports versioned data through `S3(run=self)` and `S3(run=Run)`.
 - User-friendly API with minimal boilerplate.
 - Convenient API for advanced featured such as range requests (downloading partial files) and object headers.
 
For instructions how to use the class, see [Loading and Storing Data](/scaling/data).

In [1]:
#meta:tag=hide
from functools import partial
from nbdoc.showdoc import ShowDoc as SD
ShowDoc = partial(SD, module_nm='metaflow')

from metaflow import S3
from metaflow.datatools.s3 import S3Object

## The `S3` client

In [2]:
ShowDoc(S3, spoofstr="tmproot='.', bucket=None, prefix=None, run=None, s3root=None", show_import=True)

In [3]:
ShowDoc(S3.close)

## Downloading data

In [4]:
ShowDoc(S3.get)

In [5]:
ShowDoc(S3.get_many)

In [6]:
ShowDoc(S3.get_recursive)

In [7]:
ShowDoc(S3.get_all)

## Listing objects

In [8]:
ShowDoc(S3.list_paths)

In [9]:
ShowDoc(S3.list_recursive)

## Uploading data

In [10]:
ShowDoc(S3.put)

In [11]:
ShowDoc(S3.put_many)

In [12]:
ShowDoc(S3.put_files)

## Querying metadata

In [13]:
ShowDoc(S3.info)

In [14]:
ShowDoc(S3.info_many)

## Handling results with `S3Object`

Most operations above return `S3Object`s that encapsulate information about S3 paths and objects.

Note that the data itself is not kept in these objects but it is stored in a temporary directory which is accessible through the properties of this object.

In [15]:
ShowDoc(S3Object, spoofstr='', skip_sections='Attributes')

In [16]:
ShowDoc(S3Object.exists)

In [17]:
ShowDoc(S3Object.downloaded)

In [18]:
ShowDoc(S3Object.url)

In [19]:
ShowDoc(S3Object.prefix)

In [20]:
ShowDoc(S3Object.key)

In [21]:
ShowDoc(S3Object.path)

In [22]:
ShowDoc(S3Object.blob)

In [23]:
ShowDoc(S3Object.text)

In [24]:
ShowDoc(S3Object.size)

In [25]:
ShowDoc(S3Object.has_info)

In [26]:
ShowDoc(S3Object.metadata)

In [27]:
ShowDoc(S3Object.content_type)

In [28]:
ShowDoc(S3Object.range_info)

In [29]:
ShowDoc(S3Object.last_modified)

## Helper Objects

These objects are simple containers that are used to pass information to `get_many`, `put_many`, and `put_files`. You may use your own objects instead of them, as long as they provide the same set of attributes.

In [30]:
#meta:tag=hide

# TODO: Document these are proper docstrings in the source code in S3.py

class S3GetObject():
    """
    Represents a chunk of an S3 object. A range query is performed to download only a subset of data,
    `object[key][offset:offset + length]`, from S3.
    
    Attributes
    ----------
    
    key : str
        Key identifying the object. Works the same way as any `key` passed to `get` or `get_many`.
    offset : int
        A byte offset in the file.
    length : int
        The number of bytes to download.
    """
    key = None
    offset = None
    length = None
    
    def __init__(self, key=None, offset=None, length=None):
        pass
    
class S3PutObject():
    """
    Defines an object with metadata to be uplaoded with `put_many` or `put_files`.
    
    Attributes
    ----------
    
    key : str
        Key identifying the object. Works the same way as `key` passed to `put` or `put_many`.
    value : str or bytes
        Object to upload. Works the same way as `obj` passed `to `put` or `put_many`.
    path : str
        Path to a local file. Works the same way as `path` passed to `put_files`.
    content_type : str
        Optional MIME type for the file.
    metadata : Dict
        A JSON-encodable dictionary of additional headers to be stored
        as metadata with the file.
    """
    key = None
    value = None
    path = None
    content_type = None
    metadata = None
    
    def __init__(self, key=None, value=None, path=None, content_type=None, metadata=None):
        pass
    
ShowDoc2 = partial(SD, module_nm='metaflow.datatools.s3')

In [31]:
ShowDoc2(S3GetObject, show_import=True)

In [32]:
ShowDoc2(S3PutObject, show_import=True)