# Amphora Bulk Upload

This Notebook uploads files in a directory to Amphora through the API.

### Requirements

- `requests`



### Dataset

Specify dataset directory and extensions of files you want to upload to Amphora.

**Note**: File extensions must be a list/tuple.

In [None]:
DATASET_DIR = '/foo/bar'
UPLOAD_FILE_EXTENSIONS = ['.pdf', ]    # Must be a list/tuple.

### Amphora

Specify Amphora specific properties. 
Reach out to Amphora admins for access token.

In [None]:
AMPHORA_ACCESS_TOKEN = ''
AMPHORA_COLLECTION = ''
AMPHORA_RESOURCE_TYPE = 'Biblio: article'
AMPHORA_RESOURCE_PUBLIC = False

#### Internal Amphora Constants

You wouldn't need to modify these.

In [None]:
AMPHORA_SERVER = 'https://amphora.asu.edu'
AMPHORA_APP_PATH = 'amphora'                 # Abs URL on Amphora server

import posixpath
from urlparse import urljoin
UPLOAD_API_PATH = urljoin(AMPHORA_SERVER, posixpath.join(AMPHORA_APP_PATH, 'rest/resource/'))

### Logging

Configure logging.

In [None]:
# Logging
import logging
logging.basicConfig()
logging.getLogger().setLevel(logging.CRITICAL)
logger = logging.getLogger('amphora_upload')
logger.setLevel(logging.DEBUG)

### Upload Dataset Files

Set a few common things before starting upload.

In [None]:
import os
import requests
import mimetypes
import re

filetype = re.compile('|'.join((re.escape(t)+'$' for t in UPLOAD_FILE_EXTENSIONS)))

session = requests.Session()
session.headers = {
    'Authorization': 'Token %s' % (AMPHORA_ACCESS_TOKEN),
}

#### Begin Upload!

Based on logging configuration, you should see upload progress real-time.

In [None]:
for (dpath, dnames, fnames) in os.walk(os.path.abspath(os.path.expanduser(DATASET_DIR))):
    if dnames:
        dnames.sort()

    if not fnames:
        continue

    for fname in fnames:
        fpath = os.path.join(dpath, fname)
        if not filetype.search(fpath):
            continue

        files = {
            'upload_file': (fpath, open(fpath, 'rb'), mimetypes.guess_type(fpath)[0]),
        }

        data = {
            'name': os.path.basename(fpath),
            'collection': AMPHORA_COLLECTION,
            'resource_type': AMPHORA_RESOURCE_TYPE,
            'public': AMPHORA_RESOURCE_PUBLIC,
        }

        try:
            response = session.post(UPLOAD_API_PATH, data=data, files=files)
            if response.status_code != 200:
                raise Exception(response.text)
        except Exception as e:
            logger.error('{}: {}'.format(fpath, e))
            continue
        else:
            upload_id = response.json()['id']
            logger.debug('{}: Uploaded (ID {})'.format(fpath, upload_id))