Bitstore for DataHub
Python Makefile
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bitstore [auth][xs]: remove protocol from sting as env includes Mar 28, 2018
tests [storage][l]: introduce space limitations - fixes #24 (#25) Mar 28, 2018
.gitignore
.travis.yml [tests][xs]: google_compute_engine for travis May 16, 2018
Dockerfile
LICENSE
MANIFEST.in
Makefile
README.md [signed urls][s]: check file exists on s3 - refs #20 (#21) Dec 8, 2017
pylama.ini
requirements.dev.txt
requirements.txt [storage][l]: introduce space limitations - fixes #24 (#25) Mar 28, 2018
server.py
setup.py Features/test with tox (#14) Jul 24, 2017
tox.ini [tests][xs]: install dev dependencies intead May 16, 2018

README.md

BitStore is a DataHub microservice for storing blobs i.e. files. It is a lightweight auth wrapper for an S3-compatible object store that integrates with the rest of the DataHub stack and especially the auth service.

Build Status

Quick Start

Clone the repo and install

make install

Run tests

make test

Run server

python server.py

Env Vars

  • AUTH_SERVER - the FQ URL of the auth server. Used for looking up the public key for communicating with the auth server from the auth server.
  • Object store: connection info for the underlying S3-style objectstore service
    STORAGE_ACCESS_KEY_ID
    STORAGE_SECRET_ACCESS_KEY
    STORAGE_BUCKET_NAME
    
  • STORAGE_PATH_PATTERN - pattern for generating the storage path in the objectstore for a given rile. That is, object_store_path = make_path(STORAGE_PATH_PATTERN.format{fileinfo}). May contain any format string available for a file in authorize API including
    • {path} (relative path to file in package)
    • {md5}.
    • {basename} which is the filename, extracted from the {path}
    • {dirname} which is the dirname, extracted from the {path}
    • {extension} which is the extension of the filename
    • {md5} (and {md5_hex} which is the md5 in hex form)
      Note: in addition to file info the owner and dataset (name) are available as {owner} and {dataset}. Examples:
    • custom/path/{owner}/{dataset}/{path} will, given {owner: datahq, name: datax, path: data/file.csv} will end up with custom/path/datahq/datax/data/file.csv
    • {md5} - storage path is md5 hash of the file (assuming md5 hash is provided)

Note: requested permissions to auth server will be like:

permissions:
  datapackage-upload
service:
  SERVICE_NAME (config defined above e.g. 'rawstore')

API

Get authorized upload URL(s)

/authorize

Method: POST

Query Parameters:

  • jwt - permission token (received from /user/authorize)

Headers:

  • Auth-Token - permission token (can be used instead of the jwt query parameter)

Body:

JSON content with the following structure:

{
    "metadata": {
        "owner": "<user-id-of-uploader>",
        "name": "<data-set-unique-id>"
    },
    "filedata": {
        "<relative-path-to-file-in-package-1>": {
            "length": 1234, #length in bytes of data
            "md5": "<md5-hash-of-the-data>",
            "type": "<content-type-of-the-data>",
            "name": "<file-name>"
        },
        "<relative-path-to-file-in-package-2>": {
            "length": 4321,
            "md5": "<md5-hash-of-the-data>",
            "type": "<content-type-of-the-data>",
            "name": "<file-name>"
        }
        ...
    }
}

owner must match the userid that is in the authentication token.

Returns

Signed urls to upload into S3:

{
  fileData: {
    "<file-name-1>": {
      "md5-hash": "...",
      "name": "<file-name>",
      "type": "<file-type>",
      "upload_query": {
        'Content-MD5': '...',
        'Content-Type': '...',
        'acl': 'public-read',
        'key': '<path>',
        'policy': '...',
        'x-amz-algorithm': 'AWS4-HMAC-SHA256',
        'x-amz-credential': '...',
        'x-amz-date': '<date-time-in-ISO',
        'x-amz-signature': '...'
      },
      "upload_url": "<s3-url>",
      "exists": true/false
    },
    "<file-name-2>": ...,
    ...
  }
}

Get information regarding the datastore

/info

Method: GET

Query Parameters:

  • jwt - permission token (received from /user/authorize)

Headers:

  • Auth-Token - permission token (can be used instead of the jwt query parameter)

Returns:

JSON content with the following structure:

{
    "prefixes": [
        "https://datastore.openspending.org/123456789",
    ]
}

prefixes is the list of possible prefixes for an uploaded file for this user.

Check and Generate S3 Presigned URL for private objects

/presign

Methos: GET

Query Parameters:

  • jwt - permission token (received from /user/authorize)
  • url - original URL for S3 object
  • ownerid - authenticated user Id

Headers:

  • Auth-Token - permission token (can be used instead of the jwt query parameter)

Returns:

Original or Pre-Signed S3 URL:

{
    "url": "https://s3.amazonaws.com/rawstore/ownername/dataset/maydata.csv?x=y",
}