## Create a BDBag with `participant.tsv` and `sample.tsv` in its data directory, upload to S3, and return signed URL

In [60]:
import os
import bagit
from shutil import rmtree, copy
from zipfile import ZipFile, ZIP_DEFLATED

In order to upload data to a FireCloud workspace we need two files, named `participant.tsv` and `sample.tsv`. The first holds a unique list of donor UUIDs, and the second a the same donor UUIDs with their corresponding specimen UUIDs. The values in these two columns of the table are the composite primary key. The third column holds cell-suspension UUIDs, followed by other columns which I copied from the manifest downloaded from the Explorer of the HCA browser.

In [61]:
bag_name = 'hca_manifest'  # the name of the BDBag
if os.path.isdir(bag_name):
    rmtree(bag_name)
os.mkdir(bag_name)
bag = bagit.make_bag(bag_name, {'info': 'some info'})  # use bagit module to create a bag instance
# Create list of files to be copied into the data directory of the bag.
files = list(filter(lambda x: x.endswith('.tsv'), os.listdir()))
for file in files:
    copy(file, os.path.join(bag_name, 'data'))
assert ['participant.tsv', 'sample.tsv'] == os.listdir(os.path.join(bag_name, 'data'))

In [62]:
bag.save(manifests=True)  # save bag
assert bag.is_valid()

Method to create a zip-file from a directory.
(See: https://stackoverflow.com/questions/1855095/how-to-create-a-zip-archive-of-a-directory)

In [63]:
def zipdir(path, ziph):
    # ziph is zipfile handle
    for root, dirs, files in os.walk(path):
        for file in files:
            ziph.write(os.path.join(root, file))

In [64]:
zipf = ZipFile('manifest_bag.zip', 'w', ZIP_DEFLATED)
zipdir(bag_name, zipf)
assert ZipFile.testzip(zipf) == None  # if true, arc contains no "bad" files
zipf.close()