 # Bucket List
 
The goal here is to generate a Terra data table from files that a user has already uploaded into a workspace bucket. This can already be done on a local UNIX/UNIX-like machine using shell scripts, but that isn't ideal for certain BYOD scenarios. We need a way to do it programmatically (as there might be hundreds of files) and on the cloud (as the files might be coming from Windows, or the user doesn't know how to run a shell script).

This notebook was complied by Ash O'Farrell at UCSC and borrows heavily from code by Brian Hannafious at UCSC.

## Do imports (must restart kernal after this code block but before running the other code blocks)

In [None]:
import io
import os
from datetime import datetime
import json

from firecloud import fiss
from firecloud.errors import FireCloudServerError
import firecloud.api as fapi
import numpy as np
import pandas as pd
# import pysnooper

## User-set variables

In [None]:
# Don't forgot the quotation marks!
BILLING_PROJECT_ID="biodata-catalyst"
WORKSPACE="TSV-AFY"
SUBDIRECTORY="\/case1\/"
TABLE_NAME="CRAMs"

Make sure to escape the slashes in your SUBDIRECTORY variable. For instance, if your files are in a folder called "testfiles" you will need to enter it as "/\testfiles/\" or else Python will get angry.

## Set other variables and check FireCloud API

In [None]:
try:
    bucket = os.environ["WORKSPACE_BUCKET"]
    response = fapi.list_entity_types(BILLING_PROJECT_ID, WORKSPACE)
    if response.status_code != 200:
        print("Error in Firecloud, check your billing project ID and the name of your workspace.")
    else:
        print("Firecloud has found your workspace!")
        directory = bucket + SUBDIRECTORY
except NameError:
    print("Caught a NameError exception. This probably means you didn't restart the kernal after"
          " running the first block of code (the one with all the imports). Run it again, restart"
          " the kernal, then try running every block of code (including the import one) again.")

## Display the contents of your workspace bucket (optional, you may want to skip this if you're dealing with lots of files)

In [None]:
!gsutil ls $directory

## Do magic to create a TSV file

In [None]:
# Append contents.txt with ls
!gsutil ls $directory > contentlocations.txt
# Append each line with their file names + full address of where the files live in your google bucket
!cat contentlocations.txt | sed 's@.*/@@' > filenames.txt
!paste filenames.txt contentlocations.txt > combined.txt
# Set up header that Terra requires for data tables
#headerstring = "entity:" + TABLE_NAME + "_id\tfile_location" #tab somehow gets converted to a space??
!touch temp.txt
!echo "entity:$TABLE_NAME""_id\tfile_location" >> temp.txt
!cat temp.txt combined.txt > final.tsv
# Clean up your directory
!rm filenames.txt contentlocations.txt temp.txt

## Inspect TSV file (optional, you may want to skip this if you're dealing with lots of files)

In [None]:
!cat final.tsv

## Upload TSV file as a Terra data table

In [None]:
response = fapi.upload_entities_tsv(BILLING_PROJECT_ID, WORKSPACE, "final.tsv", "flexible")
fapi._check_response_code(response, 200)

## DEBUG: Download TSV so I can try importing it via the GUI

In [None]:
!gsutil cp final.tsv $bucket