# Treehouse Storage Management

Implement the [Treehouse Storage Management](https://docs.google.com/document/d/1otNDUQIGOY4zqPBAp4OzUnhXAmt1FHrJjqjA2jsUBrI/edit?pli=1#heading=h.ly71etsanuvd) policies with respect to local and s3 storage in archive.

In [1]:
import os
import subprocess
from datetime import datetime, timedelta
import pandas as pd
import boto3

bucket = "archive-treehouse-ucsc-edu"

### Load S3 Inventory
The archive bucket is configured to generate a daily inventory of all objects. Download the most recent one and load into a dataframe

In [2]:
s3 = boto3.client('s3')
response = s3.list_objects_v2(Bucket=bucket, Prefix="inventory/archive-treehouse-ucsc-edu/all/data/")
files = [f for f in response["Contents"] if f["Size"] > 0]  # data/ folder is one of the keys...
print(f"Found {len(files)} Inventories")
latest = sorted(files, key=lambda obj: obj["LastModified"])[-1]["Key"]
inventory = pd.read_csv("s3://{}/{}".format(bucket, latest), compression="gzip",
                        names=["bucket", "key", "version", "latest", "?", 
                               "size", "created", "etag", "class", "??", "???", "encryption"],
                        parse_dates=["created"])
print("Using Inventory Dated", sorted(files, key=lambda obj: obj["LastModified"])[-1]["LastModified"])
inventory.head()

Found 27 Inventories
Using Inventory Dated 2018-06-26 10:13:49+00:00


Unnamed: 0,bucket,key,version,latest,?,size,created,etag,class,??,???,encryption
0,archive-treehouse-ucsc-edu,compendium/pre_v4/README.txt,,True,False,232,2018-05-30 17:30:07,819dab68d80ed9b2f6648a1b51485a03,STANDARD,False,,SSE-S3
1,archive-treehouse-ucsc-edu,compendium/pre_v4/TCGA_mutations/Census_allWed...,,True,False,124659,2018-05-30 17:30:07,cd1f09273d76eea6d77e0ac730d91b30,STANDARD,False,,SSE-S3
2,archive-treehouse-ucsc-edu,compendium/pre_v4/TCGA_mutations/TCGA_Broad_mu...,,True,False,4790,2018-05-30 17:30:07,4ef5284b62a5344119acee834fc9215d,STANDARD,False,,SSE-S3
3,archive-treehouse-ucsc-edu,compendium/pre_v4/TCGA_mutations/TCGA_NonSilen...,,True,False,1158114,2018-05-30 17:30:08,9ef8a17da3ddb54a4f99b4b86853f22c,STANDARD,False,,SSE-S3
4,archive-treehouse-ucsc-edu,compendium/pre_v4/TCGA_mutations/UCSF-RNAPanel...,,True,False,2990,2018-05-30 17:30:11,fb4c7ac59552b24b41343dcf49372c38,STANDARD,False,,SSE-S3


### Validate
Make sure all our local archive files are in the inventory

In [4]:
# Path to archive within the container we are running in
root = "/treehouse/archive"
subset = "downstream"
for path, _, files in os.walk(os.path.join(root, subset)):
    for f in files:
        relative_path = os.path.relpath(path, root) + "/" + f
        if relative_path.replace(" ", "+") not in inventory["key"].values:
            print(relative_path)

downstream/TH34_1162_S01/findings/Slides.html
downstream/TH34_1162_S01/findings/Summary_manual-edit_May24_2.html
downstream/TH34_1162_S01/findings/annotations.json
downstream/TH34_1162_S01/findings/pathway.png
downstream/TH34_1162_S01/findings/tumormap.png
downstream/TH34_1163_S01/findings/Slides.html
downstream/TH34_1163_S01/findings/Summary.html
downstream/TH34_1163_S01/findings/Summary.html.original
downstream/TH34_1163_S01/findings/pathway.png
downstream/TH34_1163_S01/findings/tumormap.png
downstream/TH34_1163_S01/findings/annotations.json
downstream/TH27_1164_S02/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/summary.template
downstream/TH27_1164_S02/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/slides.template
downstream/TH27_1164_S02/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/annotations.json
downstream/TH27_1164_S02/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/0_generate-json-conf.ipynb
downstream/TH27_1164_S02/tertiary/treehouse-p

downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/summary.template
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/slides.template
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/annotations.json
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/0_generate-json-conf.ipynb
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/1_convert-tpm-hugo.ipynb
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.0_tumormap-disease-cohorts.ipynb
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.2_generate-additional-cohorts.ipynb
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.5_generate-tumormap-report.ipynb
downstream/TH27_1170_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.6_MSS-mutation-data.ipynb
downstream/TH27_1170_S01

downstream/TH27_1169_S01/findings/v11.0.0/treehouse/css/summary.template.css
downstream/TH27_1169_S01/findings/v11.0.0/treehouse/css/slides.template.css
downstream/TH27_1169_S01/findings/v11.0.0/treehouse/css/HTML-Sheets-of-Paper/sheets-of-paper.css
downstream/TH27_1169_S01/findings/v11.0.0/treehouse/css/HTML-Sheets-of-Paper/sheets-of-paper-usletter.css
downstream/TH27_1169_S01/findings/v11.0.0/treehouse/js/summary.template.js
downstream/TH27_1167_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/summary.template
downstream/TH27_1167_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/slides.template
downstream/TH27_1167_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/annotations.json
downstream/TH27_1167_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/0_generate-json-conf.ipynb
downstream/TH27_1167_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/1_convert-tpm-hugo.ipynb
downstream/TH27_1167_S01/tertiary/treehouse-protocol-11.1

downstream/TH27_1168_S01/findings/annotations.json.11.0.filledout
downstream/TH27_1168_S01/findings/annotations.json
downstream/TH27_1168_S01/findings/v11.0.0/4.1.json
downstream/TH27_1168_S01/findings/v11.0.0/2.0_tumormap-disease-cohorts.ipynb
downstream/TH27_1168_S01/findings/v11.0.0/8.5_parse-vcf.ipynb
downstream/TH27_1168_S01/findings/v11.0.0/automatedLeadsIdentified.tsv
downstream/TH27_1168_S01/findings/v11.0.0/rsem.genes.tpm.hugo.log2plus1.dedupe.tab
downstream/TH27_1168_S01/findings/v11.0.0/1.json
downstream/TH27_1168_S01/findings/v11.0.0/3_generate-thresholds.ipynb
downstream/TH27_1168_S01/findings/v11.0.0/4.0_outlier-analysis.ipynb
downstream/TH27_1168_S01/findings/v11.0.0/4.0.json
downstream/TH27_1168_S01/findings/v11.0.0/5_get-dgidb-gsea.ipynb
downstream/TH27_1168_S01/findings/v11.0.0/6.json
downstream/TH27_1168_S01/findings/v11.0.0/bam_umend_qc.json
downstream/TH27_1168_S01/findings/v11.0.0/7.json
downstream/TH27_1168_S01/findings/v11.0.0/Summary.html
downstream/TH27_1168_S

downstream/TH27_1166_S01/findings/Slides.html
downstream/TH27_1166_S01/findings/annotations.json.blank
downstream/TH27_1166_S01/findings/annotations.json.110.edited
downstream/TH27_1166_S01/findings/annotations.json
downstream/TH27_1166_S01/findings/v11.0/4.1.json
downstream/TH27_1166_S01/findings/v11.0/2.0_tumormap-disease-cohorts.ipynb
downstream/TH27_1166_S01/findings/v11.0/8.5_parse-vcf.ipynb
downstream/TH27_1166_S01/findings/v11.0/automatedLeadsIdentified.tsv
downstream/TH27_1166_S01/findings/v11.0/TH27_1166_S01_gsea_dgidb_output.xlsx
downstream/TH27_1166_S01/findings/v11.0/rsem.genes.tpm.hugo.log2plus1.dedupe.tab
downstream/TH27_1166_S01/findings/v11.0/1.json
downstream/TH27_1166_S01/findings/v11.0/3_generate-thresholds.ipynb
downstream/TH27_1166_S01/findings/v11.0/4.0_outlier-analysis.ipynb
downstream/TH27_1166_S01/findings/v11.0/4.0.json
downstream/TH27_1166_S01/findings/v11.0/5_get-dgidb-gsea.ipynb
downstream/TH27_1166_S01/findings/v11.0/6.json
downstream/TH27_1166_S01/finding

downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/summary.template
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/slides.template
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/annotations.json
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/0_generate-json-conf.ipynb
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/1_convert-tpm-hugo.ipynb
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.0_tumormap-disease-cohorts.ipynb
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.2_generate-additional-cohorts.ipynb
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.5_generate-tumormap-report.ipynb
downstream/TH06_1171_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.6_MSS-mutation-data.ipynb
downstream/TH06_1171_S01

downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/summary.template
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/slides.template
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/annotations.json
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/0_generate-json-conf.ipynb
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/1_convert-tpm-hugo.ipynb
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.0_tumormap-disease-cohorts.ipynb
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.2_generate-additional-cohorts.ipynb
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.5_generate-tumormap-report.ipynb
downstream/TH06_1173_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.6_MSS-mutation-data.ipynb
downstream/TH06_1173_S01

downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/summary.template
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/slides.template
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/annotations.json
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/0_generate-json-conf.ipynb
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/1_convert-tpm-hugo.ipynb
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.0_tumormap-disease-cohorts.ipynb
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.2_generate-additional-cohorts.ipynb
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.5_generate-tumormap-report.ipynb
downstream/TH06_1175_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.6_MSS-mutation-data.ipynb
downstream/TH06_1175_S01

downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/summary.template
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/slides.template
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/annotations.json
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/0_generate-json-conf.ipynb
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/1_convert-tpm-hugo.ipynb
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.0_tumormap-disease-cohorts.ipynb
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.2_generate-additional-cohorts.ipynb
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.5_generate-tumormap-report.ipynb
downstream/TH06_1177_S01/tertiary/treehouse-protocol-11.1.0-3c51aa2/compendium-v7/2.6_MSS-mutation-data.ipynb
downstream/TH06_1177_S01

downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.h5
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/fusion.txt
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/run_info.json
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/SJ.out.tab
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/Log.final.out
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1183_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/r

downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/Log.final.out
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/SJ.out.tab
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.tsv
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.h5
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/run_info.json
downstream/THR14_1189_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kal

downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_genes.results
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_isoforms.hugo.results
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_genes.hugo.results
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.h5
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/run_info.json
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.tsv
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/fusion.txt
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1185_S01/secondary/ucsc_cgl-rnaseq-cgl-pi

downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/FusionInspector.bed
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-gene-list-filtered.final
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/Log.final.out
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/FusionInspector.junction_reads.bam
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/FusionInspector.spanning_reads.bam
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/fusion-inspector-results.final
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-non-filtered.final
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/FusionInspector.fa
downstream/THR14_1191_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/methods.json
downstream/THR14_1191_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/mini.ann.vcf
downstream/THR14_1191_S01/secondary

downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/run_info.json
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.h5
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.tsv
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/fusion.txt
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/SJ.out.tab
downstream/THR14_1198_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/ST

downstream/THR14_1201_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/Log.final.out
downstream/THR14_1201_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/sortedByCoord.md.bam.bai
downstream/THR14_1201_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/sortedByCoord.md.bam
downstream/THR14_1201_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.json
downstream/THR14_1201_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/readDist.txt
downstream/THR14_1201_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.tsv
downstream/THR14_1201_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/methods.json
downstream/THR14_1201_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-gene-list-filtered.final
downstream/THR14_1201_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/Log.final.out
downstream/THR14_1201_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-non-filtered.final
downstream/THR14_1201_S01/secondary

downstream/THR14_1204_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.tsv
downstream/THR14_1204_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/readDist.txt
downstream/THR14_1204_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/methods.json
downstream/THR14_1204_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-gene-list-filtered.final
downstream/THR14_1204_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-non-filtered.final
downstream/THR14_1204_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/Log.final.out
downstream/THR14_1204_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/methods.json
downstream/THR14_1204_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/mini.ann.vcf
downstream/THR14_1204_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/methods.json
downstream/THR14_1212_S01/secondary/md5sum-3.7.0-ccba511/md5
downstream/THR14_1212_S01/secondary/md5sum-3.7.0-ccba511/methods.json
downstream/THR14_1212_S01/secondar

downstream/THR14_1205_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/methods.json
downstream/THR14_1197_S01/secondary/md5sum-3.7.0-ccba511/md5
downstream/THR14_1197_S01/secondary/md5sum-3.7.0-ccba511/methods.json
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/methods.json
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/run_info.json
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/fusion.txt
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.h5
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.tsv
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_isoforms.results
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_genes.results
downstream/THR14_1197_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-78

downstream/THR14_1210_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/sortedByCoord.md.bam.bai
downstream/THR14_1210_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/sortedByCoord.md.bam
downstream/THR14_1210_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.tsv
downstream/THR14_1210_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/methods.json
downstream/THR14_1210_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/Log.final.out
downstream/THR14_1210_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-non-filtered.final
downstream/THR14_1210_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/star-fusion-gene-list-filtered.final
downstream/THR14_1210_S01/secondary/ucsctreehouse-fusion-0.1.0-3faac56/methods.json
downstream/THR14_1210_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/mini.ann.vcf
downstream/THR14_1210_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/methods.json
downstream/THR14_1219_S01/secondary/md5sum-3.7.0-ccba51

downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.tsv
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_isoforms.results
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_genes.results
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_isoforms.hugo.results
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_genes.hugo.results
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1222_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1222_S01/secondary/ucsc_cgl-r

downstream/THR14_1221_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/mini.ann.vcf
downstream/THR14_1221_S01/secondary/ucsctreehouse-mini-var-call-0.0.1-1976429/methods.json
downstream/THR14_1215_S01/secondary/md5sum-3.7.0-ccba511/md5
downstream/THR14_1215_S01/secondary/md5sum-3.7.0-ccba511/methods.json
downstream/THR14_1215_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/methods.json
downstream/THR14_1215_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_isoforms.results
downstream/THR14_1215_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_genes.results
downstream/THR14_1215_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_isoforms.hugo.results
downstream/THR14_1215_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_genes.hugo.results
downstream/THR14_1215_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/Kallisto/abundance.h5
downstream/THR14_1215_S01/secondary/ucsc_cgl-rnaseq-cgl-

downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/SJ.out.tab
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/Log.final.out
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_isoforms.results
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_genes.results
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_genes.hugo.results
downstream/THR14_1233_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-

downstream/THR14_1217_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1217_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1217_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1217_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/SJ.out.tab
downstream/THR14_1217_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/Log.final.out
downstream/THR14_1217_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.json
downstream/THR14_1217_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.tsv
downstream/THR14_1217_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/sortedByCoord.md.bam.bai
downstream/THR14_1217_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/readDist.txt
downstream/THR14_1217_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/sortedByCoord.md.bam
downstream/

downstream/THR14_1227_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1227_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1227_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1227_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1227_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/SJ.out.tab
downstream/THR14_1227_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/Log.final.out
downstream/THR14_1227_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.json
downstream/THR14_1227_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/bam_umend_qc.tsv
downstream/THR14_1227_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/sortedByCoord.md.bam.bai
downstream/THR14_1227_S01/secondary/ucsctreehouse-bam-umend-qc-1.1.1-5f286d7/readDist.txt
downst

downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/rsem_genes.results
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_genes.hugo.results
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/RSEM/Hugo/rsem_isoforms.hugo.results
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.zip
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.zip
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R1_fastqc.html
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/fastQC/R2_fastqc.html
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/SJ.out.tab
downstream/THR14_1237_S01/secondary/ucsc_cgl-rnaseq-cgl-pipeline-3.3.4-785eee9/QC/STAR/Log.final.out
downstream/THR14_1237_S01/secondary/ucsctreehouse-bam-u

### Explore
Interogate the inventory

In [7]:
print("All: {} {:.3f} TB".format(inventory.shape[0], 
                                 sum(inventory["size"]) / 10**12))
print("Standard: {} {:.3f} TB".format(inventory[inventory["class"] != "GLACIER"].shape[0],
                                 sum(inventory[inventory["class"] != "GLACIER"]["size"] / 10**12)))
print("Glacier: {} {:.3f} TB".format(inventory[inventory["class"] == "GLACIER"].shape[0],
                                 sum(inventory[inventory["class"] == "GLACIER"]["size"] / 10**12)))
print("Secondary BAMs: {} {:.3f} TB".format(
    inventory[inventory.key.str.contains("downstream\/.+?\.bam")].shape[0],
    sum(inventory[inventory.key.str.contains("downstream\/.+?\.bam")]["size"]) / 10**12))

cutoff = datetime.now() - timedelta(days=180)
secondary_bams = inventory[(inventory.created < cutoff) & (inventory.key.str.contains("downstream\/.+?\.bam"))]
print("Secondary BAMs older then 180 days: {} {:.3f} TB".format(
      secondary_bams.shape[0], sum(secondary_bams["size"]) / 10**12))

All: 145209 21.500 TB
Standard: 144372 13.655 TB
Glacier: 837 7.844 TB
Secondary BAMs: 941 9.000 TB
Secondary BAMs older then 180 days: 170 1.025 TB
