# Leonardemo (demonardo?)

### Hey, look at this notebook running Python!

In [None]:
print "Hello, Jupyter!"
print 1+1
print " ".join(map( str.upper, ["i", "said", "hello,", "jupyter"]))

### Connecting to FireCloud

Let's import the FireCloud Python API. Shout out to the folks in CGA who maintain it!

In [None]:
import firecloud.api as fc

Is FC even up? This will be a short demo if it isn't:

In [None]:
health = fc.health()
print health.status_code, health.text

Let's list the names and namespaces of the first ten workspaces we can see:

In [None]:
workspaces = fc.list_workspaces().json()
map( lambda ws: ws["workspace"]["name"] + "/" + ws["workspace"]["namespace"], workspaces)[:10]

Yank our demo workspace:

In [None]:
demo_ws = fc.get_workspace("broad-dsde-firecloud-billing", "Notebooks-Demo").json()
print demo_ws["workspace"]["attributes"]["description"]

### Manipulating workspace data

In [None]:
fc.list_entity_types("broad-dsde-firecloud-billing", "Notebooks-Demo").json().keys()

Get the list of participants and all their attributes from the workspace.

In [None]:
participants = fc.get_entities("broad-dsde-firecloud-billing", "Notebooks-Demo", "participant").json()
print len(participants), "participants in this workspace"
participants

Time to get graphy. First, import pandas, and turn the returned JSON into a pandas DataFrame.

In [None]:
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
def entity_to_row(entity):
    attrs = { "name": entity["name"]}
    attrs.update(entity["attributes"])
    attrs["age"] = int(attrs["age"])
    return attrs

cleaned_ents = map( entity_to_row, participants )

parts_df = pd.DataFrame.from_dict(cleaned_ents).set_index("name")
parts_df.head(10)

What does our distribution of participant ages look like?

In [None]:
parts_df.hist(column='age')

### Doing things with VCFs

Grab a gs:// path to a VCF from the participant set in the workspace

In [None]:
pset = fc.get_entity("broad-dsde-firecloud-billing", "Notebooks-Demo", "participant_set", "T2D_Cohort").json()
pset["attributes"]["snps_indels_svs_vcf"]

Use the Google Cloud python library to download it to disk

In [None]:
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('fc-4236f90d-9fdc-4772-a3b8-f218d000b002')
blob = bucket.get_blob('participants_small.vcf')
blob.download_to_filename("participants_small.vcf")

You can call out to bash from notebooks. Here we grep to get the sample names.

In [None]:
! grep -m1 "#CHROM" participants_small.vcf | cut -f 10- | xargs -n 1

### HAIL support

In [None]:
from hail import *
hc = HailContext(sc)

bucket = 'gs://fc-4236f90d-9fdc-4772-a3b8-f218d000b002'

vcf = hc.import_vcf(bucket + '/participants_small.vcf')
vcf.write(bucket + '/participants_small.vds')

In [None]:
vds = hc.read(bucket + '/participants_small.vds').split_multi().sample_qc().variant_qc()
vds.export_variants(bucket + '/variantqc.tsv', 'Variant = v, va.qc.*')
vds.write(bucket + '/participants_small.qc.vds')

In [None]:
print 'count:'
print vds.count()
print 'summary report:'
print vds.summarize().report()
print 'sample annotation schema:'
print vds.sample_schema
print '\nvariant annotation schema:'
print vds.variant_schema