# How to store analysis results in BigQuery and Cloud Storage

In this notebook we use public data to perform a simplistic analysis, storing the resulting image file of the plot to Cloud Storage and derivied data as a BigQuery table.

<div class="alert alert-block alert-info">
<b>Tip:</b> See also the companion Terra Support article <a href='https://support.terra.bio/hc/en-us/articles/360051229072-Accessing-Advanced-GCP-features-in-Terra'>Accessing Advanced GCP features in Terra</a>.
</div>

# Setup

Edit the global variables in your clone of this notebook to refer to a native Google Cloud Platform project to which you have WRITE acces.
* **The destination Cloud Storage bucket should already exist. Your pet account must have WRITE access to it.**       
[**Click for step-by-step instructions to create a bucket**](https://support.terra.bio/hc/en-us/articles/360051229072#h_01ENRE43JJYSFHNDC02YGWFYWJ)     

* **The destination BigQuery dataset should already exist. Your pet account must have WRITE access to it.**       
[**Click for step-by-step instructions to create a BQ dataset**](https://support.terra.bio/hc/en-us/articles/360051229072#h_01EPCCS08S69VE4VMT0F0NNDWR)     

* Make sure to change to your own project, bucket, and dataset names. The remaining cells can be run as-is.

In [None]:
import os
import time

import pandas as pd
import plotnine
import tensorflow as tf
from plotnine import *

In [None]:
# Set a default plot size.
plotnine.options.figure_size = (10, 6)

**Note that you will need to change the variables below to your own values** (expand the tips if you need help finding the variables)

In [None]:
# CHANGE THESE VARIABLES
PROJECT_ID = "your_GCP-native_project_ID"
BUCKET = "gs://your-bucket"
BQ_DATASET = "your_BQ_dataset"

### <font color="#FF6600">(expand for tip) </font> <font color="#445555">How to find your cloud-native project-ID</font><a class="tocSkip">

When logged in with your Terra user-ID, go to billing in the GCP console at [https://console.cloud.google.com/billing](https://console.cloud.google.com/billing)     
![finding project ID screen shot](https://storage.googleapis.com/terra-featured-workspaces/QuickStart/Advanced-GCP-features_Find-Project-ID_Step1_Screen%20shot.png)

1. Select the Organization you used when creating your cloud-native project    
2. Find the Project ID at right  

### <font color="#FF6600">(expand for tip) </font> <font color="#445555">How to find your cloud-native BigQuery dataset</font><a class="tocSkip">

Go to [https://console.cloud.google.com/bigquery](https://console.cloud.google.com/bigquery)   

On the left column, select your cloud-native Project from the drop-down. You should see your BQ dataset listed:   

![Find BQ dataset Screen shiot](https://storage.googleapis.com/terra-featured-workspaces/QuickStart/Advanced-GCP-features_Find-BQ-dataset-name_Screen%20shot.png)

# Analyze public data

## Load data from BigQuery

In [None]:
df = pd.io.gbq.read_gbq(
    """
  SELECT
    *
  FROM
    `genomics-public-data.1000_genomes.sample_info`
"""
)

df.shape

In [None]:
df.head()

## Plot the data

In [None]:
p = (
    ggplot(df, aes(x="Main_Project_E_Centers", y="Total_Exome_Sequence"))
    + geom_boxplot()
    + theme_minimal()
)
p

## Save the plot to Cloud Storage

In [None]:
filename = "plot-from-terra-" + time.strftime("%Y%m%d-%H%M%S") + ".png"
with tf.io.gfile.GFile(os.path.join(BUCKET, filename), "w") as f:
    p.save(f, format="png", dpi=150)

Fix the content type on the images so that we can view them directly in the Cloud Console.

In [None]:
!gsutil -m setmeta -h 'Content-Type:image/png' $BUCKET/$filename

In [None]:
print(
    f"""
       The plot image can now be viewed at 
       https://console.cloud.google.com/storage/browser/{BUCKET[len('gs://'):]}?project={PROJECT_ID}
       """
)

## Write the dataframe to BigQuery

In [None]:
df.iloc[:, 0:10].to_gbq(
    ".".join([BQ_DATASET, "dataframe_from_terra_" + time.strftime("%Y%m%d_%H%M%S")]),
    project_id=PROJECT_ID,
)

In [None]:
print(
    f"""
       The the BigQuery table can now be viewed at
       https://console.cloud.google.com/bigquery?project={PROJECT_ID}
       """
)

# Provenance

In [None]:
import datetime

print(datetime.datetime.now())

In [None]:
!pip3 freeze

Copyright 2018 The Broad Institute, Inc., Verily Life Sciences, LLC All rights reserved.

This software may be modified and distributed under the terms of the BSD license. See the LICENSE file for details.