# How to use Facets for interactive visualization of data

[Facets](https://pair-code.github.io/facets/) is part of Google's [People+AI Research Initiative (PAIR)](https://ai.google/pair).

Note - As an alternative to this notebook, data can be explored using the [1000 Genomes Data Explorer](https://test-data-explorer.appspot.com). For other datasets, see [if there is a Data Explorer](https://app.terra.bio/#library/datasets) for your dataset.

## Setup

First, be sure to run notebook **`Python environment setup`** in this workspace.

Then in this section we:

1. install facets
2. load the needed python packages
3. set the project id of the cloud project to bill for queries to BigQuery

### Install/update packages

In [1]:
!git clone https://github.com/PAIR-code/facets

fatal: destination path 'facets' already exists and is not an empty directory.


In [2]:
!jupyter nbextension install facets/facets-dist/ --user

Up to date: /home/jupyter-user/.local/share/jupyter/nbextensions/facets-dist/facets-jupyter.html

    To initialize this nbextension in the browser every time the notebook (or other app) loads:
    
          jupyter nbextension enable <the entry point> --user
    


### Load the dependencies


In [3]:
import base64
import sys
import os
import pandas as pd



In [4]:
sys.path.append(os.path.abspath('./facets/facets_overview/python/'))
from generic_feature_statistics_generator import GenericFeatureStatisticsGenerator

In [5]:
BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']

### Add the wrapper code.

In [6]:
class FacetsOverview(object):
  def __init__(self, data):
    # This takes the dataframe and computes all the inputs to the Facets Overview plots such as
    # - numeric variables: histogram bins, mean, min, median, max, etc..
    # - categorical variables: num unique, counts per category for bar chart, top category, etc.
    gfsg = GenericFeatureStatisticsGenerator()
    self._proto = gfsg.ProtoFromDataFrames(
        [{'name': 'data', 'table': data}])
  
  def _repr_html_(self):
    protostr = base64.b64encode(self._proto.SerializeToString()).decode("utf-8")
    HTML_TEMPLATE = """<link rel="import" href="facets/facets-dist/facets-jupyter.html" >
            <facets-overview id="overview_elem"></facets-overview>
            <script>
              document.querySelector("#overview_elem").protoInput = "{protostr}";
            </script>"""
    html = HTML_TEMPLATE.format(protostr=protostr)
    return html
  
class FacetsDive(object):
  def __init__(self, data):
    self._data = data
    self.height = 1000
    
  def _repr_html_(self):
    HTML_TEMPLATE = """<link rel="import" href="facets/facets-dist/facets-jupyter.html" >
        <facets-dive id="dive_elem" height="{height}"></facets-dive>
        <script>
          document.querySelector("#dive_elem").data = {data};
        </script>"""
    html = HTML_TEMPLATE.format(data=self._data.to_json(orient='records'), height=self.height)
    return html

# Load some public data from BigQuery

In [7]:
df = pd.io.gbq.read_gbq('''
  SELECT
    *
  FROM
    `genomics-public-data.1000_genomes.sample_info`
''',
                        project_id=BILLING_PROJECT_ID,
                        dialect='standard')

df.shape

Downloading: 100%|██████████| 3500/3500 [00:01<00:00, 1771.11rows/s]


(3500, 62)

In [8]:
df.head()

Unnamed: 0,Sample,Family_ID,Population,Population_Description,Gender,Relationship,Unexpected_Parent_Child,Non_Paternity,Siblings,Grandparents,...,In_Final_Phase_Variant_Calling,Has_Omni_Genotypes,Has_Axiom_Genotypes,Has_Affy_6_0_Genotypes,Has_Exome_LOF_Genotypes,EBV_Coverage,DNA_Source_from_Coriell,Has_Sequence_from_Blood_in_Index,Super_Population,Super_Population_Description
0,HG00144,GBR001,GBR,British in England and Scotland,female,mother,HG00155,,,,...,,,,,,,,,EUR,European
1,HG00147,GBR002a,GBR,British in England and Scotland,female,child,,,HG00146,,...,,,,,,,,,EUR,European
2,HG00153,GBR003,GBR,British in England and Scotland,female,child,,,,,...,,True,,,True,,,,EUR,European
3,HG00248,GBR004,GBR,British in England and Scotland,female,child,HG00247,,,,...,,,,,,,,,EUR,European
4,HG00377,HG00377,FIN,Finnish in Finland,female,,,,,,...,,True,,,True,,,,EUR,European


# Facets Overview

See https://ipython.org/ipython-doc/3/notebook/security.html for more detail about 'trusted' and 'untrusted' notebooks.

**If you do not see FacetsOverview**, click on the 'Not Trusted' button in the upper right hand corner of the screen and change to 'Trusted'.

In [9]:
FacetsOverview(df)

# Facets Dive

See https://ipython.org/ipython-doc/3/notebook/security.html for more detail about 'trusted' and 'untrusted' notebooks.

**If you do not see Facets Dive**, click on the 'Not Trusted' button in the upper right hand corner of the screen and change to 'Trusted'.

In [10]:
FacetsDive(df)

# Provenance

In [11]:
import datetime
print(datetime.datetime.now())

2020-01-13 21:41:00.498019


In [12]:
!pip3 freeze

google-api-core==1.15.0
google-auth-oauthlib==0.4.1
google-cloud-bigquery==1.23.1
google-cloud-core==1.1.0
ibis-framework==1.2.0
multipledispatch==0.6.0
oauthlib==3.1.0
pandas==0.25.3
pandas-gbq==0.13.0
pydata-google-auth==0.2.1
regex==2020.1.8
requests-oauthlib==1.3.0
six==1.13.0
toolz==0.10.0


Copyright 2018 The Broad Institute, Inc., Verily Life Sciences, LLC All rights reserved.

This software may be modified and distributed under the terms of the BSD license. See the LICENSE file for details.