# Review model results - Step 1 - Identify a sample to review

In this notebook, we review phenotypes and model results to identify one or more sample ids of interest.

In a separate notebook, we'll take a closer look at the input data to the model for the identified sample(s).

# Setup

<div class="alert alert-block alert-warning">
    This notebook assumes:
    <ul>
        <li><b>Terra</b> is running custom Docker image <kbd>gcr.io/uk-biobank-sek-data/ml4h_terra:20200729_091732</kbd>.</li>
        <li><b>ml4h</b> is running custom Docker image <kbd>gcr.io/broad-ml4cvd/deeplearning:tf2-latest-gpu</kbd>.</li>
    </ul>
</div>

![Screen%20Shot%202020-06-22%20at%202.50.48%20PM.png](attachment:Screen%20Shot%202020-06-22%20at%202.50.48%20PM.png)

In [None]:
from ml4h.visualization_tools.facets import FacetsOverview, FacetsDive  # Interactive data exploration of tabular data.
import pandas as pd
import tensorflow as tf

In [None]:
%%javascript
// Display cell outputs to full height (no vertical scroll bar)
IPython.OutputArea.auto_scroll_threshold = 9999;

# Identify a sample to review

<div class="alert alert-block alert-info">
    Edit the CSV filepath below, if needed, to either a local file or one in Cloud Storage.
</div>

In [None]:
#---[ EDIT THIS VARIABLE VALUE IF YOU LIKE ]---
# TODO(paolo and team): provide CSV with phenotypes and ML results for fake samples.
MODEL_RESULTS_FILE = 'gs://uk-biobank-sek-data-us-east1/phenotypes/ml4h/ukbiobank_query_results_plus_four_fake_samples.csv'

In [None]:
sample_info = pd.read_csv(tf.io.gfile.GFile(MODEL_RESULTS_FILE))

sample_info.shape

## Facets Overview

Use this visualization to get an overview of the type and distribution of sample information available.

For detailed instructions, see [Facets Overview](https://pair-code.github.io/facets/).

In [None]:
FacetsOverview(sample_info)

## Facets Dive

Use this visualization to get an overview the distributions of values for *groups* of samples.

For detailed instructions, see [Facets Dive](https://pair-code.github.io/facets/).

**NOTE**:
* It might take a few seconds for the visualization to appear.
* If the table of contents pane is in the way of the column selector drop down, click on the button to turn the table of contents off.
* Try:
 * Binning | X-Axis: `sex_at_birth`
 * Binning | Y-Axis: `bmi`, use the 'count' drop down to increase/decrease the number of categorical bins
 * Label By: `sample_id`
 * Color By: `age_at_assesment`
 * Scatter | X-Axis: `LVM_prediction_sentinel_actual`
 * Scatter | Y-Axis: `LVM_prediction_sentinel_prediction`
 
Zoom in, click on the sample(s) of interest and you'll see a pane on the right hand side with all the data for the sample **including the sample_id** which you should use for the next step.

In [None]:
FacetsDive(sample_info)

# Provenance

In [None]:
import datetime
print(datetime.datetime.now())

In [None]:
%%bash
pip3 freeze

Questions about these particular notebooks? Reach out to Puneet Batra pbatra@broadinstitute.org, Paolo Di Achille pdiachil@broadinstitute.org, and Nicole Deflaux deflaux@verily.com.