# Integrative Genomics Viewer (IGV) in a Jupyter Notebook Cell

The [Integrative Genomics Viewer](http://igv.org)<sup>1</sup> (IGV) is an open source tool for visualizing and exploring genomic data. [Terra natively supports IGV](https://support.terra.bio/hc/en-us/articles/360029654831-Viewing-IGV-tracks-of-BAM-files-in-your-workspace-data) in the Workspace data tab, but it's also possible to run IGV in a Jupyter notebook using the [`igv-jupyter` extension](https://github.com/igvteam/igv-jupyter).

`igv-jupyter` wraps the embeddable JavaScript `igv-js` visualization component and runs a fully-featured IGV instance live in notebook cells. This means you won't be able to view IGV in Terra's Notebook `Preview` mode.  You'll need to run this notebook yourself in `Edit` or `Playground` mode, since the IGV instance can't be saved as notebook outputs.

**IMPORTANT**: You may need to follow the instructions in the "Prerequisites for `igv-jupyter`" section if `igv-jupyter` has not been installed in your environment. A [request](https://github.com/DataBiosphere/terra-docker/issues/271) has been filed to add it to the base `terra-docker` environment.

Run the next cell to determine if the `igv` module provided by `igv-jupyter` exists in your cloud environment.

***

<sup>1</sup>[James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011)](https://www.nature.com/articles/nbt.1754)

In [None]:
import imp
try:
    imp.find_module('igv')
    igv_found = True
except ImportError:
    igv_found = False
    
print(f'igv {"exists" if igv_found else "does NOT exist"} in your cloud environment.')

## Prerequisites for `igv-jupyter`
### Cloud environment startup scripts

If `igv-jupyter` has not been installed in your cloud environment, you can use an [environment startup script](https://support.terra.bio/hc/en-us/articles/360058193872-Using-a-startup-script-to-launch-a-pre-configured-cloud-environment) to install this extension when creating or updating your cloud environment.
#### Setup workspace globals

In [None]:
# Imports for interacting with the environment and the FireCloud API
import os
from firecloud import api as fapi

BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']

# Workspace metadata
WORKSPACE_NAMESPACE = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE_NAME = os.environ['WORKSPACE_NAME']
WORKSPACE_BUCKET = os.environ['WORKSPACE_BUCKET']

# Key-value pairs from the "Workspace Data" section of the Worksapce "Data" tab
WORKSPACE_ATTRIBUTES = fapi.get_workspace(WORKSPACE_NAMESPACE, WORKSPACE_NAME).json().get('workspace',{}).get('attributes',{})

# The full path to the provided environment startup script
IGV_ENV_STARTUP_SCRIPT_FILENAME = 'igv_env_startup_script.sh'
IGV_ENV_STARTUP_SCRIPT_PATH = os.path.join(WORKSPACE_BUCKET, IGV_ENV_STARTUP_SCRIPT_FILENAME)
print(IGV_ENV_STARTUP_SCRIPT_PATH)

#### Write the startup script to the workspace bucket

In [None]:
script_contents = '''#!/usr/bin/env bash

# Install igv to view the Integrative Genomics Viewer in a Terra notebook cell
pip install igv-jupyter --upgrade
jupyter nbextension install --py igv --user
jupyter nbextension enable --py igv
'''

with open(IGV_ENV_STARTUP_SCRIPT_FILENAME, "w") as text_file:
    text_file.write(script_contents)

!gsutil cp $IGV_ENV_STARTUP_SCRIPT_FILENAME $IGV_ENV_STARTUP_SCRIPT_PATH

#### Confirm the startup script has been written to the workspace bucket


In [None]:
!gsutil ls $IGV_ENV_STARTUP_SCRIPT_PATH

#### Update the cloud environment

Use the URI from the previous cell's output to complete the cloud environment update instructions in the [environment startup script documentation](https://support.terra.bio/hc/en-us/articles/360058193872-Using-a-startup-script-to-launch-a-pre-configured-cloud-environment).

The cloud environment will restart, installing `igv-jupyter` and configuring Jupyter for it in the process. Your existing data on the workspace [persistent disk](https://support.terra.bio/hc/en-us/articles/360047318551-Detachable-Persistent-Disks-) will be preserved during the restart. The cloud environment restart will take you out of `Edit` or `Playground` mode and you'll need to click `Edit` or `Playground mode` to proceed.

Now, `igv-jupyter` will be accessible to all notebooks in this workspace.

**NOTE**: This environment startup script apparently prevents the terminal from loading. If you require access to the terminal after running this notebook, clear the `Startup script` text box and update the environment before launching the terminal.

### Enable trusted notebooks

`igv-jupyter` runs JavaScript natively in a notebook cell. In order to allow this, the notebook must be `Trusted`. 

You must be in `Edit` or `Playground` mode to see if a notebook is `Trusted`. There is a text box in the Jupyter menu bar, toward the center of the screen. If that text box reads `Not Trusted`, click the text and then `OK` in the resulting popup to make the notebook `Trusted`.

Click `Cell > Run All` in the Jupyter menu bar to run all of the cells in this notebook and try out the IGV viewer.

# Setup

## Import libraries

In [None]:
# Base-64 encoding
import base64

# Filesystem path manipulation
from pathlib import Path

# gzip compression
import gzip

# The IGV browser library
import igv

# Pandas for DataFrame functionality
import pandas as pd

## Set up utility functions

In [None]:
# Utility routine to convert a string into a data URI usable in an IGV track configuration.
def get_data_uri(s):
    """Converts a string s into a gzipped, base64-encode data URI"""
    enc_str = base64.b64encode(gzip.compress(s.encode()))
    return 'data:application/gzip;base64,' + str(enc_str)[2:-1]

# Launch the IGV browser

First, launch a default `igv.Browser` instance to visualize the `hg38` reference genome and explore the UI. 

You could:

- Select a chromosome or other segment from the leftmost dropdown in the "IGV" bar.
- Specifiy a genomic region in the search box next to the magnifying glass icon.
- Click the "-" and "+" buttons to zoom in and out.
- Select a new reference genome from the Genome dropdown.

Some of the dropdown items spawn popups that do not render correctly in Terra, including:
- `Tracks > URL`
- `Session > URL`
- `Session > Save`

**These dropdown items will not work correctly in Terra notebook cells, and it is not recommended that you try to use them.** If you do, you may need to close the notebook and re-open it to clear them from the notebook's output.

In [None]:
b = igv.Browser({'genome': 'hg38'})
b.show()

# Load `BED` files as custom tracks

IGV allows users to upload their own track data in a variety of formats. See the [igv.js Tracks 2.0 page](https://github.com/igvteam/igv.js/wiki/Tracks-2.0) for details.

As an example, visualize some [`BED`](http://genome.ucsc.edu/FAQ/FAQformat#format1) formatted data as annotated feature tracks in IGV along with the `hg38` reference track.

In [None]:
# Sample BED file data
BED_DATA = """
chr7    127471196  127472363  Pos1  0  +
chr7    127472363  127473530  Pos2  0  +
chr7    127473530  127474697  Pos3  0  +
chr7    127474697  127475864  Pos4  0  +
chr7    127475864  127477031  Neg1  0  -
chr7    127477031  127478198  Neg2  0  -
chr7    127478198  127479365  Neg3  0  -
chr7    127479365  127480532  Pos5  0  +
chr7    127480532  127481699  Neg4  0  -
"""

TRACK_NAME = 'sample_bed_data0'

def bed_track_config(data, track_name):
    """Returns an IGV track configuration for BED file data `data` with track name `track_name`."""
    # Create a data URI from the BED file contents
    data_uri = get_data_uri(data)
    
    # Return an IGV track configuration
    return {
        'name': track_name,
        'type': 'annotation',
        'format': 'bed',
        'sourceType': 'file',
        'url': data_uri,
        'displayMode': 'EXPANDED'
    }

# Create an IGV browser instance
b = igv.Browser({'genome': 'hg38'})

# Create a IGV track configuration from the BED file at that path
config = bed_track_config(BED_DATA, TRACK_NAME)
    
# Load the track configuration into the IGV browser
b.load_track(config)

# Zoom in to a region on chromosome 1
b.search('chr7:127471196-127495720')

# Show the IGV browser instance
b.show()

# Conclusion

`igv-jupyter` provides a powerful, easy-to-use interface for working with IGV in a Jupyter notebook cell.  This notebook demonstrates just a small subset of its capabilities. 

If you're interested in learning more about what this extension can do, visit the [`igv-jupyter` GitHub repository](https://github.com/igvteam/igv-jupyter) and peruse the documentation.