# tomoDRGN interactive visualization

### Visualization functionality
This notebook is streamlined to easily and interactively explore training output data in two forms, with the primary intention of aiding users in uncovering correlations and interesting particle subsets for further analyses and structural hypothesis generation:
1. interactive 2D scatter plot per-particle, with axes and colormaps selectable from all columns from
    * input star file
    * all `tomodrgn analyze` outputs (latent, latent PCA, latent UMAP, latent kmeans clustering)
    * tomogram XYZ positions from a separate volume series star file
    * any `*.pkl` file found recursively within this notebook's directory which contains a numpy array with first axis shape matching the number of particles in the star file used for training the model being analyzed
2. interactive 3D plot per-particle in the source tomogram spatial context
    * axes defined by particle XYZ coordinates in each source tomogram
    * optional overlay of tomogram voxel data with slice view
    * particle colormaps and sub-selection tools from all options listed above for 2D scatter plot

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tomodrgn import analysis
from tomodrgn import utils
                
import plotly.offline as py
py.init_notebook_mode()

In [None]:
# Enable interactive widgets in jupyter notebook
!jupyter nbextension enable --py widgetsnbextension

# ensure all columns in df can be viewed
pd.set_option('display.max_columns', None)

### Load all data

In [None]:
# USER INPUT
# absolute path to volume series star file, must reference the same set of particles referenced by the starfile used for tomodrgn train_vae
volumeseries_star_path='tomodrgn/testing/data/10076_both_32_sim_vols.star'
# full string name of column containing unique values per tomogram in volseries star file
tomo_id_column='_rlnMicrographName'

In [None]:
df_merged = analysis.recursive_load_dataframe(volumeseries_star_path=volumeseries_star_path, tomo_id_column=tomo_id_column)

In [None]:
df_merged

### Interactive scatter plot

Interactive visualization of the latent encodings for the trained model. Each point represents a particle image of the dataset. The hover text includes the index of the image in the particle stack. 

In [None]:
widget = analysis.ipy_plot_interactive(df_merged.select_dtypes(include=np.number))
widget

## View particle distributions in tomogram context
Interactively explore particle distributions in 3D tomogram context. Benefits from optional superposition of tomogram data in slice view, and coloring or selecting particles by any numeric property in df_merged.

In [None]:
required_cols_for_tomogram_viz = ['_rlnCoordinateX',
                                  '_rlnCoordinateY',
                                  '_rlnCoordinateZ',
                                  tomo_id_column,
                                  '_UnfilteredParticleInds']
assert np.all([col in df_merged.columns for col in required_cols_for_tomogram_viz])

In [None]:
# USER INPUT

# path to folder containing (preferably deconvolved or denoised) tomograms
path_to_tomograms = '../../../data/'

# provide tomogram.mrc : tomogram.tomostar mappings 
# (mapping tomogram file name on disk (as in `tomo_list` above, typically $TOMOGRAM.mrc to the name in the input starfile under tomo_id_column header (typically $TOMOGRAM.tomostar))
tomo_star_mapping = {
    'tomogram_001.mrc': 'both.tomostar',
}

In [None]:
analysis.ipy_tomo_ptcl_viewer(path_to_tomograms=path_to_tomograms,
                              tomo_star_mapping=tomo_star_mapping,
                              tomo_id_column=tomo_id_column,
                              df_particles=df_merged)