# Getting Started with Data Augmentation

## Before you start!

- This [notebook](getting-started-with-grooming-segmentations.ipynb) assumes that shapeworks conda environment has been activated using `conda activate shapeworks` on the terminal.
- See [Setting Up ShapeWorks Environment](setting-up-shapeworks-environment.ipynb) to learn how to set up your environment to start using shapeworks library. Please note, the prerequisite steps will use the same code to setup the environment for this notebook and import `shapeworks` library.
- See [Getting Started with Segmentations](getting-started-with-segmentations.ipynb) to learn how to load and visualize binary segmentations.


## In this notebook, you will learn:

1. How to generate realistic synthetic data from an existing dataset using different parametric distributions.
2. How to visualize the statistical distribution of the generated data compared to the original data.


## Prerequisites

- Setting up `shapeworks` environment. See [Setting Up ShapeWorks Environment](setting-up-shapeworks-environment.ipynb). To avoid code clutter, the `setup_shapeworks_env` function can found in `Examples/Python/setupenv.py` module.
- Importing `shapeworks` library. See [Setting Up ShapeWorks Environment](setting-up-shapeworks-environment.ipynb).
- Helper functions for segmentations. See [Getting Started with Segmentations](getting-started-with-segmentations.ipynb) and [Getting Started with Exploring Segmentations](getting-started-with-exploring-segmentations.ipynb).
- Helper functions for meshes. See [Getting Started with Meshes](getting-started-with-meshes.ipynb).
- Helper functions for visualization. See [Getting Started with Segmentations](getting-started-with-segmentations.ipynb), [Getting Started with Meshes](getting-started-with-meshes.ipynb), and [Getting Started with Exploring Segmentations](getting-started-with-exploring-segmentations.ipynb).
- Defining your dataset location. See [Getting Started with Exploring Segmentations](getting-started-with-exploring-segmentations.ipynb).
- Loading your dataset. See [Getting Started with Exploring Segmentations](getting-started-with-exploring-segmentations.ipynb).
- Defining parameters for `pyvista` plotter. See [Getting Started with Exploring Segmentations](getting-started-with-exploring-segmentations.ipynb).

## Note about `shapeworks` APIs

shapeworks functions are inplace, i.e., `<swObject>.<function>()` applies that function to the `swObject` data. To keep the original data unchanged, you have first to copy it to another variable before applying the function.

## Notebook keyboard shortcuts

- `Esc + H`: displays a complete list of keyboard shortcuts
- `Esc + A`: insert new cell above the current cell
- `Esc + B`: insert new cell below the current cell
- `Esc + D + D`: delete current cell
- `Esc + Z`: undo
- `Shift + enter`: run current cell and move to next
- To show a function's argument list (i.e., signature), use `(` then `shift-tab`
- Use `shift-tab-tab` to show more help for a function
- To show the help of a function, use `help(function)` or `function?`
- To show all functions supported by an object, use `dot-tab` after the variable name

## Prerequisites

### Setting up `shapeworks` environment 

Here, we will append both your `PYTHONPATH` and your system `PATH` to setup shapeworks environment for this notebook. See [Setting Up ShapeWorks Environment](setting-up-shapeworks-environment.ipynb) for more details.

In this notebook, we assume the following.

- This notebook is located in `Examples/Python/notebooks/tutorials`
- You have built shapeworks from source in `build` directory within the shapeworks code directory
- You have built shapeworks dependencies (using `build_dependencies.sh`) in the same parent directory of shapeworks code

**Note:** If you run from a ShapeWorks installation, you don't need to set the dependencies path and the `shapeworks_bin_dir` would be set as `../../../../bin`.

In [None]:
# import relevant libraries 
import sys 
import os

# add parent-parent directory (where setupenv.py is) to python path
sys.path.insert(0,'../..')

# importing setupenv from Examples/Python
import setupenv

# indicate the bin directories for shapeworks and its dependencies
shapeworks_bin_dir   = "../../../../build/bin"
dependencies_bin_dir = "../../../../../shapeworks-dependencies/bin"

# set up shapeworks environment
setupenv.setup_shapeworks_env(shapeworks_bin_dir,  
                              dependencies_bin_dir, 
                              verbose = False)

### Importing `shapeworks` library

In [None]:
# let's import shapeworks library to test whether shapeworks is now set
try:
    import shapeworks as sw
except ImportError:
    print('ERROR: shapeworks library failed to import')
else:
    print('SUCCESS: shapeworks library is successfully imported!!!')

### Import Data Augmentation Package

In [None]:
# let's import shapeworks library to test whether shapeworks is now set
try:
    import DataAugmentationUtils
except ImportError:
    print('ERROR: DataAugmentationUtils failed to import')
else:
    print('SUCCESS: DataAugmentationUtils is successfully imported!!!')

## 1. Defining the original dataset

### Defining dataset location

You can download exemplar datasets from [ShapeWorks data portal](http://cibc1.sci.utah.edu:8080) after you login. For new users, you can [register](http://cibc1.sci.utah.edu:8080/#?dialog=register) an account for free. Please do not use an important password.

After you login, click `Collections` on the left panel and then `use-case-data-v2`. Select the dataset you would like to download by clicking on the checkbox on the left of the dataset name. See the video below.

**This notebook assumes that you have downloaded `femur-v0` in `Examples/Python/Data`.** Feel free to use your own dataset. 


<p><video src="https://sci.utah.edu/~shapeworks/doc-resources/mp4s/portal_data_download.mp4" autoplay muted loop controls style="width:100%"></p>

In [None]:
# dataset name is the folder name for your dataset
datasetName  = 'femur-v0'

# path to the dataset where we can find shape data 
# here we assume shape data are given as binary segmentations
data_dir      = '../../Data/' + datasetName + '/'
    
print('Dataset Name:     ' + datasetName)
print('Directory:  ' + data_dir)

### Get file lists
Now we need the .particle files and corresponding raw images for the original dataset.

In [None]:
# Get image path list
img_dir = data_dir + "groomed/images/"
img_list = []
for file in os.listdir(img_dir):
    img_list.append(img_dir + file)
img_list = sorted(img_list)

# Get particles path list
model_dir =  data_dir + "shape_models/femur/1024/" 
local_particle_list = []
for file in os.listdir(model_dir):
    if "local" in file:
        local_particle_list.append(model_dir + file)
local_particle_list = sorted(local_particle_list)

print("Total shapes in original dataset: "+ str(len(img_list)))

## Run data augmentation using a Gaussian Distribution

In [None]:
# Augmentation variables to keep constant
num_samples = 50
num_dim = 0
percent_variability = 0.95

In [None]:
output_directory = 'Output/GaussianAugmentation/'
sampler_type = "gaussian"
embedded_dim = DataAugmentationUtils.runDataAugmentation(output_directory, img_list, local_particle_list, num_samples, num_dim, percent_variability, sampler_type)
aug_data_csv = output_directory + "/TotalData.csv"

### Visualize distribution of real and augmented data

In [None]:
DataAugmentationUtils.visualizeAugmentation(aug_data_csv, 'violin')

## Run data augmentation using a Mixture of Gaussian Distribution


In [None]:
output_directory = 'Output/MixtureAugmentation/'
sampler_type = "mixture"
embedded_dim = DataAugmentationUtils.runDataAugmentation(output_directory, img_list, local_particle_list, num_samples, num_dim, percent_variability, sampler_type)
aug_data_csv = output_directory + "/TotalData.csv"

### Visualize distribution of real and augmented data

In [None]:
DataAugmentationUtils.visualizeAugmentation(aug_data_csv, 'violin')

## Run data augmentation using Kernel Density Estimation

In [None]:
output_directory = 'Output/KDEAugmentation/'
sampler_type = "kde"
embedded_dim = DataAugmentationUtils.runDataAugmentation(output_directory, img_list, local_particle_list, num_samples, num_dim, percent_variability, sampler_type)
aug_data_csv = output_directory + "/TotalData.csv"

### Visualize distribution of real and augmented data

In [None]:
DataAugmentationUtils.visualizeAugmentation(aug_data_csv, 'violin')