# Welcome to the cinemasci tutorial workflow.  

This workflow starts by importing the cinemasci python module and creating a viewer object.  

There is an already-existing Cinema database: `data/nyx_volume.cdb` in the `data/` directory that will be loaded into the viewer.  

Give it a few seconds to start the viewer and load the database...

In [1]:
import cinemasci
import cinemasci.pynb
import os
from sys import platform

### Explore the database in the built-in viewer using the sliders

Run this cell and then take a minute to explore the Cinema database with the viewer sliders.  The data is from the Nyx Cosmology simulation and shows the formation of dark matter halos over time.  The variable shown is log(baryon density).  The database images show the simulation from different phi/theta angles with five selected time steps.  

Note that the databases included in this workflow are quite small to make this workflow quicker.  A Cinema database generated by a large scale simulation may contain thousands of images and much finer-grained angle divisions.  

In [4]:
# create a viewer object
viewer = cinemasci.pynb.CinemaViewer()
viewer.setUIValues({'image size': 200})
viewer.hideParameterControl('producer')

my_cdb = "data/nyx_volume.cdb"
viewer.load(my_cdb)


VBox(children=(Output(layout=Layout(border='0px solid black', width='98%')), HBox(children=(Output(layout=Layo…

### Now we'll do an example analysis workflow

The next cell brings in a Cinema database of 2D slices from the Nyx simulation and opens the database of slices in the viewer object.  Note that the slider values, which are drawn from the Cinema database columns, now only have slice number and time step. 

In [5]:
# Note: assumes previous cell has been run
import shutil

clean_slice = "data/nyx_clean_slice.cdb"

# create a viewer object
viewerS = cinemasci.pynb.CinemaViewer()
viewerS.setUIValues({'image size': 200})
viewerS.load(clean_slice)


VBox(children=(Output(layout=Layout(border='0px solid black', width='98%')), HBox(children=(Output(layout=Layo…

### Make a Cinema database directory for the analysis output

First there is a check to see if the the workflow has been previously run; some some cleanup is done so that we start from the unanalyzed images (clean_slice).

In [6]:
# Are we rerunning the workflow?  If so, remove old output database to generate new images and data.csv.
new_cdb = "nyx_new_slice.cdb/"
path_dir = "./data/"
cdb_path = os.path.join(path_dir, new_cdb) 
print ('New CDB will go into: ', cdb_path)

try:
    shutil.rmtree(cdb_path)
except OSError as e:#
    print("Need a working CDB: %s : %s" % (cdb_path, "Creating nyx_slice.cdb"))
shutil.copytree(clean_slice, cdb_path) 
# Load the data.csv from the nyx_slices CDB into a dataframe
data_csv = cdb_path + "data.csv"

New CDB will go into:  ./data/nyx_new_slice.cdb/


### Image-based analysis leveraging Python libraries

Jupyter notebooks are becoming a common analysis approach for scientists.  This workflow takes the Cinema database of image slices from the Nyx simulation and performs some basic analysis on the images.  The Cinema database is read in as a pandas dataframe; the skimage library is used to calculate standard statisical quantities; OpenCV library is used to find countours and edges in the images.  These analyses generate output variables and the OpenCV analysis generates additional images with the new contours.  

In [7]:
import cv2
import pandas as pd
from skimage import io
from skimage import color
from skimage import feature
from skimage.measure import shannon_entropy
import numpy as np

dfslices = pd.read_csv(data_csv)

In [8]:
# Need a set of lists to create the new columns for the dataframe / eventual updated CDB
imgMean = []
imgStDev = []
imgShEntropy = []
grayFILE = []
cannyFILE = []
contoursFILE = []

# Set up variables for the Canny edge detection
lower_threshold = 5
upper_threshold = 200
sobel_size = 3
bins = 131072

# Set up variables for the contours
cthreshold = 45
thickness = 2
color = [200,200,200]


In [9]:
# Cycle through the CDB slices
print ('Making new images, please wait...')
for f in dfslices['FILE'] :
    # Load each image and convert to a grayscale image
#    imgpath = my_slice_cdb + f
    imgpath = cdb_path + f
    src = cv2.imread(imgpath)   # original color image
    imggray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY ) # grayscale image
    
    # Save grayscale image to disk and save (relative) path/name to list of grayscale image names
    grayfile = imgpath.replace('.png' ,'_gray.png')
    cv2.imwrite(grayfile, imggray)
    grayFILE.append(f.replace('.png' ,'_gray.png'))
    
    # Calculate some basic statistics on the grayscale images and add to respective lists
    imgMean.append(np.mean(io.imread(grayfile), (0,1)) )
    imgStDev.append(np.std(io.imread(grayfile), (0,1)) )
    imgShEntropy.append(shannon_entropy(imggray))
    
    # Do some Canny edge detection, again saving the new image and adding new path/name to a list
    imgCannyEdges = cv2.Canny(imggray, lower_threshold, upper_threshold, apertureSize=sobel_size, L2gradient=False)
    cannyfile = imgpath.replace('.png', '_canny_edge.png')
    cv2.imwrite(cannyfile, imgCannyEdges)
    cannyFILE.append(f.replace('.png', '_canny_edge.png'))
    
    # Find some contours, again saving the new image and adding new path/name to a list
    ret, binary = cv2.threshold(imggray, cthreshold, 255, cv2.THRESH_BINARY)
    contours, hierarchy = cv2.findContours(binary, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    imgContours = cv2.drawContours(src, contours, -1, (color[2], color[1], color[0]), thickness)
    contoursfile = imgpath.replace('.png' ,'_contours.png')
    cv2.imwrite(contoursfile, imgContours)
    contoursFILE.append(f.replace('.png', '_contours.png'))
 
print ("New images now available")

Making new images, please wait...
New images now available


### Populate the new Cinema database

In the next cell, the new statistical quantities and the new images with the contour are added to the dataframe.  The dataframe is written out to the `data.csv` file for the new Cinema database.  

In [10]:
# Update the dataframe with the new columns and reorder to fit Cinema Specification
dfslices['FILE_gray'] = grayFILE
dfslices['FILE_canny_edge'] = cannyFILE
dfslices['Mean'] = imgMean
dfslices['ShannonEntropy'] = imgShEntropy
dfslices['StdDev'] = imgStDev
dfslices['FILE_contours'] = contoursFILE

dfslices = dfslices[['timestep', 'slice', 'Mean', 'StdDev', 'ShannonEntropy', 
                     'FILE', 'FILE_gray', 'FILE_canny_edge', 'FILE_contours']]


In [11]:

# Write out the new data.csv 
dfslices.to_csv(data_csv, index=False)


### View all the Cinema databases

Using the OS sytem library, an already-existing html file is opened in Firefox so that all these Cinema databases can be explored using the browser-based Cinema:View and Cinema:Explorer viewers.  

This workflow involved three Cinema databases: the `nyx_volume.cdb` database that has images spanning phi/theta; the original `nyx_clean_slice.cdb` which has the two original slices of 2D Nyx data; and the output `nyx_new_slices.cdb` that includes the statistical analysis variables and the new images with the OpenCV contours.  

Take some time to explore these databases in the viewers.  Cinema:Explorer uses a parallel coordinates plot approach to interactively explore a Cinema database.  

In [12]:

print (platform )
if (platform == 'darwin') :
    os.system('open -a Firefox ./nyx_databases.html')


darwin
