## Using a Voronoi diagram bases approach to cluster localizations.

This notebook demonstrates how to cluster localizations using the voronoi algorithm. It also demonstrates how to work with the clustered data.

Note:
* This implementation of the algorithm only works in 2D.
* It always ignores localization category information.

References:
* [Levet et al, Nature Methods, 2015](http://dx.doi.org/10.1038/nmeth.3579).

### Configuration

Create an empty directory somewhere on your computer and tell Python to go to that directory.

In [None]:
import matplotlib
import matplotlib.pyplot as pyplot
import numpy
import os

os.chdir("/home/hbabcock/Data/storm_analysis/jy_testing/")
print(os.getcwd())

numpy.random.seed(1)


### Generate data to cluster

In this example we are just going to generate the clustering data synthetically.

In [None]:
import storm_analysis.jupyter_examples.clustering_data as clusteringData

# 40 clusters
# 1000 tracks per cluster
# 20000 background tracks
clusteringData.makeClusters("clusters.hdf5", 40, 1000, 20000)

In [None]:
# Make an image from the data.
import storm_analysis.sa_utilities.hdf5_to_image as h5_image

sr_im = h5_image.render2DImage("clusters.hdf5", scale = 2, sigma = 1)

fig = pyplot.figure(figsize = (9, 6))
pyplot.imshow(sr_im, cmap = "gray")
pyplot.show()

### Cluster the data

Note:
* The results of the clustering are saved in the HDF5 that contained the tracks / localizations.
* Clustering is done on tracks if they are available, otherwise it is done on the localizations.

In [None]:
import storm_analysis.voronoi.voronoi_analysis as voronoiAnalysis

# The second parameter is the relative density factor.
# The third parameter is the minimum cluster size.
voronoiAnalysis.findClusters("clusters.hdf5", 0.2, 100)

### RGB image of the clustering results

In [None]:
import storm_analysis.dbscan.cluster_images as clusterImages

[rgb_im, sum_im, num_clusters] = clusterImages.clusterImages("clusters.hdf5", 10, 3, scale = 2, 
                                                             show_unclustered = True)


fig = pyplot.figure(figsize = (9, 6))
pyplot.imshow(rgb_im, cmap = "gray")
pyplot.show()


### Create a file with some statistics for each cluster

In [None]:
import storm_analysis.dbscan.dbscan_analysis as dbscanAnalysis

stats_name = dbscanAnalysis.clusterStats("clusters.hdf5", 10)

print()
print("Cluster statistics:")
with open(stats_name) as fp:
    for line in fp:
        print(line.strip())

### Working with Voronoi HDF5 clusters files

In [None]:
import storm_analysis.dbscan.clusters_sa_h5py as clSAH5Py

# This is basically the same as the DBSCAN clusters file so also see the
# dbscan_clustering Jupyter notebook.
#
with clSAH5Py.SAH5Clusters("clusters.hdf5") as cl_h5:
    
    # Get clustering program information.
    print("Analysis info", cl_h5.getClusteringInfo())
    
    # Get the number of clusters.
    print("Total clusters", cl_h5.getNClusters())
    
    # The Voronoi analysis includes a density for each localization/track, so we'll
    # iterate over all the localizations/tracks and make a histogram of this property.
    #
    # Use skip_unclustered = False to include all the localizations/tracks that were
    # not assigned to a cluster.
    #
    print()
    density = None
    for index, cluster in cl_h5.clustersIterator(fields = ["density"], skip_unclustered = False):
        
        # Use log of the density as the spread of densities is very large, particularly
        # in the clusters.
        log_density = numpy.log(cluster["density"] + 1.0e-6)
        [hist, bins] = numpy.histogram(log_density, bins = 40, range = (-15.0, 0.0))
        
        if density is None:
            density = hist
        else:
            density += hist
            
    centers = 0.5*(bins[1:] + bins[:-1])
    pyplot.plot(centers, density)
    pyplot.xlabel("Density (log(1/nm^2 + 1.0e-6))")
    pyplot.ylabel("Counts")
    pyplot.show()
            
