# Too Many or Too Few
### Sampling Bounds for Topological Descriptors

#### Authors

#### Link to paper

This [Jupyter](https://jupyter.org/) notebook is an electronic supplementry material of the article.

## Smallest Stratum Experiment
NOTE: This experiement is Section 4.2 from the paper using 001 approx.

Please run all steps in preprocess.ipynb before running this experiment

In [None]:
from main import get_exp_graphs, exp
import time
import random
import numpy as np
from IPython.display import clear_output

# modify for that approximation type for emnist and mpeg7
# choices include graphs_001_approx and graphs_005_approx
graphs_dir = "graphs_001_approx"
# same as above but specifies where to write results
out_graphs_dir = "output_001_approx"

# main function for setting up and executing experiments
start = time.time()
# Set for random experiments only
random.seed(423652346)
np.random.seed(423652346)

#### exp type is:
#       1 for stratification experiment (distribution_exp)
#       2 for random sample experiment (sample_exp)
#       3 for smallest angle experiment (smallest_angle_exp)
#       4 for a uniform random sample experiment (uniform_sample_exp)
#       5 for all four exps
exp_type = 3

#### data is:
#       1 for random
#       2 for MPEG7 (classes from PHT paper - Turner et al.)
#       3 for EMNIST
#       4 for all three
#       5 for test
data_type = 4

exp_list = get_exp_graphs(data_type,graphs_dir,out_graphs_dir)

# Run the experiments
counter = 1 
for e in exp_list:
  print("Graph "+str(counter)+" of "+str(len(exp_list)))
  exp(e["G"], e["output_file"], exp_type, out_graphs_dir)
  counter+=1

print("Execution time: "+str(time.time() - start)+"(s)")


#Clear output when done running experiment
clear_output()

# Prepare Experiment files for PDF representation
Once the Experiment has finished running, the following code will combine the experiment data for statistical analysis

In [None]:
from combine_data import random, mpeg7_mnist

#Combines experiment files for statistical analysis
random("smallest_angle_exp","001","angle_stats")
mpeg7_mnist("mpeg7","smallest_angle_exp", "001", "angle_stats")
mpeg7_mnist("mnist","smallest_angle_exp", "001", "angle_stats")

# Create PDFs
Here we use R to run the statistical analysis for the combined graphs. 

In [None]:
#Load R for statistical analysis
import rpy2.rinterface
%load_ext rpy2.ipython

In [None]:
%%R
# Reads data from output/combined_data/*/
# Generates graphs in figs/smallest_angle_exp/*/
# Graphs show:
#   x-axis: number of vertices
#   y-axis: the ratio number of generated stratum using smallest size/number of
#		stratum
# Creates both pdf and png files

source("plotting-code/analysis_001_approx-smallest_angle_exp-smallest_angle_graphs.R")
smallest_angle_exp_stat_001()

# Figures
Combine output PDF files for viewing

In [None]:
from combine_data import combine_pngs

combine_pngs('smallest_angle_exp', '001')

In [None]:
%%html

<div style="position:relative; width:100%; height:0px; padding-bottom:33.33%;">
    <img style="position:absolute; left:0; top:0; width:100%; height:100%"
        src="figs/smallest_angle_exp/smallest_angle_exp_001.png" frameborder="0" allowfullscreen>
    </img>
</div>

Log-log plot of the smallest stratum size versus the number of vertices for datasets RANDPTS, EMNIST .001, and MPEG7 .001. For all three datasets, the smallest stratum size decreases proportionately with the number of vertices. In all experiments, we observe that even for graphs with 1000’s of vertices, the minimum stratum size drops below 10−5. Building a descriptor set using a uniform discretization with such small stratum size would result in hundreds-of-thousands of descriptors, which would be far too large to be of any use in practice. (Figure 2 from paper)