# Too Many or Too Few
### Sampling Bounds for Topological Descriptors

#### Authors

#### Link to paper

This [Jupyter](https://jupyter.org/) notebook is an electronic supplementry material of the article.

## Uniform Random Sample Experiment
NOTE: This experiement is Section 4.3 from the paper using 001 approx.

Please run all steps in preprocess.ipynb before running this experiment

In [None]:
from main import *
from IPython.display import clear_output

# modify for that approximation type for emnist and mpeg7
# choices include graphs_001_approx and graphs_005_approx
graphs_dir = "graphs_001_approx"
# same as above but specifies where to write results
out_graphs_dir = "output_001_approx"

# main function for setting up and executing experiments
if __name__ == "__main__":
  start = time.time()
  # Set for random experiments only
  random.seed(423652346)
  np.random.seed(423652346)
    
  exp_list = get_exp_graphs(4,graphs_dir,out_graphs_dir)

  counter = 1
  for e in exp_list:
    print("Graph "+str(counter)+" of "+str(len(exp_list)))
    exp(e["G"], e["output_file"], 4, out_graphs_dir)
    counter+=1

  print("Execution time: "+str(time.time() - start)+"(s)")

#Clear output when done running experiment
clear_output()

# Prepare Experiment files for PDF representation
Once the Experiment has finished running, the following code will combine the experiment data for statistical analysis

In [None]:
from uniform_sample_combine_data import *

#Combines experiment files for statistical analysis
random("uniform_sample_exp","001")
mpeg7_mnist("mpeg7","uniform_sample_exp", "001")
mpeg7_mnist("mnist","uniform_sample_exp", "001")

# Create PDFs
Here we use R to run the statistical analysis for the combined graphs. 

In [None]:
#Load R for statistical analysis
import rpy2.rinterface
%load_ext rpy2.ipython

In [None]:
%%R
# Reads data from output/combined_data/*/
# Generates graphs in figs/smallest_angle_exp/*/
# Graphs show:
#   x-axis: number of vertices
#   y-axis: the ratio number of generated stratum using smallest size/number of
#		stratum
# Creates both pdf and png files

source("plotting-code/analysis_001_approx-uniform_sample_exp-uniform_sample.R")
uniform_sample_exp_stat_001()

# Figures
Combine output PDF files for viewing

In [None]:
from uniform_sample_combine_data import combine_pngs

combine_pngs('uniform_sample_exp')

In [None]:
%%html
# View plots
<div style="position:relative; width:100%; height:0px; padding-bottom:33.33%;">
    <img style="position:absolute; left:0; top:0; width:100%; height:100%"
        src="figs/uniform_sample_exp/uniform_sample_exp_001.png" frameborder="0" allowfullscreen>
    </img>
</div>

Plot of the ratio of hit stratum over the total number of stratum versus the number of vertices for RANDPTS, EMNIST .001, and MPEG7 .001. We select 16384 uniformly distributed directions Δ from S1. For each graph, we compute the Sufficient Stratification and compute the proportion of strata hit by Δ. We observe that in all cases, even for small graphs with 60 or more vertices, we miss more than 10% of the strata, implying that if we want to use a small set of descriptors, we must be willing to miss some strata. (Figure 3 from paper)