# Too Many or Too Few
### Sampling Bounds for Topological Descriptors

#### Authors

#### Link to paper

This [Jupyter](https://jupyter.org/) notebook is an electronic supplementry material of the article.


## Preprocessing
#### The following code will download the MPEG7/EMNIST data and create and store the data in the appropriate folders. 
You can find the MPEG7 data at: http://www.dabi.temple.edu/~shape/MPEG7
    
You can find the EMNIST data at: https://www.nist.gov/itl/products-and-services/emnist-dataset

In [None]:
from get_data import *

preprocess_data(dir_list)

#### The following code will create the graphs for RANDPTS.

In [None]:
from build_graphs import *
randpts_graphs()

#### The following code will create the graphs for MPEG7 data. 

In [None]:
from build_graphs import *

#Change eps according to how close we want the approx
eps = .001

#Change to corresponding eps 
graphs_dir = "graphs_001_approx" 

mpeg7_graphs(eps, graphs_dir)

#### The following code will create the graphs for EMNIST data. 

In [None]:
from build_graphs import *

#Change eps according to how close we want the approx
eps = .001

#Change to corresponding eps 
graphs_dir = "graphs_001_approx" 

mnist_graphs(eps, graphs_dir)

## Running experiments
Once the graphs are created, the following code we will run each of the experiments for RANDPTS, MPEG7, and MNIST data sets.

Note that you can change graphs_dir and out_graphs_dir for which boundary you would like.
You can also choose to run this experiment for all datasets or just a particular dataset by specifying data_type.

### Smallest Stratum Experiment
NOTE: This experiement is Section 4.2 from the paper

In [None]:
from main import *

# modify for that approximation type for emnist and mpeg7
# choices include graphs_001_approx and graphs_005_approx
graphs_dir = "graphs_001_approx"
# same as above but specifies where to write results
out_graphs_dir = "output_001_approx"

# main function for setting up and executing experiments
if __name__ == "__main__":
  start = time.time()
  # Set for random experiments only
  random.seed(423652346)
  np.random.seed(423652346)

  #### exp type is:
  #       1 for stratification experiment (distribution_exp)
  #       2 for random sample experiment (sample_exp)
  #       3 for smallest angle experiment (smallest_angle_exp)
  #       4 for a uniform random sample experiment (uniform_sample_exp)
  #       5 for all four exps
  exp_type = 3
  #### data is:
  #       1 for random
  #       2 for MPEG7 (classes from PHT paper - Turner et al.)
  #       3 for EMNIST
  #       4 for all three
  #       5 for test
  data_type = 4

  exp_list = get_exp_graphs(4,graphs_dir,out_graphs_dir)

  counter = 1
  for e in exp_list:
    print("Graph "+str(counter)+" of "+str(len(exp_list)))
    exp(e["G"], e["output_file"], 3, out_graphs_dir)
    counter+=1

  print("Execution time: "+str(time.time() - start)+"(s)")

# Prepare Experiment files for PDF representation

In [None]:
from combine_data import *

#Combines experiment files for each approximation 
mpeg7_mnist("mpeg7","smallest_angle_exp", "001")
mpeg7_mnist("mnist","smallest_angle_exp", "001")

random("005", "smallest_angle_exp")
mpeg7_mnist("mpeg7","smallest_angle_exp", "005")
mpeg7_mnist("mnist","smallest_angle_exp", "005")

In [None]:
import pandas as pd
df = pd.DataFrame({
    'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
    'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]
})

In [None]:
import rpy2.rinterface
%load_ext rpy2.ipython

In [None]:
%%R -i df -w 5 -h 5 --units in -r 200
# import df from global environment
# make default figure size 5 by 5 inches with 200 dpi resolution

install.packages("ggplot2", repos='http://cran.us.r-project.org', quiet=TRUE)
library(ggplot2)
ggplot(df, aes(x=cups_of_coffee, y=productivity)) + geom_line()

# Figures

### Uniform Random Sample Experiment
NOTE: This experiement is Section 4.3 from the paper

In [None]:
from main import *
# modify for that approximation type for emnist and mpeg7
# choices include graphs_001_approx and graphs_005_approx
graphs_dir = "graphs_005_approx"
# same as above but specifies where to write results
out_graphs_dir = "output_005_approx"

# main function for setting up and executing experiments
if __name__ == "__main__":
  start = time.time()
  # Set for random experiments only
  random.seed(423652346)
  np.random.seed(423652346)

  #### exp type is:
  #       1 for stratification experiment (distribution_exp)
  #       2 for random sample experiment (sample_exp)
  #       3 for smallest angle experiment (smallest_angle_exp)
  #       4 for a uniform random sample experiment (uniform_sample_exp)
  #       5 for all four exps
  exp_type = 4
  #### data is:
  #       1 for random
  #       2 for MPEG7 (classes from PHT paper - Turner et al.)
  #       3 for EMNIST
  #       4 for all three
  #       5 for test
  data_type = 4

  exp_list = get_exp_graphs(4,graphs_dir,out_graphs_dir)

  counter = 1
  for e in exp_list:
    print("Graph "+str(counter)+" of "+str(len(exp_list)))
    exp(e["G"], e["output_file"], 4, out_graphs_dir)
    counter+=1

  print("Execution time: "+str(time.time() - start)+"(s)")

Graph 1 of 8317
Num arcs: 1560
output_005_approx/uniform_sample_exp/random/RAND_40_80.txt
Graph 2 of 8317
Num arcs: 870
output_005_approx/uniform_sample_exp/random/RAND_30_13.txt
Graph 3 of 8317
Num arcs: 1560
output_005_approx/uniform_sample_exp/random/RAND_40_90.txt
Graph 4 of 8317
Not running exp, too many arcs: 8010
Graph 5 of 8317
Not running exp, too many arcs: 6320
Graph 6 of 8317
Not running exp, too many arcs: 6320
Graph 7 of 8317
Not running exp, too many arcs: 9900
Graph 8 of 8317
Num arcs: 3540
output_005_approx/uniform_sample_exp/random/RAND_60_70.txt
Graph 9 of 8317
Not running exp, too many arcs: 6320
Graph 10 of 8317
Not running exp, too many arcs: 9900
Graph 11 of 8317
Num arcs: 3540
output_005_approx/uniform_sample_exp/random/RAND_60_60.txt
Graph 12 of 8317
Not running exp, too many arcs: 6320
Graph 13 of 8317
Num arcs: 2450
output_005_approx/uniform_sample_exp/random/RAND_50_26.txt
Graph 14 of 8317
Num arcs: 2450
output_005_approx/uniform_sample_exp/random/RAND_50_36

# Prepare Experiment files for PDF representation

In [None]:
from combine_data import *

#Combines experiment files for each approximation 
mpeg7_mnist("mpeg7","uniform_sample_exp", "001")
mpeg7_mnist("mnist","uniform_sample_exp", "001")

random("005", "uniform_sample_exp")
mpeg7_mnist("mpeg7", "uniform_sample_exp", "005")
mpeg7_mnist("mnist", "uniform_sample_exp", "005")

## Figures