The Replacement Bootstrap for Dependent Data
C++ Makefile Shell Matlab
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.idea
matlab_tools
Makefile
README.md
boot_n500_b1000_r500.tar.gz
bootstrap_methods.cpp
bootstrap_methods.h
bootstrap_methods.o
functions.cpp
functions.h
functions.o
gzstream.h
libgzstream.a
main
main.cpp
main.o
origseq_n500.dat.gz
rboot_ISIT_2015.pdf
run.sh

README.md

replacementbootstrap

The Replacement Bootstrap for Dependent Data

These files are the minimum viable files to run the algorithm from

  • The Replacement Bootstrap for Dependent Data, with Alessandro Lazaric and Daniil Ryabko, ISIT 2015

Notes

The existing code does not take advantage of any memory optimizations (preferring cache memory when possible) or parallelization (it's naively parallelizable in the number of bootstraps and alphabet size) because the utility depends on the parameter settings. Auto-detection if memory optimization or multiple-cores are possible would be a great upgrade.

Minimum viable run file

The run.sh file will demonstrate the code.

Parameters

orig_file_name

  • The sequence to bootstrap
  • Currently a binary encoded csv in the form of a gzip file.
  • Would be nice to include flags for csv or json input files as well, along with a skip header option.

output_file_name

  • Would be nice to include this as an option.

bootstraps

  • Number of bootstraps
  • default = 1,000
  • minimum = 1,000
  • maximum = 100,000

A

  • Alphabet Size must be a power of 2 for now.
  • It would be nice to include the option of optimal binning or other unsupervised quantization/discretization.
  • The constraint of power of 2 is in the existing code design and not the algorithm.
  • default = 2 (binary)
  • minimum power value = 2
  • maximum power value = 1,024

n

  • Sequence length
  • default = autodetect length based on input sequence
  • minimum = 10
  • maximum = 100000

replacement_percentage

  • The replacement percentage for the algorithm
  • default = 100
  • minimum = 0
  • maximum = 10,000

DOI