# Benchmark discretization

Time data discretization and MI matrix calculation for datasets with different numbers of cells, and plot the results.

## More details
Data are generated randomly for this test. The number of calculations is related to the number of _bins_ when the data are discretized, rather than the number of cells. The number of cells affects the time to infer a network only by affecting the number of bins.

By default, this script times data discretization using the recommended Bayesian blocks discretization algorithm. The uniform width algorithm, being simpler, is much quicker.

Fig 7B varies the number of genes, which affects the MI matrix claculation step. This is further explored in __Benchmark MI matrix__. For timings related to PIDC network inference, see __Benchmark network inference__.

In [None]:
# Include packages

using NetworkInference
using InformationMeasures
using PyPlot

include("../helper_functions.jl")

In [None]:
# Customize options (defaults are consistent with Fig. 7B)

algorithm = PIDCNetworkInference()
discretizer = "bayesian_blocks"
number_of_genes = 500
min_number_of_cells = 1000
max_number_of_cells = 5000
step = 1000;

In [None]:
sizes, times = get_times_per_number_of_cells(algorithm, discretizer, number_of_genes, min_number_of_cells, max_number_of_cells, step)

In [None]:
# Plot times

plot(sizes, times)
plt[:xlabel]("No.of cells", fontsize = 14)
plt[:ylabel]("Seconds", fontsize = 14)
plt[:xticks](collect(1000:1000:5000), fontsize = 12)
plt[:yticks](collect(0:100:300), fontsize = 12)
plt[:title]("Time to infer MI network", fontsize = 16)