diff --git a/docs/src/api.md b/docs/src/api.md index a9bcb288..f5962782 100644 --- a/docs/src/api.md +++ b/docs/src/api.md @@ -1,12 +1,15 @@ # API -This is the API page of the package. For a general overview of the functionalities -consult the [ReadMe](https://github.com/JuliaDynamics/StreamSampling.jl). - -## General Functionalities +## Types ```@docs ReservoirSampler +StreamSampler +``` + +## Methods + +```@docs fit! merge! merge @@ -14,11 +17,10 @@ empty! value ordvalue nobs -StreamSampler itsample ``` -## Sampling Algorithms +## Algorithms ```@docs AlgR diff --git a/docs/src/basics.md b/docs/src/basics.md index ef6ab9b5..b33edc35 100644 --- a/docs/src/basics.md +++ b/docs/src/basics.md @@ -3,7 +3,7 @@ The `itsample` function allows to consume all the stream at once and return the sample collected: -```julia +```@example 1 using StreamSampling st = 1:100; @@ -14,9 +14,7 @@ itsample(st, 5) In some cases, one needs to control the updates the `ReservoirSampler` will be subject to. In this case you can simply use the `fit!` function to update the reservoir: -```julia -using StreamSampling - +```@example 1 st = 1:100; rs = ReservoirSampler{Int}(5); @@ -31,9 +29,7 @@ value(rs) If the total number of elements in the stream is known beforehand and the sampling is unweighted, it is also possible to iterate over a `StreamSampler` like so -```julia -using StreamSampling - +```@example 1 st = 1:100; ss = StreamSampler{Int}(st, 5, 100); @@ -50,6 +46,3 @@ r The advantage of `StreamSampler` iterators in respect to `ReservoirSampler` is that they require `O(1)` memory if not collected, while reservoir techniques require `O(k)` memory where `k` is the number of elements in the sample. - -Consult the [API page](https://juliadynamics.github.io/StreamSampling.jl/stable/api) for more information -about the package interface. \ No newline at end of file diff --git a/docs/src/example.md b/docs/src/example.md index b21163b4..397ec2c4 100644 --- a/docs/src/example.md +++ b/docs/src/example.md @@ -9,7 +9,7 @@ assume that the monitored statistic in this case is the mean of the data, and you want that to be lower than a certain threshold otherwise some malfunctioning is expected. -```julia +```@example 1 using StreamSampling, Statistics, Random function monitor(stream, thr) @@ -29,21 +29,21 @@ end We use some toy data for illustration -```julia +```@example 1 stream = 1:10^8; # the data stream thr = 2*10^7; # the threshold for the mean monitoring ``` Then, we run the monitoring -```julia +```@example 1 rs = monitor(stream, thr); ``` The number of observations until the detection is triggered is given by -```julia +```@example 1 nobs(rs) ``` diff --git a/docs/src/perf_tips.md b/docs/src/perf_tips.md index 8a97ed70..7fe632cf 100644 --- a/docs/src/perf_tips.md +++ b/docs/src/perf_tips.md @@ -1,5 +1,6 @@ +# Performance Tips -# Use Immutable Reservoir Samplers +## Use Immutable Reservoir Samplers By default, a `ReservoirSampler` is mutable, however, it is also possible to use an immutable version which supports @@ -9,8 +10,8 @@ hood to update the reservoir. Let's compare the performance of mutable and immutable samplers with a simple benchmark -```julia -using BenchmarkTools +```@example 1 +using StreamSampling, BenchmarkTools function fit_iter!(rs, iter) for i in iter @@ -24,11 +25,11 @@ iter = 1:10^7; Running with both version we get -```julia +```@example 1 @btime fit_iter!(rs, $iter) setup=(rs = ReservoirSampler{Int}(10, AlgRSWRSKIP(); mutable = true)) ``` -```julia +```@example 1 @btime fit_iter!(rs, $iter) setup=(rs = ReservoirSampler{Int}(10, AlgRSWRSKIP(); mutable = false)) ``` @@ -39,7 +40,7 @@ will be faster than the mutable one. Be careful though, because calling `fit!` on an immutable sampler won't modify it in-place, but only create a new updated instance. -# Parallel Sampling from Multiple Streams +## Parallel Sampling from Multiple Streams Let's say that you want to split the sampling of an iterator. If you can split the iterator into different partitions then you can update in parallel a reservoir sample for each partition and then @@ -47,19 +48,19 @@ merge them together at the end. Suppose for instance to have these 2 iterators -```julia +```@example 1 iters = [1:100, 101:200] ``` then you create two reservoirs of the same type -```julia +```@example 1 rs = [ReservoirSampler{Int}(10, AlgRSWRSKIP()) for i in 1:length(iters)] ``` and after that you can just update them in parallel like so -```julia +```@example 1 Threads.@threads for i in 1:length(iters) for e in iters[i] fit!(rs[i], e) @@ -70,6 +71,6 @@ end then you can obtain a unique reservoir containing a summary of the union of the streams with -```julia +```@example 1 merge(rs...) ```