From 618bd02178acd9de5f8946837b0e33f3a6f35452 Mon Sep 17 00:00:00 2001 From: Adriano Meligrana <68152031+Tortar@users.noreply.github.com> Date: Wed, 13 Aug 2025 02:28:19 +0200 Subject: [PATCH 1/6] Show results of code blocks --- docs/src/basics.md | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/docs/src/basics.md b/docs/src/basics.md index ef6ab9b5..52f80e41 100644 --- a/docs/src/basics.md +++ b/docs/src/basics.md @@ -3,7 +3,7 @@ The `itsample` function allows to consume all the stream at once and return the sample collected: -```julia +```@example 1 using StreamSampling st = 1:100; @@ -14,9 +14,7 @@ itsample(st, 5) In some cases, one needs to control the updates the `ReservoirSampler` will be subject to. In this case you can simply use the `fit!` function to update the reservoir: -```julia -using StreamSampling - +```@example 1 st = 1:100; rs = ReservoirSampler{Int}(5); @@ -31,9 +29,7 @@ value(rs) If the total number of elements in the stream is known beforehand and the sampling is unweighted, it is also possible to iterate over a `StreamSampler` like so -```julia -using StreamSampling - +```@example 1 st = 1:100; ss = StreamSampler{Int}(st, 5, 100); @@ -52,4 +48,4 @@ memory if not collected, while reservoir techniques require `O(k)` memory where of elements in the sample. Consult the [API page](https://juliadynamics.github.io/StreamSampling.jl/stable/api) for more information -about the package interface. \ No newline at end of file +about the package interface. From 02bfc4c3fb2d640adb765c8c8b5635bedf88a9df Mon Sep 17 00:00:00 2001 From: Adriano Meligrana <68152031+Tortar@users.noreply.github.com> Date: Wed, 13 Aug 2025 02:29:19 +0200 Subject: [PATCH 2/6] Update example.md --- docs/src/example.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/src/example.md b/docs/src/example.md index b21163b4..397ec2c4 100644 --- a/docs/src/example.md +++ b/docs/src/example.md @@ -9,7 +9,7 @@ assume that the monitored statistic in this case is the mean of the data, and you want that to be lower than a certain threshold otherwise some malfunctioning is expected. -```julia +```@example 1 using StreamSampling, Statistics, Random function monitor(stream, thr) @@ -29,21 +29,21 @@ end We use some toy data for illustration -```julia +```@example 1 stream = 1:10^8; # the data stream thr = 2*10^7; # the threshold for the mean monitoring ``` Then, we run the monitoring -```julia +```@example 1 rs = monitor(stream, thr); ``` The number of observations until the detection is triggered is given by -```julia +```@example 1 nobs(rs) ``` From 80c491a5a38ebdce6965f1a60b47b252ff33e1ac Mon Sep 17 00:00:00 2001 From: Adriano Meligrana <68152031+Tortar@users.noreply.github.com> Date: Wed, 13 Aug 2025 02:29:59 +0200 Subject: [PATCH 3/6] Update perf_tips.md --- docs/src/perf_tips.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/src/perf_tips.md b/docs/src/perf_tips.md index 8a97ed70..b604096b 100644 --- a/docs/src/perf_tips.md +++ b/docs/src/perf_tips.md @@ -9,8 +9,8 @@ hood to update the reservoir. Let's compare the performance of mutable and immutable samplers with a simple benchmark -```julia -using BenchmarkTools +```@example 1 +using StreamSampling, BenchmarkTools function fit_iter!(rs, iter) for i in iter @@ -24,11 +24,11 @@ iter = 1:10^7; Running with both version we get -```julia +```@example 1 @btime fit_iter!(rs, $iter) setup=(rs = ReservoirSampler{Int}(10, AlgRSWRSKIP(); mutable = true)) ``` -```julia +```@example 1 @btime fit_iter!(rs, $iter) setup=(rs = ReservoirSampler{Int}(10, AlgRSWRSKIP(); mutable = false)) ``` @@ -47,19 +47,19 @@ merge them together at the end. Suppose for instance to have these 2 iterators -```julia +```@example 1 iters = [1:100, 101:200] ``` then you create two reservoirs of the same type -```julia +```@example 1 rs = [ReservoirSampler{Int}(10, AlgRSWRSKIP()) for i in 1:length(iters)] ``` and after that you can just update them in parallel like so -```julia +```@example 1 Threads.@threads for i in 1:length(iters) for e in iters[i] fit!(rs[i], e) @@ -70,6 +70,6 @@ end then you can obtain a unique reservoir containing a summary of the union of the streams with -```julia +```@example 1 merge(rs...) ``` From f2fe2dcd2b5fec68696f7878d24078e295d77c09 Mon Sep 17 00:00:00 2001 From: Adriano Meligrana <68152031+Tortar@users.noreply.github.com> Date: Wed, 13 Aug 2025 02:31:24 +0200 Subject: [PATCH 4/6] Update api.md --- docs/src/api.md | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/docs/src/api.md b/docs/src/api.md index a9bcb288..f5962782 100644 --- a/docs/src/api.md +++ b/docs/src/api.md @@ -1,12 +1,15 @@ # API -This is the API page of the package. For a general overview of the functionalities -consult the [ReadMe](https://github.com/JuliaDynamics/StreamSampling.jl). - -## General Functionalities +## Types ```@docs ReservoirSampler +StreamSampler +``` + +## Methods + +```@docs fit! merge! merge @@ -14,11 +17,10 @@ empty! value ordvalue nobs -StreamSampler itsample ``` -## Sampling Algorithms +## Algorithms ```@docs AlgR From 5e5daca5b7db95d4ee6cb4605fcb27101f23bdaa Mon Sep 17 00:00:00 2001 From: Adriano Meligrana <68152031+Tortar@users.noreply.github.com> Date: Wed, 13 Aug 2025 02:32:02 +0200 Subject: [PATCH 5/6] Update basics.md --- docs/src/basics.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/src/basics.md b/docs/src/basics.md index 52f80e41..b33edc35 100644 --- a/docs/src/basics.md +++ b/docs/src/basics.md @@ -46,6 +46,3 @@ r The advantage of `StreamSampler` iterators in respect to `ReservoirSampler` is that they require `O(1)` memory if not collected, while reservoir techniques require `O(k)` memory where `k` is the number of elements in the sample. - -Consult the [API page](https://juliadynamics.github.io/StreamSampling.jl/stable/api) for more information -about the package interface. From 0d3d22ad24a477ab4c745ca8b19212fa0674fc1b Mon Sep 17 00:00:00 2001 From: Adriano Meligrana <68152031+Tortar@users.noreply.github.com> Date: Wed, 13 Aug 2025 02:34:52 +0200 Subject: [PATCH 6/6] Update perf_tips.md --- docs/src/perf_tips.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/src/perf_tips.md b/docs/src/perf_tips.md index b604096b..7fe632cf 100644 --- a/docs/src/perf_tips.md +++ b/docs/src/perf_tips.md @@ -1,5 +1,6 @@ +# Performance Tips -# Use Immutable Reservoir Samplers +## Use Immutable Reservoir Samplers By default, a `ReservoirSampler` is mutable, however, it is also possible to use an immutable version which supports @@ -39,7 +40,7 @@ will be faster than the mutable one. Be careful though, because calling `fit!` on an immutable sampler won't modify it in-place, but only create a new updated instance. -# Parallel Sampling from Multiple Streams +## Parallel Sampling from Multiple Streams Let's say that you want to split the sampling of an iterator. If you can split the iterator into different partitions then you can update in parallel a reservoir sample for each partition and then