Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,26 @@
# API

This is the API page of the package. For a general overview of the functionalities
consult the [ReadMe](https://github.com/JuliaDynamics/StreamSampling.jl).

## General Functionalities
## Types

```@docs
ReservoirSampler
StreamSampler
```

## Methods

```@docs
fit!
merge!
merge
empty!
value
ordvalue
nobs
StreamSampler
itsample
```

## Sampling Algorithms
## Algorithms

```@docs
AlgR
Expand Down
13 changes: 3 additions & 10 deletions docs/src/basics.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

The `itsample` function allows to consume all the stream at once and return the sample collected:

```julia
```@example 1
using StreamSampling

st = 1:100;
Expand All @@ -14,9 +14,7 @@ itsample(st, 5)
In some cases, one needs to control the updates the `ReservoirSampler` will be subject to. In this case
you can simply use the `fit!` function to update the reservoir:

```julia
using StreamSampling

```@example 1
st = 1:100;

rs = ReservoirSampler{Int}(5);
Expand All @@ -31,9 +29,7 @@ value(rs)
If the total number of elements in the stream is known beforehand and the sampling is unweighted, it is
also possible to iterate over a `StreamSampler` like so

```julia
using StreamSampling

```@example 1
st = 1:100;

ss = StreamSampler{Int}(st, 5, 100);
Expand All @@ -50,6 +46,3 @@ r
The advantage of `StreamSampler` iterators in respect to `ReservoirSampler` is that they require `O(1)`
memory if not collected, while reservoir techniques require `O(k)` memory where `k` is the number
of elements in the sample.

Consult the [API page](https://juliadynamics.github.io/StreamSampling.jl/stable/api) for more information
about the package interface.
8 changes: 4 additions & 4 deletions docs/src/example.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ assume that the monitored statistic in this case is the mean of the data, and
you want that to be lower than a certain threshold otherwise some malfunctioning
is expected.

```julia
```@example 1
using StreamSampling, Statistics, Random

function monitor(stream, thr)
Expand All @@ -29,21 +29,21 @@ end

We use some toy data for illustration

```julia
```@example 1
stream = 1:10^8; # the data stream
thr = 2*10^7; # the threshold for the mean monitoring
```

Then, we run the monitoring

```julia
```@example 1
rs = monitor(stream, thr);
```

The number of observations until the detection is triggered is
given by

```julia
```@example 1
nobs(rs)
```

Expand Down
21 changes: 11 additions & 10 deletions docs/src/perf_tips.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Performance Tips

# Use Immutable Reservoir Samplers
## Use Immutable Reservoir Samplers

By default, a `ReservoirSampler` is mutable, however, it is
also possible to use an immutable version which supports
Expand All @@ -9,8 +10,8 @@ hood to update the reservoir.
Let's compare the performance of mutable and immutable samplers
with a simple benchmark

```julia
using BenchmarkTools
```@example 1
using StreamSampling, BenchmarkTools

function fit_iter!(rs, iter)
for i in iter
Expand All @@ -24,11 +25,11 @@ iter = 1:10^7;

Running with both version we get

```julia
```@example 1
@btime fit_iter!(rs, $iter) setup=(rs = ReservoirSampler{Int}(10, AlgRSWRSKIP(); mutable = true))
```

```julia
```@example 1
@btime fit_iter!(rs, $iter) setup=(rs = ReservoirSampler{Int}(10, AlgRSWRSKIP(); mutable = false))
```

Expand All @@ -39,27 +40,27 @@ will be faster than the mutable one. Be careful though, because
calling `fit!` on an immutable sampler won't modify it in-place,
but only create a new updated instance.

# Parallel Sampling from Multiple Streams
## Parallel Sampling from Multiple Streams

Let's say that you want to split the sampling of an iterator. If you can split the iterator into
different partitions then you can update in parallel a reservoir sample for each partition and then
merge them together at the end.

Suppose for instance to have these 2 iterators

```julia
```@example 1
iters = [1:100, 101:200]
```

then you create two reservoirs of the same type

```julia
```@example 1
rs = [ReservoirSampler{Int}(10, AlgRSWRSKIP()) for i in 1:length(iters)]
```

and after that you can just update them in parallel like so

```julia
```@example 1
Threads.@threads for i in 1:length(iters)
for e in iters[i]
fit!(rs[i], e)
Expand All @@ -70,6 +71,6 @@ end
then you can obtain a unique reservoir containing a summary of the union of the streams
with

```julia
```@example 1
merge(rs...)
```
Loading