# 4.A. Generalized Entropy

* In the study of dynamical systems there are many quantities that identify as "entropy".
* These quantities are not the more commonly known [thermodynamic ones](https://en.wikipedia.org/wiki/Entropy), used in Statistical Physics. 
* Rather, they are more like the entropies of [information theory](https://en.wikipedia.org/wiki/Entropy_(information_theory), which represent information contained within a dataset. 
* In general, the more "uncertain" or "random" the dataset is, the larger its entropy will be. On the other hand, the lower the entropy, the more "predictable" the dataset becomes.


Let $p$ be an array of probabilities (such that it sums to 1). Then the generalized entropy is defined as 

$$
H_\alpha(p) = \frac{1}{1-\alpha}\log\left(\sum_i p[i]^\alpha\right)
$$

and is also called [Rényi entropy](https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy). Other entropies, like e.g. the [Shannon entropy](https://en.wikipedia.org/wiki/Entropy_(information_theory) are generalized by it, since at the limit $\alpha \to 1$, the Rényi entropy becomes the Shannon entropy,

$$
H_1(p) = -\left(\sum_i p[i] \log (p[i]) \right)
$$

The Rényi entropy can be computed for a specific dataset, given $p$. But how does one get $p$?
1. $p$ represents the probability that a point of a dataset falls into a specific "bin". 
2. It is nothing more than the (normalized) histogram of the dataset!

In [None]:
using DynamicalSystems, Plots

Let's generate a dataset so that we can start calculating entropies.

In [None]:
N = 100000
randomdata = Dataset(rand(N,3))

---
The call signature we need is

```julia
genentropy(α, ε, dataset::AbstractDataset; base = e)
```
* This function calculates the generalized entropy of order `α`.
* It first calculates the probability array $p$.
* The "histogram" is created by partitioning the `dataset` into boxes of size `ε`.


In [None]:
for i in [0.1, 0.01, 0.001, 0.0001]
    println(genentropy(2, i, randomdata))
end

Note that the output of `genentropy` changed with changing $\varepsilon$ until we hit $\varepsilon = 0.001$. 

At this point the value for the entropy has already saturated. There's no use in partitioning the dataset in smaller boxes. Every bin already contains at most one point.

---


`genentropy` is conveniently used with outputs of e.g. `trajectory` or `poincaresos`, because they return a `Dataset`.

Here we create a trajectory for the towel map, a three dimensional chaotic discrete system.

In [None]:
towel = Systems.towel()

In [None]:
tr = trajectory(towel, N-1);
summary(tr)

In [None]:
points = Matrix(tr)
scatter(points[:,1], points[:,2], points[:,3], 
        markersize=0.3, alpha=0.2, markercolor=:black, 
        html_output_format=:png, size=(1000, 1000), leg=false, title="The towel attractor")

and calculate its entropy:

In [None]:
for i in [0.1, 0.01, 0.001, 0.0001]
    println(genentropy(2, i, tr))
end

Let's also compare the entropy of the above dataset (a trajectory of the towel map) with that of a random dataset:

In [None]:
genentropy(1, 0.01, randomdata)

* As expected, the entropy of the random dataset is higher.

---

How much time does the computation take?

In [None]:
using BenchmarkTools
@btime genentropy(1, 0.01, $tr);

# 4.B. Specialized histogram
* Partitioning the dataset (i.e. generating a "histogram") is in general a costly operation that depends exponentially on the number of dimensions.
* In this specific application however, we can tremendously reduce the memory allocation and time spent!

To get the array of probabilities `p` for size `ε` from the trajectory of the towel map we use the function `non0hist`

In [None]:
ε = 0.01
p = non0hist(ε, tr)

Here's a sanity check, showing our probabilities should sum to `1`.

In [None]:
sum(p)

How long does computing the probabilities take?

In [None]:
@btime non0hist($ε, $tr);

How long does this take if we create 9-dimensional data and compare again?

In [None]:
nine = Dataset(rand(N, 9))
@btime non0hist($ε, $nine);

`non0hist` uses a very specialized (to-be-published) algorithm and its time does not depend exponentially on the dimensionality of the dataset, instead only linearly. It also has a linearithmic complexity (`n log(n)`) on the number of points.

# 4.C. Generalized Dimension
1. There are numerous methods that one can use to calculate a so-called "dimension" of a
dataset, like for example the [Fractal dimension](https://en.wikipedia.org/wiki/Fractal_dimension).

2. Most of the time these dimensions indicate some kind of scaling behavior. 

3. For example, the scaling of `genentropy` with decreasing `ε` gives the so-called "generalized dimension".


$ E \approx -D\log(\varepsilon)$ with $E$ the entropy and $D$ the "dimension".

---
Let's find out the dimension of the attractor of the Towel Map!



In [None]:
towel = Systems.towel()
towel_tr = trajectory(towel, 1000000, Ttr = 100);
scatter(towel_tr[:, 1], towel_tr[:, 2],
        markersize=0.1, alpha=0.2, markercolor=:black, 
        html_output_format=:png, size=(1000, 1000), leg=false, title="The towel attractor")

*(Note that more points = more precision = more computations = more time!)*

Now we have to compute `genentropy` for different ε.

Which ε should we use...?

Let's do a "random" guess...

In [None]:
ες =  10.0 .^ range(-4, stop=1, length=12)

In [None]:
Es = zero(ες)
for (i, ε) ∈ enumerate(ες)
    Es[i] = genentropy(1, ε, towel_tr)
end
Es

**Shorter version (thanks broadcasting!)**

In [None]:
Es = genentropy.(1, ες, Ref(towel_tr))

*usage of `Ref` ensures broadcasting over `ες` but not over `towel_tr`, which is also iterable. In general, `Ref(x)` causes broadcasting to treat `x` as a scalar.*

Alright. Remember that it should be that $E \approx -D\log(\varepsilon)$
 with $E$ the entropy and $D$ the "dimension". 

Let's plot and see:

In [None]:
x = -log.(ες)
plot(x, Es, xlabel="-log\\(\\epsilon\\)", ylabel="Entropy", leg=false)
plot!([x[4], x[4]], [0, 15], color=:orange)
plot!([x[end-3], x[end-3]], [0, 15], color=:orange, size=(500, 300))

At the limit of very large ε, all points are in the same bin, and the entropy reaches the minimum of $0$. At the limit of very small ε, every point gets its own bin, and the entropy reaches the maximum of $\log(N)$ where $N$ is the number of points. In the middle, there is some linear region where this scaling behavior holds. 

Above, the expected scaling behavior holds between the orange vertical lines.

Let's choose the curve points that do fall in the linear regime of the above plot,

In [None]:
x, y = -log.(ες)[4:end-2], Es[4:end-2]

and find the slope of the curve there, to calculate the dimension, D.

In [None]:
using ChaosTools
offset, slope = ChaosTools.linreg(x, y)
D = slope

This is actually a correct result, the information dimension of the attractor of the towel map is around 2.

---

* Are the values of `ες` we used good? 
* For a general dataset, how can we determine them?

the function `estimate_boxsizes(dataset; kwargs...)` can help with that!

In [None]:
ες = estimate_boxsizes(towel_tr)

Let's plot $E$ vs. $-\log \epsilon$ again

In [None]:
Es = genentropy.(1, ες, Ref(towel_tr))
plot(-log.(ες), Es, xlabel="-log\\(\\epsilon\\)", ylabel="Entropy", leg=false)

---
# 4.D. Automated Dimension Estimation

Given some arbitrary plot like the one above, is there any algorithm to deduce a scaling region?

The function `linear_regions(x, y; kwargs...)` decomposes the function `y(x)` into regions where  the function is linear.

It returns the indices of `x` that correspond to linear regions and the approximated tangents at each region!

In [None]:
xs = -log.(ες)
lrs, slopes = linear_regions(xs, Es)

In [None]:
for i in 1:length(lrs)-1
    plot!(xs[lrs[i]:lrs[i+1]], Es[lrs[i]:lrs[i+1]])
end
plot!()

The linear region which is biggest is the "probably correct one". The function `linear_region` estimates its slope:

In [None]:
linear_region(xs, Es)[2]

## `generalized_dim` function

Let's summarize what we just did to estimate the dimension of an attractor.

1. We decided on some partition sizes `ες` to use (the function `estimate_boxsizes` can give an estimate for that).
2. For each `ε` in `ες` we calculated the entropy via `genentropy`. We stored these entropies in an array `Es`.
3. We tried to find a "linear scaling region" of the curve `Es` vs. `-log.(ες)`.
4. The slope of this "linear scaling region" is the dimension we estimated.

Wouldn't it be **cool** if all of this process could happen with one function call?

This is *exactly* what the following function does:
```julia
generalized_dim(α, dataset, ες = estimate_boxsizes(tr))
```
which computes the `α`-order generalized dimension.

In [None]:
generalized_dim(2.0, tr)

Similarly, let's calculate the dimension of the Henon map that we have seen in previous tutorials,

In [None]:
hen = Systems.henon()
tr = trajectory(hen, 200000)
generalized_dim(0, tr)

### `generalized_dim` is but a crude estimate!

**You must check and double-check and triple-check if you want more accuracy!**

## Confirming Takens Theorem

Recalling back from notebook 3, we discussed delay embeddings and how Takens theorem states that quantities like e.g. the attractor dimension remain the same between reconstructed and original systems.

We can now show this numerically. We start with a trajectory from the system we used in notebook 3

In [None]:
g = Systems.gissinger(ones(3))

In [None]:
dt = 0.05
data = trajectory(g, 20000.0, dt = dt, Ttr = 100.0)
summary(data)

We estimate a good delay time using the first minimum of the mutual information

In [None]:
τ = estimate_delay(data[:, 1], "mi_min")

And embed the timeseries in three dimensions

In [None]:
R = embed(data[:, 1], 3, τ)
summary(R)

Using `generalized_dim` we can now compare the dimension estimated for the original trajectory `data` and the reconstructed trajectory `R`

In [None]:
generalized_dim(1, data)

In [None]:
generalized_dim(1, R)