# danhammer/info-theory

No description, website, or topics provided.
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc
src/info_theory
test/info_theory
.gitignore
project.clj

# info-theory

The purpose of this project is to build up a library to illustrate ideas from conversations with George Judge.

## Examining entropy from a de-meaned series

The first application is to compare the distribution of the dynamic sequences between a raw time series and the demeaned time series. We generate the time series using the following function:

```(defn mean-dgp
[T]
(let [e (s/sample-normal T)]
(map (partial + 5) e)))```

This will return a series of length `T`, mean 5, and error distributed standard normal. We then apply the `permutation-count` function to return a hash-map of the permutation sequences and their frequency. We do the same for the de-meaned version of the series. The following image displays the count histogram for each of the sequences for a series of length T=400, of which there are 24 for D = 4. The sequences are arbitrarily ordered.

The function to generate the time series and test for differences in the empirical distribution functions of the permutation counts is below.

```(defn retrieve-diff
"accepts the length of the permutation series, and the errors from
the reference and new series; returns the the K-S test statistic
associated with the comparison of the permutation entropy
distributions associated with a time series of length T and the
supplied D length."
[D e-ref e-new]
{:pre [= (count e-ref) (count e-new)]}
(let [m-ref (permutation-count D e-ref)
m-new (permutation-count D e-new)]
(apply ks-stat
(map empirical-dist (key-counts m-new m-ref)))))

(defn demean-illustration
"compares the residuals from a series and the demeaned series"
[D T]
(let [y (mean-dgp T)]
(retrieve-diff D y (demean y))))```

For this application, the Kolomogorov-Smirnov test statistic is always 0, since the demeaning only shifts the time series, and does not change the sequencing of relative values. This can be seen in the following line graphs.

## Linear model

Consider, now, a random variable generated by the linear model, which amounts to the linear combination of a constant, a single covariate `x`, and a random variable distributed standard normal.

```(defn linear-dgp [T x]
(let [e (s/sample-normal T)]
(map (partial + 5) x e)))```

Similar to the previous example, we can collect the Kolomogorov-Smirnov test, using the raw time series and the linear residuals as the base distributions.

```(defn linear-illustration
"compares the linear DGP with the residuals from a linear model"
[D T]
(let [x (s/sample-normal T :mean 3)
y (linear-dgp T x)]
(retrieve-diff D y (linear-residuals y x))))```

We can run the same Kolomogorov-Smirnov repeatedly, collecting the test statistic and plotting the histogram for each run. The test statistic is distributed Kolomogorov, and well within the bounds of standard variation. We cannot reject the hypothesis that the two distibutions are the same. There is no dynamic pattern in the unexplained variation that cannot be explained by the linear regression - which makes sense, since we constructed it to be so.

```(defn hist-diffstat
"returns a histogram of the ks-stat for a MC-simulation, iterated B
times."
[B f D T]
(let [dgp-fn (fn [x] (f D T))]
(i/view (c/histogram (pmap dgp-fn (range B))
:nbins 20))))```

## Instrumental Variables

Now, we use the data generating process that is described in my section notes for the Berkeley applied econometrics sequence. We can get the same histograms, as above, for the IV model and the standard linear model, which has biased estimates. We see that the permutation entropy approach does not differentiate between the two models.

LINEAR MODEL