fix code blocks

dit · May 22, 2019 · 998f96a · 998f96a
1 parent 88736a2
commit 998f96a
Show file tree

Hide file tree

Showing 35 changed files with 137 additions and 137 deletions.
diff --git a/docs/distributions/npdist.rst b/docs/distributions/npdist.rst
@@ -7,7 +7,7 @@ Numpy-based Distribution
 The primary method of constructing a distribution is by supplying both the
 outcomes and the probability mass function:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit import Distribution
 
@@ -35,7 +35,7 @@ outcomes and the probability mass function:
 Another way to construct a distribution is by supplying a dictionary mapping
 outcomes to probabilities:
 
-.. ipython:: python
+.. ipython::
 
    In [6]: outcomes_probs = {'000': 1/4, '011': 1/4, '101': 1/4, '110': 1/4}
 
@@ -58,7 +58,7 @@ outcomes to probabilities:
 
 Yet a third method is via an ndarray:
 
-.. ipython:: python
+.. ipython::
 
     In [9]: pmf = [[0.5, 0.25], [0.25, 0]]
 
@@ -83,7 +83,7 @@ Yet a third method is via an ndarray:
 To verify that these two distributions are the same, we can use the
 `is_approx_equal` method:
 
-.. ipython:: python
+.. ipython::
 
    @doctest
    In [12]: xor.is_approx_equal(xor2)

diff --git a/docs/distributions/npscalardist.rst b/docs/distributions/npscalardist.rst
@@ -11,13 +11,13 @@ Playing with ScalarDistributions
 
 First we will enable two optional features: printing fractions by default, and using :func:`__str__` as :func:`__repr__`. Be careful using either of these options, they can incur significant performance hits on some distributions.
 
-.. ipython:: python
+.. ipython::
 
    In [1]: dit.ditParams['print.exact'] = dit.ditParams['repr.print'] = True
 
 We next construct a six-sided die:
 
-.. ipython:: python
+.. ipython::
 
    In [2]: from dit.example_dists import uniform
 
@@ -39,7 +39,7 @@ We next construct a six-sided die:
 
 We can perform standard mathematical operations with scalars, such as adding, subtracting from or by, multiplying, taking the modulo, or testing inequalities.
 
-.. ipython:: python
+.. ipython::
 
    @doctest
    In [5]: d6 + 3
@@ -113,7 +113,7 @@ We can perform standard mathematical operations with scalars, such as adding, su
 
 Furthermore, we can perform such operations with two distributions:
 
-.. ipython:: python
+.. ipython::
 
    @doctest
    In [11]: d6 + d6
@@ -173,7 +173,7 @@ Furthermore, we can perform such operations with two distributions:
 
 There are also statistical functions which can be applied to :class:`~dit.ScalarDistributions`:
 
-.. ipython:: python
+.. ipython::
 
    In [15]: from dit.algorithms.stats import *
 

diff --git a/docs/hypothesis.rst b/docs/hypothesis.rst
@@ -7,7 +7,7 @@ Finding Examples
 
 What if you'd like to find a distribution that has a particular property? For example, what if I'd like to find a distribution with a :ref:`coinformation` less that :math:`-0.5`? This is where Hypothesis comes in:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from hypothesis import find
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -31,7 +31,7 @@ For a quick tour, see the :ref:`Quickstart <quickstart>`. Otherwise, work
 your way through the various sections. Note that all code snippets in this
 documentation assume that the following lines of code have already been run:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from __future__ import division # true division for Python 2.7
 

diff --git a/docs/measures/divergences/copy_mutual_information.rst b/docs/measures/divergences/copy_mutual_information.rst
@@ -15,7 +15,7 @@ The copy mutual information :cite:`kolchinsky2019decomposing` is a measure captu
 
 Consider the binary symmetric channel. With probabilities :math:`\leq \frac{1}{2}`, the input (:math:`X`) is largely copied to the output (:math:`Y`); while when the probabilities :math:`\geq \frac{1}{2}`, the output is largely opposite the input. We therefore expect the mutual information to be "copy-like" for :math:`0 \leq p \leq \frac{1}{2}`, while the mutual information should be not "copy-like" for :math:`\frac{1}{2} \leq p \leq 1`:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.divergences import copy_mutual_information as Icopy
 

diff --git a/docs/measures/divergences/cross_entropy.rst b/docs/measures/divergences/cross_entropy.rst
@@ -18,7 +18,7 @@ the cross entropy of a distribution with itself is the entropy of that
 distribion because the entropy quantifies the average cost of representing a
 distribution:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.divergences import cross_entropy
 
@@ -31,7 +31,7 @@ distribution:
 If, however, we attempted to model a fair coin with a biased on, we could
 compute this mis-match with the cross entropy:
 
-.. ipython:: python
+.. ipython::
 
    In [4]: q = dit.Distribution(['0', '1'], [3/4, 1/4])
 
@@ -43,7 +43,7 @@ Meaning, we will on average use about :math:`1.2` bits to represent the flips of
 a fair coin. Turning things around, what if we had a biased coin that we
 attempted to represent with a fair coin:
 
-.. ipython:: python
+.. ipython::
 
    @doctest float
    In [6]: cross_entropy(q, p)

diff --git a/docs/measures/divergences/earth_movers_distance.rst b/docs/measures/divergences/earth_movers_distance.rst
@@ -8,7 +8,7 @@ The Earth mover's distance is a distance measure between probability distributio
 
 For categorical data, the "distance" between unequal symbols is unitary. In this case, :math:`1/6` of the probability in symbol '0' needs to be moved to '1', and :math:`1/6` needs to be moved to '2', for a total of :math:`1/3`:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.divergences import earth_movers_distance
 
@@ -22,7 +22,7 @@ For categorical data, the "distance" between unequal symbols is unitary. In this
 
  For numerical data, "distance" defaults to the difference between the symbols. In this case, :math:`1/6` of the probability in symbol '0' needs to be moved to '1' (a distance of 1), and :math:`1/6` needs to be moved to '2' (a distance of 2), for a total of :math:`1/2`:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.divergences import earth_movers_distance
 

diff --git a/docs/measures/divergences/jensen_shannon_divergence.rst b/docs/measures/divergences/jensen_shannon_divergence.rst
@@ -17,7 +17,7 @@ That is, it is the entropy of the mixture minus the mixture of the entropy. This
 
    \JSD{X_{0:n}} = \H{\sum w_i X_i} - \sum \left( w_i \H{X_i} \right)
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.divergences import jensen_shannon_divergence
 

diff --git a/docs/measures/divergences/kullback_leibler_divergence.rst b/docs/measures/divergences/kullback_leibler_divergence.rst
@@ -25,7 +25,7 @@ using :math:`q`, and the :doc:`../multivariate/entropy` quantifies the true,
 minimum cost of encoding :math:`p`. For example, let's consider the cost of
 representing a biased coin by a fair one:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.divergences import kullback_leibler_divergence
 

diff --git a/docs/measures/multivariate/caekl_mutual_information.rst b/docs/measures/multivariate/caekl_mutual_information.rst
@@ -15,7 +15,7 @@ The Chan-AlBashabsheh-Ebrahimi-Kaced-Liu mutual information :cite:`chan2015multi
 
 for some non-trivial partition :math:`\mathcal{P}` of :math:`\left\{0:n\right\}`. For example, the CAEKL mutual information for the ``xor`` distribution is :math:`\frac{1}{2}`, because the joint entropy is 2 bits, each of the three marginals is 1 bit, and :math:`2 - \frac{1}{2} = 3 (1 - \frac{1}{2})`.
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.multivariate import caekl_mutual_information as J
 

diff --git a/docs/measures/multivariate/coinformation.rst b/docs/measures/multivariate/coinformation.rst
@@ -14,7 +14,7 @@ The co-information :cite:`Bell2003` is one generalization of the :ref:`mutual_in
 
 It is clear that the co-information measures the "center-most" atom of the diagram only, which is the only atom to which every variable contributes. To exemplifying this, consider "giant bit" distributions:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit import Distribution as D
 
@@ -26,7 +26,7 @@ It is clear that the co-information measures the "center-most" atom of the diagr
 
 This verifies intuition that the entire one bit of the distribution's entropy is condensed in a single atom. One notable property of the co-information is that for :math:`n \geq 3` it can be negative. For example:
 
-.. ipython:: python
+.. ipython::
 
    In [4]: from dit.example_dists import Xor
 
@@ -38,7 +38,7 @@ This verifies intuition that the entire one bit of the distribution's entropy is
 
 Based on these two examples one might get the impression that the co-information is positive for "redundant" distributions and negative for "synergistic" distributions. This however is not true --- consider the four-variable parity distribution:
 
-.. ipython:: python
+.. ipython::
 
    In [7]: from dit.example_dists import n_mod_m
 

diff --git a/docs/measures/multivariate/deweese.rst b/docs/measures/multivariate/deweese.rst
@@ -7,7 +7,7 @@ DeWeese-like Measures
 
 Mike DeWeese has introduced a family of multivariate information measures based on a multivariate extension of the data processing inequality. The general idea is the following: local modification of a single variable can not increase the amount of correlation or dependence it has with the other variables. Consider, however, the triadic distribution:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.example_dists import dyadic, triadic
 
@@ -32,7 +32,7 @@ Mike DeWeese has introduced a family of multivariate information measures based
 
 This particular distribution has zero :ref:`coinformation`:
 
-.. ipython:: python
+.. ipython::
 
    In [3]: from dit.multivariate import coinformation
 
@@ -46,7 +46,7 @@ Yet the distribution is a product of a giant bit (coinformation :math:`1.0`) and
 
    \ID{X_0 : \ldots : X_n} = \max_{p(x'_i | x_i)} \I{X'_0 : \ldots : X'_n}
 
-.. ipython:: python
+.. ipython::
 
    In [5]: from dit.multivariate import deweese_coinformation
 

diff --git a/docs/measures/multivariate/dual_total_correlation.rst b/docs/measures/multivariate/dual_total_correlation.rst
@@ -14,7 +14,7 @@ The dual total correlation :cite:`Han1975linear`, or binding information :cite:`
 
 In a sense the binding information captures the same information that the :doc:`total_correlation` does, in that both measures are zero or non-zero together. However, the two measures take on very different quantitative values for different distributions. By way of example, the type of distribution that maximizes the total correlation is a "giant bit":
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.multivariate import binding_information, total_correlation
 
@@ -30,7 +30,7 @@ In a sense the binding information captures the same information that the :doc:`
 
 For the same distribution, the dual total correlation takes on a relatively low value. On the other hand, the type of distribution that maximizes the dual total correlation is a "parity" distribution:
 
-.. ipython:: python
+.. ipython::
 
    In [5]: from dit.example_dists import n_mod_m
 

diff --git a/docs/measures/multivariate/entropy.rst b/docs/measures/multivariate/entropy.rst
@@ -13,13 +13,13 @@ The entropy measures the total amount of information contained in a set of rando
 
 Let's consider two coins that are interdependent: the first coin fips fairly, and if the first comes up heads, the other is fair, but if the first comes up tails the other is certainly tails:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: d = dit.Distribution(['HH', 'HT', 'TT'], [1/4, 1/4, 1/2])
 
 We would expect that entropy of the second coin conditioned on the first coin would be :math:`0.5` bits, and sure enough that is what we find:
 
-.. ipython:: python
+.. ipython::
 
    In [2]: from dit.multivariate import entropy
 
@@ -29,15 +29,15 @@ We would expect that entropy of the second coin conditioned on the first coin wo
 
 And since the first coin is fair, we would expect it to have an entropy of :math:`1` bit:
 
-.. ipython:: python
+.. ipython::
 
    @doctest float
    In [3]: entropy(d, [0])
    Out[3]: 1.0
 
 Taken together, we would then expect the joint entropy to be :math:`1.5` bits:
 
-.. ipython:: python
+.. ipython::
 
    @doctest float
    In [4]: entropy(d)

diff --git a/docs/measures/multivariate/exact_common_information.rst b/docs/measures/multivariate/exact_common_information.rst
@@ -16,7 +16,7 @@ Subadditivity of Independent Variables
 
 Kumar **et. al.** :cite:`kumar2014exact` have shown that the exact common information of a pair of independent pairs of variables can be less than the sum of their individual exact common informations. Here we verify this claim:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.multivariate import exact_common_information as G
 

diff --git a/docs/measures/multivariate/gk_common_information.rst b/docs/measures/multivariate/gk_common_information.rst
@@ -26,7 +26,7 @@ Consider a joint distribution over :math:`X_0` and :math:`X_1`. Given any partic
 
 As a canonical example, consider the following:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit import Distribution as D
 
@@ -47,7 +47,7 @@ As a canonical example, consider the following:
 
 So, the Gács-Körner common information is 1.5 bits. But what is the common random variable?
 
-.. ipython:: python
+.. ipython::
 
    In [7]: from dit.algorithms import insert_meet
 
@@ -108,7 +108,7 @@ Which can be visualized as this:
 
 This quantity can be computed easily using dit:
 
-.. ipython:: python
+.. ipython::
 
    In [10]: from dit.example_dists import RdnXor
 
@@ -149,7 +149,7 @@ The multivariate common information follows a similar inequality as the two vari
 
 It is interesting to note that the Gács-Körner common information can be non-zero even when the :ref:`coinformation` is negative:
 
-.. ipython:: python
+.. ipython::
 
    In [16]: from dit.example_dists.miscellaneous import gk_pos_i_neg
 

diff --git a/docs/measures/multivariate/residual_entropy.rst b/docs/measures/multivariate/residual_entropy.rst
@@ -16,7 +16,7 @@ The residual entropy was originally proposed in :cite:`Verdu2008` to quantify th
 
 If a joint distribution consists of independent random variables, the residual entropy is equal to the :doc:`entropy`:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit.multivariate import entropy, residual_entropy
 
@@ -28,7 +28,7 @@ If a joint distribution consists of independent random variables, the residual e
 
 Another simple example is a distribution where one random variable is independent of the others:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: d = dit.uniform(['000', '001', '110', '111'])
 
@@ -38,7 +38,7 @@ Another simple example is a distribution where one random variable is independen
 
 If we ask for the residual entropy of only the latter two random variables, the middle one is now independent of the others and so the residual entropy grows:
 
-.. ipython:: python
+.. ipython::
 
    @doctest float
    In [4]: residual_entropy(d, [[1], [2]])

diff --git a/docs/measures/multivariate/total_correlation.rst b/docs/measures/multivariate/total_correlation.rst
@@ -14,7 +14,7 @@ The total correlation :cite:`watanabe1960information`, denoted :math:`\T{}`, als
 
 Two nice features of the total correlation are that it is non-negative and that it is zero if and only if the random variables :math:`X_{0:n}` are all independent. Some baseline behavior is good to note also. First its behavior when applied to "giant bit" distributions:
 
-.. ipython:: python
+.. ipython::
 
    In [1]: from dit import Distribution as D
 
@@ -26,7 +26,7 @@ Two nice features of the total correlation are that it is non-negative and that
 
 So we see that for giant bit distributions, the total correlation is equal to one less than the number of variables. The second type of distribution to consider is general parity distributions:
 
-.. ipython:: python
+.. ipython::
 
    In [4]: from dit.example_dists import n_mod_m
 
@@ -46,7 +46,7 @@ The total correlation follows a nice decomposition rule. Given two sets of (not
 
    \T{A \cup B} = \T{A} + \T{B} + \I{A : B}
 
-.. ipython:: python
+.. ipython::
 
    In [18]: from dit.multivariate import coinformation as I
 

diff --git a/docs/measures/multivariate/tse_complexity.rst b/docs/measures/multivariate/tse_complexity.rst
@@ -13,7 +13,7 @@ The Tononi-Sporns-Edelmans (TSE) complexity :cite:`Tononi1994` is a complexity m
 
 Two distributions which might be considered tightly coupled are the "giant bit" and the "parity" distributions:
 
-.. ipython:: python
+.. ipython::
 
    In [54]: from dit.multivariate import tse_complexity