## Markov processes
Stochastic process $x_t$, say scalar: for each $t$ $x_t$ is a random variable.

Markov if
$$
F(x_t|x_{t-1}) = F(x_t|x_{t-1}, ...)
$$
(generally)
order $p$ Markov
$$
x_t \sim F(.|x_{t-1}, x_{t-2}, ..., x_{t-p})
$$
every Markov process can be order 1

"Nothing helps predict the future more like the present"

$\mathbf x_t$ may be vector $\to$ enough to use
$$
\tag{*}
F(\mathbf x_t|\mathbf x_{t-1})
$$

### Examples
#### First-order DE
$$
\mathbf x_t = \mathbf A \mathbf x_{t-1}
$$
$$
\Pr(x_t \le Z|x_{t-1}=x) = 
\begin{cases}
0 & Z<Ax\\
1 & Z \ge Ax
\end{cases}
$$
is Markov

#### Markov chain
$x_t$ is **categorical**, $x_t \in \{1, 2, ..., K\}$

> "distribution" of x is **probability mass function**
>$$
\pi_k \equiv  \Pr(x_t = k)
$$

Markovness means
$$
\Pr(x_t = k|x_{t-1} = i) = \pi_{ik}
$$
then (*) holds

Collect these values into a **transition matrix**
$$
\mathbf P = [\pi_{ik}]
$$
For ex,
$$
\begin{bmatrix}
0.7 & 0.3 \\
0.1 & 0.9
\end{bmatrix}
$$

> ### Properties
> $K\times K$ square
> $$
0 \le \mathbf P \le 1
$$
> $$
\mathbf P \mathbf 1 = \mathbf 1
$$
> (rows sum up to 1)

In [1]:
using Distributions

In [3]:
?Distributions

search: [0m[1mD[22m[0m[1mi[22m[0m[1ms[22m[0m[1mt[22m[0m[1mr[22m[0m[1mi[22m[0m[1mb[22m[0m[1mu[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m[0m[1ms[22m [0m[1mD[22m[0m[1mi[22m[0m[1ms[22m[0m[1mt[22m[0m[1mr[22m[0m[1mi[22m[0m[1mb[22m[0m[1mu[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m Matrix[0m[1mD[22m[0m[1mi[22m[0m[1ms[22m[0m[1mt[22m[0m[1mr[22m[0m[1mi[22m[0m[1mb[22m[0m[1mu[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m NonMatrix[0m[1mD[22m[0m[1mi[22m[0m[1ms[22m[0m[1mt[22m[0m[1mr[22m[0m[1mi[22m[0m[1mb[22m[0m[1mu[22m[0m[1mt[22m[0m[1mi[22m[0m[1mo[22m[0m[1mn[22m



A Julia package for probability distributions and associated functions.

API overview (major features):

  * `d = Dist(parameters...)` creates a distribution instance `d` for some distribution `Dist` (see choices below) with the specified `parameters`
  * `rand(d, sz)` samples from the distribution
  * `pdf(d, x)` and `logpdf(d, x)` compute the probability density or log-probability density of `d` at `x`
  * `cdf(d, x)` and `ccdf(d, x)` compute the (complementary) cumulative distribution function at `x`
  * `quantile(d, p)` is the inverse `cdf` (see also `cquantile`)
  * `mean(d)`, `var(d)`, `std(d)`, `skewness(d)`, `kurtosis(d)` compute moments of `d`
  * `fit(Dist, xs)` generates a distribution of type `Dist` that best fits the samples in `xs`

These represent just a few of the operations supported by this package; users are encouraged to refer to the full documentation at https://JuliaStats.github.io/Distributions.jl/stable/ for further information.

Supported distributions:

```
Arcsine, Bernoulli, Beta, BetaBinomial, BetaPrime, Binomial, Biweight,
Categorical, Cauchy, Chi, Chisq, Cosine, DiagNormal, DiagNormalCanon,
Dirichlet, DiscreteUniform, DoubleExponential, EdgeworthMean,
EdgeworthSum, EdgeworthZ, Erlang,
Epanechnikov, Exponential, FDist, FisherNoncentralHypergeometric,
Frechet, FullNormal, FullNormalCanon, Gamma, GeneralizedPareto,
GeneralizedExtremeValue, Geometric, Gumbel, Hypergeometric,
InverseWishart, InverseGamma, InverseGaussian, IsoNormal,
IsoNormalCanon, Kolmogorov, KSDist, KSOneSided, Laplace, Levy, LKJ,
Logistic, LogNormal, MatrixBeta, MatrixFDist, MatrixNormal,
MatrixReshaped, MatrixTDist, MixtureModel, Multinomial,
MultivariateNormal, MvLogNormal, MvNormal, MvNormalCanon,
MvNormalKnownCov, MvTDist, NegativeBinomial, NoncentralBeta, NoncentralChisq,
NoncentralF, NoncentralHypergeometric, NoncentralT, Normal, NormalCanon,
NormalInverseGaussian, Pareto, PGeneralizedGaussian, Poisson, PoissonBinomial,
QQPair, Rayleigh, Skellam, Soliton, StudentizedRange, SymTriangularDist, TDist, TriangularDist,
Triweight, Truncated, TruncatedNormal, Uniform, UnivariateGMM,
VonMises, VonMisesFisher, WalleniusNoncentralHypergeometric, Weibull,
Wishart, ZeroMeanIsoNormal, ZeroMeanIsoNormalCanon,
ZeroMeanDiagNormal, ZeroMeanDiagNormalCanon, ZeroMeanFullNormal,
ZeroMeanFullNormalCanon
```


In [4]:
?Categorical

search: [0m[1mC[22m[0m[1ma[22m[0m[1mt[22m[0m[1me[22m[0m[1mg[22m[0m[1mo[22m[0m[1mr[22m[0m[1mi[22m[0m[1mc[22m[0m[1ma[22m[0m[1ml[22m n[0m[1mc[22m[0m[1ma[22m[0m[1mt[22m[0m[1me[22m[0m[1mg[22m[0m[1mo[22m[0m[1mr[22m[0m[1mi[22mes



```
Categorical(p)
```

A *Categorical distribution* is parameterized by a probability vector `p` (of length `K`).

$$
P(X = k) = p[k]  \quad \text{for } k = 1, 2, \ldots, K.
$$

```julia
Categorical(p)   # Categorical distribution with probability vector p
params(d)        # Get the parameters, i.e. (p,)
probs(d)         # Get the probability vector, i.e. p
ncategories(d)   # Get the number of categories, i.e. K
```

Here, `p` must be a real vector, of which all components are nonnegative and sum to one.

**Note:** The input vector `p` is directly used as a field of the constructed distribution, without being copied.

`Categorical` is simply a type alias describing a special case of a `DiscreteNonParametric` distribution, so non-specialized methods defined for `DiscreteNonParametric` apply to `Categorical` as well.

External links:

  * [Categorical distribution on Wikipedia](http://en.wikipedia.org/wiki/Categorical_distribution)


In [7]:
states_of_world = Categorical([0.7, 0.2, 0.1])

Categorical{Float64, Vector{Float64}}(support=Base.OneTo(3), p=[0.7, 0.2, 0.1])

In [8]:
typeof(states_of_world)

Categorical{Float64, Vector{Float64}} (alias for DiscreteNonParametric{Int64, Float64, Base.OneTo{Int64}, Array{Float64, 1}})

In [10]:
probs(states_of_world)

3-element Vector{Float64}:
 0.7
 0.2
 0.1

#### Most readable

In [11]:
mean(states_of_world)

1.4000000000000001

#### Least readable (vectorized Matlab style)

In [12]:
probs(states_of_world)' * [1,2,3] 

1.4000000000000001

#### Well readable

In [14]:
sum(probs(states_of_world) .* [1, 2, 3])

1.4000000000000001

In [15]:
rand(states_of_world, 5)

5-element Vector{Int64}:
 1
 3
 1
 2
 1

## Types, composite types and "type inheritance"

In [40]:
struct LabeledCategorical
    p::Vector{Float64}
    labels::Vector
end

LoadError: invalid redefinition of type LabeledCategorical

In [18]:
?LabeledCategorical

search: [0m[1mL[22m[0m[1ma[22m[0m[1mb[22m[0m[1me[22m[0m[1ml[22m[0m[1me[22m[0m[1md[22m[0m[1mC[22m[0m[1ma[22m[0m[1mt[22m[0m[1me[22m[0m[1mg[22m[0m[1mo[22m[0m[1mr[22m[0m[1mi[22m[0m[1mc[22m[0m[1ma[22m[0m[1ml[22m



No documentation found.

# Summary

```
struct LabeledCategorical <: Any
```

# Fields

```
p      :: Vector{Float64}
labels :: Vector{T} where T
```


**Note**: Has no parent type (only `Any`). `labels` can be a vector of anything.

In [20]:
z = LabeledCategorical([0.7, 0.3], ["employed", "unemployed"])

LabeledCategorical([0.7, 0.3], ["employed", "unemployed"])

In [21]:
z

LabeledCategorical([0.7, 0.3], ["employed", "unemployed"])

In [22]:
typeof(z)

LabeledCategorical

In [23]:
z.p

2-element Vector{Float64}:
 0.7
 0.3

In [24]:
z.labels

2-element Vector{String}:
 "employed"
 "unemployed"

Let's overwrite the `rand` function for labeled categorical variables. Note that we can still refer to `rand` of a `Categorical` variable. This is an example of **multiple dispatch**.

In [58]:
import Base: rand
function rand(x::LabeledCategorical, sz::Int)::Vector
    categorical_pmf = Categorical(x.p)
    output = []
    for i = 1:sz
        j = rand(categorical_pmf, 1)
        append!(output, x.labels[j])
    end
    return output
end

rand (generic function with 170 methods)

**Notes**: We create an empty vector in line 4. This is not efficient for large problems with known data types. In that case we want to preassign memory like `output = zeros(sz)`. The function `append!` appends the vector by adding elements at the end. The (strongly held) convention is that if your function modifies its inputs, (1) the modified input should be the first one (2) the function name should end in `!`. This makes for intuitively readable code like `sort!(example_vector)`. 

In [59]:
rand(z, 5)

5-element Vector{Any}:
 "unemployed"
 "employed"
 "employed"
 "unemployed"
 "employed"

### Toxic code 
```
for i = 1:m
    x = y[i]^k
end
```

1. Not readable at all even a couple of days later. What is `i`? What is `x`? What is `k`? Why are we doing all this?
2. Nobody is every going to touch this again. It's toxic.

### Readable code
```
function simulate_trajectory(states_of_world::LabeledCategorical, number_periods::Int)
    return rand(states_of_world, number_periods)
end
```
Good variable names. Use high-level language features like composite types. Reuse existing functions like `rand` to
1. ease mental burden
2. make code explicitly reusable, for example, after we decided to model states of the world with a continuous rather than discrete distribution.

In [62]:
struct StringCategorical <: LabeledCategorical
    p::Vector{Float64}
    labels::Vector{String}
end

LoadError: invalid subtyping in definition of StringCategorical

This has to be done differently, as we will discuss in Class 05.