Analyze categorical attributes 

## 3.1 Univariate Analysis 
* We assume that the domain of the random variable, the variables that are included in the random variable, $dom(X) = {a_1, a_2, ..., a_m}$ <br/> 
$D = \begin{bmatrix}X \\ x_1\\x_2\\...\\x_n\end{bmatrix}$ <br/>

### 3.1.1 Bernoulli Variable 
* Consider the case where a categorical variable has domain of two values ${a_1, a_2}$ <br/> 
* We can model the random variable as a Bernoulli random variable where it takes two values and maps them to 1 and 0. They don't necessaarily mean that the 1 is a success or 0 is a failure, they are just values.: <br/> 
$ X(v) = \begin{cases} 1, & \text{if v =}a_1 \\ 0, & \text{if v =}a_2 \end{cases}$ <br/> 
* Thus the pmf of X is given as $P(X=x) = f(x) = \begin{cases} p_1, & \text{if x = 1} \\ p_0, & \text{if x = 0} \end{cases}$ <br/> 
where the sum of the two probabilities should be 1. <br/> 
* A function to get the probaiblity can be written as $P(X=x) = f(x) = p^x(1-p)^{1-x}$

#### Mean and Variance
$\mu = E[X] = 1 * p + 0 * (1-p) = p$ 
$\sigma^2 = var)x_ = E[X^2] - (E[X])^2 = p(1-p)$

#### Sample Mean and Variance 
Sample is pretty much the same as the mean and variance calculations 

#### Binomial Distribution: Number of Occurrences
* With a random sample size of size n, N the random variable denotes the number of occurrences of the categorical variable that correspond to the value of 1. The number of times that you can select the categorical variable from the n number of trials follows a binomial distribution. 

### 3.1.2 Multivariate Bernoulli Variable 
* What if the random variable's domain is ${a_1, a_2, ..., a_m}$, now multiple variables are in the domain of the random variable. To represent this, we use standard basis vectors where the corresponding value $a_i$ is marked as a 1 on the $i^{th}$ position such that it looks like $e_i = (0, ..., 0, 1, 0, ..., 0)$ where the 1 is in the $i^{th}$ position. 
$(a_1 = \text{VeryShort}, a_2 = \text{Short}, a_3 = \text{Long}, a_4 = \text{VeryLong})$
* Look at page 76 to see the basis vectors for the multivariate vernoulli variable. 
#### Mean
The mean can be obtained as the population probability that the $i^{th}$ random variable will occur. <br/>
$\mu = E[X] = \begin{bmatrix} 1 \\0 \\ ... \\ 0 \end{bmatrix} p_1 + ... + \begin{bmatrix}0 \\ 0 \\ ... \\ 1 \end{bmatrix} p_m = \begin{bmatrix} p_1 \\ p_2 \\ ... \\ p_m \end{bmatrix} = p$
* Look at page 77 to see what the mean loosks like.

#### Covariance Matrix
$\sigma_i^2 = var(A_i) = p_i (1 - p_i) $ <br/>
$\sigma_{ij} = E[A_iA_j] - E[A_i] * E[A_j] = 0 - p_ip_j = -p_ip_j$ <br/>
$\sum = \begin{bmatrix} \sigma_1^2, \sigma_{12}, ... , \sigma_{1m} \\ \sigma_{12}, \sigma_2^2, ..., \sigma_{2m} \\ ..., ..., ..., ... \\ \sigma_{1m}, \sigma_{2m}, ..., \sigma_m^2 \end{bmatrix} = \begin{bmatrix} p_1(1-p_1), -p_1p_2, ..., -p_1p_m \\ -p_1p_2, p_2(1-p_2), ..., -p_2p_m \\ ..., ..., ..., ... \\ -p_1p_m, -p_2p_m, ..., p_m(1-p_m) \end{bmatrix} $ <br/> 
* We can write out the covariance matrix as $\sum = P - p * p^T$


In [12]:
import pandas as pd 
import numpy as np
P = np.diag((1,1)) * [0.4, 0.6]

In [13]:
p = np.array([0.4, 0.6])[:, np.newaxis]

In [23]:
P - p.dot(p.transpose())

array([[ 0.24, -0.24],
       [-0.24,  0.24]])