# Practice Problems 1 - Linear Algebra and probability
### Brayan José Calderón Amorocho

1. Do the tutorial “Kaggle Python Tutorial on Machine Learning” (https://www.datacamp.com/courses/kaggle-python-tutorial-on-machine-learning).

![](certificate.PNG)



2. Let $D = (d_1, . . . , d_n)$ be a set of documents and $T = (t_1, . . . , t_m)$ a set of terms (words). Let $TD = (TD_{i,j})_{i=1...m,j=1...n}$ be a matrix such that $TD_{i,j}$ corresponds to the number of times the term $t_i$ appears in the document $d_j$ . Also, let $l_i$ be the length, number of characters,of term $t_i$ , and let $L = (l_1, . . . , l_m)$ be a column vector. Finally, assume a process where a document $d_j$ is randomly chosen with uniform probability and then a term $t_i$ , present in $d_j$ , is randomly chosen with a probability proportional to the frequency of $t_i$ in $d_j$ .


For all the following expressions you must provide:
- a mathematical expression to calculate it that includes $T D, L$, constants (scalars, vectors or matrices) and linear algebra operations
- a expression in Numpy (http://www.scipy.org) that, when evaluated, generates the requested matrix, vector or scalar (the expression must be a linear algebra expression that does not involve control structures such as for, while etc.)
- the result of evaluating the expression, assuming:
$$ TD = \left( \begin{matrix} 2 & 3 & 0 & 3 & 7 \\ 0 & 5 & 5 & 0 & 3 \\ 5 & 0 & 7 & 3 & 3 \\ 3 & 1 & 0 & 9 & 9 \\ 0 & 0 & 7 & 1 & 3 \\ 6 & 9 & 4 & 6 & 0 \end{matrix} \right) \hspace{1cm} A = \left( \begin{matrix} 5\\
2\\
3\\
6\\
4\\
3 \end{matrix} \right)$$


In [1]:
import numpy as np
TD = np.array([[2,3,0,3,7],
               [0,5,5,0,3],
               [5,0,7,3,3],
               [3,1,0,9,9],
               [0,0,7,1,3],
               [6,9,4,6,0]])
L = np.array([5,2,3,6,4,3]).T
print(TD)
print(L)
print(L.shape)

[[2 3 0 3 7]
 [0 5 5 0 3]
 [5 0 7 3 3]
 [3 1 0 9 9]
 [0 0 7 1 3]
 [6 9 4 6 0]]
[5 2 3 6 4 3]
(6,)


> (a) Matrix P(T,D) (Each position of the matrix $P(T,D)_{i,j}$ corresponds to the joint probability of term $t_i$ and document $d_j$, $P(t_i,d_j)$)

We define the joint probability as the probality of two events ocurring at the same time. That means we want $P(T \cap D)$ that is:
$$ P(T \cap D) = P(T|D)P(D)$$ 

Also, we know $P(D)$ is randomly chosen with uniform probability, and $P(T|D)$ will be the probabilities values in that subset of the document choosen

$$P(T_i|D_i) = \frac{TD_{i,j}}{\sum(TD_j)}$$





In [2]:
n_documents = TD.shape[1]
total_words_per_document = np.sum(TD, axis = 0)
PTD = ((TD/total_words_per_document))*(1/n_documents)
PTD

array([[0.025     , 0.03333333, 0.        , 0.02727273, 0.056     ],
       [0.        , 0.05555556, 0.04347826, 0.        , 0.024     ],
       [0.0625    , 0.        , 0.06086957, 0.02727273, 0.024     ],
       [0.0375    , 0.01111111, 0.        , 0.08181818, 0.072     ],
       [0.        , 0.        , 0.06086957, 0.00909091, 0.024     ],
       [0.075     , 0.1       , 0.03478261, 0.05454545, 0.        ]])

> (b) Matrix $P(T|D)$

It was explained in the previous block and taking into account both events are independent $P(T|D)$ is $P(T)$

In [5]:
PTonD = (TD/total_words_per_document)
PTonD

array([[0.125     , 0.16666667, 0.        , 0.13636364, 0.28      ],
       [0.        , 0.27777778, 0.2173913 , 0.        , 0.12      ],
       [0.3125    , 0.        , 0.30434783, 0.13636364, 0.12      ],
       [0.1875    , 0.05555556, 0.        , 0.40909091, 0.36      ],
       [0.        , 0.        , 0.30434783, 0.04545455, 0.12      ],
       [0.375     , 0.5       , 0.17391304, 0.27272727, 0.        ]])

> (c) Matrix $P(D|T)$

We can obtain this probability matrix using Baye´s theorem, as follows:
$$ P(D|T) = \frac{P(T|D)P(D)}{P(T)}$$

In [4]:
#Probability of D given a term
Pd = 1/n_documents
Pt = PTD #Joint probabilties
PT = np.sum(Pt, axis = 1, keepdims = True) #Marginal probabilities of T
pDonT = (PTonD*Pd)/PT
pDonT



array([[0.17654612, 0.23539482, 0.        , 0.19259576, 0.3954633 ],
       [0.        , 0.45154704, 0.35338464, 0.        , 0.19506832],
       [0.35787437, 0.        , 0.34853851, 0.15616336, 0.13742376],
       [0.18524987, 0.05488885, 0.        , 0.40418153, 0.35567975],
       [0.        , 0.        , 0.64782097, 0.09675248, 0.25542655],
       [0.28373832, 0.37831776, 0.13158879, 0.20635514, 0.        ]])

> (d) Vector $P(D)$

In [50]:
PD = np.ones((n_documents,1))*(1/n_documents)
PD

array([[0.2],
       [0.2],
       [0.2],
       [0.2],
       [0.2]])

> (e) vector $P(T)$
Is the marginal probability of T, so we can obtain it, as follows:
$$
P(X = x_i) = \sum_j{P(X=x_i,Y=y_j)}
$$


In [54]:
PT = (TD/np.sum(TD)).sum(axis = 1, keepdims = True)
PT

array([[0.14423077],
       [0.125     ],
       [0.17307692],
       [0.21153846],
       [0.10576923],
       [0.24038462]])

> (f) $E[l]$ (the expected value of the random variable l corresponding to the length of a randomly chosen term)
The expectation of f(x), is the average value of a some function f(x) under the distribution P(X)
$$
E[f(x)] = \sum_{x\in X}{f(x)P(X=x)}
$$

L follows a uniform distribution, and the expected value of the distribution is:

$$\mu = E(l)=\frac{1}{n}\sum^n_{i=1}l_i=\frac{\sum(L)}{length(L)}$$



In [46]:
EL = np.mean(L)
EL

3.8333333333333335

In [7]:
EL = sum(L)/(len(L))
EL

3.8333333333333335

> (g) Var$(l)$ (the variance of $l$)
Variance of f(X), under P(X), is a measure of the variation of f(x) around the mean E[f(X)]
$$Var[f(X)] = E[(f(X)-E[f(X)])^2]$$
And the variance of the uniform distribution is:
$$Var(l)=\frac{1}{n}\sum^n_{i=1}(l_i-E(l))^2=\frac{\sum(L-E(l))^2}{length(L)}$$


In [47]:
varL = np.var(L)
varL

1.8055555555555556

In [11]:
varL = (1/len(L))*np.sum((L-EL)**2)
varL

1.8055555555555556