# Digital Filter Design using Deep Learning

We can see digital filters as some block box which we should find the parameters of it. In this notebook, we will use deep learning to find the parameters of a digital filter.

## Linear Algebra Review
First I will review some linear algebra concepts which we will use in this notebook.
In linear algebra we have different notations for vectors and matrices.
### Notations
In this notebook, I will use the following notations:

* $x$: element of a vector(scalar)
* $\mathbf{x}$: The vector
* $\mathbf{X}$: The matrix
* $\mathcal{X}$: The set which contains the elements of the vector or matrix and is a subset of $\mathbb{R}^n$ or $\mathbb{C}^n$

### Product of a vector and a matrix
If the vector $\mathbf{x}$ has $m_1$ elements, and the matrix $\mathbf{W}$ has $m_1$ rows and $m_2$ columns, we say that $\mathbf{x} \in \mathbb{R}^{m_1}$ and $\mathbf{W} \in \mathbb{R}^{m_1 \times m_2}$. the production of this vector and this matrix is defined as below:
$$ \mathbf{x_{1\times m_1}} \mathbf{M_{m_1 \times m_2}} = \mathbf{y_{1\times m_2}}$$
where $\mathbf{y} \in \mathbb{R}^{m_2}$.

### Matrix as a set of vectors
If we have $n$ vectors $\mathbf{x}_i \in \mathbb{R}^{m_1}$, we can define a matrix $\mathbf{X} \in \mathbb{R}^{n \times m_1}$ as below:
$$ \mathbf{X} = \begin{bmatrix} \mathbf{x}_1 \\ \mathbf{x}_2 \\ \vdots \\ \mathbf{x}_n \end{bmatrix}$$

The transpose of a matrix $\mathbf{X}$ is defined as below:
$$ \mathbf{X}^T = \begin{bmatrix} \mathbf{x}_1^T & \mathbf{x}_2^T & \cdots & \mathbf{x}_n^T \end{bmatrix}$$

The inverse of a matrix $\mathbf{X}$ is defined as below:
$$ \mathbf{X}^{-1} \mathbf{X} = \mathbf{I}$$

We call $\mathbf{X}$ a 2D tensor, and $\mathbf{x}$ a 1D tensor.

### Series of input signals
We can consider each vector as a input signal to a digital filter. The output of the filter is the production of the input signal and the filter parameters. We can consider the filter parameters as a matrix. So, the output of the filter is a vector.
the filter should behave correctly for all the input signals. So, we should tune the weights of the filter to have the correct output for all the input signals.

We assumed that the number of input signals is $n$ and the number of elements of each input signal is $m_1$. So, the input matrix is $\mathbf{X} \in \mathbb{R}^{n \times m_1}$. If we give all input signals to the system the output matrix is calculated as below:
$$ \mathbf{Y_{n \times m_2}} = \mathbf{X_{n \times m_1}} . \mathbf{W_{m_1 \times m_2}}$$

In our case the output matrix should have the same size as the input matrix. So, $m_2 = m_1$. So, the output matrix is $\mathbf{Y_{n \times m_1}}$. We can calculate the output matrix as below:
$$ \mathbf{Y_{n \times m_1}} = \mathbf{X_{n \times m_1}} . \mathbf{W_{m_1 \times m_1}}$$

### Non-linearities
To model a digital filter we cannot use a simple matrix multiplication. We should add some non-linearities to the system. So, we can use the following equation to model a digital filter:
$$ \mathbf{Y_{n \times m_1}} = \sigma(\mathbf{X_{n \times m_1}} . \mathbf{W_{m_1 \times m_1}})$$
where $\sigma$ is a non-linear function.

Still, we cannot use this equation to model a digital filter. Because the relation between the input and output of a digital filter is not a simple matrix multiplication which is a linear relation.
To illustrate this, we can consider $\mathbf{y_{new}} = \sigma^{-1}(\mathbf{y})$.
The relation between $\mathbf{X}$ and $\mathbf{y_{new}}$ is a linear relation. So, we didn't learn a non-linear relation between $\mathbf{X}$ and $\mathbf{y}$.

The input signals have been related to the system weights using a linear matrix multiplication. But, the output of the system is not a linear function of the system weights. So, we should insert some non-linearities between the system weights and the output of the system. So, we can use the following equation to model a digital filter:
$$ \mathbf{Y_{n \times m_1}} = \sigma(\sigma(\mathbf{X_{n \times m_1}} . \mathbf{W_1}_{m_1 \times m_2}) . \mathbf{W_2}_{m_2 \times m_1})$$
where $\mathbf{W_1}$ and $\mathbf{W_2}$ are the system weights and $\sigma$ is a non-linear function.

This is a simple neural network with one hidden layer. We can add more hidden layers to the system to have a more complex system. We should tune the system weights to have the correct output for all the input signals.


## Loss Function
In practice, we cannot tune the system weights to have the correct output for all the input signals. So, we should find the system weights which have the minimum error for all the input signals. So, we should define a cost function to calculate the error of the system for all the input signals. We can use the following equation to calculate the cost function:
$$ \mathbf{J} = \frac{1}{n} \sum_{i=1}^{n} \mathbf{Y_i} - \mathbf{\hat{Y}_i}$$
where $\mathbf{Y_i}$ is the output of the system for the $i$th input signal and $\mathbf{\hat{Y}_i}$ is the correct output of the system for the $i$-th input signal which we expect from the system.
This cost function cannot be used because the negative values of the error can cancel the positive values of the error. So, we should use absolute value of the error to calculate the cost function. So, we can use the following equation to calculate the cost function:
$$ \mathbf{J} = \frac{1}{n} \sum_{i=1}^{n} |\mathbf{Y_i} - \mathbf{\hat{Y}_i}|$$
We call this cost function mean absolute error (MAE).
This cost function is not differentiable for values of $\mathbf{Y_i} = \mathbf{\hat{Y}_i}$. So, we should use a differentiable function to calculate the cost function. We can use the following equation to calculate the cost function:
$$ \mathbf{J} = \frac{1}{n} \sum_{i=1}^{n} (\mathbf{Y_i} - \mathbf{\hat{Y}_i})^2$$
This cost function is called mean squared error (MSE).

There are lots of other cost functions which can be used to calculate the error of the system.