# Topology of Deep Neural Networks

## 1. Introduction

Analyze the topology of deep neural nets in binary classification tasks:  
1) How is a simple neural net built?  
2) Binary classification  
3) topology of neural nets  
4) Analysis of Topology

### 1.1. Neural Nets

A neural net is given by the composition of functions of the form $f(x) = K(Wx + b)$.  
With $K$ being some non-linear function, the weight matrix $W$ and a bias vector $b$.  
In the picture below, each hidden layer corresponds to one of these functions.

![alt text](images/intro_2.png "Simple neural net")

### 1.2. Binary Classification

In classification tasks, we try to say to which of the classes of a given set, a picture corresponds.  
In binary classification, we have only two classes.

### 1.3. Topology of neural nets

The figure below gives an inuition of what is meant by the topology of a neural net

![alt text](images/intro_1.png "Title")

- Given two disjoint manifolds (green and red)
- Each corresponding to a certain class (e.g. red are all cat images and green are all dog images in our train set)
- Each step corresponds to one layer in a well trained neural net
- The betti numbers change in the following way:  
$\beta~(red): (1,2,0) \rightarrow (1,2,0) \rightarrow (2,1,0) \rightarrow (2,0,0) \rightarrow (1,0,0) \rightarrow (1,0,0)$  
$\beta~(green): (2,2,0) \rightarrow (2,2,0) \rightarrow (2,1,0) \rightarrow (2,0,0) \rightarrow (2,0,0) \rightarrow (1,0,0)$  
- In the end we get two disjoint balls

### 1.4. Analysis of Topology

## 2. Methodology

- We seek to classify two disjoint manifolds $M_a, M_b \subset \mathbb{R}^d$.  
- Sample large but finite set of points $T\subset M_a \cup M_b$ uniformly and densely. Write $T_i = T \cap M_i, i \in \{a,b\}$.  
- Feedforward NN is given by composition $\nu = s \circ f_l \circ f_{l-1} \circ \dots \circ f_2 \circ f_1$, where the $f_i$ are the layers of the NN and $s$ is the score function.  
- Write $\nu_j = f_j \circ \dots \circ f_2 \circ f_1$ to denote the first $j$ layers of the NN.  
- Train the network until it correctly classifies all training examples and almost all test examples. We call such a network "well-trained".  
- Experiments are intended to show the topologies of $\nu_j(M_a)$ and $\nu_j(M_b)$ as j runs from 1 to $l$, for different manifolds and network architectures.  
- Perform experiments on both simulated datasets, where we know the topology in advance, and real-world data. Real world datasets are more difficult to handle for various reasons, but the most important one for us is, that they have extremely complex topologies in general.  
- Thus the experiments on simulated datasets are very extensive and we can then use some real-world datasets to validate our findings.

# (i) Generate the simulated datasets

Hier n paar nice Bilder einfügen mit Betti-Zahlen. Gibt nicht viel zum Prozess zu sagen.

# (ii) Training neural networks

Want to examine topology changing effects of:
- different activations (ReLU, leaky ReLU, tanh
- different network depths (4 to 10 layers)
- different network widths (6 to 50 neurons per layer)

# (iii) Computing Homology

a) Building the Vietoris-Rips complex.

We don't simply use the Euclidean distance to build the VR-complex. Instead we first build the k-nearest-neighbour graph and use the geodesic distance on it, denoted by $\delta_k$. For each $x_i, x_j \in X$ the distance $\delta_k(x_i, x_j)$ is defined by the minimal number of edges between the in the k-nearest neighbour graph. This has the effect of normalizing distances across layers in neural networks, while preserving connectivity of nearest neighbors. This is desirable, because the layers will change geometry quite drastically and this metric is rather robust to geometric changes, but will still reveal the topological ones, that we are looking for.

Now, our Vietoris-Rips complex depens on two parameters, that we need to set: $k$ for the metric and the usual $\epsilon$ we need for the VR-complex.

For simulated data we will set parameters, such that the corresponding VR-complex has the same homology as the simulated dataset and then keep these parameters throughout the layers in the neural networks. This means, that we do not have to do persistent homology in every layer of every network, which will save us a lot of time, that we can invest in more experiments.

First set $\epsilon = 1$ and find a $k$, such that the first betti-number is correct, then fix the found $k_\star$ and proceed to tweak $\epsilon$, until all the betti-numbers are correct.