### Persistence 0-dimensional Induced Matchings on the MNIST set

First, copy the data `test.csv` and `train.csv` from https://www.kaggle.com/c/digit-recognizer/data and store it into the folder `MNIST_data`

Then, we compute PCA on the samples to reduce their dimension to 15 coordinates. For this, we have followed this notebook https://github.com/ranasingh-gkp/PCA-TSNE-on-MNIST-dataset/blob/master/14_15_16(PCA%2CT_SNE).ipynb

Finally, we compute 0 dimensional matchings from classes to the total dataset.

Now, we load the necessary modules.

In [None]:
import numpy as np
from numpy.random import default_rng
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn import decomposition

import iblofunmatch.inter as ibfm
output_dir = "output"

Then, we load the MNIST dataset.

In [None]:
X_raw_l = pd.read_csv('mnist_data/train.csv')
y = X_raw_l["label"]
X_raw = X_raw_l.drop("label", axis=1)

There are 10 classes.

In [None]:
np.unique(y)

Each one about four-thousand points

In [None]:
class_idx = 8
np.sum(y==class_idx)

We select the class 3, where we will compute the matching.

In [None]:
# We can select a single class
X_raw_class_idx = X_raw[y==class_idx]
X_scal = StandardScaler().fit_transform(X_raw_class_idx)
# We select the whole dataset and scale it 
# X_scal = StandardScaler().fit_transform(X_raw)
X_scal.shape

We reduce the number of components from 784 down to 10

In [None]:
pca = decomposition.PCA()
pca.n_components = 10
X = pca.fit_transform(X_scal)
X.shape

Now, we take the indices from the first half of the elements from the set X

In [None]:
rng = np.random.default_rng(seed=20)

In [None]:
indices_subset = list(range(X.shape[0]))
indices_subset = rng.choice(indices_subset, int(len(indices_subset)*0.5), replace=False)

In [None]:
len(indices_subset)

Next, we compute the induced matching in dimension $0$. Notice that we bound the maximum radius `max_rad` to value of $10$.

In [None]:
output_data_ibfm = ibfm.get_IBloFunMatch_subset(None, X, indices_subset, output_dir, num_it=4, points=True, max_rad=10, max_dim=1)

Now, we can plot the density matrix.

In [None]:
fig, ax = plt.subplots(figsize=(6,5))
ibfm.plot_density_matrix(0, output_data_ibfm, nbins=14)