# Basic Introduction to Topological Data Analysis

This notebook studies Topological Data Analysis using the GUDHI library.
http://gudhi.gforge.inria.fr/

The notebook was started from the tutorial below

Taken from http://bertrand.michel.perso.math.cnrs.fr/Enseignements/TDA/Tuto-Part1.html

The first section of the above tutorial looks at the data from the smart phones of three
different users. The data is the acceleration of the smart phone in x,y and z directions.
(For example you can use https://phyphox.org/ to get such data. However, this not needed for this tutorial.)

I would start with reading the first part of the above tutorial.

The aim of this tutorial is to look at different persistence diagrams for simulated data sets and
to compare it to the data from the smart phone.


In [None]:
import numpy as np
import pandas as pd
import pickle as pickle
import gudhi as gd
from pylab import *
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from IPython.display import Image
from sklearn.neighbors.kde import KernelDensity
%matplotlib inline

## Input the data from disk
Note that the format of the datafile is pickle is not particulary portable.

In [None]:
f = open("data_acc","rb")
data = pickle.load(f, encoding='latin1')
f.close()


We will look at person A. 
Given that the data is the acceleration (and ignoring that there are no units). What do you notice about the values
of the acceleration?

In [None]:
data_A = data[0]
data_B = data[1] 
data_C = data[2]
label = data[3]
print(data_A)

Plot the data in three dimensions

In [None]:
data_A_sample = data_A[0]
plt.gca(projection='3d')
plt.plot(data_A_sample [:,0],data_A_sample [:,1],data_A_sample [:,2] )


## Start of the topological data analysis
The aim is to summarize the data using a persistence diagram

*  There is information about topological data analysis at:  https://en.wikipedia.org/wiki/Topological_data_analysis
* See the readable blog post https://towardsdatascience.com/from-tda-to-dl-d06f234f51d

In [None]:
Rips_complex_sample = gd.RipsComplex(points = data_A_sample,max_edge_length=0.8 )

In [None]:
Rips_simplex_tree_sample = Rips_complex_sample.create_simplex_tree(max_dimension=3) 

In [None]:
diag_Rips = Rips_simplex_tree_sample.persistence()
diag_Rips

In [None]:
Rips_simplex_tree_sample.persistence_intervals_in_dimension(0)


In [None]:
gd.plot_persistence_diagram(diag_Rips)

## Some simpler test case

The sklearn module has sone code to create simple objects, which can be used to test various cluster algorithms.
This just creates some simulated data. See  https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html  for documentation. 

What to try
 * Compare the persistence diagram against the data above
 * One thing to do is to create just one cluster.
 * The scikit-learn documentation can create other simulated data sets. https://scikit-learn.org/stable/datasets/index.html#generated-datasets  Can you use another data set.

In [None]:
from sklearn.datasets.samples_generator import make_blobs

In [None]:
# generate 2d classification dataset
X, y = make_blobs(n_samples=1000, centers=3, n_features=3)

In [None]:
from matplotlib import pyplot
from mpl_toolkits import mplot3d

In [None]:
fig = pyplot.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0] ,X[:,1] ,X[:,2] , c="r", marker='o')

# Apply the TDA techniques to this data set.

In [None]:
Rips_complex_sampleX = gd.RipsComplex(points = X,max_edge_length=0.8 )

In [None]:
Rips_simplex_tree_sampleX = Rips_complex_sampleX.create_simplex_tree(max_dimension=3) 
diag_RipsX = Rips_simplex_tree_sampleX.persistence()

In [None]:
Rips_simplex_tree_sampleX.persistence_intervals_in_dimension(0)

In [None]:
gd.plot_persistence_diagram(diag_RipsX)

##  Torus
You can repeat the computation with the points from the torus
http://mathworld.wolfram.com/Torus.html

In [None]:
import random
import math

def make_torus(ndata) :
  i = 0
  XX = np.zeros(ndata*3).reshape((ndata,3))

  while i < ndata : 
    x = 20 * random.random() - 10
    y = 20 * random.random() - 10
##    tmp = 25 - (10 - math.sqrt(x*x + y*y))**2
    tmp = 5 - (4 - math.sqrt(x*x + y*y))**2

    if tmp > 0 :
      ss = random.random()
      if ss < 0.5 :
        z = math.sqrt(tmp) 
      else:
        z = -1.0 * math.sqrt(tmp) 
      
      XX[i,0] = x 
      XX[i,1] = y
      XX[i,2] = z 
      i += 1

#      print (x, y, z)

  return XX


In [None]:
from matplotlib import pyplot
fig = pyplot.figure()
ax = fig.add_subplot(111, projection='3d')

TTT = make_torus(2000) 

ax.scatter(TTT[:,0] ,TTT[:,1] ,TTT[:,2] , c="r", marker='o')