Topological Data Analysis for Synthetic Tabular Data

This code is for the paper titled "Topology for preserving feature correlation in tabular synthetic data".

Summary: From reviewing the literature, we found that the GAN based synthetic tabular data generating models are outperforming other models like Variation Autoencoder. However, we identified that the tabular synthetic data generated by GAN cannot preserve the characteristics of the original data (Feature correlation, Manifold, temporal correlation). In this paper, we analyzed the fact that why GAN (CTGAN) can not preserve feature correlations in synthetic data

Abstract: Tabular synthetic data generating models based on Generative Adversarial Network (GAN) show significant contributions to enhancing the performance of deep learning models by providing a sufficient amount of training data. However, the existing GAN-based models cannot preserve the feature correlations in synthetic data during the data synthesis process. Therefore, the synthetic data become unrealistic and creates a problem for certain applications like correlation-based feature weighting. In this short theoretical paper, we showed a promising approach based on the topology of datasets to preserve correlation in synthetic data. We formulated our hypothesis for preserving correlation in synthetic data and used persistent homology to show that the topological spaces of the original and synthetic data have dissimilarity in topological features, especially in $0^{th}$ and $1^{st}$ Homology groups. Finally, we concluded that minimizing the difference in topological features can make the synthetic data space locally homeomorphic to the original data space, and the synthetic data may preserve the feature correlation under homeomorphism conditions.

The paper can be accessed at the following link- https://ieeexplore.ieee.org/document/9970505

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Persistent_Diagram_SRU.ipynb		Persistent_Diagram_SRU.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent_Diagram_SRU.ipynb

Persistent_Diagram_SRU.ipynb

README.md

README.md

Repository files navigation

Topological Data Analysis for Synthetic Tabular Data

About

Releases

Packages

Languages

cybersec-soc-rgu/TopologyGANSyntheticData

Folders and files

Latest commit

History

Persistent_Diagram_SRU.ipynb

Persistent_Diagram_SRU.ipynb

README.md

README.md

Repository files navigation

Topological Data Analysis for Synthetic Tabular Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages