Skip to content

cybersec-soc-rgu/TopologyGANSyntheticData

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Topological Data Analysis for Synthetic Tabular Data

This code is for the paper titled "Topology for preserving feature correlation in tabular synthetic data".

Summary: From reviewing the literature, we found that the GAN based synthetic tabular data generating models are outperforming other models like Variation Autoencoder. However, we identified that the tabular synthetic data generated by GAN cannot preserve the characteristics of the original data (Feature correlation, Manifold, temporal correlation). In this paper, we analyzed the fact that why GAN (CTGAN) can not preserve feature correlations in synthetic data

Abstract: Tabular synthetic data generating models based on Generative Adversarial Network (GAN) show significant contributions to enhancing the performance of deep learning models by providing a sufficient amount of training data. However, the existing GAN-based models cannot preserve the feature correlations in synthetic data during the data synthesis process. Therefore, the synthetic data become unrealistic and creates a problem for certain applications like correlation-based feature weighting. In this short theoretical paper, we showed a promising approach based on the topology of datasets to preserve correlation in synthetic data. We formulated our hypothesis for preserving correlation in synthetic data and used persistent homology to show that the topological spaces of the original and synthetic data have dissimilarity in topological features, especially in $0^{th}$ and $1^{st}$ Homology groups. Finally, we concluded that minimizing the difference in topological features can make the synthetic data space locally homeomorphic to the original data space, and the synthetic data may preserve the feature correlation under homeomorphism conditions.

The paper can be accessed at the following link- https://ieeexplore.ieee.org/document/9970505

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%