# Synthetic Data Generation with GAN

This notebook implements a Generative Adversarial Network (GAN) to generate synthetic application usage data.

## Project Structure
- The main functions are in `data_gen_utils.py`  
- The original data is located in `./data/screentime_analysis.csv`

In [4]:
# Import necessary libraries
import numpy as np
import pandas as pd

# Import our custom functions
from data_gen_utils import (
    build_generator, build_discriminator, build_gan,
    train_gan, preprocess_data, generate_synthetic_data
)


## 1. Data Loading and Preparation

We load the application usage data and prepare it for GAN training.

In [5]:
# Load data
data = pd.read_csv('../data/screentime_analysis.csv')
print("Raw data preview:")
print(data.head())

# Data preprocessing
normalized_data, scaler = preprocess_data(data)
print("\nShape of normalized data:", normalized_data.shape)

Raw data preview:
         Date        App  Usage (minutes)  Notifications  Times Opened
0  2024-08-07  Instagram               81             24            57
1  2024-08-08  Instagram               90             30            53
2  2024-08-26  Instagram              112             33            17
3  2024-08-22  Instagram               82             11            38
4  2024-08-12  Instagram               59             47            16

Shape of normalized data: (200, 3)


## 2. GAN Configuration

Setting up and building the Generator and Discriminator models.

![GAN Discriminator](../image/GAN_discriminator.png)

In [6]:
# Model parameters
latent_dim = 100  # dimension of the latent space

# Build models
generator = build_generator(latent_dim)
discriminator = build_discriminator()
gan = build_gan(generator, discriminator)

# Display architectures
print("Generator architecture:")
generator.summary()
print("\nDiscriminator architecture:")
discriminator.summary()


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
2025-07-29 11:57:37.178636: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


Generator architecture:



Discriminator architecture:


## 3. Model Training

Training the GAN on our normalized data.

In [7]:
# Train the GAN
train_gan(
    gan=gan,
    generator=generator,
    discriminator=discriminator,
    data=normalized_data,
    nb_epochs=100,  # Adjust as needed
    batch_size=128,
    latent_dim=latent_dim
)


[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step  




Epoch 0: D Loss: [0.69074875 0.75      ], G Loss: 0.6626264452934265
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m4/4[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/s

## 4. Synthetic Data Generation

Using the trained model to generate new data.

In [8]:
# Generate new data
feature_names = [col for col in data.columns if col not in ['Date', 'App']]
generated_df = generate_synthetic_data(
    generator=generator,
    scaler=scaler,
    n_samples=1000,
    latent_dim=latent_dim,
    feature_names=feature_names
)

print("Preview of generated data:")
print(generated_df.head())

[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step
Preview of generated data:
   Usage (minutes)  Notifications  Times Opened
0        56.830372     129.494659     26.542692
1         1.001729       0.000054      1.000018
2       114.038704     144.770432     98.779320
3       115.785591     144.567856     98.514954
4       118.233452     146.870361     98.987000
