### GANDataBalancer Class


The **GANDataBalancer** class uses Generative Adversarial Networks (GANs) to generate synthetic data for balancing imbalanced datasets. It automatically identifies the minority and majority classes, trains a GAN, and generates synthetic data to supplement the minority class, achieving a balanced dataset.

---

#### **Functions**

##### 1. __init__
- **Description**:  
  Initializes the GANDataBalancer class with the dataset, target column, and latent dimension for the GAN.
- **Parameters**:  
  - dataset (DataFrame): Input dataset to be balanced.  
  - target_column (str): The name of the column representing the class labels.  
  - latent_dim (int): Size of the latent dimension for the generator's input. Default is 100.  
- **Usage**:  
  Prepares the dataset and segregates majority and minority classes for later processing.

---

##### 2. build_generator
- **Description**:  
  Builds the generator model, which takes random noise from the latent space as input and generates synthetic data samples resembling the minority class.
- **Parameters**:  
  - output_dim (int): Dimensionality of the output data (same as the number of features in the dataset).  
- **Returns**:  
  - A compiled Keras Sequential model for the generator.

---

##### 3. build_discriminator
- **Description**:  
  Builds the discriminator model, which takes data samples (real or fake) as input and predicts whether they are real (from the dataset) or fake (from the generator).
- **Parameters**:  
  - input_dim (int): Dimensionality of the input data (same as the number of features in the dataset).  
- **Returns**:  
  - A compiled Keras Sequential model for the discriminator, trained to classify real vs. fake data.

---

##### 4. build_gan
- **Description**:  
  Combines the generator and discriminator to build the GAN model. The discriminator is kept non-trainable to ensure only the generator is updated during GAN training.
- **Parameters**:  
  None.  
- **Returns**:  
  - A compiled Keras Model for the GAN.

------

##### 5. fit
- **Description**:  
  Trains the GAN by alternately updating the discriminator and generator. The discriminator is trained to distinguish between real and fake data, while the generator learns to produce more realistic synthetic data.
- **Parameters**:  
  - epochs (int): Number of epochs to train the GAN. Default is 1000.  
  - batch_size (int): Size of each training batch. Default is 64.  
- **Usage**:  
  Monitors and logs the losses of the discriminator and generator during training.

---

##### 6. resample
- **Description**:  
  Uses the trained generator to produce synthetic data and combines it with the original dataset to balance the minority class.
- **Parameters**:  
  - synthetic_size (int): Number of synthetic samples to generate. Default is 1000.  
- **Returns**:  
  - A balanced DataFrame containing both the original and synthetic data.

---

##### 7. fit_resample
- **Description**:  
  Combines the training (fit) and resampling (resample) steps into a single function. Trains the GAN and generates a balanced dataset in one call.
- **Parameters**:  
  - epochs (int): Number of epochs to train the GAN. Default is 1000.  
  - batch_size (int): Size of each training batch. Default is 64.  
  - synthetic_size (int): Number of synthetic samples to generate. Default is 1000.  
- **Returns**:  
  - A balanced DataFrame containing both the original and synthetic data.


In [9]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from keras.models import Sequential, Model
from keras.layers import Dense, Input

class GANDataBalancer:
    def __init__(self, dataset, target_column, latent_dim=100):
        self.dataset = dataset
        self.target_column = target_column
        self.latent_dim = latent_dim

  # Split dataset into train and test sets
        X = dataset.drop(columns=[target_column])
        y = dataset[target_column]
        self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(
            X, y, test_size=0.2, random_state=42
        )

        # Automatically identify minority and majority classes
        self.X_minority = self.X_train[self.y_train == 1]
        self.X_majority = self.X_train[self.y_train == 0]

        self.generator = None
        self.discriminator = None
        self.gan = None

    def build_generator(self, output_dim):
        model = Sequential([
            Dense(16, activation='relu', input_dim=self.latent_dim),
            Dense(32, activation='relu'),
            Dense(output_dim, activation='linear')
        ])
        return model

    def build_discriminator(self, input_dim):
        model = Sequential([
            Dense(32, activation='relu', input_dim=input_dim),
            Dense(16, activation='relu'),
            Dense(1, activation='sigmoid')
        ])
        model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
        return model

    def build_gan(self):
        self.discriminator.trainable = False
        gan_input = Input(shape=(self.latent_dim,))
        x = self.generator(gan_input)
        gan_output = self.discriminator(x)
        gan = Model(gan_input, gan_output)
        gan.compile(optimizer='adam', loss='binary_crossentropy')
        return gan

    def fit(self, epochs=1000, batch_size=64):
        minority_class_samples = self.X_minority.values
        self.generator = self.build_generator(self.X_train.shape[1])
        self.discriminator = self.build_discriminator(self.X_train.shape[1])
        self.gan = self.build_gan()

        for epoch in range(epochs):
            # Train Discriminator
            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))
            fake_data = self.generator.predict(noise)
            real_data = minority_class_samples[
                np.random.randint(0, minority_class_samples.shape[0], batch_size)
            ]
            X_combined = np.vstack((real_data, fake_data))
            y_combined = np.hstack((np.ones(batch_size), np.zeros(batch_size)))
            self.discriminator.trainable = True
            d_loss = self.discriminator.train_on_batch(X_combined, y_combined)

            # Train Generator
            noise = np.random.normal(0, 1, (batch_size, self.latent_dim))
            y_gen = np.ones(batch_size)
            self.discriminator.trainable = False
            g_loss = self.gan.train_on_batch(noise, y_gen)

            if epoch % 100 == 0:
                print(f"Epoch {epoch}/{epochs} | Discriminator Loss: {d_loss} | Generator Loss: {g_loss}")

    def resample(self, synthetic_size=1000):
        synthetic_data = self.generator.predict(
            np.random.normal(0, 1, (synthetic_size, self.latent_dim))
        )
        synthetic_df = pd.DataFrame(synthetic_data, columns=self.X_train.columns)
        balanced_X = pd.concat([self.X_train, synthetic_df])
        balanced_y = pd.concat([self.y_train, pd.Series([1] * synthetic_df.shape[0])])
        balanced_dataset = pd.concat([balanced_X, balanced_y], axis=1)
        return balanced_dataset

    def fit_resample(self, epochs=1000, batch_size=64, synthetic_size=1000):
        self.fit(epochs=epochs, batch_size=batch_size)
        return self.resample(synthetic_size=synthetic_size)


In [10]:
# Load dataset
data = pd.read_csv('creditcard.csv')

# Initialize GANDataBalancer
gan_balancer = GANDataBalancer(data, target_column='Class', latent_dim=100)

# Train the GAN and resample the dataset
balanced_dataset = gan_balancer.fit_resample(epochs=1000, batch_size=64, synthetic_size=1000)

# Save the balanced dataset (you can save it manually if needed)
balanced_dataset.to_csv('balanced_dataset.csv', index=False)
print("Balanced dataset saved to 'balanced_dataset.csv'")


[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step 


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 0/1000 | Discriminator Loss: [array(0.50165504, dtype=float32), array(0.5, dtype=float32)] | Generator Loss: [array(0.50165504, dtype=float32), array(0.50165504, dtype=float32), array(0.5, dtype=float32)]
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 988us/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 969us/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 882us/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 961us/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m2/2[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 860us/ste