# **Generating Synthetic Data with Quantum Randomness: A Comprehensive Guide**

*Unlock the power of Quantum Random Number Generators (QRNGs) to enhance your AI models and generate high-quality synthetic data.*

---

## **Introduction**

As artificial intelligence (AI) continues to evolve, the demand for high-quality data grows exponentially. Synthetic data generation has emerged as a viable solution to augment datasets, especially when real data is scarce or sensitive. Traditional methods rely on pseudo-random number generators (PRNGs), which, while efficient, are deterministic and potentially predictable.

**Quantum Random Number Generators (QRNGs)** Leveraging the inherent unpredictability of quantum mechanics, QRNGs provide true randomness, offering potential benefits in enhancing model robustness and security.

In this tutorial, we'll explore how to integrate QRNGs into a Variational Autoencoder (VAE) for synthetic data generation. We'll cover:

- Setting up a QRNG with a fallback mechanism.
- Building a QRNG-enhanced VAE.
- Training the model on real data.
- Generating and evaluating synthetic data.

By the end, you'll have a working codebase capable of generating high-quality synthetic data infused with quantum randomness.

---

## **Why Quantum Randomness in AI?**

### **True Randomness vs. Pseudo-Randomness**

- **Pseudo-Random Number Generators (PRNGs)**: Generate sequences that appear random but are deterministic if the initial seed is known.
- **Quantum Random Number Generators (QRNGs)**: Utilize quantum phenomena to produce numbers that are fundamentally unpredictable.

### **Benefits of Integrating QRNGs**

- **Enhanced Security**: True randomness reduces vulnerability to attacks exploiting predictable patterns.
- **Improved Robustness**: Introducing genuine randomness can prevent overfitting and improve generalization.
- **Innovation**: Opens new research avenues at the intersection of quantum computing and AI.

---

## **Prerequisites**

- **Python 3.6+**
- **PyTorch**
- **NumPy**
- **Pandas**
- **Scikit-learn**
- **Requests library**
- **An API token for your Quantum eMotion's Entropy-as-a-Service (EaaS) API**

---

## **1. Setting Up the Quantum Random Number Generator**

First, we'll create a `QuantumRandomGenerator` class to interact with the QRNG API. This class includes a caching mechanism to minimize API calls.

In [None]:
import os
import numpy as np
import torch
import torch.nn as nn
import requests
import base64
from datetime import datetime, timedelta
from functools import lru_cache

class QuantumRandomGenerator:
    def __init__(self, api_token: str, cache_ttl_minutes: int = 30):
        self.api_token = api_token
        self.base_url = 'https://api-qxeaas.quantumemotion.com/entropy' # replace with your QRNG API endpoint
        self.headers = {'Authorization': f'Bearer {self.api_token}'}
        self.cache_ttl = timedelta(minutes=cache_ttl_minutes)
        self._initialize_cache()

    def _initialize_cache(self):
        @lru_cache(maxsize=32)
        def cached_quantum_fetch(num_bytes: int, timestamp: str) -> np.ndarray:
            response = requests.get(
                self.base_url,
                headers=self.headers,
                params={'size': num_bytes},
                timeout=10
            )
            response.raise_for_status()
            data = response.json()
            qrng_base64 = data['random_number']
            qrng_bytes = base64.b64decode(qrng_base64)
            return np.frombuffer(qrng_bytes, dtype=np.uint8)
        self._cached_fetch = cached_quantum_fetch

    def _get_cache_key_timestamp(self) -> str:
        now = datetime.now()
        ttl_seconds = self.cache_ttl.total_seconds()
        epoch_seconds = now.timestamp()
        current_period = int(epoch_seconds // ttl_seconds)
        period_start = datetime.fromtimestamp(current_period * ttl_seconds)
        return period_start.isoformat()

    def get_quantum_random(self, num_bytes: int) -> np.ndarray:
        if num_bytes > 512:
            raise ValueError("num_bytes cannot exceed 512")
        cache_timestamp = self._get_cache_key_timestamp()
        result = self._cached_fetch(num_bytes, cache_timestamp)
        if result is not None:
            return result
        else:
            raise Exception("Failed to fetch quantum random numbers")

    def clear_cache(self):
        self._cached_fetch.cache_clear()


**Important Notes:**

- **API Endpoint**: Replace `'https://api-qxeaas.quantumemotion.com/entropy'` with your actual QRNG API endpoint.
- **API Token**: Ensure you have a valid API token and set it appropriately.
- **Caching**: Uses an LRU cache to minimize API calls and handle rate limits.

---

## **2. Managing Quantum Random Numbers Efficiently**

To handle large requests and minimize API calls, we'll implement a buffer system with the `QuantumRandomBuffer` class.

In [None]:
class QuantumRandomBuffer:
    VALID_SIZES = [4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096]

    def __init__(self, qrng: QuantumRandomGenerator, buffer_size: int = 10000):
        self.qrng = qrng
        self.buffer_size = buffer_size
        self.buffer = np.array([], dtype=np.uint8)

    def _get_optimal_chunk_size(self, required_size: int) -> int:
        for size in self.VALID_SIZES:
            if size >= required_size:
                return size
        return self.VALID_SIZES[-1]

    def get_numbers(self, size: int) -> np.ndarray:
        while len(self.buffer) < size:
            remaining = size - len(self.buffer)
            chunk_size = self._get_optimal_chunk_size(min(512, remaining))
            new_numbers = self.qrng.get_quantum_random(chunk_size)
            self.buffer = np.concatenate([self.buffer, new_numbers])
        result = self.buffer[:size]
        self.buffer = self.buffer[size:]
        return result

**Key Features:**

- **Buffering**: Stores quantum random numbers to reduce API requests.
- **Optimal Chunk Size**: Ensures requests are made in sizes supported by the API.

---

## **3. Building the QRNG-Enhanced Variational Autoencoder**

### **3.1 Importing all libraries & Preparing the Dataset**

We'll create a custom `TabularDataset` class to handle our tabular data.

In [None]:
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

class TabularDataset(Dataset):
    def __init__(self, data):
        self.data = torch.FloatTensor(data)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx]


### **3.2 Implementing the QRNG Variational Autoencoder**

In [None]:
class QRNGVariationalAutoencoder(nn.Module):
    def __init__(self, input_dim, latent_dim, hidden_dim, qrng):
        super().__init__()
        self.qrng = qrng
        self.quantum_buffer = QuantumRandomBuffer(qrng)

        # Encoder
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU()
        )

        self.fc_mu = nn.Linear(hidden_dim, latent_dim)
        self.fc_var = nn.Linear(hidden_dim, latent_dim)

        # Decoder
        self.decoder = nn.Sequential(
            nn.Linear(latent_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, input_dim)
        )

    def get_quantum_noise(self, shape):
        num_elements = int(np.prod(shape))
        qrng_bytes = self.quantum_buffer.get_numbers(num_elements)
        noise = (qrng_bytes.astype(np.float32) / 128.0) - 1.0
        return torch.FloatTensor(noise.reshape(shape))

    def encode(self, x):
        hidden = self.encoder(x)
        mu = self.fc_mu(hidden)
        log_var = self.fc_var(hidden)
        return mu, log_var

    def reparameterize(self, mu, log_var):
        std = torch.exp(0.5 * log_var)
        eps = self.get_quantum_noise(mu.shape)
        if mu.is_cuda:
            eps = eps.cuda()
        return mu + eps * std

    def decode(self, z):
        return self.decoder(z)

    def forward(self, x):
        mu, log_var = self.encode(x)
        z = self.reparameterize(mu, log_var)
        return self.decode(z), mu, log_var

**Key Components:**

- **Quantum Noise Injection**: Uses quantum random numbers in the reparameterization trick.
- **Encoder and Decoder**: Standard VAE architecture with fully connected layers.
- **Reparameterization Trick**: Allows backpropagation through stochastic variables.

---

## **4. Training the Model on Real Data**

### **4.1 Preparing the Real Data**

We'll use the Breast Cancer Wisconsin dataset from scikit-learn as our real dataset.

In [None]:
from sklearn.datasets import load_breast_cancer
import os

# Save breast cancer dataset as CSV
cancer = load_breast_cancer()
cancer_df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
cancer_df.to_csv('breast_cancer.csv', index=False)

### **4.2 Initializing the Synthetic Data Generator Class**

**Training Details:**

- **Loss Function**: Combines reconstruction loss (MSE) and KL divergence.
- **Optimizer**: Uses Adam optimizer for efficient training.
- **Quantum Noise**: Injected during the reparameterization step.


**Generating Synthetic Data**

In [None]:
class SyntheticDataGenerator:
    def __init__(self, qrng_token, real_data_path=None):
        self.qrng = QuantumRandomGenerator(api_token=qrng_token)
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.real_data_path = real_data_path
        self.scaler = StandardScaler()

    def prepare_data(self, data=None):
        if data is None:
            if self.real_data_path is None:
                raise ValueError("Either provide data or set real_data_path")
            data = pd.read_csv(self.real_data_path)

        self.original_columns = data.columns
        self.original_dtypes = data.dtypes

        # Scale the data
        scaled_data = self.scaler.fit_transform(data)
        return scaled_data

    def train_model(self, data, latent_dim=10, hidden_dim=128, batch_size=64, epochs=100):
        input_dim = data.shape[1]

        # Create dataset and dataloader
        dataset = TabularDataset(data)
        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

        # Initialize model
        self.model = QRNGVariationalAutoencoder(
            input_dim=input_dim,
            latent_dim=latent_dim,
            hidden_dim=hidden_dim,
            qrng=self.qrng
        ).to(self.device)

        optimizer = torch.optim.Adam(self.model.parameters(), lr=0.001)

        # Training loop
        for epoch in range(epochs):
            total_loss = 0
            for batch in dataloader:
                batch = batch.to(self.device)

                # Forward pass
                recon_batch, mu, log_var = self.model(batch)

                # Compute loss
                recon_loss = nn.MSELoss()(recon_batch, batch)
                kl_loss = -0.5 * torch.sum(1 + log_var - mu.pow(2) - log_var.exp())
                loss = recon_loss + 0.1 * kl_loss

                # Backward pass
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

                total_loss += loss.item()

            if (epoch + 1) % 10 == 0:
                print(f'Epoch {epoch+1}/{epochs}, Loss: {total_loss/len(dataloader):.4f}')

    def generate_samples(self, n_samples):
        self.model.eval()
        with torch.no_grad():
            # Generate quantum random latent vectors
            latent_dim = self.model.fc_mu.out_features
            z = self.model.get_quantum_noise((n_samples, latent_dim)).to(self.device)

            # Generate samples
            generated = self.model.decode(z)
            generated = generated.cpu().numpy()

            # Inverse transform the generated data
            generated_data = self.scaler.inverse_transform(generated)

            # Convert to DataFrame with original column names and dtypes
            df_generated = pd.DataFrame(generated_data, columns=self.original_columns)
            for col, dtype in self.original_dtypes.items():
                if np.issubdtype(dtype, np.integer):
                    df_generated[col] = df_generated[col].round().astype(dtype)

            return df_generated

**Key Steps:**

- **Quantum Latent Vectors**: Uses quantum noise to generate latent vectors.
- **Decoding**: Transforms latent vectors back to data space.
- **Post-processing**: Inverse scaling and type casting to match original data.

---

## **6. Evaluating the Synthetic Data**

We'll compare the statistical properties of the real and synthetic data.

In [None]:
def demonstrate_synthetic_data_generation(real_data_path, qrng_token, n_samples=1000):
    """
    Demonstrates the synthetic data generation process using a real dataset
    """
    # Initialize generator
    generator = SyntheticDataGenerator(qrng_token=qrng_token, real_data_path=real_data_path)

    # Load and prepare data
    real_data = pd.read_csv(real_data_path)
    prepared_data = generator.prepare_data(real_data)

    # Train the model
    generator.train_model(
        data=prepared_data,
        latent_dim=min(10, prepared_data.shape[1]),
        hidden_dim=128,
        epochs=500
    )

    # Generate synthetic samples
    synthetic_data = generator.generate_samples(n_samples)

    # Print summary statistics comparison
    print("\nReal vs Synthetic Data Summary:")
    print("\nReal Data Summary:")
    print(real_data.describe())
    print("\nSynthetic Data Summary:")
    print(synthetic_data.describe())

    synthetic_data.to_csv('synthetic_data2.csv')

    return synthetic_data


**Usage Example:**

In [None]:
# Generate synthetic data
synthetic_data = demonstrate_synthetic_data_generation(
    real_data_path='breast_cancer.csv',
    qrng_token=os.getenv('API_TOKEN'),
    n_samples=500
)

**Evaluation Metrics:**

- **Statistical Similarity**: Compare mean, standard deviation, and other statistics.
- **Data Distribution**: Ensure the synthetic data follows similar distributions as the real data.

---

## **7. Conclusion**

In this tutorial, we've successfully integrated quantum randomness into a Variational Autoencoder for synthetic data generation. By leveraging QRNGs, we've introduced true randomness into the model, potentially enhancing the quality and security of the generated data.

**Key Takeaways:**

- **QRNG Integration**: Provides true randomness, enhancing unpredictability.
- **VAE Architecture**: Effective for generating high-dimensional synthetic data.
- **Potential Applications**: Data augmentation, privacy-preserving data sharing, and more.

**Next Steps:**

- **Experimentation**: Try different datasets and observe the effects.
- **Hyperparameter Tuning**: Adjust latent dimensions, hidden sizes, and training epochs.
- **Further Research**: Explore integrating QRNGs into other models like GANs.

---

## **References**

- **Quantum Random Number Generators**: [Wikipedia](https://en.wikipedia.org/wiki/Quantum_random_number_generator)
- **Auto-Encoding Variational Bayes**: [Kingma & Welling (2013)](https://arxiv.org/abs/1312.6114)
- **PyTorch Documentation**: [PyTorch Official Site](https://pytorch.org/docs/stable/index.html)
- **Breast Cancer Wisconsin Dataset**: [UCI Machine Learning Repository](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic)

---

*Embrace the future of AI by integrating quantum technologies into your models today! If you found this tutorial helpful, please share it with others interested in the exciting intersection of quantum computing and artificial intelligence.*