# Implementing Differential Privacy in Machine Learning

## Introduction
In this notebook, we will implement differential privacy using the **TensorFlow Privacy** library. We will apply it to a simple classification task using the **MNIST dataset** (handwritten digits). We will compare the model's performance with and without differential privacy.

### Objectives:
- Understand the concept of differential privacy.
- Implement a differentially private training algorithm.
- Compare the accuracy of models with and without differential privacy.

## Prerequisites
Make sure you have the following libraries installed:

```bash
pip install tensorflow tensorflow-privacy matplotlib

In [None]:
import tensorflow as tf
import tensorflow_privacy as tfp
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import datasets, layers, models

In [None]:
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()

# Normalize the images to the range [0, 1]
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# Reshape the data
x_train = x_train.reshape((x_train.shape[0], 28, 28, 1))
x_test = x_test.reshape((x_test.shape[0], 28, 28, 1))

In [None]:
def create_model():
    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Flatten())
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(10, activation='softmax'))
    return model


In [None]:
# Create and compile the model
model = create_model()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(x_train, y_train, epochs=5, validation_split=0.2)


In [None]:
# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Test accuracy without differential privacy: {test_acc:.4f}')


In [None]:
# Set differential privacy parameters
noise_multiplier = 1.1  # Adjust noise multiplier for privacy
l2_norm_clip = 1.0      # Clip gradients to this L2 norm
epochs = 5              # Number of training epochs

# Create and compile the model again for DP training
model_dp = create_model()

# Define the optimizer with differential privacy
dp_optimizer = tfp.DPGradientDescentOptimizer(
    l2_norm_clip=l2_norm_clip,
    noise_multiplier=noise_multiplier,
    num_microbatches=256,  # For better accuracy
    learning_rate=0.01
)

model_dp.compile(optimizer=dp_optimizer, loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model with differential privacy
history_dp = model_dp.fit(x_train, y_train, epochs=epochs, validation_split=0.2)


In [None]:
# Evaluate the model with differential privacy
test_loss_dp, test_acc_dp = model_dp.evaluate(x_test, y_test)
print(f'Test accuracy with differential privacy: {test_acc_dp:.4f}')


In [None]:
# Plot training and validation accuracy
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'], label='Train Accuracy (No DP)')
plt.plot(history.history['val_accuracy'], label='Val Accuracy (No DP)')
plt.title('Training and Validation Accuracy (No DP)')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.subplot(1, 2, 2)
plt.plot(history_dp.history['accuracy'], label='Train Accuracy (DP)')
plt.plot(history_dp.history['val_accuracy'], label='Val Accuracy (DP)')
plt.title('Training and Validation Accuracy (With DP)')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()

plt.tight_layout()
plt.show()
