# Implement Differential Privacy with TensorFlow Privacy

## Learning Objectives

* Learn how to wrap existing optimizers (e.g., SGD, Adam) into their differentially private counterparts using TensorFlow Privacy
* Understand hyperparameters introduced by differentially private machine learning
* Measure the privacy guarantee provided using analysis tools included in TensorFlow Privacy

## Overview

[Differential privacy](https://en.wikipedia.org/wiki/Differential_privacy) (DP) is a framework for measuring the privacy guarantees provided by an algorithm. Through the lens of differential privacy, you can design machine learning algorithms that responsibly train models on private data. Learning with differential privacy provides measurable guarantees of privacy, helping to mitigate the risk of exposing sensitive training data in machine learning. Intuitively, a model trained with differential privacy should not be affected by any single training example, or small set of training examples, in its data set. This helps mitigate the risk of exposing sensitive training data in ML.

The basic idea of this approach, called differentially private stochastic gradient descent (DP-SGD), is to modify the gradients
used in stochastic gradient descent (SGD), which lies at the core of almost all deep learning algorithms. Models trained with DP-SGD provide provable differential privacy guarantees for their input data. There are two modifications made to the vanilla SGD algorithm:

1. First, the sensitivity of each gradient needs to be bounded. In other words, you need to limit how much each individual training point sampled in a minibatch can influence gradient computations and the resulting updates applied to model parameters. This can be done by *clipping* each gradient computed on each training point.
2. *Random noise* is sampled and added to the clipped gradients to make it statistically impossible to know whether or not a particular data point was included in the training dataset by comparing the updates SGD applies when it operates with or without this particular data point in the training dataset.


This tutorial uses [tf.keras](https://www.tensorflow.org/guide/keras) to train a convolutional neural network (CNN) to recognize handwritten digits with the DP-SGD optimizer provided by the TensorFlow Privacy library. TensorFlow Privacy provides code that wraps an existing TensorFlow optimizer to create a variant that implements DP-SGD.

## Setup

In [1]:
# In a Jupyter notebook, the exclamation mark (!) indicates 
# that this line is executed as a shell command rather than Python code.
# 
# Here we are installing two specific Python packages:
#   1) tensorflow-privacy (version 0.8.12)
#   2) dp_accounting (version 0.4.3)
#
# The --user flag installs the packages into the user's home directory 
# rather than at the system level, preventing potential permission issues.
#
# The --no-deps (no dependencies) flag tells pip not to install any 
# additional packages that these libraries might depend on. We might do this 
# if we want to strictly manage dependency versions ourselves. 
#
# Note: In a managed notebook environment like Vertex AI, ensure that your 
# environment is set up to allow user installations before running this command.

!pip install --user --no-deps tensorflow-privacy==0.8.12 dp_accounting==0.4.3


Collecting tensorflow-privacy==0.8.12
  Downloading tensorflow_privacy-0.8.12-py3-none-any.whl.metadata (962 bytes)
Collecting dp_accounting==0.4.3
  Downloading dp_accounting-0.4.3-py3-none-any.whl.metadata (1.8 kB)
Downloading tensorflow_privacy-0.8.12-py3-none-any.whl (405 kB)
Downloading dp_accounting-0.4.3-py3-none-any.whl (104 kB)
Installing collected packages: tensorflow-privacy, dp_accounting
Successfully installed dp_accounting-0.4.3 tensorflow-privacy-0.8.12


Begin by importing the necessary libraries:

In [2]:
# We import the os and warnings modules. 
#   - The 'os' module provides functions for interacting with the operating system.
#   - The 'warnings' module helps manage and filter Python warning messages.
import os
import warnings

# We set an environment variable 'TF_CPP_MIN_LOG_LEVEL' to '2'.
# This environment variable is used by TensorFlow to control the verbosity 
# of log messages printed to the console.
# Possible values and their effects:
#   0: All logs are shown (includes debug messages).
#   1: Filter out INFO logs.
#   2: Filter out INFO and WARNING logs (only errors are shown).
#   3: Filter out all logs (errors, warnings, and info).
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

# We use the 'warnings.filterwarnings' function to ignore all warning messages.
# This can help keep the notebook output clean, but be cautious because 
# ignoring warnings might mean missing important information about potential issues.
warnings.filterwarnings("ignore")


In [3]:
# Import the numpy library and alias it as np.
# This is a common Python convention so that when we use numpy, 
# we can easily refer to it as np instead of typing numpy every time.
import numpy as np

# Import TensorFlow and alias it as tf.
# TensorFlow is a popular machine learning framework developed by Google.
import tensorflow as tf

# Here we are setting the TensorFlow logger's log level to "ERROR". 
# This means that only error messages will be printed, 
# and warnings or informational messages will be suppressed.
tf.get_logger().setLevel("ERROR")


Import TensorFlow Privacy.

In [4]:
# -------------------------------------------------------------------------
# DIFFERENTIAL PRIVACY
# -------------------------------------------------------------------------
# Differential Privacy is a framework that helps ensure the privacy of
# individual data points in a dataset. The main idea is to limit how much
# any single data point (e.g., a person's record) can affect the outcome
# of a computation, such as the training of a machine learning model.
#
# In practice, one way to achieve differential privacy is through
# Differentially Private Stochastic Gradient Descent (DP-SGD). DP-SGD
# modifies the standard SGD procedure by:
#   1) Clipping the gradients for each individual sample or microbatch
#      (so that no single data point can excessively influence the update).
#   2) Adding random noise to those clipped gradients to mask the 
#      contribution of individual samples.
#
# These steps ensure that the final trained model does not overly 
# "memorize" any single data record, thereby protecting the privacy of 
# individuals in the dataset. 
#
# -------------------------------------------------------------------------
# EPSILON (ε) AND DELTA (δ)
# -------------------------------------------------------------------------
# When discussing differential privacy, you will often hear about ε (epsilon)
# and δ (delta). They represent the "privacy budget" of your training:
#
#   - ε (epsilon) is the core measure of how much your model's outputs
#     could potentially differ if a single individual’s data were removed.
#     A smaller ε means stronger privacy guarantees, but potentially 
#     lower model accuracy.
#
#   - δ (delta) is a small probability that allows the differential privacy
#     guarantee to be broken. In simpler terms, it captures rare "worst-case"
#     scenarios. Typically, δ is set to be smaller than 1/N, where N is 
#     the size of the dataset.
#
# -------------------------------------------------------------------------
# compute_dp_sgd_privacy
# -------------------------------------------------------------------------
# The 'compute_dp_sgd_privacy' function in 'tensorflow_privacy' helps 
# analyze the privacy budget (ε, δ) given:
#   - The number of training steps (how many times the model updates),
#   - The batch size and total dataset size (to understand the sampling rate),
#   - The noise multiplier (how much noise is added to the gradients),
#   - And sometimes the clipping norm (how much the gradients are clipped).
#
# By providing these hyperparameters, 'compute_dp_sgd_privacy' can 
# estimate the overall privacy budget (ε, δ) used by your model. This 
# helps you know how well your model is protecting individuals' data.
#
# -------------------------------------------------------------------------
# IMPORT STATEMENTS
# -------------------------------------------------------------------------

# The 'tensorflow_privacy' library extends TensorFlow with tools 
# that enable differentially private training of machine learning models.
import tensorflow_privacy

# Here, we specifically import the function 'compute_dp_sgd_privacy'
# from the 'privacy.analysis' module within 'tensorflow_privacy'. 
# This function will be used to estimate the (ε, δ) privacy guarantees
# after training a model with DP-SGD.
from tensorflow_privacy.privacy.analysis import compute_dp_sgd_privacy


## Load and pre-process the dataset

Load the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and prepare the data for training.

In [5]:
# -----------------------------------------------------------------------------
# MNIST DATASET LOADING AND PREPROCESSING
# -----------------------------------------------------------------------------
# The MNIST dataset consists of 70,000 handwritten digit images (28x28 pixels),
# each labeled with its corresponding digit (0 through 9).
# Typically, 60,000 images are used for training, and 10,000 for testing.

import numpy as np
import tensorflow as tf

# Load the MNIST dataset using TensorFlow Keras utilities.
# The returned objects are tuples (images, labels) for both training and testing.
# train -> (train_data, train_labels)
# test  -> (test_data, test_labels)
train, test = tf.keras.datasets.mnist.load_data()

# We then unpack the tuples into separate variables for clarity.
train_data, train_labels = train
test_data, test_labels = test

# -----------------------------------------------------------------------------
# DATA NORMALIZATION (SCALING PIXELS)
# -----------------------------------------------------------------------------
# The pixel values in the MNIST images are originally integers in the range [0, 255].
# We convert them to float32 (a common data type for deep learning) and scale 
# them to the range [0, 1]. This makes training more stable and efficient.
train_data = np.array(train_data, dtype=np.float32) / 255.0
test_data = np.array(test_data, dtype=np.float32) / 255.0

# -----------------------------------------------------------------------------
# RESHAPING DATA FOR CONVOLUTIONAL NETWORKS
# -----------------------------------------------------------------------------
# If we're planning to use Convolutional Neural Networks (CNNs), we often need
# the data to have a shape of (samples, height, width, channels). 
# Since MNIST is grayscale, channels = 1.
# Here, each image is 28x28 pixels, so we reshape the data accordingly.
train_data = train_data.reshape(train_data.shape[0], 28, 28, 1)
test_data = test_data.reshape(test_data.shape[0], 28, 28, 1)

# -----------------------------------------------------------------------------
# CONVERTING LABELS TO CATEGORICAL (ONE-HOT) ENCODING
# -----------------------------------------------------------------------------
# MNIST labels are integers from 0 to 9. Many deep learning frameworks
# expect labels in a one-hot encoded format, where each label is represented 
# by a vector of length 10 (for digits 0 to 9). 
# For example, the digit 2 would become [0, 0, 1, 0, 0, 0, 0, 0, 0, 0].
train_labels = np.array(train_labels, dtype=np.int32)
test_labels = np.array(test_labels, dtype=np.int32)
train_labels = tf.keras.utils.to_categorical(train_labels, num_classes=10)
test_labels = tf.keras.utils.to_categorical(test_labels, num_classes=10)

# -----------------------------------------------------------------------------
# ASSERTIONS FOR DATA VALIDATION
# -----------------------------------------------------------------------------
# We use Python assert statements to verify that our data has been 
# scaled correctly to the range [0, 1].
# These checks will raise an error if the condition is not satisfied.
assert train_data.min() == 0.0
assert train_data.max() == 1.0
assert test_data.min() == 0.0
assert test_data.max() == 1.0


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## Define the hyperparameters
Set learning model hyperparamter values. 


DP-SGD has three general hyperamater and three privacy-specific hyperparameters that you must tune:

**General hyperparameters**

1. `epochs` (int) - This refers to the one entire passing of training data through the algorithm. Larger epoch increase the privacy risks since the model is trained on a same data point for multiple times.
2. `batch_size` (int) - Batch size affects different aspects of DP-SGD training. For instance, increasing the batch size could reduce the amount of noise added during training under the same privacy guarantee, which reduces the training variance.
3. `learning_rate` (float) - This hyperparameter already exists in vanilla SGD. The higher the learning rate, the more each update matters. If the updates are noisy (such as when the additive noise is large compared to the clipping threshold), a low learning rate may help the training procedure converge. 

**Privacy-specific hyperparameters**
1. `l2_norm_clip` (float) - The maximum Euclidean (L2) norm of each gradient that is applied to update model parameters. This hyperparameter is used to bound the optimizer's sensitivity to individual training points. 
2. `noise_multiplier` (float) - Ratio of the standard deviation to the clipping norm (The amount of noise sampled and added to gradients during training). Generally, more noise results in better privacy (often, but not necessarily, at the expense of lower utility).
3.   `microbatches` (int) - Each batch of data is split in smaller units called microbatches. By default, each microbatch should contain a single training example. This allows us to clip gradients on a per-example basis rather than after they have been averaged across the minibatch. This in turn decreases the (negative) effect of clipping on signal found in the gradient and typically maximizes utility. However, computational overhead can be reduced by increasing the size of microbatches to include more than one training examples. The average gradient across these multiple training examples is then clipped. The total number of examples consumed in a batch, i.e., one step of gradient descent, remains the same. The number of microbatches should evenly divide the batch size. 


Use the hyperparameter values below to obtain a reasonably accurate model (95% test accuracy):

In [6]:
# -----------------------------------------------------------------------------
# HYPERPARAMETERS FOR TRAINING
# -----------------------------------------------------------------------------
# 1) epochs: How many times the model will see the entire training dataset.
# 2) batch_size: Number of samples processed before the model updates its parameters.
# 3) learning_rate: Controls how big a step the training algorithm takes 
#    when updating model parameters.
#
# In differential privacy terms, we often need two additional parameters:
# 4) l2_norm_clip: Maximum value to which gradients are clipped to limit 
#    the influence of any single example.
# 5) noise_multiplier: Amount of noise added to the clipped gradients 
#    to mask contributions of individual data points.
# 6) num_microbatches: The number of subdivisions of a batch. Each microbatch
#    is processed separately for clipping and then aggregated. Sometimes set 
#    to be the same as the batch size (meaning no microbatch splitting).
# -----------------------------------------------------------------------------

epochs = 1
batch_size = 32
learning_rate = 0.25

l2_norm_clip = 1.0
noise_multiplier = 0.5
num_microbatches = 32  # Same as the batch size (i.e., no separate microbatches)

# -----------------------------------------------------------------------------
# VALIDATION: BATCH SIZE VERSUS MICROBATCHES
# -----------------------------------------------------------------------------
# For DP-SGD to function properly, the batch size should be an integer multiple 
# of the number of microbatches. If it's not, we raise a ValueError.
#
# In this example, num_microbatches = batch_size, so batch_size % num_microbatches 
# should be 0. If it isn't, there's likely a mismatch in how you've set up your 
# microbatching.
# -----------------------------------------------------------------------------
if batch_size % num_microbatches != 0:
    raise ValueError(
        "Batch size should be an integer multiple of the number of microbatches."
    )


## Build the model

Define a convolutional neural network as the learning model. 

In [7]:
# -----------------------------------------------------------------------------
# MODEL DEFINITION: A SIMPLE CONVOLUTIONAL NEURAL NETWORK (CNN)
# -----------------------------------------------------------------------------
# We use tf.keras.Sequential to build a feed-forward model layer by layer.
# Each layer is applied in sequence to the output of the previous layer.
# This CNN architecture is suitable for image classification on MNIST.
# -----------------------------------------------------------------------------

import tensorflow as tf

model = tf.keras.Sequential(
    [
        # ---------------------------------------------------------------------
        # Conv2D Layer 1:
        #   - 16 filters: The number of different kernel "patterns" the layer 
        #     will learn to detect.
        #   - Kernel size of 8 (i.e., 8x8 filters).
        #   - strides=2: The filter moves 2 pixels at a time along width and height, 
        #     reducing spatial dimensions faster.
        #   - padding="same": Pads the input so the output size remains 
        #     (input_size / stride) for each dimension (if perfectly divisible).
        #   - activation="relu": A common non-linear activation function 
        #     (Rectified Linear Unit).
        #   - input_shape=(28, 28, 1): The shape of each MNIST image 
        #     (28x28, single grayscale channel).
        # ---------------------------------------------------------------------
        tf.keras.layers.Conv2D(
            16,         # Number of filters
            8,          # Kernel size
            strides=2,  
            padding="same",
            activation="relu",
            input_shape=(28, 28, 1),
        ),

        # ---------------------------------------------------------------------
        # MaxPool2D Layer 1:
        #   - pool_size=2: Takes a 2x2 window and outputs the maximum value 
        #     within that window, reducing spatial dimensions by half (if stride=2).
        #   - strides=1 here means the pooling window moves 1 step at a time,
        #     which can help extract slightly overlapping features.
        # ---------------------------------------------------------------------
        tf.keras.layers.MaxPool2D(
            pool_size=2, 
            strides=1
        ),

        # ---------------------------------------------------------------------
        # Conv2D Layer 2:
        #   - 32 filters, each of size 4x4.
        #   - strides=2: Further reduces the spatial dimensions.
        #   - padding="valid": No padding is added, so the output size 
        #     shrinks based on the kernel size.
        #   - activation="relu": Same ReLU activation for non-linearity.
        # ---------------------------------------------------------------------
        tf.keras.layers.Conv2D(
            32,
            4, 
            strides=2,
            padding="valid",
            activation="relu"
        ),

        # ---------------------------------------------------------------------
        # MaxPool2D Layer 2:
        #   - Another max pooling layer to reduce spatial size 
        #     (height/width of the feature maps).
        #   - pool_size=2 with strides=1 again.
        # ---------------------------------------------------------------------
        tf.keras.layers.MaxPool2D(
            pool_size=2, 
            strides=1
        ),

        # ---------------------------------------------------------------------
        # Flatten Layer:
        #   - Converts the 2D feature maps (plus channel dimension) into 
        #     a 1D vector, preparing the data for the Dense (fully connected) layers.
        # ---------------------------------------------------------------------
        tf.keras.layers.Flatten(),

        # ---------------------------------------------------------------------
        # Dense Layer 1:
        #   - 32 neurons, each with 'relu' activation.
        #   - This layer learns non-linear combinations of features extracted 
        #     by the convolutional layers.
        # ---------------------------------------------------------------------
        tf.keras.layers.Dense(
            32, 
            activation="relu"
        ),

        # ---------------------------------------------------------------------
        # Dense Layer 2 (Output Layer):
        #   - 10 neurons, corresponding to the 10 classes of digits (0-9).
        #   - 'softmax' activation ensures each of the 10 outputs is between 0 and 1, 
        #     summing up to 1. This can be interpreted as the probability of each class.
        # ---------------------------------------------------------------------
        tf.keras.layers.Dense(
            10, 
            activation="softmax"
        ),
    ]
)


Define the optimizer and loss function for the learning model. Compute the loss as a vector of losses per-example rather than as the mean over a minibatch to support gradient manipulation over each training point. 

In [8]:
# -----------------------------------------------------------------------------
# DIFFERENTIALLY PRIVATE OPTIMIZER SETUP
# -----------------------------------------------------------------------------
# A key component of differentially private stochastic gradient descent (DP-SGD)
# is the use of an optimizer that can handle gradient clipping and noise addition
# to each gradient update. 
#
# Here, we use 'DPKerasSGDOptimizer' from the 'tensorflow_privacy' library. 
# It overrides how gradients are calculated and applies the differential 
# privacy steps under the hood.

optimizer = tensorflow_privacy.DPKerasSGDOptimizer(
    l2_norm_clip=l2_norm_clip,           # Gradient clipping threshold
    noise_multiplier=noise_multiplier,   # Noise level added to clipped gradients
    num_microbatches=num_microbatches,   # Number of microbatches per batch
    learning_rate=learning_rate,         # Standard SGD learning rate
)

# -----------------------------------------------------------------------------
# LOSS FUNCTION
# -----------------------------------------------------------------------------
# We use 'CategoricalCrossentropy' for multi-class classification problems 
# (digits 0 through 9). 
# 
# Notice that 'reduction=tf.losses.Reduction.NONE' is used because, in DP-SGD,
# we often compute the loss separately for each microbatch so that we can 
# clip each sample or microbatch's gradient individually. Then, we combine 
# the gradient updates and add noise. 
#
# If we used 'reduction="sum"' or 'reduction="mean"', we would lose the 
# fine-grained control needed for differential privacy, because those 
# modes aggregate the loss before we can apply clipping and noise 
# at the microbatch level.

loss = tf.keras.losses.CategoricalCrossentropy(
    reduction=tf.losses.Reduction.NONE
)


## Train the model


In [9]:
# -----------------------------------------------------------------------------
# COMPILING THE MODEL
# -----------------------------------------------------------------------------
# 'model.compile' configures the learning process. Here we specify:
#   1) optimizer: The object that applies the gradient updates. In this case, 
#      the DP-SGD optimizer from 'tensorflow_privacy', which clips and adds 
#      noise to gradients for differential privacy.
#   2) loss: The objective function used to measure the difference between 
#      the model's predictions and the true labels. We use 
#      'CategoricalCrossentropy' with NO reduction for DP-SGD.
#   3) metrics: Additional statistics we want to track. Here, we track "accuracy",
#      which measures the percentage of correctly classified samples.
model.compile(optimizer=optimizer, loss=loss, metrics=["accuracy"])

# -----------------------------------------------------------------------------
# TRAINING THE MODEL
# -----------------------------------------------------------------------------
# 'model.fit' starts the actual training process. Key parameters include:
#   - train_data, train_labels: The inputs (images) and targets (digit labels).
#   - epochs: Number of times the entire training dataset passes through 
#     the network.
#   - validation_data: A separate dataset (test_data, test_labels) used to 
#     evaluate how well the model generalizes, without updating weights.
#   - batch_size: The number of samples processed before the DP-SGD optimizer 
#     updates the model parameters. This corresponds to the "batch_size" 
#     we set earlier, which is also the same as 'num_microbatches' in this example.
#
# The method returns a History object that contains information about 
# training and validation performance (e.g., loss and accuracy per epoch).
model.fit(
    train_data,
    train_labels,
    epochs=epochs,
    validation_data=(test_data, test_labels),
    batch_size=batch_size,
)




<keras.callbacks.History at 0x7fc4d4d33790>

## Measure the differential privacy guarantee

Perform a privacy analysis to measure the DP guarantee achieved by a training algorithm. Knowing the level of DP achieved enables the objective comparison of two training runs to determine which of the two is more privacy-preserving. At a high level, the privacy analysis measures how much a potential adversary can improve their guess about properties of any individual training point by observing the outcome of the training procedure (e.g., model updates and parameters). 


This guarantee is sometimes referred to as the **privacy budget**. A lower privacy budget bounds more tightly an adversary's ability to improve their guess. This ensures a stronger privacy guarantee. Intuitively, this is because it is harder for a single training point to affect the outcome of learning: for instance, the information contained in the training point cannot be memorized by the ML algorithm and the privacy of the individual who contributed this training point to the dataset is preserved.

In this tutorial, the privacy analysis is performed in the framework of Rényi Differential Privacy (RDP), which is a relaxation of pure DP based on [this paper](https://arxiv.org/abs/1702.07476) that is particularly well suited for DP-SGD.


Two metrics are used to express the DP guarantee of an ML algorithm:

1.   Delta ($\delta$) - Bounds the probability of the privacy guarantee not holding. A rule of thumb is to set it to be less than the inverse of the size of the training dataset. In this tutorial, it is set to $10^{-5}$ as the MNIST dataset has 60,000 training points.
2.   Epsilon ($\epsilon$) - This is the privacy budget. It measures the strength of the privacy guarantee (or maximum tolerance for revealing information on input data) by bounding how much the probability of a particular model output can vary by including (or excluding) a single training point. A smaller value for $\epsilon$ implies a better privacy guarantee. However, the $\epsilon$ value is only an upper bound and a large value could still mean good privacy in practice.

For more detail about the mathematical definition of $(\epsilon, \delta)$-differential privacy, see the original [DP-SGD paper](https://arxiv.org/pdf/1607.00133.pdf).

Tensorflow Privacy provides a tool, `compute_dp_sgd_privacy`, to compute the value of $\epsilon$ given a fixed value of $\delta$ and the following hyperparameters from the training process:

1.   The total number of points in the training data, `n`.
2. The `batch_size`.
3.   The `noise_multiplier`.
4. The number of `epochs` of training.

In [10]:
# -----------------------------------------------------------------------------
# COMPUTING THE PRIVACY STATEMENT
# -----------------------------------------------------------------------------
# The 'compute_dp_sgd_privacy_statement' function from 'tensorflow_privacy' 
# produces a human-readable message about the (ε, δ) guarantees of a 
# differentially private training configuration.
#
# PARAMETERS:
#   - number_of_examples:     The total number of training samples (e.g., 60,000 for MNIST).
#   - batch_size:            How many samples are processed in one training step.
#   - noise_multiplier:      The amount of noise added to the gradients in DP-SGD.
#   - used_microbatching:    Whether microbatching is used. If 'False', 
#                            microbatching is effectively disabled or equals the batch size.
#   - num_epochs:            How many times the entire dataset is traversed during training.
#   - delta:                 The δ (delta) parameter in differential privacy, 
#                            typically set to a small number (like 1e-5 or 1 / (training set size)).
#
# The returned statement typically indicates the effective ε (epsilon) 
# for the given DP-SGD hyperparameters, letting you know the approximate 
# privacy budget spent after training completes.
#
# EXAMPLE:
#   "DP-SGD with sampling rate = ..., noise_multiplier = ... and ... steps 
#    achieves (ε, δ)-DP for ε=..., δ=1e-5."
#
# This statement tells you the privacy level (ε) you have, assuming you 
# set δ to 1e-5.
# -----------------------------------------------------------------------------

dpsgd_statement = compute_dp_sgd_privacy.compute_dp_sgd_privacy_statement(
    number_of_examples=train_data.shape[0],  # total training examples (e.g., 60,000)
    batch_size=batch_size,                  # batch size used during training
    noise_multiplier=noise_multiplier,      # noise added to gradients
    used_microbatching=False,               # 'False' means microbatching not in use
    num_epochs=epochs,                      # how many epochs were trained
    delta=1e-5                              # small delta for DP (1e-5 is commonly used)
)

# The returned string describes the resulting (ε, δ) privacy guarantee. 
# We print it out to see how strong our differential privacy is, 
# given the chosen hyperparameters.
print(dpsgd_statement)


DP-SGD performed over 60000 examples with 32 examples per iteration, noise
multiplier 0.5 for 1 epochs without microbatching, and no bound on number of
examples per user.

This privacy guarantee protects the release of all model checkpoints in addition
to the final model.

Example-level DP with add-or-remove-one adjacency at delta = 1e-05 computed with
RDP accounting:
    Epsilon with each example occurring once per epoch:        10.726
    Epsilon assuming Poisson sampling (*):                      3.800

No user-level privacy guarantee is possible without a bound on the number of
examples per user.

(*) Poisson sampling is not usually done in training pipelines, but assuming
that the data was randomly shuffled, it is believed the actual epsilon should be
closer to this value than the conservative assumption of an arbitrary data
order.



The tool reports $\epsilon$ value for the hyperparameters chosen above, including $\delta=10^{-5}$.

Copyright 2024 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License


---

**Explanation**:

1. **What is being computed?**  
   - The statement shows the privacy guarantees when using DP-SGD on 60,000 training examples (MNIST) with:
     - A batch size of 32.  
     - A noise multiplier of 0.5.  
     - 1 epoch of training (the model saw each image once).

2. **Example-Level DP**  
   - *Example-level* differential privacy means each **individual image (example)** is protected. When you see “no user-level privacy guarantee,” it indicates there’s no upper limit on how many images could belong to the same user (i.e., a single user could contribute multiple images).

3. **\(\varepsilon\) (Epsilon) and \(\delta\)**  
   - The privacy level is measured by \(\varepsilon\) (epsilon) and \(\delta\).  
   - **\(\delta = 1e-5\)** is a small probability that the worst-case privacy guarantee might fail.  
   - Two different epsilon values are shown:
     - **Epsilon = 10.726** under the conservative assumption of arbitrary data order.  
     - **Epsilon = 3.800** under the assumption of Poisson sampling (a more randomized data order).  
   - A **lower** epsilon implies a **stronger** privacy guarantee.

4. **Why Two Epsilon Values?**  
   - DP-SGD accounting can be done in different ways:
     - **Conservative assumption**: Data might be arranged in the worst possible order for privacy.  
     - **Poisson sampling assumption**: The data was well-shuffled, and each example is included in a batch with some probability. This typically yields a **lower** (better) epsilon in practice.

5. **No Bound on the Number of Examples per User**  
   - Without a limit on how many samples each user can contribute, the statement notes there is no *user-level* differential privacy guarantee.  
   - *User-level DP* would require each *user* to have a maximum number of samples in the dataset (so that bounding the user’s contributions could further protect that user’s data).

6. **Interpretation**  
   - Even though \(\varepsilon\) might be somewhat large (like 10.726), the presence of noise still obscures individual contributions compared to training without DP.  
   - If you want **stronger privacy guarantees** (i.e., a smaller epsilon), you could:
     - Increase the noise multiplier,  
     - Reduce the number of epochs, or  
     - Use smaller batch sizes (increasing the sampling rate).

Overall, this statement helps you **quantify** how strong your model’s privacy protection is under various assumptions about how data is sampled and organized.
