https://jonathan-hui.medium.com/tensorflow-eager-execution-v-s-graph-tf-function-6edaa870b1f1

by default TF uses eager_execution.... disavantages

Graph Mode is a powerful feature where your computations are first defined as a computation graph before being executed. This mode is used primarily in TensorFlow 1.x and still available in TensorFlow 2.x through @tf.function.

Key benefits:

- Performance Optimization: Since the graph is static, TensorFlow can apply various optimizations like constant folding, kernel fusion, etc., improving execution speed.

- Deployment Efficiency: Graphs can be serialized, exported, and run on different platforms (e.g., mobile devices, TPU, etc.).

- Parallel Execution: The static graph enables parallel computation across devices efficiently.

- Error Checking: Graphs are analyzed before execution, allowing TensorFlow to catch potential issues early.


For debugging or quick prototyping, Eager Execution is generally more intuitive, but Graph Mode excels in production scenarios requiring efficiency.

Graph Mode: When you use @tf.function, TensorFlow traces the function once (using a sample input or the first value you provide), creating a static computation graph. This graph includes all operations needed for forward and backward passes (like matrix multiplications, activations, gradients, etc.). It doesn’t dynamically change once traced — meaning the structure is fixed, but the values computed during the forward and backward passes will depend on the input data and current model parameters.

Eager Execution: When not using @tf.function, TensorFlow operates in eager execution mode, which means that operations are evaluated immediately as you run the code, and the graph is built dynamically as the operations are executed. This allows for more flexibility but is slower than graph execution.

-> SIMPLE EXAMPLES

In [1]:
import tensorflow as tf

2025-03-27 16:54:56.394291: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-27 16:54:56.394496: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-27 16:54:56.558356: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-27 16:54:56.933005: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:


@tf.function  # Converts to Graph Mode
def add_tensors(x, y):
    return x + y

# Example usage
a = tf.constant(3)
b = tf.constant(5)
print(add_tensors(a, b))  # Output: 8


tf.Tensor(8, shape=(), dtype=int32)


In [4]:
# See what the generated code looks like
print(tf.autograph.to_code(add_tensors.python_function))

def tf__add_tensors(x, y):
    with ag__.FunctionScope('add_tensors', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()
        try:
            do_return = True
            retval_ = ag__.ld(x) + ag__.ld(y)
        except:
            do_return = False
            raise
        return fscope.ret(retval_, do_return)



In [5]:
# watch the graph using tensorboard

In [7]:
import tensorflow as tf
import os

# Define your function with @tf.function to enable graph mode
@tf.function
def add_tensors(x, y):
    return x + y

# Create a directory for TensorBoard logs
log_dir = "logs/graph"
if not os.path.exists(log_dir):
    os.makedirs(log_dir)

# Create a SummaryWriter to log the graph
writer = tf.summary.create_file_writer(log_dir)

# Log the graph to the writer
with writer.as_default():
    # Log the graph using tf.summary.trace_on() and tf.summary.trace_export()
    tf.summary.trace_on(graph=True)
    a = tf.constant(3)
    b = tf.constant(5)
    add_tensors(a, b)
    tf.summary.trace_export(name="model_trace", step=0, profiler_outdir=log_dir)

# Start TensorBoard (in the terminal or an IDE with TensorBoard support)
# Run the following in your terminal:
# tensorboard --logdir=logs/graph


Open http://localhost:6006 in your browser to view the graph under the Graph tab in TensorBoard.

In [8]:
# image

### Example 2

Control flow statements which are very intuitive to write in eager mode can look very complex in graph mode. You can see that in the next examples: 

In [9]:
# simple function that returns the square if the input is greater than zero
@tf.function
def f(x):
    if x>12:
        x = x * x
    return x

print(tf.autograph.to_code(f.python_function))

def tf__f(x):
    with ag__.FunctionScope('f', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()

        def get_state():
            return (x,)

        def set_state(vars_):
            nonlocal x
            (x,) = vars_

        def if_body():
            nonlocal x
            x = ag__.ld(x) * ag__.ld(x)

        def else_body():
            nonlocal x
            pass
        ag__.if_stmt(ag__.ld(x) > 12, if_body, else_body, get_state, set_state, ('x',), 1)
        try:
            do_return = True
            retval_ = ag__.ld(x)
        except:
            do_return = False
            raise
        return fscope.ret(retval_, do_return)



In [13]:
# esto da error
import tensorflow as tf

@tf.function
def simple_model(x):
    weights = tf.Variable([[0.1, 0.2], [0.3, 0.4]])
    biases = tf.Variable([0.5, 0.6])
    output = tf.matmul(x, weights) + biases
    return tf.nn.relu(output)

# Example input data
x = tf.constant([[1.0, 2.0]])

# Forward pass
print(simple_model(x))


ValueError: in user code:

    File "/tmp/ipykernel_197006/4086060191.py", line 6, in simple_model  *
        weights = tf.Variable([[0.1, 0.2], [0.3, 0.4]])

    ValueError: tf.function only supports singleton tf.Variables created on the first call. Make sure the tf.Variable is only created once or created outside tf.function. See https://www.tensorflow.org/guide/function#creating_tfvariables for more information.


The error occurs because TensorFlow's @tf.function decorator requires that tf.Variable objects are created outside the decorated function or only once during the first call. When you define weights and biases inside the @tf.function-decorated function, TensorFlow tries to recreate these variables every time the function is called, which is not allowed.

To fix this, you need to create the tf.Variable objects once, outside of the function, and then use them inside the @tf.function function.

in the following:
    
The structure of the graph is defined once when you call simple_model(x) for the first time.

Values like x, weights, and biases will change as you pass new data during training.

The graph stays the same, but the values for activations and gradients will change depending on the data and current model parameters.

In [12]:
import tensorflow as tf

# Define the model with tf.function
@tf.function
def simple_model(x, weights, biases):
    output = tf.matmul(x, weights) + biases
    return tf.nn.relu(output)

# Create variables outside the tf.function
weights = tf.Variable([[0.1, 0.2], [0.3, 0.4]])
biases = tf.Variable([0.5, 0.6])

# Example input data
x = tf.constant([[1.0, 2.0]])

# Forward pass
print(simple_model(x, weights, biases))


tf.Tensor([[1.2 1.6]], shape=(1, 2), dtype=float32)


In [23]:
# divisible by 4, 6 or both

@tf.function
def check_divisibility(max_num):
    counter = 0
    for num in range(max_num):
        if num % 4 == 0 and num % 6 == 0:
            print('Divisible by 4 and 6')
        elif num % 4 == 0:
            print('Divisible by 4')
        elif num % 6 == 0:
            print('Divisible by 6')
        else:
            print(num)
        counter += 1
    return counter

# Example usage
check_divisibility(10)

# Print the code representation of the graph
print(tf.autograph.to_code(check_divisibility.python_function))


Divisible by 4 and 6
1
2
3
Divisible by 4
5
Divisible by 6
7
Divisible by 4
9
def tf__check_divisibility(max_num):
    with ag__.FunctionScope('check_divisibility', 'fscope', ag__.ConversionOptions(recursive=True, user_requested=True, optional_features=(), internal_convert_user_code=True)) as fscope:
        do_return = False
        retval_ = ag__.UndefinedReturnValue()
        counter = 0

        def get_state_3():
            return (counter,)

        def set_state_3(vars_):
            nonlocal counter
            (counter,) = vars_

        def loop_body(itr):
            nonlocal counter
            num = itr

            def get_state_2():
                return ()

            def set_state_2(block_vars):
                pass

            def if_body_2():
                ag__.ld(print)('Divisible by 4 and 6')

            def else_body_2():

                def get_state_1():
                    return ()

                def set_state_1(block_vars):
                    pass


## PART 2

In [24]:
## print
def f(x):
    print("Traced with", x)

for i in range(5):
    f(2)
    
f(3)

Traced with 2
Traced with 2
Traced with 2
Traced with 2
Traced with 2
Traced with 3


In [25]:
@tf.function
def f(x):
    print("Traced with", x)

for i in range(5):
    f(2)
    
f(3)

Traced with 2
Traced with 3


In [26]:
@tf.function
def f(x):
    print("Traced with", x)
    # added tf.print
    tf.print("Executed with", x)

for i in range(5):
    f(2)
    
f(3)

Traced with 2
Executed with 2
Executed with 2
Executed with 2
Executed with 2
Executed with 2
Traced with 3
Executed with 3


tf.print is graph aware and will run as expected in loops.

Summary of Key Benefits:
Performance Boost: Faster execution through optimization, better memory management, and reduced Python overhead.

Hardware Optimization: Efficient execution on GPUs/TPUs with hardware-specific optimizations.

Portability: Easier deployment and execution across different platforms and hardware.

Reduced Python Overhead: Compiles operations into a static graph, eliminating the need for repeated function calls in Python.

Efficient Backpropagation: Optimized gradient computation, leading to faster training.

By using @tf.function, you're enabling TensorFlow to optimize execution for performance, which is especially important for complex models and large datasets.

In [30]:
import tensorflow as tf
import numpy as np
import time

# Define a simple model for binary classification
class SimpleModel(tf.keras.Model):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(64, activation='relu')
        self.dense2 = tf.keras.layers.Dense(1, activation='sigmoid')

    def call(self, inputs):
        x = self.dense1(inputs)
        return self.dense2(x)

# Loss and optimization functions
loss_fn = tf.keras.losses.BinaryCrossentropy()
optimizer = tf.keras.optimizers.Adam()

# Generate random data for training (1000 samples, 20 features)
X_train = np.random.randn(1000, 20).astype(np.float32)
y_train = np.random.randint(0, 2, size=(1000, 1)).astype(np.float32)

# Create an instance of the model
model = SimpleModel()

# Training step without @tf.function
def train_step_no_graph(x, y):
    with tf.GradientTape() as tape:
        logits = model(x)
        loss = loss_fn(y, logits)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

# Training step with @tf.function (graph mode)
@tf.function
def train_step_with_graph(x, y):
    with tf.GradientTape() as tape:
        logits = model(x)
        loss = loss_fn(y, logits)
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    return loss

# Helper function to train for one epoch
def train_epoch(train_data, train_labels, train_step_fn):
    total_loss = 0
    for x_batch, y_batch in zip(train_data, train_labels):
        # Ensure x_batch has the correct shape (batch_size, num_features)
        if x_batch.ndim == 1:
            x_batch = np.expand_dims(x_batch, axis=0)
        # Ensure y_batch has the correct shape (batch_size, 1)
        if y_batch.ndim == 1:
            y_batch = np.expand_dims(y_batch, axis=0)
        loss = train_step_fn(x_batch, y_batch)
        total_loss += loss
    return total_loss / len(train_data)

# Fix the input shape by ensuring X_train has a batch dimension
if X_train.ndim == 1:
    X_train = X_train.reshape(-1, 20)  # Ensures input is (batch_size, 20)

# Train with no graph mode (normal Python function)
start_time = time.time()
train_epoch(X_train, y_train, train_step_no_graph)
end_time = time.time()
no_graph_duration = end_time - start_time
print(f"Training with no graph mode took {no_graph_duration:.4f} seconds.")

# Train with graph mode (tf.function)
start_time = time.time()
train_epoch(X_train, y_train, train_step_with_graph)
end_time = time.time()
graph_duration = end_time - start_time
print(f"Training with graph mode took {graph_duration:.4f} seconds.")

# Compare the results
print(f"Speedup from graph mode: {no_graph_duration / graph_duration:.2f}x")

Training with no graph mode took 13.6618 seconds.
Training with graph mode took 1.1947 seconds.
Speedup from graph mode: 11.44x
