Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ cascade:

minutes_to_complete: 30

who_is_this_for: This learning path is intended for software developers deploying and optimizing TensorFlow workloads on Linux/Arm64 environments, specifically using Google Cloud C4A virtual machines powered by Axion processors.
who_is_this_for: This is an introductory topic for software developers deploying and optimizing TensorFlow workloads on Arm64 Linux environments, specifically using Google Cloud C4A virtual machines powered by Axion processors.

learning_objectives:
- Provision an Arm-based SUSE SLES virtual machine on Google Cloud (C4A with Axion processors)
Expand All @@ -32,7 +32,7 @@ armips:
tools_software_languages:
- TensorFlow
- Python
- tf.keras
- Keras

operatingsystems:
- Linux
Expand Down
Original file line number Diff line number Diff line change
@@ -1,63 +1,53 @@
---
title: TensorFlow Baseline Testing on Google Axion C4A Arm Virtual Machine
title: Test TensorFlow baseline performance on Google Axion C4A Arm virtual machines
weight: 5

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## TensorFlow Baseline Testing on GCP SUSE VMs
This section helps you check if TensorFlow is properly installed and working on your **Google Axion C4A Arm64 VM**. You will run small tests to confirm that your CPU can perform TensorFlow operations correctly.
## Perform baseline testing

This section helps you verify that TensorFlow is properly installed and working on your Google Axion C4A VM. You'll run tests to confirm that your CPU can perform TensorFlow operations correctly.

### Verify Installation
This command checks if TensorFlow is installed correctly and prints its version number.
### Check available devices

```console
python -c "import tensorflow as tf; print(tf.__version__)"
```

You should see an output similar to:
```output
2.20.0
```

### List Available Devices
This command shows which hardware devices TensorFlow can use — like CPU or GPU. On most VMs, you’ll see only CPU listed.
This command shows which hardware devices TensorFlow can use, such as CPU or GPU. On most VMs, you'll see only CPU listed:

```console
python -c "import tensorflow as tf; print(tf.config.list_physical_devices())"
```

You should see an output similar to:
The output is similar to:

```output
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
```

### Run a Simple Computation
This test multiplies two large matrices to check that TensorFlow computations work correctly on your CPU and measures how long it takes.
### Run a computation test

```python
This test multiplies two large matrices to verify that TensorFlow computations work correctly on your CPU and measures execution time:

```console
python -c "import tensorflow as tf; import time;
a = tf.random.uniform((1000,1000)); b = tf.random.uniform((1000,1000));
start = time.time(); c = tf.matmul(a,b); end = time.time();
print('Computation time:', end - start, 'seconds')"
```
- This checks **CPU speed** and the correctness of basic operations.
- Note the **computation time** as your baseline.

You should see an output similar to:
This checks CPU performance for basic operations and provides a baseline measurement.

The output is similar to:

```output
Computation time: 0.008263111114501953 seconds
```
### Test Neural Network Execution
Create a new file for testing a simple neural network using your text editor ("edit" is shown as an example):

```console
edit test_nn.py
```
This opens a new Python file where you’ll write a short TensorFlow test program.
Paste the code below into the `test_nn.py` file:
### Test neural network execution

Use a text editor to create a new file named `test_nn.py` for testing a simple neural network.

Add the following code to create and train a basic neural network using random data:

```python
import keras
Expand All @@ -80,21 +70,21 @@ model.compile(optimizer='adam', loss='mse')
# Train for 1 epoch
model.fit(x, y, epochs=1, batch_size=32)
```
This script creates and trains a simple neural network using random data — just to make sure TensorFlow’s deep learning functions work properly.

**Run the Script**
This script creates a simple neural network to verify that TensorFlow's deep learning functions work properly on the Arm platform.

### Run the neural network test

Execute the script with Python:
Execute the script:

```console
python test_nn.py
```

**Output**
TensorFlow displays training progress similar to:

TensorFlow will print training progress, like:
```output
32/32 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - loss: 0.1024
```

This confirms that TensorFlow is working properly on your Arm64 VM.
This confirms that TensorFlow is working correctly on your Arm VM and can perform both basic computations and neural network training.
Original file line number Diff line number Diff line change
@@ -1,39 +1,44 @@
---
title: TensorFlow Benchmarking
title: Benchmark TensorFlow model performance using tf.keras
weight: 6

### FIXED, DO NOT MODIFY
layout: learningpathall
---

## Benchmark TensorFlow models

## TensorFlow Benchmarking with tf.keras
This guide benchmarks multiple TensorFlow models (ResNet50, MobileNetV2, and InceptionV3) using dummy input data. It measures average inference time and throughput for each model running on the CPU.
This section benchmarks multiple TensorFlow models (ResNet50, MobileNetV2, and InceptionV3) using dummy input data. You'll measure average inference time and throughput for each model running on the CPU.

`tf.keras` is **TensorFlows high-level API** for building, training, and benchmarking deep learning models. It provides access to **predefined architectures** such as **ResNet**, **MobileNet**, and **Inception**, making it easy to evaluate model performance on different hardware setups like **CPU**, **GPU**, or **TPU**.
tf.keras is TensorFlow's high-level API for building, training, and benchmarking deep learning models. It provides access to predefined architectures such as ResNet, MobileNet, and Inception, making it easy to evaluate model performance on different hardware setups.

### Activate your TensorFlow virtual environment
This step enables your isolated Python environment (`tf-venv`) where TensorFlow is installed. It ensures that all TensorFlow-related packages and dependencies run in a clean, controlled setup without affecting system-wide Python installations:
### Activate your virtual environment

Enable your isolated Python environment where TensorFlow is installed:

```console
source ~/tf-venv/bin/activate
python -c "import tensorflow as tf; print(tf.__version__)"
```
### Install required packages for the benchmark
Here, you install TensorFlow 2.20.0 and NumPy, the core libraries needed for model creation, computation, and benchmarking. NumPy supports efficient numerical operations, while TensorFlow handles deep learning workloads (these packages are likely already installed FYI):

This ensures that all TensorFlow-related packages run in a clean, controlled setup without affecting system-wide Python installations.

### Install required packages

Install TensorFlow and NumPy for model creation and benchmarking:

```console
pip install tensorflow==2.20.0 numpy
```

### Create a Python file named tf_cpu_benchmark.py:
This step creates a Python script (`tf_cpu_benchmark.py`) using your text editor (showing "edit" as an example below) that will run TensorFlow model benchmarking tests:
These packages are likely already installed from the previous installation steps. NumPy supports efficient numerical operations, while TensorFlow handles deep learning workloads.

```console
edit tf_cpu_benchmark.py
```
### Create the benchmark script

Use an editor to create a Python script named `tf_cpu_benchmark.py` that will run TensorFlow model benchmarking tests.

Add the following code to benchmark three different model architectures:

Paste the following code:
```python
import tensorflow as tf
import time
Expand Down Expand Up @@ -66,24 +71,19 @@ for name, constructor in models.items():
print(f"{name} average inference time per batch: {avg_time:.4f} seconds")
print(f"{name} throughput: {throughput:.2f} images/sec")
```
- **Import libraries** – Loads TensorFlow and `time` for model creation and timing.
- **Define models** – Lists three TensorFlow Keras models: **ResNet50**, **MobileNetV2**, and **InceptionV3**.
- **Set parameters** – Configures `batch_size = 32` and runs each model **50 times** for stable benchmarking.
- **Create model instances** – Initializes each model **without pretrained weights** for fair CPU testing.
- **Generate dummy input** – Creates random data shaped like real images **(224×224×3)** for inference.
- **Warm-up phase** – Runs one inference to **stabilize model graph and memory usage**.
- **Benchmark loop** – Measures total time for 50 runs and calculates **average inference time per batch**.
- **Compute throughput** – Calculates how many **images per second** the model can process.
- **Print results** – Displays **average inference time and throughput** for each model.

This script creates model instances without pretrained weights for fair CPU testing, generates random image data for inference, includes a warm-up phase to stabilize model performance, and measures inference time over 50 runs to calculate average performance and throughput.

### Run the benchmark

Execute the benchmarking script:

```console
python tf_cpu_benchmark.py
```

You should see an output similar to:
The output is similar to:

```output
Benchmarking ResNet50...
ResNet50 average inference time per batch: 1.2051 seconds
Expand All @@ -98,23 +98,18 @@ InceptionV3 average inference time per batch: 0.8971 seconds
InceptionV3 throughput: 35.67 images/sec
```

### Benchmark Metrics Explanation
### Understand the results

The benchmark provides key performance metrics. Average inference time per batch measures how long it takes to process one batch of input data, with lower values indicating faster performance. Throughput shows how many images the model can process per second, with higher values indicating better efficiency.

- **Average Inference Time per Batch (seconds):** Measures how long it takes to process one batch of input data. Lower values indicate faster inference performance.
- **Throughput (images/sec):** Indicates how many images the model can process per second. Higher throughput means better overall efficiency.
- **Model Type:** Refers to the neural network architecture used for testing (e.g., ResNet50, MobileNetV2, InceptionV3). Each model has different computational complexity.
### Performance summary

### Benchmark summary
Results from the earlier run on the `c4a-standard-4` (4 vCPU, 16 GB memory) Arm64 VM in GCP (SUSE):
The following table shows results from running the benchmark on a `c4a-standard-4` (4 vCPU, 16 GB memory) aarch64 VM in GCP using SUSE:

| **Model** | **Average Inference Time per Batch (seconds)** | **Throughput (images/sec)** |
|------------------|-----------------------------------------------:|-----------------------------:|
| **ResNet50** | 1.2051 | 26.55 |
| **MobileNetV2** | 0.2909 | 110.02 |
| **InceptionV3** | 0.8971 | 35.67 |
| Model | Average Inference Time per Batch (seconds) | Throughput (images/sec) |
|-------------|-------------------------------------------:|------------------------:|
| ResNet50 | 1.2051 | 26.55 |
| MobileNetV2 | 0.2909 | 110.02 |
| InceptionV3 | 0.8971 | 35.67 |

- **Arm64 VMs show strong performance** for lightweight CNNs like **MobileNetV2**, achieving over **110 images/sec**, indicating excellent optimization for CPU-based inference.
- **Medium-depth models** like **InceptionV3** maintain a **balanced trade-off between accuracy and latency**, confirming consistent multi-core utilization on Arm.
- **Heavier architectures** such as **ResNet50** show expected longer inference times but still deliver **stable throughput**, reflecting good floating-point efficiency.
- **Arm64 provides energy-efficient yet competitive performance**, particularly for **mobile, quantized, or edge AI workloads**.
- **Overall**, Arm64 demonstrates that **TensorFlow workloads can run efficiently on cloud-native ARM processors**, making them a **cost-effective and power-efficient alternative** for AI inference and model prototyping.
The results demonstrate strong performance for lightweight CNNs like MobileNetV2, achieving over 110 images/sec on the aarch64 platform. Medium-depth models like InceptionV3 maintain balanced performance between accuracy and latency. Heavier architectures such as ResNet50 show longer inference times but deliver stable throughput, confirming that TensorFlow workloads run efficiently on Arm processors and provide a cost-effective alternative for AI inference tasks.
Original file line number Diff line number Diff line change
Expand Up @@ -6,45 +6,51 @@ weight: 4
layout: learningpathall
---

## TensorFlow Installation on GCP SUSE VM
TensorFlow is a widely used **open-source machine learning library** developed by Google, designed for building and deploying ML models efficiently. On Arm64 SUSE VMs, TensorFlow can run on CPU natively, or on GPU if available.
## Install TensorFlow on Google Axion C4A

### System Preparation
Update the system and install Python3 and pip3 to a compatible version with tensorflow (please enter "y" when prompted to confirm the install):
TensorFlow is an open-source machine learning library developed by Google for building and deploying ML models efficiently. On aarch64 SUSE VMs, TensorFlow runs natively on CPU or GPU if available.

### Update your system

Update the system and install Python 3.11 with pip and virtual environment support:

```console
sudo zypper refresh
sudo zypper install python311 python311-pip python311-venv
```
This ensures your system is up-to-date and installs Python with the essential tools required for TensorFlow setup.

**Verify Python version:**
Enter "y" when prompted to confirm the installation. This ensures your system has the essential tools required for TensorFlow setup.

### Verify Python installation

Confirm that Python and pip are correctly installed and identify their versions to ensure compatibility with TensorFlow requirements.
Confirm that Python and pip are correctly installed:

```console
python3.11 --version
pip3 --version
```

Your particular versions may vary a bit but typically your version output should resemble:
The output is similar to:

```output
Python 3.11.10
pip 22.3.1 from /usr/lib/python3.11/site-packages/pip (python 3.11)
```

### Create a Virtual Environment (Recommended)
Set up an isolated Python environment (`tf-venv`) so that TensorFlow and its dependencies don’t interfere with system-wide packages or other projects.
### Create a virtual environment

Set up an isolated Python environment to keep TensorFlow dependencies separate from system packages:

```console
python3.11 -m venv tf-venv
source tf-venv/bin/activate
```
Create and activate an isolated Python environment to keep TensorFlow dependencies separate from system packages.

This creates and activates a virtual environment named `tf-venv` that prevents package conflicts.

### Upgrade pip
Upgrade pip to the latest version for smooth and reliable package installation.

Upgrade pip to the latest version for reliable package installation:

```console
pip3 install --upgrade pip
Expand All @@ -58,21 +64,23 @@ pip3 install tensorflow==2.20.0
```

{{% notice Note %}}
TensorFlow 2.18.0 introduced compatibility with NumPy 2.0, incorporating its updated type promotion rules and improved numerical precision.
You can view [this release note](https://blog.tensorflow.org/2024/10/whats-new-in-tensorflow-218.html)
TensorFlow 2.18.0 introduced compatibility with NumPy 2.0, incorporating its updated type promotion rules and improved numerical precision. You can review [What's new in TensorFlow 2.18](https://blog.tensorflow.org/2024/10/whats-new-in-tensorflow-218.html) for more information.

The [Arm Ecosystem Dashboard](https://developer.arm.com/ecosystem-dashboard/) recommends Tensorflow version 2.18.0, the minimum recommended on the Arm platforms.
The [Arm Ecosystem Dashboard](https://developer.arm.com/ecosystem-dashboard/) recommends TensorFlow version 2.18.0 as the minimum recommended version on Arm platforms.
{{% /notice %}}

### Verify installation:
Run a quick Python command to check that TensorFlow was installed successfully and print the installed version number for confirmation.
### Verify the installation

Check that TensorFlow installed successfully and display the version:

```console
python -c "import tensorflow as tf; print(tf.__version__)"
```

You should see an output similar to:
The output is similar to:

```output
2.20.0
```
TensorFlow installation is complete. You can now go ahead with the baseline testing of TensorFlow in the next section.

Your TensorFlow installation is now complete and ready for use.