ArmDeveloperEcosystem · pareenaverma · Aug 27, 2025 · Aug 27, 2025
diff --git a/...ths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md b/...ths/servers-and-cloud-computing/tune-network-workloads-on-bare-metal/1_setup.md
@@ -11,9 +11,9 @@ layout: learningpathall
 
 There are numerouse client-server and network-based workloads, and Tomcat is a typical example of such applications, which provide services via HTTP/HTTPS network requests.
 
-In this section, you'll set up a benchmark environment using Apache Tomcat and `wrk2` to simulate HTTP load and evaluate performance on an Arm-based bare-metal (**__`Nvidia-Grace`__**).
+In this section, you'll set up a benchmark environment using `Apache Tomcat` and `wrk2` to simulate HTTP load and evaluate performance on an Arm-based bare-metal, such as **__`AWS c8g.metal-48xl`__**.
 
-## Set up the Tomcat benchmark server on **Nvidia Grace**
+## Set up the Tomcat benchmark server on **AWS c8g.metal-48xl**
 [Apache Tomcat](https://tomcat.apache.org/) is an open-source Java Servlet container that runs Java web applications, handles HTTP requests, and serves dynamic content. It supports technologies such as Servlet, JSP, and WebSocket.
 
 ## Install the Java Development Kit (JDK)
@@ -30,8 +30,8 @@ sudo apt install -y openjdk-21-jdk
 Download and extract Tomcat:
 
 ```bash
-wget -c https://dlcdn.apache.org/tomcat/tomcat-11/v11.0.9/bin/apache-tomcat-11.0.9.tar.gz
-tar xzf apache-tomcat-11.0.9.tar.gz
+wget -c https://dlcdn.apache.org/tomcat/tomcat-11/v11.0.10/bin/apache-tomcat-11.0.10.tar.gz
+tar xzf apache-tomcat-11.0.10.tar.gz
 ```
 Alternatively, you can build Tomcat [from source](https://github.com/apache/tomcat).
 
@@ -41,7 +41,7 @@ To access the built-in examples from your local network or external IP, use a te
 
 The file is at:
 ```bash
-apache-tomcat-11.0.9/webapps/examples/META-INF/context.xml
+~/apache-tomcat-11.0.10/webapps/examples/META-INF/context.xml
 ```
 
 ```xml
@@ -60,17 +60,17 @@ To achieve maximum performance of Tomcat, the maximum number of file descriptors
 Start the server:
 
 ```bash
-ulimit -n 65535 && ./apache-tomcat-11.0.9/bin/startup.sh
+ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
 ```
 
 You should see output like:
 
 ```output
-Using CATALINA_BASE:   /home/ubuntu/apache-tomcat-11.0.9
-Using CATALINA_HOME:   /home/ubuntu/apache-tomcat-11.0.9
-Using CATALINA_TMPDIR: /home/ubuntu/apache-tomcat-11.0.9/temp
+Using CATALINA_BASE:   /home/ubuntu/apache-tomcat-11.0.10
+Using CATALINA_HOME:   /home/ubuntu/apache-tomcat-11.0.10
+Using CATALINA_TMPDIR: /home/ubuntu/apache-tomcat-11.0.10/temp
 Using JRE_HOME:        /usr
-Using CLASSPATH:       /home/ubuntu/apache-tomcat-11.0.9/bin/bootstrap.jar:/home/ubuntu/apache-tomcat-11.0.9/bin/tomcat-juli.jar
+Using CLASSPATH:       /home/ubuntu/apache-tomcat-11.0.10/bin/bootstrap.jar:/home/ubuntu/apache-tomcat-11.0.10/bin/tomcat-juli.jar
 Using CATALINA_OPTS:
 Tomcat started.
 ```
@@ -132,28 +132,28 @@ ulimit -n 65535 && wrk -c32 -t16 -R50000 -d60 http://${tomcat_ip}:8080/examples/
 You should see output similar to:
 
 ```console
-Running 1m test @ http://172.26.203.139:8080/examples/servlets/servlet/HelloWorldExample
+Running 1m test @ http://172.31.46.193:8080/examples/servlets/servlet/HelloWorldExample
   16 threads and 32 connections
-  Thread calibration: mean lat.: 0.986ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.999ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.994ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.983ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.989ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.993ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.985ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.987ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.990ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.984ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.991ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.978ms, rate sampling interval: 10ms
-  Thread calibration: mean lat.: 0.976ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.381ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.626ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.020ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.578ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.166ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.275ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.454ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.655ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.334ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.089ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.365ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.382ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.342ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.349ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.023ms, rate sampling interval: 10ms
+  Thread calibration: mean lat.: 3.275ms, rate sampling interval: 10ms
   Thread Stats   Avg      Stdev     Max   +/- Stdev
-    Latency     1.00ms  454.90us   5.09ms   63.98%
-    Req/Sec     3.31k   241.68     4.89k    63.83%
-  2999817 requests in 1.00m, 1.56GB read
-Requests/sec:  49997.08
+    Latency     1.02ms  398.88us   4.24ms   66.77%
+    Req/Sec     3.30k   210.16     4.44k    70.04%
+  2999776 requests in 1.00m, 1.56GB read
+Requests/sec:  49996.87
 Transfer/sec:     26.57MB
 ```
diff --git a/.../servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md b/.../servers-and-cloud-computing/tune-network-workloads-on-bare-metal/2_baseline.md
@@ -11,45 +11,79 @@ To achieve maximum performance, ulimit -n 65535 must be executed on both server
 {{% /notice %}}
 
 ## Optimal baseline before tuning
-- Baseline on Grace bare-metal (default configuration)
-- Baseline on Grace bare-metal (access logging disabled)
-- Baseline on Grace bare-metal (optimal thread count)
+- Align the IOMMU settings with default Ubuntu
+- Baseline on Arm Neoverse bare-metal (default configuration)
+- Baseline on Arm Neoverse bare-metal (access logging disabled)
+- Baseline on Arm Neoverse bare-metal (optimal thread count)
+
+### Align the IOMMU settings with default Ubuntu
+
+{{% notice Note %}}
+Due to the customized Ubuntu distribution on AWS, you first need to align the IOMMU settings with default Ubuntu: iommu.strict=1 and iommu.passthrough=0.
+{{% /notice %}}
+
+1. Setting IOMMU default status, use a text editor to modify the `grub` file by adding or updating the `GRUB_CMDLINE_LINUX` configuration.
+
+```bash
+sudo vi /etc/default/grub
+```
+then add or update
+```bash
+GRUB_CMDLINE_LINUX="iommu.strict=1 iommu.passthrough=0"
+```
+
+2. Update GRUB and reboot to apply the default settings.
+```bash
+sudo update-grub && sudo reboot
+```
+
+3. Verify whether the default settings have been successfully applied.
+```bash
+sudo dmesg | grep iommu
+```
+It can be observed that under the default configuration, iommu.strict is enabled, and iommu.passthrough is disabled.
+```bash
+[    0.877401] iommu: Default domain type: Translated (set via kernel command line)
+[    0.877404] iommu: DMA domain TLB invalidation policy: strict mode (set via kernel command line)
+...
+```
+
+### Baseline on Arm Neoverse bare-metal (default configuration)
 
-### Baseline on Grace bare-metal (default configuration)
 {{% notice Note %}}
 To align with the typical deployment scenario of Tomcat, reserve 8 cores online and set all other cores offline
 {{% /notice %}}
 
 1. You can offline the CPU cores using the below command.
 ```bash
-for no in {8..143}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
+for no in {8..191}; do sudo bash -c "echo 0 > /sys/devices/system/cpu/cpu${no}/online"; done
 ```
 2. Use the following commands to verify that cores 0-7 are online and the remaining cores are offline.
 ```bash
 lscpu
 ```
 You can check the following information:
 ```bash
-Architecture:             aarch64
-  CPU op-mode(s):         64-bit
-  Byte Order:             Little Endian
-CPU(s):                   144
-  On-line CPU(s) list:    0-7
-  Off-line CPU(s) list:   8-143
-Vendor ID:                ARM
-  Model name:             Neoverse-V2
+Architecture:                aarch64
+  CPU op-mode(s):            64-bit
+  Byte Order:                Little Endian
+CPU(s):                      192
+  On-line CPU(s) list:       0-7
+  Off-line CPU(s) list:      8-191
+Vendor ID:                   ARM
+  Model name:                Neoverse-V2
 ...
 ```
 
-3. Use the following command on the Grace bare-metal where `Tomcat` is on
+3. Use the following command on the Arm Neoverse bare-metal where `Tomcat` is on
 ```bash
-~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
-ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
+~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
+ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
 ```
 
 4. And use the following command on the `x86_64` bare-metal where `wrk2` is on
 ```bash
-tomcat_ip=10.169.226.181
+tomcat_ip=172.31.46.193
 ```
 ```bash
 ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examples/servlets/servlet/HelloWorldExample
@@ -58,20 +92,20 @@ ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examp
 The result of default configuration is:
 ```bash
   Thread Stats   Avg      Stdev     Max   +/- Stdev
-    Latency    13.29s     3.25s   19.07s    57.79%
-    Req/Sec   347.59    430.94     0.97k    66.67%
-  3035300 requests in 1.00m, 1.58GB read
-  Socket errors: connect 1280, read 0, write 0, timeout 21760
-Requests/sec:  50517.09
-Transfer/sec:     26.84MB
+    Latency    16.76s     6.59s   27.56s    56.98%
+    Req/Sec     1.97k   165.05     2.33k    89.90%
+  14680146 requests in 1.00m, 7.62GB read
+  Socket errors: connect 1264, read 0, write 0, timeout 1748
+Requests/sec: 244449.62
+Transfer/sec:    129.90MB
 ```
 
-### Baseline on Grace bare-metal (access logging disabled)
+### Baseline on Arm Neoverse bare-metal (access logging disabled)
 To disable the access logging, use a text editor to modify the `server.xml` file by commenting out or removing the **`org.apache.catalina.valves.AccessLogValve`** configuration.
 
 The file is at:
 ```bash
-vi ~/apache-tomcat-11.0.9/conf/server.xml
+vi ~/apache-tomcat-11.0.10/conf/server.xml
 ```
 
 The configuratin is at the end of the file, and common out or remove it.
@@ -83,10 +117,10 @@ The configuratin is at the end of the file, and common out or remove it.
 -->
 ```
 
-1. Use the following command on the Grace bare-metal where `Tomcat` is on
+1. Use the following command on the Arm Neoverse bare-metal where `Tomcat` is on
 ```bash
-~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
-ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
+~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
+ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
 ```
 
 2. And use the following command on the `x86_64` bare-metal where `wrk2` is on
@@ -97,15 +131,15 @@ ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examp
 The result of access logging disabled is:
 ```bash
   Thread Stats   Avg      Stdev     Max   +/- Stdev
-    Latency    12.66s     3.05s   17.87s    57.47%
-    Req/Sec   433.69    524.91     1.18k    66.67%
-  3572200 requests in 1.00m, 1.85GB read
-  Socket errors: connect 1280, read 0, write 0, timeout 21760
-Requests/sec:  59451.85
-Transfer/sec:     31.59MB
+    Latency    16.16s     6.45s   28.26s    57.85%
+    Req/Sec     2.16k     5.91     2.17k    77.50%
+  16291136 requests in 1.00m, 8.45GB read
+  Socket errors: connect 0, read 0, write 0, timeout 75
+Requests/sec: 271675.12
+Transfer/sec:    144.36MB
 ```
 
-### Baseline on Grace bare-metal (optimal thread count)
+### Baseline on Arm Neoverse bare-metal (optimal thread count)
 To minimize resource contention between threads and overhead from thread context switching, the number of CPU-intensive threads in Tomcat should be aligned with the number of CPU cores.
 
 1. When using `wrk` to perform pressure testing on `Tomcat`:
@@ -115,23 +149,39 @@ top -H -p$(pgrep java)
 
 You can see the below information
 ```bash
-top - 12:12:45 up 1 day,  7:04,  5 users,  load average: 7.22, 3.46, 1.75
-Threads:  79 total,   8 running,  71 sleeping,   0 stopped,   0 zombie
-%Cpu(s):  3.4 us,  1.9 sy,  0.0 ni, 94.1 id,  0.0 wa,  0.0 hi,  0.5 si,  0.0 st
-MiB Mem : 964975.5 total, 602205.6 free,  12189.5 used, 356708.3 buff/cache
-MiB Swap:      0.0 total,      0.0 free,      0.0 used. 952786.0 avail Mem
+top - 08:57:29 up 20 min,  1 user,  load average: 4.17, 2.35, 1.22
+Threads: 231 total,   8 running, 223 sleeping,   0 stopped,   0 zombie
+%Cpu(s): 31.7 us, 20.2 sy,  0.0 ni, 31.0 id,  0.0 wa,  0.0 hi, 17.2 si,  0.0 st
+MiB Mem : 386127.8 total, 380676.0 free,   4040.7 used,   2801.1 buff/cache
+MiB Swap:      0.0 total,      0.0 free,      0.0 used. 382087.0 avail Mem
 
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
-  53254 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.70 http-nio-8080-e
-  53255 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.62 http-nio-8080-e
-  53256 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.64 http-nio-8080-e
-  53258 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.62 http-nio-8080-e
-  53260 yinyu01   20   0   38.0g   1.4g  28288 R  96.7   0.1   2:30.69 http-nio-8080-e
-  53257 yinyu01   20   0   38.0g   1.4g  28288 R  96.3   0.1   2:30.59 http-nio-8080-e
-  53259 yinyu01   20   0   38.0g   1.4g  28288 R  96.3   0.1   2:30.63 http-nio-8080-e
-  53309 yinyu01   20   0   38.0g   1.4g  28288 R  95.3   0.1   2:29.69 http-nio-8080-P
-  53231 yinyu01   20   0   38.0g   1.4g  28288 S   0.3   0.1   0:00.10 VM Thread
-  53262 yinyu01   20   0   38.0g   1.4g  28288 S   0.3   0.1   0:00.12 GC Thread#2
+   4677 ubuntu    20   0   36.0g   1.4g  24452 R  89.0   0.4   1:18.71 http-nio-8080-P
+   4685 ubuntu    20   0   36.0g   1.4g  24452 R   4.7   0.4   0:04.42 http-nio-8080-A
+   4893 ubuntu    20   0   36.0g   1.4g  24452 S   3.3   0.4   0:00.60 http-nio-8080-e
+   4963 ubuntu    20   0   36.0g   1.4g  24452 S   3.3   0.4   0:00.66 http-nio-8080-e
+   4924 ubuntu    20   0   36.0g   1.4g  24452 S   3.0   0.4   0:00.59 http-nio-8080-e
+   4955 ubuntu    20   0   36.0g   1.4g  24452 S   3.0   0.4   0:00.60 http-nio-8080-e
+   5061 ubuntu    20   0   36.0g   1.4g  24452 S   3.0   0.4   0:00.61 http-nio-8080-e
+   4895 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.58 http-nio-8080-e
+   4907 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.59 http-nio-8080-e
+   4940 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.58 http-nio-8080-e
+   4946 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.59 http-nio-8080-e
+   4956 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.65 http-nio-8080-e
+   4959 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.59 http-nio-8080-e
+   4960 ubuntu    20   0   36.0g   1.4g  24452 R   2.7   0.4   0:00.60 http-nio-8080-e
+   4962 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.57 http-nio-8080-e
+   4982 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.63 http-nio-8080-e
+   4983 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.58 http-nio-8080-e
+   4996 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.60 http-nio-8080-e
+   5033 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.59 http-nio-8080-e
+   5036 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.66 http-nio-8080-e
+   5056 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.61 http-nio-8080-e
+   5065 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.56 http-nio-8080-e
+   5068 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.61 http-nio-8080-e
+   5070 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.60 http-nio-8080-e
+   5071 ubuntu    20   0   36.0g   1.4g  24452 S   2.7   0.4   0:00.61 http-nio-8080-e
+...
 ```
 
 It can be observed that **`http-nio-8080-e`** and **`http-nio-8080-P`** threads are CPU-intensive.
@@ -141,7 +191,7 @@ To configure the `http-nio-8080-e` thread count, use a text editor to modify the
 
 The file is at:
 ```bash
-vi ~/apache-tomcat-11.0.9/conf/server.xml
+vi ~/apache-tomcat-11.0.10/conf/server.xml
 ```
 
 
@@ -164,10 +214,10 @@ vi ~/apache-tomcat-11.0.9/conf/server.xml
     />
 ```
 
-2. Use the following command on the Grace bare-metal where `Tomcat` is on
+2. Use the following command on the Arm Neoverse bare-metal where `Tomcat` is on
 ```bash
-~/apache-tomcat-11.0.9/bin/shutdown.sh 2>/dev/null
-ulimit -n 65535 && ~/apache-tomcat-11.0.9/bin/startup.sh
+~/apache-tomcat-11.0.10/bin/shutdown.sh 2>/dev/null
+ulimit -n 65535 && ~/apache-tomcat-11.0.10/bin/startup.sh
 ```
 
 3. And use the following command on the `x86_64` bare-metal where `wrk2` is on
@@ -178,9 +228,9 @@ ulimit -n 65535 && wrk -c1280 -t128 -R500000 -d60 http://${tomcat_ip}:8080/examp
 The result of optimal thread count is:
 ```bash
   Thread Stats   Avg      Stdev     Max   +/- Stdev
-    Latency    24.34s     9.91s   41.81s    57.77%
-    Req/Sec     1.22k     4.29     1.23k    71.09%
-  9255672 requests in 1.00m, 4.80GB read
-Requests/sec: 154479.07
-Transfer/sec:     82.06MB
+    Latency    10.26s     4.55s   19.81s    62.51%
+    Req/Sec     2.86k    89.49     3.51k    77.06%
+  21458421 requests in 1.00m, 11.13GB read
+Requests/sec: 357835.75
+Transfer/sec:    190.08MB
 ```