There were some regressions in rls2602, trending [0] shows most prominent testcase to be geneve 1tnl. Bisect did not replicate the old performance, but VPP version suffix gives a hint. I believe the cause is the CSIT change [1] (picked up by new oper branch) switching ARM jobs to test builds compiled using VPP_PLATFORM, as opposed to downloading generic multiarch builds from packagecloud. As the same change also created big progressions (ip4base shown in [0]), I guess the trade-off is worth it for ARM users. Comparing runtime telemetries before [2] and after [3], the notable increase is ip4-receive almost tripling in Cycles/Pkt.
I am not sure what is happening in perfmon output, so copying it here in case it helps VPP developers tweak the build further.
Before:
2025-12-01 00:12:24,034 - telemetry.bundle_vppctl - L1D/L2D/L3D data cache accesses and refills per packet
L1D: access refill %* L2D: access refill %* L3D: access refill %* pkts
vpp_main (0)
vpp_wk_0 (1)
geneve4-input 38.25 .66 1.73% 9.56 1.34 14.00% 0.00 2.32 0.00% 4171120
geneve4-encap 62.16 1.08 1.74% 9.44 1.62 17.13% 0.00 2.00 0.00% 5205592
dpdk-input 42.72 5.57 13.03% 17.55 1.53 8.71% 0.00 5.46 0.00% 9376456
ip4-udp-lookup 25.39 .36 1.42% 3.73 0.00 .08% 0.00 .07 0.00% 4171120
ip4-input-no-checksum 18.02 .66 3.68% 6.27 .33 5.19% 0.00 .56 0.00% 9376456
ip4-input 20.35 .37 1.80% 5.58 .09 1.57% 0.00 .13 0.00% 4170864
ip4-rewrite 36.22 .56 1.54% 5.41 .01 .23% 0.00 .08 0.00% 14582304
ip4-receive 99.72 1.84 1.85% 6.99 .04 .64% 0.00 .66 0.00% 4170864
ip4-load-balance 17.08 .41 2.42% 3.83 0.00 .07% 0.00 .02 0.00% 4170864
ip4-lookup 29.11 .71 2.43% 4.48 .01 .18% 0.00 .11 0.00% 13547320
ethernet-input 28.17 1.72 6.09% 6.67 .87 13.05% 0.00 1.40 0.00% 13547576
TwoHundredGigabitEthernet1/0/0-tx 47.02 3.86 8.21% 12.42 .87 6.99% 0.00 2.32 0.00% 4170864
TwoHundredGigabitEthernet1/0/0-output 6.32 .38 5.94% 1.51 0.00 .23% 0.00 .03 0.00% 4170864
TwoHundredGigabitEthernet1/0/1-tx 36.65 3.88 10.57% 11.78 .88 7.43% 0.00 2.09 0.00% 5205848
TwoHundredGigabitEthernet1/0/1-output 6.09 .41 6.66% 1.92 0.00 .17% 0.00 .02 0.00% 5205848
After:
2025-12-02 00:27:11,261 - telemetry.bundle_vppctl - L1D/L2D/L3D data cache accesses and refills per packet
L1D: access refill %* L2D: access refill %* L3D: access refill %* pkts
vpp_main (0)
vpp_wk_0 (1)
geneve4-input 37.76 .32 .84% 4.45 .01 .31% 0.00 .04 0.00% 3976400
geneve4-encap 62.15 1.05 1.69% 4.77 .97 20.36% 0.00 1.19 0.00% 4921879
dpdk-input 42.09 4.64 11.02% 15.03 .97 6.42% 0.00 2.91 0.00% 8898023
ip4-udp-lookup 25.36 .16 .64% 2.65 0.00 .05% 0.00 .02 0.00% 3976400
ip4-input-no-checksum 17.96 .34 1.90% 3.12 .01 .31% 0.00 .03 0.00% 8898023
ip4-input 20.34 .38 1.85% 3.20 .01 .21% 0.00 .02 0.00% 3976400
ip4-rewrite 36.27 .35 .96% 2.72 .01 .32% 0.00 .03 0.00% 13820283
ip4-receive 94.96 1.40 1.48% 5.67 .98 17.33% 0.00 1.31 0.00% 3976144
ip4-load-balance 16.98 .21 1.25% 2.63 0.00 .07% 0.00 .01 0.00% 3976144
ip4-lookup 29.01 .37 1.27% 2.75 0.00 .13% 0.00 .02 0.00% 12874423
ethernet-input 28.13 1.17 4.17% 4.84 .67 13.77% 0.00 .78 0.00% 12874423
TwoHundredGigabitEthernet1/0/0-tx 47.05 3.32 7.06% 10.79 .79 7.29% 0.00 1.54 0.00% 3976400
TwoHundredGigabitEthernet1/0/0-output 6.31 .29 4.55% .80 0.00 .13% 0.00 .01 0.00% 3976400
TwoHundredGigabitEthernet1/0/1-tx 36.69 3.47 9.45% 10.53 .73 6.95% 0.00 1.36 0.00% 4922004
TwoHundredGigabitEthernet1/0/1-output 6.10 .38 6.25% 1.35 0.00 .03% 0.00 .01 0.00% 4922004
[0] https://csit.fd.io/trending/#eNrNU8luwyAQ_RpyqcYyOMSnHpr4PyKCJ7YlGyPAzvL1gSjRxD2kUi_tgW3emzeb8GF0uPfYfzK5ZeWWibKr48aKr494zNaCMNA4DSLPGxSW63M5owow9GcJnV3DZn0ArgFDm15xHZRHGJxLMmKXZOopLDQJse2FkDeRyEM5VOQSwxEU0L_EWSZErKNTA_ruikSNJRCuY0cI4nqpHy72BV1U2Y4UXqZO8jwr8kzyrNgku6xYWd25v-nzPkzGYO-_9ftxTrVt0OCMwCPvD6bwTO-HabxN99_MSFYrM7rh_ifi3bfjCYLrVO8fphv9WuMQ
[1] 43903: feat(core): Build VPP for ARM | https://gerrit.fd.io/r/c/csit/+/43903
[2] https://logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-grc/19805863470/log.html.gz#s1-s1-s1-s3-s2-t1-k2-k9-k10-k10-k1-k1-k1-k14
[3] https://logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-grc/19839837995/log.html.gz#s1-s1-s1-s3-s2-t1-k2-k9-k10-k10-k1-k1-k1-k14
There were some regressions in rls2602, trending [0] shows most prominent testcase to be geneve 1tnl. Bisect did not replicate the old performance, but VPP version suffix gives a hint. I believe the cause is the CSIT change [1] (picked up by new oper branch) switching ARM jobs to test builds compiled using VPP_PLATFORM, as opposed to downloading generic multiarch builds from packagecloud. As the same change also created big progressions (ip4base shown in [0]), I guess the trade-off is worth it for ARM users. Comparing runtime telemetries before [2] and after [3], the notable increase is ip4-receive almost tripling in Cycles/Pkt.
I am not sure what is happening in perfmon output, so copying it here in case it helps VPP developers tweak the build further.
Before:
After:
[0] https://csit.fd.io/trending/#eNrNU8luwyAQ_RpyqcYyOMSnHpr4PyKCJ7YlGyPAzvL1gSjRxD2kUi_tgW3emzeb8GF0uPfYfzK5ZeWWibKr48aKr494zNaCMNA4DSLPGxSW63M5owow9GcJnV3DZn0ArgFDm15xHZRHGJxLMmKXZOopLDQJse2FkDeRyEM5VOQSwxEU0L_EWSZErKNTA_ruikSNJRCuY0cI4nqpHy72BV1U2Y4UXqZO8jwr8kzyrNgku6xYWd25v-nzPkzGYO-_9ftxTrVt0OCMwCPvD6bwTO-HabxN99_MSFYrM7rh_ifi3bfjCYLrVO8fphv9WuMQ
[1] 43903: feat(core): Build VPP for ARM | https://gerrit.fd.io/r/c/csit/+/43903
[2] https://logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-grc/19805863470/log.html.gz#s1-s1-s1-s3-s2-t1-k2-k9-k10-k10-k1-k1-k1-k14
[3] https://logs.fd.io/vex-yul-rot-jenkins-1/csit-vpp-perf-mrr-daily-master-2n-grc/19839837995/log.html.gz#s1-s1-s1-s3-s2-t1-k2-k9-k10-k10-k1-k1-k1-k14