-
Notifications
You must be signed in to change notification settings - Fork 718
VPP Missing_Prefetches
vpp graph nodes make extensive use of explicit prefetching to cover dependent read latency. In the simplest dual-loop case, we prefetch buffer headers and (typically) one cache line worth of packet data. The rest of this page shows what happens if we disable the prefetch block.
Single-core, 13 MPPS offered load, i40e NICs, ~13 MPPS in+out:
vpp# show run
Name Clocks Vectors/Call
FortyGigabitEthernet84/0/1-out 9.08e0 50.09
FortyGigabitEthernet84/0/1-tx 3.84e1 50.09
dpdk-input 7.45e1 50.09
interface-output 1.08e1 50.09
ip4-input-no-checksum 3.92e1 50.09
ip4-lookup 3.88e1 50.09
ip4-rewrite-transit 3.43e1 50.09
The key statistic to note here: ip4-input-no-checksum costs 39 clocks per packet
Baseline "perf top" function-level profile:
14.21% libvnet.so.0.0.0 [.] ip4_input_no_checksum_avx2
14.14% libvnet.so.0.0.0 [.] ip4_lookup_avx2
14.10% vpp [.] i40e_recv_scattered_pkts_vec
12.64% libvnet.so.0.0.0 [.] ip4_rewrite_transit_avx2
10.60% libvnet.so.0.0.0 [.] dpdk_input_avx2
9.70% vpp [.] i40e_xmit_pkts_vec
4.88% libvnet.so.0.0.0 [.] dpdk_interface_tx_avx2
3.67% libvlib.so.0.0.0 [.] dispatch_node
3.25% libvnet.so.0.0.0 [.] vnet_per_buffer_interface_output_avx2
2.96% libvnet.so.0.0.0 [.] vnet_interface_output_node_no_flatten
1.85% libvlib.so.0.0.0 [.] vlib_put_next_frame
1.80% libvlib.so.0.0.0 [.] vlib_get_next_frame_internal
1.12% vpp [.] rte_delay_us_block
/* Prefetch next iteration. */
if (0)
{
vlib_buffer_t * p2, * p3;
p2 = vlib_get_buffer (vm, from[2]);
p3 = vlib_get_buffer (vm, from[3]);
vlib_prefetch_buffer_header (p2, LOAD);
vlib_prefetch_buffer_header (p3, LOAD);
CLIB_PREFETCH (p2->data, sizeof (ip0[0]), LOAD);
CLIB_PREFETCH (p3->data, sizeof (ip1[0]), LOAD);
}
This is a fairly harsh demonstration, but it clearly shows the "missing prefetch, fix me" signature:
Name Clocks Vectors/Call
FortyGigabitEthernet84/0/1-out 7.91e0 76.97
FortyGigabitEthernet84/0/1-tx 3.76e1 76.97
dpdk-input 6.62e1 76.97
interface-output 9.91e0 76.97
ip4-input-no-checksum 5.53e1 76.97
ip4-lookup 3.49e1 76.97
ip4-rewrite-transit 3.32e1 76.97
This single change causes ip4-input-no-checksum to increase to 55 clocks/pkt (from 39 clocks/pkt). ip4-input-no-checksum jumps to the top of the "perf top" summary:
21.47% libvnet.so.0.0.0 [.] ip4_input_no_checksum_avx2
13.73% vpp [.] i40e_recv_scattered_pkts_vec
13.42% libvnet.so.0.0.0 [.] ip4_lookup_avx2
12.53% libvnet.so.0.0.0 [.] ip4_rewrite_transit_avx2
The "perf top" detailed function profile shows a gross stall (32% of the function runtime) at the first use of packet data:
│ /* Check bounds. */
│ ASSERT ((signed) b->current_data >= (signed) -VLIB_BUFFER_PRE_DAT
│ return b->data + b->current_data;
0.77 │ movswq (%rbx),%rax
│ p1 = vlib_get_buffer (vm, pi1);
│
│ ip0 = vlib_buffer_get_current (p0);
│ ip1 = vlib_buffer_get_current (p1);
│
│ sw_if_index0 = vnet_buffer (p0)->sw_if_index[VLIB_RX];
0.06 │ mov 0x20(%rbx),%r11d
│ sw_if_index1 = vnet_buffer (p1)->sw_if_index[VLIB_RX];
0.20 │ mov 0x20(%rbp),%r10d
0.03 │ lea 0x100(%rbx,%rax,1),%rdx
0.80 │ movswq 0x0(%rbp),%rax
│
│ arc0 = ip4_address_is_multicast (&ip0->dst_address) ? lm-
0.23 │ movzbl 0x10(%rdx),%edi
32.64 │ lea 0x100(%rbp,%rax,1),%rax
│ and $0xfffffff0,%edi
0.84 │ cmp $0xe0,%dil
│ arc1 = ip4_address_is_multicast (&ip1->dst_address) ? lm-
0.81 │ movzbl 0x10(%rax),%edi
│
│ vnet_buffer (p0)->ip.adj_index[VLIB_RX] = ~0;
5.32 │ movl $0xffffffff,0x28(%rbx)
│ ip1 = vlib_buffer_get_current (p1);
│
│ sw_if_index0 = vnet_buffer (p0)->sw_if_index[VLIB_RX];
│ sw_if_index1 = vnet_buffer (p1)->sw_if_index[VLIB_RX];
│
- VPP-ABF
- VPP API Concepts
- VPP API Versioning
- VPP-ApiChangeProcess
- VPP-ArtifactVersioning
- VPP-BIER
- VPP-Bihash
- VPP-BugReports
- VPP Build System Deep Dive
- VPP Build, Install, And Test Images
- VPP-BuildArtifactRetentionPolicy
- VPP-c2cpel
- VPP-CodingTips
- VPP Command Line Arguments
- VPP Command Line Interface CLI Guide
- VPP-CommitMessages
- VPP-CommitterTasks-ApiFreeze
- VPP CommitterTasks Compare API Changes
- VPP-CommitterTasks-CutPointRelease
- VPP-CommitterTasks-CutRelease
- VPP-CommitterTasks-FinalReleaseCandidate
- VPP-CommitterTasks-PullThrottleBranch
- VPP-CommitterTasks-ReleasePlan
- VPP Configure An LW46 MAP E Terminator
- VPP Configure VPP As A Router Between Namespaces
- VPP Configure VPP TAP Interfaces For Container Routing
- VPP-CoreFileMismatch
- VPP-cpel
- VPP-cpeldump
- VPP-DHCPv6
- VPP-DistributedOwnership
- VPP DPOs And Feature Arcs
- VPP EC2 Instance With SRIOV
- VPP-elog
- VPP-FAQ
- VPP Feature Arcs
- VPP-g2
- VPP-HA
- VPP-HostStack
- VPP-HostStack-BuiltinEchoClientServer
- VPP-HostStack-EchoClientServer
- VPP-HostStack-ExternalEchoClientServer
- VPP HostStack Hs Test
- VPP-HostStack-LDP-iperf
- VPP-HostStack-LDP-nginx
- VPP-HostStack-LDP-sshd
- VPP-HostStack-nginx
- VPP-HostStack-SessionLayerArchitecture
- VPP-HostStack-TestHttpServer
- VPP-HostStack-TestProxy
- VPP-HostStack-TLS
- VPP-HostStack-VCL
- VPP-HostStack-VclEchoClientServer
- VPP How To Add A Tunnel Encapsulation
- VPP How To Build The Sample Plugin
- VPP How To Connect A PCI Interface To VPP
- VPP How To Create A VPP Binary Control Plane API
- VPP How To Deploy VPP In EC2 Instance And Use It To Connect Two Different VPCs
- VPP How To Optimize Performance %28System Tuning%29
- VPP How To Use The API Trace Tools
- VPP How To Use The C API
- VPP How To Use The Packet Generator And Packet Tracer
- VPP-Howtos
- VPP Installing VPP Binaries From Packages
- VPP Interconnecting vRouters With VPP
- VPP Introduction To IP Adjacency
- VPP Introduction To N Tuple Classifiers
- VPP-IPFIX
- VPP-IPSec
- VPP IPSec And IKEv2
- VPP-Macswapplugin
- VPP-Meeting
- VPP-MFIB
- VPP Missing Prefetches
- VPP Modifying The Packet Processing Directed Graph
- VPP MPLS FIB
- VPP-NAT
- VPP Per Feature Notes
- VPP Performance Analysis Tools
- VPP-perftop
- VPP Project Meeting Minutes
- VPP Pulling, Building, Running, Hacking And Pushing VPP Code
- VPP Pure L3 Between Namespaces With 32s
- VPP Pure L3 Container Networking
- VPP Pushing And Testing A Tag
- VPP Python API
- VPP-QuickTrexSetup
- VPP Random Hints And Kinks For KVM Usage
- VPP Release Plans Release Plan 26.06
- VPP-RM
- VPP-SecurityGroups
- VPP Segment Routing For IPv6
- VPP Segment Routing For MPLS
- VPP Setting Up Your Dev Environment
- VPP-SNAT
- VPP Software Architecture
- VPP STN Testing
- VPP The VPP API
- VPP Training Events
- VPP-Troubleshooting
- VPP-Troubleshooting-BuildIssues
- VPP-Troubleshooting-Vagrant
- VPP Tutorial DPDK And MacSwap
- VPP-Tutorials
- VPP Use VPP To Chain VMs Using Vhost User Interface
- VPP Use VPP To Connect VMs Using Vhost User Interface
- VPP Using mTCP User Mode TCP Stack With VPP
- VPP Using VPP As A VXLAN Tunnel Terminator
- VPP VPP BFD Nexus
- VPP VPP Home Gateway
- VPP-VPPCommunicationsLibrary
- VPP What Is VPP
- VPP Working With The 16.06 Throttle Branch