Optimize FREIGHT multi-pass streaming evaluation (1.8x)#2
Merged
schulzchristian merged 1 commit intomainfrom Apr 3, 2026
Merged
Optimize FREIGHT multi-pass streaming evaluation (1.8x)#2schulzchristian merged 1 commit intomainfrom
schulzchristian merged 1 commit intomainfrom
Conversation
Replace expensive per-pass evaluation with efficient in-memory alternatives,
producing bit-identical results with the FREIGHT CLI.
Changes to bindings/freight_binding.cpp:
- Connectivity evaluation: replace vector-of-vectors reverse mapping +
std::set-per-net with per-net bit vectors (ceil(k/64) words per net),
set incrementally during the main partitioning loop, evaluated via popcount
- Cut-net evaluation: count CUT_NET entries in stream_edges_assign directly
instead of building a reverse mapping (O(num_nets) sequential scan)
- Eliminate valid_neighboring_nets vector; re-iterate CSR edges for per-net
tracking update (edge data is in L1 cache from prior accumulation scan)
- Pre-allocate best partition vectors to avoid dynamic reallocation
- Skip best-partition snapshot on the last pass (read from stream_nodes_assign
directly if last pass is best); use memcpy for intermediate snapshots
- Copy result directly from best/current assignment to numpy output via memcpy,
skipping the intermediate restore step
- Replace /dev/null file open with lightweight null_buf for output suppression
Verified bit-identical against FREIGHT CLI (freight_con_opt, freight_cut_opt)
on ISPD98 ibm01/ibm05/ibm18, k=8, passes 1-10, both objectives.
Binding vs CLI wall-clock (--ram_stream, includes evaluation):
ibm18 connectivity k=8:
1 pass: CLI 101ms Bind 36ms (2.8x faster)
5 pass: CLI 366ms Bind 262ms (1.4x faster)
10 pass: CLI 653ms Bind 515ms (1.3x faster)
ibm18 cut-net k=8:
1 pass: CLI 102ms Bind 37ms (2.8x faster)
5 pass: CLI 371ms Bind 242ms (1.5x faster)
10 pass: CLI 685ms Bind 494ms (1.4x faster)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bindings/freight_binding.cppfor ~1.8x overall speedupnet_to_nodesreverse mapping +std::set-per-net evaluation with in-memory bit vectors (connectivity) and directCUT_NETcounting (cut-net)valid_neighboring_netsvector, use memcpy for snapshots and output, skip redundant copiesfreight_con_opt,freight_cut_opt)Benchmark (ibm18, k=8, --ram_stream, wall-clock including evaluation)
Quality and balance match exactly across all tested configs (ibm01/ibm05/ibm18, k=8, passes 1-10, both objectives, seed=0).
Test plan
tests/test_freight.py