Skip to content

optimize Abacus#2

Closed
SecretLittleBoy wants to merge 7 commits into
ApeachM:feature/mdm/legalizeReproducefrom
SecretLittleBoy:fixbug
Closed

optimize Abacus#2
SecretLittleBoy wants to merge 7 commits into
ApeachM:feature/mdm/legalizeReproducefrom
SecretLittleBoy:fixbug

Conversation

@SecretLittleBoy
Copy link
Copy Markdown

optimized result:
./openroad ../../src/mdm/test/ICCAD/2022/reproduce/ipl-case3-gp.tcl
HPWL is: 27,479,632
HPWL is: 31,273,952
cost real time 0m4.627s

@SecretLittleBoy
Copy link
Copy Markdown
Author

Is anyone still in charge of this repository?

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 6, 2026

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs. Remove the Stale label or comment to keep it open.

@github-actions github-actions Bot added the Stale label Apr 6, 2026
@github-actions github-actions Bot closed this Apr 27, 2026
ApeachM pushed a commit that referenced this pull request May 26, 2026
Bug fixes (correctness):
- CPU-only build was broken: unique_ptr<NesterovDeviceContext> and
  unique_ptr<DeviceState> member destructors need a complete type, but
  the PIMPL headers were only included under #ifdef ENABLE_GPU. Move
  both includes outside the gate (they are plain C++) and add the src/
  include path unconditionally so gpu/*.h can find sibling headers.
- revertToSnapshot now scatters inst coords to DeviceState, refreshes
  pin locations, and marks coords fresh after syncing the host vectors
  back to device. Previously the divergence-recovery iteration ran on
  stale pin coords.
- saveSnapshot pulls curSLPSumGrads_ from device before snapshotting.
  On the GPU path updateGradients writes sum-grads only to device, so
  the host vector stayed at zero and a subsequent revertToSnapshot
  pushed zeros back, wiping the gradient state.
- divideByWSquare in poissonSolver.cpp was called with (hID, wID) but
  the function signature is (wID, hID, binCntX, binCntY, ...). On
  square bin grids the bug was invisible; on non-square grids both the
  bin indexing and the frequency math were wrong. Swap the call args.
- NesterovDeviceContext clamp bounds now match the CPU formula
  exactly: bg.lx()+dDx/2 .. bg.ux()-dDx/2 (and Y mirror). Previously
  the bounds used a bin-width margin, producing different cell
  positions from CPU when the clamp fired.
- PoissonSolver constructor now aborts when bin grid aspect ratio
  exceeds 2:1; the IDCT expk index math in dct.cpp goes negative past
  that point. Aspect threshold is kMaxBinAspectRatio.
- dct.cpp replaced printf+assert(0) with Kokkos::abort. The previous
  pattern was a silent no-op in release (NDEBUG) builds and let
  garbage output continue.

Hardening (defense in depth):
- nb_device_ctx_ allocation guarded by !nb_device_ctx_; initDensity1
  can run more than once (init recursion, routability flows) and
  previously rebuilt every device View on each call.
- getHpwl now consults DeviceState::consumeCoordsFresh() before the
  host->device sync, matching updateWireLengthForceWA.
- coords_fresh_ is now std::atomic<bool> (defensive; consumers run on
  the master thread today but OMP parallel boundaries elsewhere make
  a future race plausible).

Refactor (industry-level cleanup):
- Removed four dead methods from NesterovDeviceContext: swapCurNext,
  swapSumGrads (also broken — structured-binding copied the Views,
  swap was a no-op), scatterDensityGradsToNB, syncCurSLPToHost.
- Collapsed five copy-pasted blocks in syncCoordsToDevice into a
  single pushVecPairToDevice helper. Three pull-to-host functions
  collapsed similarly into pullVecPairToHost. ~50 lines removed.
- Removed unused NesterovBaseCommon* nbc constructor parameter and
  the now-unused h_cur_slp_x/y / h_next_slp_x/y host mirror Views.
- Extracted PoissonSolver::launchDivideByWSquare to share the Step
  #2 lambda between solvePoisson and solvePoissonPotential.

Verification:
- ENABLE_GPU=ON build: gpl regression 63/63 pass (both GPU backend
  and ENABLE_GPU=0 env-pinned CPU backend).
- ENABLE_GPU=OFF build: gpl_lib + openroad compile clean.
- Wall-time benchmark unchanged: large01 (274k cells) CPU 2:16 -> GPU 1:34.

Signed-off-by: Minjae Kim <develop.minjae@gmail.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Signed-off-by: Minjae Kim <minjae.kim@baum-ds.com>
ApeachM added a commit that referenced this pull request May 26, 2026
Bug fixes (correctness):
- CPU-only build was broken: unique_ptr<NesterovDeviceContext> and
  unique_ptr<DeviceState> member destructors need a complete type, but
  the PIMPL headers were only included under #ifdef ENABLE_GPU. Move
  both includes outside the gate (they are plain C++) and add the src/
  include path unconditionally so gpu/*.h can find sibling headers.
- revertToSnapshot now scatters inst coords to DeviceState, refreshes
  pin locations, and marks coords fresh after syncing the host vectors
  back to device. Previously the divergence-recovery iteration ran on
  stale pin coords.
- saveSnapshot pulls curSLPSumGrads_ from device before snapshotting.
  On the GPU path updateGradients writes sum-grads only to device, so
  the host vector stayed at zero and a subsequent revertToSnapshot
  pushed zeros back, wiping the gradient state.
- divideByWSquare in poissonSolver.cpp was called with (hID, wID) but
  the function signature is (wID, hID, binCntX, binCntY, ...). On
  square bin grids the bug was invisible; on non-square grids both the
  bin indexing and the frequency math were wrong. Swap the call args.
- NesterovDeviceContext clamp bounds now match the CPU formula
  exactly: bg.lx()+dDx/2 .. bg.ux()-dDx/2 (and Y mirror). Previously
  the bounds used a bin-width margin, producing different cell
  positions from CPU when the clamp fired.
- PoissonSolver constructor now aborts when bin grid aspect ratio
  exceeds 2:1; the IDCT expk index math in dct.cpp goes negative past
  that point. Aspect threshold is kMaxBinAspectRatio.
- dct.cpp replaced printf+assert(0) with Kokkos::abort. The previous
  pattern was a silent no-op in release (NDEBUG) builds and let
  garbage output continue.

Hardening (defense in depth):
- nb_device_ctx_ allocation guarded by !nb_device_ctx_; initDensity1
  can run more than once (init recursion, routability flows) and
  previously rebuilt every device View on each call.
- getHpwl now consults DeviceState::consumeCoordsFresh() before the
  host->device sync, matching updateWireLengthForceWA.
- coords_fresh_ is now std::atomic<bool> (defensive; consumers run on
  the master thread today but OMP parallel boundaries elsewhere make
  a future race plausible).

Refactor (industry-level cleanup):
- Removed four dead methods from NesterovDeviceContext: swapCurNext,
  swapSumGrads (also broken — structured-binding copied the Views,
  swap was a no-op), scatterDensityGradsToNB, syncCurSLPToHost.
- Collapsed five copy-pasted blocks in syncCoordsToDevice into a
  single pushVecPairToDevice helper. Three pull-to-host functions
  collapsed similarly into pullVecPairToHost. ~50 lines removed.
- Removed unused NesterovBaseCommon* nbc constructor parameter and
  the now-unused h_cur_slp_x/y / h_next_slp_x/y host mirror Views.
- Extracted PoissonSolver::launchDivideByWSquare to share the Step
  #2 lambda between solvePoisson and solvePoissonPotential.

Verification:
- ENABLE_GPU=ON build: gpl regression 63/63 pass (both GPU backend
  and ENABLE_GPU=0 env-pinned CPU backend).
- ENABLE_GPU=OFF build: gpl_lib + openroad compile clean.
- Wall-time benchmark unchanged: large01 (274k cells) CPU 2:16 -> GPU 1:34.

Signed-off-by: Minjae Kim <develop.minjae@gmail.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant