Skip to content

Merge fastcpc#37

Merged
cr-marcstevens merged 38 commits intomasterfrom
fastcpc
Sep 9, 2024
Merged

Merge fastcpc#37
cr-marcstevens merged 38 commits intomasterfrom
fastcpc

Conversation

@cr-marcstevens
Copy link
Owner

  • (for speed: forward): progressively increase output set size for the first few steps (t=1,2,3) of forward. this speeds up these first few steps considerably without loss of quality
  • (fix forward, backward, connect): make dostep_index volatile, read it at start, write it at end of critical section.
  • (improvement for helper): add better support to fix a few bytes at the start of a near-collision block to support more format-restricted usecases for chosen-prefix collisions.
  • (for speed: helper): add support to negate a set of diffpaths, in order to negate each 'positive delta m_11'-based precomputed set of upper differential paths instead also computing the 'negate delta m_11'-based set.
  • (for speed: helper): add support to combine diffpaths, in order to use precomputed set of upper differential paths and overwrite the full differential paths with the correct ending differences for the near-collision block at hand.
  • (fix md5helper): add support to join files consisting of just a differentialpath
  • (for scaling & latency: forward): add splitsave parameter to save output over multiple files (for multiple connect processes) in parallel
  • (for speed: forward): when full decrease maxcond to stop processing paths that will be pruned later on anyway.
  • (for speed: forward): use C++ move instead of copy, as that is faster
  • (for scaling: forward): use threadlocal buffers to global container to reduce contention
  • (for speed: connect): new mintunnel parameter to prune search from start to avoid bad full paths with too few tunnels, minor code improvements
  • (for latency: connect): do parallel read of both inputfiles & new option waitinputfile to wait for each inputfile to exist before trying to load it
  • (for scaling: forward, backward, connect): improve input distribution over threads, reducing contention
  • (fix avx256 allocation: birthday): older GCCs (like on servers) might not propely do large alignment as required for AVX256
  • (for latency: birthday): immediately save birthday collision when found, instead of waiting till all threads finish
  • (for scaling: birthday): add saveloadwait parameter that controls how frequently trails are distributed to controllers (saved) and loaded by controllers (load)
  • (for scaling: birthday): add generatormode that only generates trails and distributes them to controllers
  • (for latency: lib): compressed diffpath archives: configure gzip for best_speed
  • (for speed: lib, forward, backward): diffpath fast check solvability: only check changed part
  • MacOS fix: differentialpath: clean up operators

…s trails and distributes them to controllers
…rols how frequently trails are distributed to controllers (saved) and loaded by controllers (load)
…when found, instead of waiting till all threads finish
…s) might not propely do large alignment as required for AVX256
… new option waitinputfile to wait for each inputfile to exist before trying to load it
… from start to avoid bad full paths with too few tunnels, minor code improvements
…essing paths that will be pruned later on anyway.
…save output over multiple files (for multiple connect processes) in parallel
…er to use precomputed set of upper differential paths and overwrite the full differential paths with the correct ending differences for the near-collision block at hand.
…, in order to negate each 'positive delta m_11'-based precomputed set of upper differential paths instead also computing the 'negate delta m_11'-based set.
…es at the start of a near-collision block to support more format-restricted usecases for chosen-prefix collisions.
… read it at start, write it at end of critical section.
…for the first few steps (t=1,2,3) of forward. this speeds up these first few steps considerably without loss of quality
…age by doing a minimum amount of work in between accesses, independent of average trail length.
… threads (incl GPU threads), where each GPU thread does not share a core.
@roycewilliams
Copy link

No pressure, but just curious about your plans/timing for merging this.

@cr-marcstevens
Copy link
Owner Author

I think the script need to go through some more testing for smaller machines to ensure it also performes well in that case.
Would be good to hear feedback and comparison between fastcpc.sh and the old cpc.sh.

@fproulx-boostsecurity
Copy link

Excited to give it a spin with have a minute.

@cr-marcstevens cr-marcstevens merged commit e7d864d into master Sep 9, 2024
@cr-marcstevens cr-marcstevens deleted the fastcpc branch September 9, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants