Replies: 12 comments 23 replies
-
|
Thanks for taking the time to think about this @Stoatwblr — and for the upfront disclaimer about hallucination, that's the right framing. Before going down a patch-generation path, two things would help the team a lot more than a generic optimization sweep:
One concrete data point on Gemini's claims — claim 1 ("Implementing mtime + file-size verification checks ... would allow the client to safely skip re-hashing unchanged files") is already in the code, at time_t fdiskDate = CPath::GetModificationTime(full);
...
if (fdiskDate == file->GetLastChangeDatetime() && fdiskSize == (sint64)file->GetFileSize()) {
// skip re-hash
}And the cold-start path does the same compare via So the trade is: rather than asking Gemini for general-modernization patches, point at one specific thing you've felt was slow, paste numbers if you have them ("seeded N TB, restart took M minutes hashing"), and we'll look at the actual code path that handled it. That kind of report has a much better chance of landing as a working PR. |
Beta Was this translation helpful? Give feedback.
-
|
[edit: This was being composed during @got3nks reply above and it hadn't been seen] When asking Gemnini if these apply to the live code branch ==== The amule-org/amule repository is a recent infrastructure migration (the core maintainers moved from amule-project to amule-org because the original owner became unreachable). The development team is actively working on it, but the underlying codebase remains historically rooted in older patterns. The structural bottlenecks inside the source code can be verified by reviewing specific files: 1. KnownFileList.cpp (File Hashing & Startup Lag)
2. Manual Memory Allocation & C-Style Strings
3. amuleDlg.cpp (The Thread Sync Layout Engine)
Summary for Your Dev BranchBecause you are compiling via CMake and Clang with debug info enabled, you can isolate these targets yourself:
If you are thinking about writing a patch to submit upstream to the new amule-org team, let me know:
[this last suggestion seems like a worthwhile addition. Comments?] ====== Me: "is there a portable way of detecting/using aes/ni or similar CPU extensions in a package made for generic x86_64 systems?"Yes. The standard, portable way to deploy a generic x86_64 binary package that dynamically detects and runs AES-NI, AVX2, or other vector extensions at runtime is through a combination of Compiler Function Multi-Versioning and Built-in CPU Dispatching. This completely eliminates the need for separate Haswell and Broadwell packages, allowing a single generic binary to automatically utilize Broadwell instructions on your server while executing safely on your Haswell desktop without throwing illegal instruction (SIGILL) crashes. Method 1: The Modern C++ Way (Function Multi-Versioning)Both Clang and GCC support the Method 2: Explicit Dynamic Dispatch (__builtin_cpu_supports)If you are writing manual assembly wrappers or using explicit Intel hardware intrinsics (like Note: This built-in is fully supported on modern Linux toolchains across Clang and GCC. Method 3: The Microarchitecture Level Framework (x86-64-v3)If you want to compile an entire binary asset tailored for modern processors without dealing with individual function attributes, target the standardized x86-64 microarchitecture levels. (LLVM Phabricator) Instead of passing [ Bearing in mind I've been discussing my specific hardware setup with Gemini so far] Why x86-64-v3 Is Perfect for Your Setup:
If you would like to implement a multi-versioned patch for a specific file, let me know:
Me: "which way is best for a distribution package rather than my specific setup?"For a public distribution package (like an upstream Ubuntu PPA, an Arch User Repository package, or a Debian .deb), you cannot predict what hardware the user will run. A hard dependency on modern extensions will lock out users on older machines, while a baseline x86-64-v1 package will waste performance. The best strategy is not a single choice, but a layered architecture used by tier-1 Linux distributions (like Fedora, openSUSE, and Ubuntu). Tier 1: The Binary Engine (GLIBC HWCAPS) — The Definitive MethodThe absolute best way to handle this at a packaging level is Glibc Hardware Capabilities (hwcaps). This mechanism allows you to compile your critical, heavy performance libraries twice, stick them in a special subfolder inside your package, and let the system's dynamic linker automatically choose the best one at runtime. When building your distribution package, you arrange your library installations into this specific file tree: Why this is the best distribution choice:
Tier 2: The Application Frontend — Function Multi-Versioning (FMV)If you are packaging the main executable (amuled or amulegui) rather than a library file, implementing Function Multi-Versioning using You compile the package using the lowest common denominator baseline ( Why this is the best distribution choice:
Tier 3: Upstream Repository Architecture — The Split BaselineIf your distribution mechanism supports separate repository branches, you can adopt the model utilized by major Linux distributions. Instead of embedding checks inside the package, maintain two distinct compiled versions of the package metadata:
During package installation, your distribution script can run a quick check: ###💡 Recommendation for aMule Packaging For a widely deployed aMule package, a hybrid approach yields the best results:
This architecture allows the distribution package to run smoothly on legacy client platforms while automatically unlocking full AVX2 performance on your server and desktop machines without any user intervention. If you are writing the package installer script, let me know:
It's all food for thought. Hopefully useful rather than "cruft" |
Beta Was this translation helpful? Give feedback.
-
|
I haven't tried a cold hash build - YET, as I didn't really want to trash the existing metfiles, but there's no real harm in doing so. :) Startup time with already-existing known.met files is in the 3-10 second range (94k files, 26TB), so we're already good from that point of view Will give coldstart a try and report back. I did potice that Gemini seems to give greater credence to old reports than the fixes already implemented (long startup delays) and that's why I warned to take it with more than a few "grains of salt" (then again, it may have identified a better path) |
Beta Was this translation helpful? Give feedback.
-
|
btw: after a reload shared and getting "Found 178 known shared files, 94206 unknown", it'd be really nice to have hashing completions logged wihout needing debug mode. |
Beta Was this translation helpful? Give feedback.
-
|
seven minutes in Huh? 2 unknown? Edit: It seems to be handling about 70 files/minute @ an average of 350MB/file - 16 minutes in that cadence is being maintained, which translates to about a day to rehash 26TB/94k files in single-threaded mode from an absolute cold start. If that can be halved I'd gladly take it, although AES/NI detection seems more important in the short term as that has concrete effects on load. Threadsafing Amulegui is a worthwhile thing. Most parts of it are several orders of magnitude faster than they were in 2.3.3 but it does get a tad sluggish updating when 20 uploads are happening and I'm hitting bandwidth limits |
Beta Was this translation helpful? Give feedback.
-
|
Useful data points — particularly the warm-restart number (3–10 s for 94 k files) and the cold-rehash projection. Replies in line:
|
Beta Was this translation helpful? Give feedback.
-
|
Heads up: The amule implementation of rc4 is slightly broken vs Crypto++ Trying to use Crypto++ results in broken connectivity (I've also got parallelised hashing with disk thrash protection working) Gemini again:Where the Logic MismatchedIn the standard amule-org/amule repository architecture, obfuscation initialization works via an interleaved swap. The class CEncryptedStreamSocket constructs an in-place dynamic loop stream transformation context [▲]. When the obfuscation layer initializes keys via SetKey(md5), the socket assumes the engine begins encoding right away without dropping historical byte sequences or altering positions. When we introduced Crypto++ to override RC4Encrypt.cpp, the engine was structured using m_cipher.SetKey(...). In Crypto++, calling SetKey doesn't just pass the 16-byte password; it completely resets the cipher's internal byte counters back to zero. The Permanent, Portable Fix: Use aMule's Clean Native Crypto Layer Since your goal is to build an unthrottled, portable client that works cleanly on desktop hosts, we should use a modular approach: We can restore perfect network connection stability by rolling back only the RC4Encrypt changes to their original, pristine repository state, while keeping your fast parallel hashing system and CMake rules untouched. ============ When asking about wotking around the issue and using Crpto++ anyway: =========== Sticking with aMule's internal version for the RC4/obfuscation layer is absolutely the best choice here. Why Sticking to the Internal RC4 is Better
The Perfect Hybrid Setup You Have NowBy rolling back just the RC4 files and keeping the rest of our verified changes, you have created a highly optimized, custom hybrid build:
======== |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the heads-up. Triage on each thread. RC4 / Crypto++ swap. Symptom is real, mechanism is trivial: eMule obfuscation is RC4-drop-1024 ( Parallelised hashing + disk-thrash protection. If the branch works, please open a scoped PR against Crypto++ migration for SHA-1 / AICH / MD5. Already on our roadmap; would land as its own scoped PR — please don't bundle with the hashing or RC4 work. Per-digest commits inside that PR also make a future bisect tractable if any one swap surfaces a wire-format mismatch. General ask: one PR per scope, current |
Beta Was this translation helpful? Give feedback.
-
|
https://github.com/Stoatwblr/amule/blob/master/amule-SHA-MD5-patches.TGZ These WILL need more work if used - for starters they only know about Crypto++ RC5 patches left in, but these are NOT used in the final product for the reasons discussed above |
Beta Was this translation helpful? Give feedback.
-
|
Woohoo! 2026-06-18 20:25:28: Adding file /ZFSMEDIA/AMULE/Mule-1/Temp/Temp-1/203.part.met to shares 2026-06-18 20:25:35: Successfully updated ipfilter.dat 2026-06-18 20:25:33: IP filter is ready !2026-06-18 20:25:46: Connected to Kad (ok) 2026-06-18 20:26:17: External connection closed. Disk is extremely comfortable and if anything else fires up the thrash detector turns the thread count down. Right now Jellyfin is doing its thing in the same pool |
Beta Was this translation helpful? Give feedback.
-
|
Parallelisation_git_report.docx |
Beta Was this translation helpful? Give feedback.
-
|
I'd say the gains from farming out to crypto++ are modest, but the important part is that it's less ongoing overhead and more standardisation for parts that simply be plugged in. Why have a dog if you're going to do all the barking yourself? I've got 2 regressions that affect the patches in the doc: 1 (fixed) was that amulegui's "shared files" counter wasn't seeing the new shares the other (more serious and I haven't looked at it yet) is that having rehashed everything, I renamed/moved some newly downloaded files - only to find that whilst the directories show in sharefiles.dat, the scanner says those directories are empty. Zlib will have its biggest gains from threadsafe parallelisation. The existing inline method is a serious bottleneck and simply calling external libraries won't gain much - not helped by the issue that for best gains you really need zlib2 and that's simply not widely deployed enough to be viable. I'd rather see that hook into the back of the existing compression libraries like pzlib did 15+ years ago (pigz, pbzip, etc), as it means everything just gains without needing masses of rewrites. In any case we're not pushing large amounts of data through zlib even on wide-bandwith connections - it will become a bottleneck per thread at about 30MB/s but there are still gains to be had in the meantime simply by getting away from running synchronously I'm not touching zlib until this regression is nailed down, for obvious reasons - I've made a start on it but shelved things until hasing is fully sorted. @mrjimenez What's your opinion of the code quality? It seems fairly reasonable to me and nothing like the utter garbage I was seeing AI produce 5 years ago. As I said my only issue is that I have to keep nudging it to stay on the right path and Gemini's interface rapidly bloats a browser until it OOMs - but you can get around that by dumping the conversation to PDF, starting a new session and then feeding the previous chat back in. It seems to result in a more focussed AI too. What's really impressed me is the dfocumentations. I predicted 15 years ago that AI would take the white collar jobs first, based on what had already happened in offices since the 1970s (when was the last time you saw a room full of ledger clerks scratching away?) and still think that the 21st century Luddites tossing shoes into the punch cards will be newly unemployed paralegals and conveyancers as the mundane stuff becomes AI-manageable and what's left needing better expertise to deal with. You can probably web developers to that crew too. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I was using gemini to see if there are compilation optimisations I can apply to my individual systems (broadwell and haswell based systems) and a couple of things came up.
Take all this with a large grain of salt, it's AI and potentially hallucenatory
Compiling RelWithDebInfo and symbols enabled (the -g)
For broadwell systems:
For Haswell systems
Without those in place, AES/NI extensions won't be used. I don't know if this applies to later CPU generations
Using clang/llvm
This will tell cmake to use the llvm build system instead of gnu - if you're doing this you will need to delete any existing build/ tree and CMakeCache.txt - I've tried these and they work for me but "your milage may vary" (YMMV)
I then asked Gemini if it could see any issues in the live code. Again, these answers may be spurious. I'm pasting these for comment and consideration only.
======================
Developing, building, and running aMule inside a debugger uncovers distinct legacy architectural patterns in the codebase. Because the project originated in the early 2000s, it functions primarily as a C-style application written using C++98 syntax.
Several areas within the aMule source code would benefit from modernization, particularly concerning performance bottlenecks on your Broadwell server and stability issues on your Haswell desktop.
1. File Hashing & Core Startup Latency (The known2_64.met Bottleneck)
The Issue: When amuled boots or reloads shared folders, it parses large directory trees sequentially. Users with large collections report it taking up to an hour to complete hashing or validation. The codebase relies on a single thread to compute the MD4 chunk hashes sequentially, leaving other Broadwell cores unutilized.
The Fix: Modernize the file-scanning subsystem in KnownFileList.cpp. Introducing std::execution::par (parallel policies) allows the initialization loops to process files concurrently across all CPU threads.
Redundant Hashing: Implementing modification time (mtime) and file-size verification checks on startup (similar to modern BitTorrent clients) would allow the client to safely skip re-hashing unchanged files.
2. Monolithic Threading & Blocking Network I/O
The Issue: The networking model relies heavily on synchronous sockets wrapped in basic wxWidgets thread management structures. Under intense network strain—such as handling high connection queues or experiencing unexpected gateway dropouts—these threads can stall, occasionally causing the entire daemon process to crash.
The Fix: Migrate the legacy network infrastructure to a modern asynchronous framework. Refactoring code blocks to utilize modern Boost.Asio handlers with C++ Lambdas (moving away from legacy boost::bind patterns) prevents network fluctuations from blocking core application execution threads.
3. Outdated Memory Frameworks & Lack of RAII
The Issue: The aMule codebase contains extensive manual memory allocation chains (new and delete), along with legacy C-style pointer structures (char* buffers) for packet building. This approach increases the risk of memory leaks and buffer tracking errors, which can complicate the debugging process in RelWithDebInfo targets.
The Fix: Convert raw tracking loops to follow RAII (Resource Acquisition Is Initialization) patterns using modern C++ design principles. Swapping out traditional pointer configurations for modern memory management structures minimizes resource allocation errors:
4. Excessive Deep-Copy Overheads on String Parsing
The Issue: The codebase passes large string parameters—such as paths, filenames, and peer data—as copies rather than references. It also depends on heavy allocations like wxString instances, which can introduce unnecessary overhead during large-scale network indexing.
The Fix: Update internal function definitions across structural classes like CFile and CPacket to leverage modern zero-copy typing:
5. GUI Rendering Sync Lag over Remote Connections
The Issue: When amulegui runs on an Ubuntu 26.04 Wayland instance under KDE Plasma, it handles high-frequency GUI updates via legacy cross-thread event dispatches. If the server pushes rapid data updates, the client can drop frames, flicker, or experience thread deadlocks.
The Fix: Update the remote core mapping inside amuleDlg.cpp to decouple rendering loops from data packet processing. Offloading state updates to a separate background thread allows the main GUI thread to remain responsive on the client desktop.
If you want to focus your development work on a specific optimization, let me know:
Would you like to patch the file-hashing loops to resolve high startup times?
Are you looking to clean up memory management to address specific connection tracking issues?
I can help generate concrete code patches or configure clang-tidy to flag these exact modern target opportunities automatically during compilation.
=============
Should I ask Gemini for its suggested patches?
Beta Was this translation helpful? Give feedback.
All reactions