Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Multithreading for L2_LR Solver #25

Closed
wants to merge 9 commits into from

Conversation

jeffpasternack
Copy link

  1. Adds a threadCount param and associated command-line argument. This argument is a noop unless L2_LR solver is used.
  2. Adds the LLThreadPool class, a wrapper around a ThreadPoolExecutor with accompanying helper classes. The general multithreading strategy is: (a) break up an operation over a list of examples into "chunks", small subsets of the examples, (b) process each chunk independently via the thread pool, accumulating results (e.g. the partial gradients) into a thread-local array where applicable, (c) acquiring a lock and then adding the thread-local array into the "real", global array of values.
    (3) The actual multithreaded operations are in L2R_LrFunction. If multithreading is not used, the code paths are exactly the same as before, except that inner loops are factored into separate functions to avoid code duplication.
    (4) Added unit tests.

I've tried to keep to the style of the original code. Wall clock speedup of a problem with 100K examples in 5 dimensions on a MacBook with 4 threads was roughly 2.4x (not rigorously tested).

@apatzer
Copy link

apatzer commented Jul 4, 2018

@bwaldvogel Any progress on the PR? Would love to see an official release now that the C/Liblinear has multi-core support.

@jeffpasternack
Copy link
Author

FYI due to a lack of action on this pull request I have "hard" forked the project (https://github.com/jeffpasternack/liblinear-java) and deployed to the Central Repository (https://search.maven.org/artifact/com.jeffreypasternack/liblinear/2.21/jar). Multithreaded training is a major speedup when training large problems so I wanted to make sure it is widely available to those who want it.

@bwaldvogel
Copy link
Owner

bwaldvogel commented Dec 29, 2018

I’m sorry that it took me so long to answer.

I would like to move forward with this topic and eventually integrate multicore support in liblinear-java.
Since you already spent some time, you can probably answer some questions:

  1. How does your implementation approach differ from the upstream multicore version of Liblinear?

  2. You mentioned to measure a speedup of factor 2.4 with 4 threads. Can you elaborate on your benchmark setup? On which CPU did you test it? I’m wondering that you only get a speedup of factor 2.4 with 4 threads while the upstream Liblinear claims to measure a speed-up factor of 3.5 on the RCV1 dataset.

  3. Did you test and compare the results of your multi-core implementation with the results of the single-core implementation on various datasets?

@bwaldvogel
Copy link
Owner

Closing this PR for the following reasons:

  1. I didn’t get an answer to my questions that I asked almost 2 years ago. So assuming that @jeffpasternack lost interest since he lives with his hard fork now.
  2. This PR has multiple non-trivial conflicts now.
  3. From what I can tell, the approach is significantly different from what the authors of liblinear (C++) did.
  4. The multi-threading implementation has a negative performance impact on the single-threaded mode.
  5. I’m not sure if the multi-threading implementation was properly tested for correctness.

I see a couple of problems in the approach of this PR. For example, the RangeConsumerWithAccumulatorArray seems to do too much of synchronization when the threads accumulate the results:


A speed-up of factor 2.4 with 4 cores sounds like too much potential is given away.

I would like to add multithreading but this will only work for me if

  • The single-threaded implementation has (almost) no negative impact,
  • The implementation approach is the same (or very similar) to what the authors of liblinear (C++) did,
  • The implementation doesn’t make my life harder when porting changes.

BTW: In most of my use cases in the past I trained multiple models in parallel with different threads (Yes, Liblinear.train(…) is thread-safe). This approach yields basically a speed-up that is linear in the number of cores. Of course, this doesn’t help if you need to train one big model. OTH, in my use cases the I/O to extract and prepare the feature nodes was almost always the clear bottleneck.

@bwaldvogel bwaldvogel closed this Nov 29, 2020
@jeffpasternack
Copy link
Author

@bwaldvogel sorry that I didn't see your previous message. Yes, so far we've been fine with the hard fork, so for the moment there's no material incentive to invest in resolving the conflicts.

With regards to the speed-up factor, this will depend on the number of examples and weights, but I agree it's definitely possible that a better synchronization scheme is possible. E.g. in the extreme a "hogwild" update scheme would use no synchronization, or (without potentially compromising correctness) using a more sophisticated fork-join scheme. Other approaches to parallelization are also possible, e..g parallelizing updates across parameters rather than examples (I don't know offhand what approach the original liblinear implementation opted for).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants