Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel processing repairing playlist #203

Merged
merged 2 commits into from
Sep 25, 2023

Conversation

Borewit
Copy link
Owner

@Borewit Borewit commented Sep 18, 2023

Changed repair playlist to parallel process.
On a CPU with at least 3 cores, on lengthy playlists, this should reduce the total processing time.
The playlist is no longer fixed in perfect sequence.

@Borewit Borewit self-assigned this Sep 18, 2023
@Borewit Borewit marked this pull request as draft September 18, 2023 18:35
@Borewit Borewit force-pushed the parallel-processing-repairing-playlist branch from 8008b2f to d0a7d30 Compare September 18, 2023 18:43
@Borewit Borewit force-pushed the parallel-processing-repairing-playlist branch from d0a7d30 to 4b65078 Compare September 22, 2023 13:58
Repository owner deleted a comment from github-actions bot Sep 22, 2023
Repository owner deleted a comment from github-actions bot Sep 22, 2023
@github-actions
Copy link

@Borewit
Copy link
Owner Author

Borewit commented Sep 22, 2023

@touwys, can you review this PR?

Expected result:

  • should do process the repair process faster, but is strongly depended on hardware (number of CPU cores).
  • results in playlist are updated out of sequence

Please ignore the version number.

@Borewit Borewit marked this pull request as ready for review September 23, 2023 12:01
@touwys
Copy link

touwys commented Sep 24, 2023

Expected result:

* should do process the repair process faster, but is strongly depended on hardware (number of CPU cores).

* results in playlist are updated out of sequence

Tested with a c 13-year-old Windows 7 x64 PC with a 4-core Intel Core i7-3770K CPU. These observations are not absolutely observed:

Numerous playlists were submitted for test-repair. Those tested, contain any number of tracks from 19 up to 631.

Result:

  1. There is a striking ⚡ gain of speed noticeable during the first round of playlist repair — i.e. at finding the exact matches.

  2. As it were, these exact matches were also found and updated out-of-sequence.

  3. The speed gain on the second round of repairs for each of the playlists — i.e. to find the closest matches — was not as clearly visible as in the first round. Superficially, it appears to be not on par with the search for the exact matches, but, presumably, that is due to any differences inherent between the two processes? For instance, are the searches for closest matches carried out out-of-sequence, too? Even so, even if the speed gain here is not as clearly established, the processing happens fast enough to be wholly acceptable.

@Borewit: Great job — once again. 🏆

@Borewit
Copy link
Owner Author

Borewit commented Sep 24, 2023

I am looking at the closest matches algorithm, looks like Jeremy optimized the algorithm for your PC @touwys:

// Only keep the top X highest-rated matches (default is 20), anything more than that has a good chance of using too much memory
// on systems w/ huge media libraries, too little RAM, or when fixing excessively large playlists (the things you have to worry
// about when people run your software on ancient PCs in Africa =])

🤣

@Borewit
Copy link
Owner Author

Borewit commented Sep 24, 2023

Despite the funny comment, I removed that optimization and replaced that with parallel processing as well. Looking forward to hear if the find-closest-match algorithm runs faster as well @touwys.

@github-actions
Copy link

@touwys
Copy link

touwys commented Sep 24, 2023

Build run (#6290622796, attempt #1) artifacts

Is it ready to try out? I can only attend to it tomorrow.


Despite the funny comment, I removed that optimization and replaced that with parallel processing as well. Looking forward to hear if the find-closest-match algorithm runs faster as well.

I hope that it is as noticeable a difference as with finding the exact matches. What really counts though, is the accuracy of the matches, rather than the speed by which they are delivered.

What I have seen so far, is that the current search algorithm is actually quite accurate for finding matching tracks where the broken ones form part of various artist compilations. Obviously, it's got more data available to process, in the file name of the broken track. So, one could probably retain that strength, and focus on massaging the algorithm where it tries to locate the tracks with very little data available in the filename.

@Borewit Borewit force-pushed the parallel-processing-repairing-playlist branch 2 times, most recently from cd83680 to 5711af9 Compare September 24, 2023 17:28
Repository owner deleted a comment from github-actions bot Sep 24, 2023
@Borewit Borewit force-pushed the parallel-processing-repairing-playlist branch from 5711af9 to ddc51bb Compare September 24, 2023 17:32
@github-actions
Copy link

@Borewit
Copy link
Owner Author

Borewit commented Sep 24, 2023

It's good thing you didn't test yet. I was not the smartest location to apply parallel processing, I adjust the processing keeping both parallel processing on Jeremy's memory optimization for now.

For me the closest match runs very fast, so hard to notice a big difference.

In this PR nothing changed from a functional point of view. So the accuracy has not been changed. On my Intel(R) Core(TM) i7-7800X CPU @ 3.50GHz, 6 cores (but 12 logical cores):

Before this PR:
Repaired playlist in 9491 ms.
Resolved closest matches in 433 ms.

After this PR:
Repaired playlist in 3512 ms.
Resolved closest matches in 391 ms.

@touwys
Copy link

touwys commented Sep 25, 2023

Before this PR: Repaired playlist in 9491 ms. Resolved closest matches in 433 ms.

After this PR: Repaired playlist in 3512 ms. Resolved closest matches in 391 ms.

@Borewit :

The advance in your results is clearly very impressive.

Even if the huge difference in hardware makes a straightforward comparison of the results quite impossible, I easily recognised the improvement in my results. You've certainly done something special in this.

To what extent do other variables such as, for e.g. the number of tracks in the media database, parsing of the track information by the search algorithm, etc. play a role in advancing the search speed?


It's good thing you didn't test yet. I was not the smartest location to apply parallel processing, I adjust the processing keeping both parallel processing on Jeremy's memory optimization for now.

Which build should I then download to use next, or should I wait for another? Please indicate.


@Borewit
Copy link
Owner Author

Borewit commented Sep 25, 2023

To what extent do other variables such as, for e.g. the number of tracks in the media database, parsing of the track information by the search algorithm, etc. play a role in advancing the search speed?

I expect no significant difference. With a background job which takes at least a few seconds, the advantage of parallel processing will likely by higher then a relative very short job, as result of a more optimal spread and the overhead being relatively smaller.

Gaining small advantages resulting in more complex code, I am not much in favor of. In line with "Premature Optimization is the Root of All Evil". The code to optimization done for the "users with an ancient computer" is probably an example of that. It is almost asking for trouble.

If you can please re-test the latest build

@touwys
Copy link

touwys commented Sep 25, 2023

@Borewit :

listFix()-2.8.1.2 delivered some very interesting results

Setup

  1. Windows 7 x64 PC with a 4-core Intel Core i7-3770K CPU.
  2. Two different processing speed tests were run with the same playlist.
  3. The playlist contains 631 fully broken mp3 tracks calling for matches.
  4. The tests were timed.
  5. For the first test, the Media Directory (MD) panel incorporated both a mp3 and FLAC library.
  6. For the second test, the FLAC directory was removed from the MD.

Results

Test 1

Find Exact Find Closest Total
1m 7s 3m 41s 4m 48s

Test 2

Find Exact Find Closest Total
29s 1m 58s 2m 27s

CPU

image

@Borewit Borewit merged commit a9d770a into main Sep 25, 2023
4 checks passed
@Borewit Borewit deleted the parallel-processing-repairing-playlist branch September 25, 2023 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants