Skip to content

Move Distance to a dedicated module and refactor related tests #5800

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 31, 2025

Conversation

snejus
Copy link
Member

@snejus snejus commented May 25, 2025

This PR:

  1. Reorganizes distance-related code by moving it from hooks.py and match.py to a new dedicated distance.py module:

    • The actual implementation logic and algorithms remain unchanged - code is moved, not rewritten
    • Distance class, string distance functions, and track/album distance calculators are relocated intact
    • Only imports and function references are updated to maintain compatibility
    • current_metadata function is replaced with equivalent get_most_common_tags function for clarity
  2. Refactors the distance testing code from unittest to pytest:

    • Tests now use fixtures and parametrization while verifying the same functionality
    • The tested behaviors remain identical, just with improved test structure
      • Actually, distance.py coverage slightly increased since I included an additional test
  3. Adds a test for the sanitize_pairs function to complete config utility test coverage

This is primarily a code organization improvement that follows better separation of concerns, grouping related distance functionality in a single module without changing how the distance calculations work. No algorithm changes or behavior modifications were made to the core distance calculation code - it was simply moved to a more appropriate location.

@snejus snejus requested review from semohr, wisp3rwind and Copilot May 25, 2025 13:52
Copy link

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reorganizes distance‐related logic into a dedicated module and updates tests accordingly. Key changes include:

  • Moving functions and classes (Distance, string distance functions, track/album distance calculators) to distance.py.
  • Refactoring tests from unittest to pytest with improved structure and additional test coverage.
  • Replacing current_metadata with get_most_common_tags for clarity and consistency.

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
test/util/test_config.py Adds tests for sanitize_choices and sanitize_pairs
test/test_util.py Refactors plurality tests using pytest; note potential tuple issue
test/test_plugins.py Removes outdated unittest-based tests
test/autotag/test_distance.py Covers various aspects of distance calculations and distance methods
beetsplug/*.py Updates imports to use the new distance module
beets/util/config.py Introduces new configuration sanitization functions
beets/util/init.py Adds get_most_common_tags implementation
beets/plugins.py Removes local configuration cleaning functions from plugins
beets/importer/tasks.py Updates metadata extraction to use get_most_common_tags
beets/autotag/match.py Removes current_metadata in favor of get_most_common_tags, cleans diff
beets/autotag/hooks.py Removes legacy distance implementation
beets/autotag/init.py Removes Distance and current_metadata exports from public API

Copy link
Contributor

@semohr semohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me.

You renamed current_metadata to get_most_common_tags which will break external programs using this function. We need to keep this in mind when incrementing the version number. This should be mentioned it in the changelog imo.

@snejus
Copy link
Member Author

snejus commented May 26, 2025

A note on test durations

master

Poe => pytest -p no:cov -k 'AlbumDistanceTest or DistanceTest or HelpersTest or PluralityTest or StringDistanceTest or TrackDistanceTest' test/test_autotag.py test/test_plugins.py --durations=50
========================================== test session starts ===========================================
platform linux -- Python 3.9.20, pytest-8.3.5, pluggy-1.5.0
cachedir: /tmp/pytest_cache
rootdir: /home/sarunas/repo/beets
configfile: setup.cfg
plugins: anyio-4.9.0, xdist-3.6.1, requests-mock-1.12.1, flask-1.3.0
collected 121 items / 67 deselected / 54 selected                                                        

test/test_autotag.py .....................................................                        [53/54]
test/test_plugins.py .                                                                            [54/54]

========================================== slowest 50 durations ==========================================
0.10s call     test/test_autotag.py::DistanceTest::test_distance
0.08s call     test/test_autotag.py::AlbumDistanceTest::test_per_medium_track_numbers
0.08s call     test/test_autotag.py::PluralityTest::test_current_metadata_likelies
0.08s call     test/test_autotag.py::AlbumDistanceTest::test_comp_track_artists_match
0.08s call     test/test_autotag.py::AlbumDistanceTest::test_comp_track_artists_do_not_match
0.08s call     test/test_autotag.py::AlbumDistanceTest::test_comp_no_track_artists
0.08s call     test/test_autotag.py::AlbumDistanceTest::test_global_artists_differ
0.08s call     test/test_autotag.py::AlbumDistanceTest::test_two_medium_release
0.08s call     test/test_autotag.py::AlbumDistanceTest::test_identical_albums
0.07s call     test/test_autotag.py::AlbumDistanceTest::test_tracks_out_of_order
0.07s call     test/test_autotag.py::AlbumDistanceTest::test_incomplete_album
0.07s call     test/test_autotag.py::PluralityTest::test_albumartist_consensus
0.07s call     test/test_autotag.py::DistanceTest::test_items
0.07s call     test/test_autotag.py::TrackDistanceTest::test_different_artist
0.07s call     test/test_autotag.py::DistanceTest::test_add_string
0.07s call     test/test_autotag.py::PluralityTest::test_current_metadata_finds_pluralities
0.07s call     test/test_autotag.py::TrackDistanceTest::test_different_title
0.07s call     test/test_autotag.py::TrackDistanceTest::test_identical_tracks
0.07s call     test/test_autotag.py::PluralityTest::test_current_metadata_artist_consensus
0.07s call     test/test_autotag.py::DistanceTest::test_raw_distance
0.07s call     test/test_autotag.py::DistanceTest::test_operators
0.07s call     test/test_autotag.py::DistanceTest::test_add_ratio
0.07s call     test/test_autotag.py::DistanceTest::test_max_distance
0.07s call     test/test_autotag.py::PluralityTest::test_plurality_consensus
0.07s call     test/test_autotag.py::DistanceTest::test_add_number
0.07s call     test/test_autotag.py::DistanceTest::test_add_priority
0.07s call     test/test_autotag.py::DistanceTest::test_add
0.07s call     test/test_autotag.py::DistanceTest::test_add_equality
0.07s call     test/test_autotag.py::PluralityTest::test_plurality_empty_sequence_raises_error
0.07s call     test/test_autotag.py::DistanceTest::test_add_string_none
0.07s call     test/test_autotag.py::DistanceTest::test_add_expr
0.07s call     test/test_autotag.py::DistanceTest::test_add_string_both_none
0.07s call     test/test_autotag.py::TrackDistanceTest::test_various_artists_tolerated
0.07s call     test/test_autotag.py::PluralityTest::test_plurality_conflict
0.07s call     test/test_autotag.py::PluralityTest::test_plurality_near_consensus
0.07s call     test/test_autotag.py::DistanceTest::test_update

(14 durations < 0.005s hidden.  Use -vv to show these durations.)
=================================== 54 passed, 67 deselected in 4.49s ====================================

this branch

Poe => pytest -p no:cov -k 'TestAlbumDistance or TestCase or TestDistance or TestPlurality or TestStringDistance or TestTrackDistance or (test_sanitize and not UtilTest)' test/test_util.py test/util test/autotag/test_distance.py --durations=50
========================================== test session starts ===========================================
platform linux -- Python 3.9.20, pytest-8.3.5, pluggy-1.5.0
cachedir: /tmp/pytest_cache
rootdir: /home/sarunas/repo/beets
configfile: setup.cfg
plugins: anyio-4.9.0, xdist-3.6.1, requests-mock-1.12.1, flask-1.3.0
collected 128 items / 78 deselected / 50 selected                                                        

test/test_util.py .....                                                                           [ 5/50]
test/util/test_config.py ....                                                                     [ 9/50]
test/autotag/test_distance.py .........................................                           [50/50]

========================================== slowest 50 durations ==========================================
0.07s setup    test/autotag/test_distance.py::TestDistance::test_add

(49 durations < 0.005s hidden.  Use -vv to show these durations.)
=================================== 50 passed, 78 deselected in 1.70s ====================================

@snejus
Copy link
Member Author

snejus commented May 26, 2025

You renamed current_metadata to get_most_common_tags which will break external programs using this function. We need to keep this in mind when incrementing the version number. This should be mentioned it in the changelog imo.

Very good point!!! I see it's indeed used outside of beets:

$ gh search code autotag.current_metadata                                                                                                   beets 2.3.1 │ 🐍 3.9 🐶

Showing 3 of 3 results

beetbox/beets beets/importer/tasks.py
        likelies, consensus = autotag.current_metadata(self.items)

rembo10/headphones lib/beets/importer.py
        likelies, consensus = autotag.current_metadata(self.items)

pSpitzner/beets-flask backend/beets_flask/importer/states.py
        info, _ = autotag.current_metadata(items)

So good to have another pair of eyes looking at this! <3

Added both current_metadata and Distance to autotag.__init__ for backwards compat.

@snejus
Copy link
Member Author

snejus commented May 26, 2025

I had to bring back our patchy Distance imports in plugins.py to deal with the circular import issue.

Safe to say, I'm not delighted about it. Well, this way at least we will continue being aware of the issue!

Essentially, we shouldn't be importing plugins in distance.py.

@semohr
Copy link
Contributor

semohr commented May 26, 2025

I had to bring back our patchy Distance imports in plugins.py to deal with the circular import issue.

Safe to say, I'm not delighted about it. Well, this way at least we will continue being aware of the issue!

Essentially, we shouldn't be importing plugins in distance.py.

Maybe it is reasonable to move the plugins related code out of the distances.py as it is only used in the _add_candidate func anyways.

I.e. move this from distance.py to match.py:

# Plugins.
dist.update(plugins.album_distance(items, album_info, mapping))

@snejus
Copy link
Member Author

snejus commented May 26, 2025

Maybe it is reasonable to move the plugins related code out of the distances.py as it is only used in the _add_candidate func anyways.

I.e. move this from distance.py to match.py:

# Plugins.
dist.update(plugins.album_distance(items, album_info, mapping))

Same issue since match still sits under autotag. I don't think there's a need to involve yet another module in distance calculations.

I'm instead questioning the entire design of

  • plugins calculating distance
  • distance calculation requesting track/album distances from all plugins

@semohr
Copy link
Contributor

semohr commented May 26, 2025

Indeed seems a bit convoluted. Optimally and if possible, autotag should be completely independent of plugins. Not sure what is the right approach here, maybe a flowcharts might help?

Keep in mind that I also moved the distance function from plugins into a metadataplugin file in #5787. Might help to disentangle this a bit too.

Let's also keep the scope of this PR in mind, maybe we split this into multiple PRs and refactor further once we are more confident on how to proceed here.

@snejus
Copy link
Member Author

snejus commented May 26, 2025

Oh yeah I forgot to mention in my comment that I'm not going to fix it here to keep it within the scope 😅

@snejus
Copy link
Member Author

snejus commented May 31, 2025

Is this good to go by any chance @semohr ?

@semohr
Copy link
Contributor

semohr commented May 31, 2025

Is this good to go by any chance @semohr ?

Sure, feel free to merge it.

snejus added 2 commits May 31, 2025 19:17
This integration test failed because `chartlyrics.com` website is no
longer available, so I'm removing it.
@snejus snejus merged commit 87701fd into master May 31, 2025
20 checks passed
@snejus snejus deleted the move-distance branch May 31, 2025 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants