Skip to content

Fix: Spotify plugin unable to recognize Chinese and Japanese albums. #5705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dhruvravii
Copy link
Contributor

@dhruvravii dhruvravii commented Apr 3, 2025

Fixes for Issue #5699 (close #5699)

Have made changes to spotify.py (in lines 399 - 407) to allow Chinese and Japanese characters to be read. Previously these were converted to their ASCII values which led to them not being matched. Made changes to test_spotify.py to check parsing of Chinese and Japanese characters. The changes were tested and passed.

A file called japanese_track_request.json was made to mimic the Spotify API response since I don't have the credentials. Entries in that will need to be modified with the actual entries.

Let me know if any changes need to be made.

  • Documentation. No need for change in documentation.
  • Changelog. No need for changelog entry until review is completed.
  • Tests. The changes were tested and passed.

Copy link

github-actions bot commented Apr 3, 2025

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

@dhruvravii dhruvravii changed the title added modifications to spotify.py to allow reading of Japanese and Ch… Fix: Spotify plugin unable to recognize Chinese and Japanese albums. Apr 3, 2025
@semohr

This comment was marked as outdated.

@wisp3rwind
Copy link
Member

You have seen that there is already a PR for this? Also is there a reason to ascii decode the query? As far as I can see this is not required by spotify and should be handled closer to the input anyways.

Seems like both PRs were opened almost at the same time. (The other one is #5703.)

As it is done here, the unidecode indeed appears to be without effect, since it's only called if the query is ASCII-only anyway.

However, I'm wondering about the original reasoning for adding unidecode here. Is it still valid (for some languages)? Should there be a fallback to the unidecoded if the direct query yields no or only low-confidence results? Or maybe always query for both, and the autotagger sort it out?

Maybe you could check the git blame and pull requests related to this plugin to see whether there's any discussion of the need for the ASCII conversion?

@semohr
Copy link
Contributor

semohr commented Apr 8, 2025

Was added in a commit in part of the initial spotify plugin implementation. The reason for adding is explained in a comment, I don't think it is up to date anymore. We would need to test if spotify still has issues with ascii.

refs:

Addition:
Interestingly, the example from the comment still is handled the same way. I.e. artist:deadmau5 album:4×4 doesn't return this album, but artist:deadmau5 album:4x4 does.
We need some decision here, I don't think there is an easy way to support both. I would argue, the user is responsible to normalize their input. E.g. do not use × if x was wanted... Just converting all input seems wrong to me, i.e. it is hard to understand for end users why things are converted. E.g. I'm not fluent in ascii and neither is spotify ;)

@semohr
Copy link
Contributor

semohr commented Apr 8, 2025

I think #5189 and maybe #4514 could be related.

@semohr semohr added the spotify label May 12, 2025
@semohr
Copy link
Contributor

semohr commented May 19, 2025

As there wasn't any commit from @dhruvravii for some time on this. I just added a config option to enable the legacy behavior and adjusted the tests to check for ascii encoding.

Hope we can get this going again.

@semohr
Copy link
Contributor

semohr commented Jun 1, 2025

@beetbox/maintainers Could we get another look and some feedback here? I think this is an easy win.

@semohr semohr force-pushed the chinese-japanese-chars-spotify-fix branch from d33337f to dde48a9 Compare June 18, 2025 12:29
@semohr semohr force-pushed the chinese-japanese-chars-spotify-fix branch from dde48a9 to aea30a1 Compare June 18, 2025 13:22
@JOJ0
Copy link
Member

JOJ0 commented Jun 20, 2025

@semohr just had a high level look on this and trying to think about what's best from a user's point of view. I often have diffculties finding things on spotify and yes the @mgoltzsche issue you found might also be related to all this.

Basically I think your idea of keeping the original "decode to asciii" strategy with a config option is good and should stay. But also my first thoughts went to something similar to what @wisp3rwind suggested initially: The idea of searching with the original unicode string and then falling back on bad (zero) results to the ascii strategy is worth considering! Would that be difficult to implement?

And last but not least: Does the original idea of the contributor make any sense? Only searching in ascii if all is ascii in the first place? Just throwing this idea in the mix again...

@semohr
Copy link
Contributor

semohr commented Jun 20, 2025

The idea of searching with the original unicode string and then falling back on bad (zero) results to the ascii strategy is worth considering! Would that be difficult to implement?

I think it would be rather difficult to reliable check. Getting zero result is quite unusual even for wrong queries. For instance 裘盛戎 - 盗御马 as ascii returns completely unrelated albums.
I guess it should be possible to re-fetch candidates if the previous fetch only return albums or tracks under a given threshold (maybe 60%?). While possible I think this could be quite difficult to implement as we would have to patch some core metadataplugin logic here.

And last but not least: Does the original idea of the contributor make any sense? Only searching in ascii if all is ascii in the first place? Just throwing this idea in the mix again...

I don't think this makes any difference, we send a string-query to spotify (spotify expects utf-8 afaik) and will treat any string identical. Whether we convert or query to a single byte representation (ascii) should therefore make no difference here. I'm not even sure if python treats ascii and utf-16 different...

Copy link
Member

@JOJ0 JOJ0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the essential code changes are only a few lines and depend on a config option this is a minimal change and the old behavior can be reverted / be experimented with if people want that. Tiny grammar correction within. Thanks for resurrecting and polishing this PR @semohr

Also I see the point now that it is difficult to decide if zero results is a bad result or if "random" results is worse. So let's keep it simple!

Co-authored-by: J0J0 Todos <2733783+JOJ0@users.noreply.github.com>
@semohr
Copy link
Contributor

semohr commented Jun 20, 2025

It seems like windows tests are broken at the moment. Will wait with the merge until they are fixed :)

@semohr semohr merged commit dd6cb53 into beetbox:master Jul 1, 2025
16 checks passed
5061726b6572 pushed a commit to 5061726b6572/beets that referenced this pull request Jul 9, 2025
…eetbox#5705)

Fixes an issue where each spotify query was converted to ascii before sending. Adds a 
new config option to enable legacy behaviour.

A file called japanese_track_request.json was made to mimic the Spotify
API response since I don't have the credentials. Entries in that will
need to be modified with the actual entries.

Co-authored-by: Sebastian Mohr <sebastian@mohrenclan.de>
Co-authored-by: Sebastian Mohr <39738318+semohr@users.noreply.github.com>
Co-authored-by: J0J0 Todos <2733783+JOJ0@users.noreply.github.com>
peterjdolan pushed a commit to peterjdolan/beets that referenced this pull request Jul 27, 2025
…eetbox#5705)

Fixes an issue where each spotify query was converted to ascii before sending. Adds a 
new config option to enable legacy behaviour.

A file called japanese_track_request.json was made to mimic the Spotify
API response since I don't have the credentials. Entries in that will
need to be modified with the actual entries.

Co-authored-by: Sebastian Mohr <sebastian@mohrenclan.de>
Co-authored-by: Sebastian Mohr <39738318+semohr@users.noreply.github.com>
Co-authored-by: J0J0 Todos <2733783+JOJ0@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spotify plugin unable to find Chinese/Japanese albums
4 participants