-
-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed .lower()
error thrown on certain videos
#836
Conversation
… a list. Or something. I don't actually understand anything that's going on here, but this change works on Py2+Py3 and seems to achieve my desired result - get subtitles even on those files.
fea480f
to
7043054
Compare
subliminal/subtitle.py
Outdated
@@ -232,7 +233,9 @@ def guess_matches(video, guess, partial=False): | |||
if video.resolution and 'screen_size' in guess and guess['screen_size'] == video.resolution: | |||
matches.add('resolution') | |||
# format | |||
if video.format and 'format' in guess and guess['format'].lower() == video.format.lower(): | |||
if video.format and guess.get('format') \ | |||
and isinstance(guess['format'], str if sys.version_info[0] >= 3 else basestring) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
________________________________ pyflakes-check ________________________________
/home/travis/build/Diaoul/subliminal/subliminal/subtitle.py:237: UndefinedName
undefined name 'basestring'
============== 1 failed, 403 passed, 17 skipped in 323.29 seconds ==============
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just check for list
instead or stick with str
.
@Diaoul like this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if it's a list instance is not a good approach in my opinion.
I strongly believe that format value doesn't need lowercase prior to comparison since its values are well known and predefined.
@ratoaq2 what would you suggest instead? I don't think lower is doing any harm here so best leave it for people that rename their videos to lower case or anything... Keep in mind that a |
@ratoaq2 ok so how to proceed? |
ab9267d
to
d7c8915
Compare
@Diaoul I picked this fix from another PR. The premiss is that if video has multiple formats, then it's a conflict in guessit guess, so it can lead to false positive |
In my opinion all different format sources should go through guessit somehow. The format needs to be parsed. A simple lowercase is not enough: Bluray, Blu-ray, bd5, bd9, bd25 Next major guessit version the format will become source and we'll have a more refined and predefined list of formats. And the RIP part will go away to another field: guessit-io/guessit#452 My point here is: if we keep format in the video object consistent with guessit then we can use simple equals and also profit from better matches since we rely on guessit parsing |
The format may not always come from guessit. I'm OK to use the same standards as guessit however this should be documented. Currently most of the information is extracted from the filename and that's usually a good source of information. Also, in the context of subtitles, we may not need the such detailed information. Maybe some families can be defined e.g. web, bluray, etc. as we can expect the subtitles from various web sources to match yet we may not care if this is bluray-this or bluray-that. |
Now if this is possible for guessit to have something like |
@Diaoul That is what I meant. Bluray or brrip or bdrip will all be bluray for guessit. GuessIt will normalize the source and subliminal will only profit from it. @fernandog if we go down the path to have a consistent predefined source list, then you don't need to manipulate the value and can just do a simple comparison |
@ratoaq2 but can we assume that iN case of multiple formats then is a conflict in guessit? |
I think the question here is: can a file have multiple sources for valid reasons? |
What about release packs? Or subtitles that are compatible with multiple formats? |
Yes both of those can have multiple formats |
ok. so one thing is filename have multiple formats, other thing is a conflict in guessit that results in multi-format guess. Most of time is the second one. So how to differentiate with high confidence? |
Talking with @duramato he came up with this example: guessit won't find a format in the filename and it will use the folder. The folder will result in multiple formats. IMHO its safer to not match a format because we don't know for sure which format is the correct one. Its safer to not match format instead of download a subtitle with high chances of being a wrong/out-of-sync one. |
If you use guessit to create a video object from a filename and also use guessit to create a subtitle object and both have the very same name and both get multiple formats, format should match. |
Except sometimes providers don't just give the subtitle filename, you have to assemble bits of the filename from the subtitle page on the website. |
@ratoaq2 not necessary. subtitle may have a different name, example Addi7ed |
I'm not saying that happens for all providers and all cases. It happens for certain providers for certain cases. For these cases you might have a perfect release name match, but because both got multiple formats, you're discarding a valid match format match, although they are equal. I'm just exploring the consequences of the proposed solution. In my opinion I still believe it will be better to have the format in the video object and the subtitle object to always come from guessit. If guessit should be enhanced to better allow that, that's fine. It's also fine to have a interim solution if you guys think so. |
And as of now will just detect the format properly.
|
I'd rather have a way to tell guessit that what I input is a format so it doesn't get confused. Is there a way to do that? |
@Diaoul I could implement something similar in guessit. One idea is to add two advanced options: In our scenario, we want guessit to only consider
@Toilal What do you think about having
That way we can give guessit users better control on what rules they want to be enabled or not. |
I don't get how that will fix having format as list or string. Also guessit still can have a conflict finding more than one source/format in a given filename (or folder + filename) I'm talking about a step earlier that if we should match format in subliminal when multiple formats are available (wrong/correct formats). |
@ratoaq2 as you merged the guessit enabling properties PR, what need to be done in subliminal now? |
Any chance of a PyPI release with this fix? |
@fernandog Any progress on this? Can this PR be merged to fix the |
@ratoaq2 Is going to do a PR soon using the new unreleased guessit 3.0 to fix this |
The 'title' field also throws an error with certain videos, e.g., 'Rick.and.Morty.S03E04.The.Return.of.Worldender.1080p.AMZN.WEB-DL.DD5.1.H.264-QOQ.mkv'. The provider (LegendasTV) returns a subtitle with the path 'Rick.and.Morty.S03.1080p.WEBRip-WEB-DL/Amazon.WEB-DL-QOQ/Rick.and.Morty.S03E04.Vindicators.3.The.Return.of.Worldender.1080p.Amazon.WEB-DL.DD+5.1.H.264-QOQ.srt', which gives the following guess:
When guess_matches calls sanitize on title, we get the same 'expected string or buffer', this time from the re module. The fix is easy (I found the if/elif pattern better for readability):
A similar code can be used to fix the format problem. |
By the way, the 'format' .lower() error also happens for 'Captain.America.Civil.War.2016.1080p.BluRay.x264.DTS-HD.MA.7.1-FGT.mkv', using Opensubtitles. This provider returns 'Captain.America.Civil.WAR.2016.1080p.HD.TC.AC3.x264-ETRG.br', which gets guessed as:
|
@hpsbranco, your code did the trick for the format issue:
|
@gfjardim @hpsbranco basestring does not exist in py3, use six.string_types instead: if video.format and 'format' in guess:
if isinstance(guess['format'], six.string_types) and guess['format'].lower() == video.format.lower():
matches.add('format')
elif isinstance(guess['format'], list) and video.format.lower() in (fmt.lower() for fmt in guess['format']):
matches.add('format') Im going to add this PR#836 to SR for now to alleviate the subtitle failures until you guys come up with a permanent solution. |
…for a temporary fix to format.lower issue Signed-off-by: miigotu <miigotu@gmail.com>
…for a temporary fix to format.lower issue
Fixes #4546
Signed-off-by: miigotu <miigotu@gmail.com>
…for a temporary fix to format.lower issue
Fixes #4546
Signed-off-by: miigotu <miigotu@gmail.com>
guessit 3 solves this problem. However, there would be some changes to subliminal API due to the new values it returns (https://github.com/guessit-io/guessit/blob/develop/docs/migration2to3.rst). So, we can use a simple mapping to return the old values, while benefiting from the improvements made to guessit, or simply use the new values. Either way, it's easy to do. Let me know so I can submit a PR. |
I'm planning to create a PR here for guessit 3.0 upgrade with the changes that will also fix this. But I've been busy lately, I can't promise this week |
Already fixed by 3121a75 |
From @Deathspike:
Create a feature branch and cherry-picked his commit:
#814
Fixed
.lower()
error thrown on certain videos due toformat
being a list. Or something. I don't actually understand anything that's going on here, but this change works on Py2+Py3 and seems to achieve my desired result - get subtitles even on those files.