Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed .lower() error thrown on certain videos #836

Closed
wants to merge 4 commits into from

Conversation

fernandog
Copy link
Collaborator

From @Deathspike:

Create a feature branch and cherry-picked his commit:
#814

Fixed .lower() error thrown on certain videos due to format being a list. Or something. I don't actually understand anything that's going on here, but this change works on Py2+Py3 and seems to achieve my desired result - get subtitles even on those files.

… a list. Or something. I don't actually understand anything that's going on here, but this change works on Py2+Py3 and seems to achieve my desired result - get subtitles even on those files.
@@ -232,7 +233,9 @@ def guess_matches(video, guess, partial=False):
if video.resolution and 'screen_size' in guess and guess['screen_size'] == video.resolution:
matches.add('resolution')
# format
if video.format and 'format' in guess and guess['format'].lower() == video.format.lower():
if video.format and guess.get('format') \
and isinstance(guess['format'], str if sys.version_info[0] >= 3 else basestring) \
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Deathspike

________________________________ pyflakes-check ________________________________
/home/travis/build/Diaoul/subliminal/subliminal/subtitle.py:237: UndefinedName
undefined name 'basestring'
============== 1 failed, 403 passed, 17 skipped in 323.29 seconds ==============

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just check for list instead or stick with str.

@fernandog
Copy link
Collaborator Author

@Diaoul like this?

Copy link
Collaborator

@ratoaq2 ratoaq2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if it's a list instance is not a good approach in my opinion.
I strongly believe that format value doesn't need lowercase prior to comparison since its values are well known and predefined.

@Diaoul
Copy link
Owner

Diaoul commented Nov 20, 2017

@ratoaq2 what would you suggest instead? I don't think lower is doing any harm here so best leave it for people that rename their videos to lower case or anything... Keep in mind that a video.format may come from other sources than guessit (refiners, media players, etc.)

@fernandog
Copy link
Collaborator Author

@ratoaq2 ok so how to proceed?

@fernandog
Copy link
Collaborator Author

fernandog commented Nov 25, 2017

@Diaoul I picked this fix from another PR. The premiss is that if video has multiple formats, then it's a conflict in guessit guess, so it can lead to false positive format match. So with this PR it will only match when format is not a list. What do you think about this premiss?

@ratoaq2
Copy link
Collaborator

ratoaq2 commented Dec 1, 2017

In my opinion all different format sources should go through guessit somehow.

The format needs to be parsed. A simple lowercase is not enough:

Bluray, Blu-ray, bd5, bd9, bd25
Web, webdl, webcap
TS, telesync, tele-sync
Tc, telecam
Cam, camera
Videots, dvd9, dvdr, dvd
Uhdtv, UltraHDTV
Dsr, dth, sat, satellite

Next major guessit version the format will become source and we'll have a more refined and predefined list of formats. And the RIP part will go away to another field: guessit-io/guessit#452

My point here is: if we keep format in the video object consistent with guessit then we can use simple equals and also profit from better matches since we rely on guessit parsing

@Diaoul
Copy link
Owner

Diaoul commented Dec 1, 2017

The format may not always come from guessit. I'm OK to use the same standards as guessit however this should be documented. Currently most of the information is extracted from the filename and that's usually a good source of information.
In some cases, it can be useful to rely on other sources (media centers, downloaders, sonarr, radarr and friends) and for that we need a well defined list of possible values for those attributes.

Also, in the context of subtitles, we may not need the such detailed information. Maybe some families can be defined e.g. web, bluray, etc. as we can expect the subtitles from various web sources to match yet we may not care if this is bluray-this or bluray-that.

@Diaoul
Copy link
Owner

Diaoul commented Dec 1, 2017

Now if this is possible for guessit to have something like normalize_source('WeB--DL.C') that returns the normalized version of "webdl" that would be awesome because it means we can apply this to external sources (both providers and media information sources)

@fernandog
Copy link
Collaborator Author

@Diaoul @ratoaq2 what about what I said The premiss is that if video has multiple formats, then it's a conflict in guessit guess- a correct format will always be a single value. If not, we shouldn't match it

@ratoaq2
Copy link
Collaborator

ratoaq2 commented Dec 1, 2017

@Diaoul That is what I meant.

Bluray or brrip or bdrip will all be bluray for guessit.
Web, webdl, webrip will all be Web for guessit.

GuessIt will normalize the source and subliminal will only profit from it.

@fernandog if we go down the path to have a consistent predefined source list, then you don't need to manipulate the value and can just do a simple comparison

@fernandog
Copy link
Collaborator Author

fernandog commented Dec 1, 2017

@ratoaq2 but can we assume that iN case of multiple formats then is a conflict in guessit?

@Diaoul
Copy link
Owner

Diaoul commented Dec 1, 2017

I think the question here is: can a file have multiple sources for valid reasons?

@fernandog
Copy link
Collaborator Author

fernandog commented Dec 1, 2017

@Diaoul at my best knowledge, no, it can't.

@duramato is our specialist on this - and helped rato on guessit. he can confirm.

@Diaoul
Copy link
Owner

Diaoul commented Dec 1, 2017

What about release packs? Or subtitles that are compatible with multiple formats?

@duramato
Copy link

duramato commented Dec 1, 2017

What about release packs? Or subtitles that are compatible with multiple formats?

Yes both of those can have multiple formats

@fernandog
Copy link
Collaborator Author

ok. so one thing is filename have multiple formats, other thing is a conflict in guessit that results in multi-format guess. Most of time is the second one. So how to differentiate with high confidence?

@fernandog
Copy link
Collaborator Author

fernandog commented Dec 1, 2017

Talking with @duramato he came up with this example: Show.s01.720p.hdtv.web/s01e01.mkv

guessit won't find a format in the filename and it will use the folder. The folder will result in multiple formats.

IMHO its safer to not match a format because we don't know for sure which format is the correct one. Its safer to not match format instead of download a subtitle with high chances of being a wrong/out-of-sync one.

@ratoaq2
Copy link
Collaborator

ratoaq2 commented Dec 1, 2017

If you use guessit to create a video object from a filename and also use guessit to create a subtitle object and both have the very same name and both get multiple formats, format should match.

@Diaoul
Copy link
Owner

Diaoul commented Dec 1, 2017

Except sometimes providers don't just give the subtitle filename, you have to assemble bits of the filename from the subtitle page on the website.

@fernandog
Copy link
Collaborator Author

@ratoaq2 not necessary. subtitle may have a different name, example Addi7ed

@ratoaq2
Copy link
Collaborator

ratoaq2 commented Dec 1, 2017

I'm not saying that happens for all providers and all cases. It happens for certain providers for certain cases. For these cases you might have a perfect release name match, but because both got multiple formats, you're discarding a valid match format match, although they are equal.

I'm just exploring the consequences of the proposed solution.

In my opinion I still believe it will be better to have the format in the video object and the subtitle object to always come from guessit. If guessit should be enhanced to better allow that, that's fine. It's also fine to have a interim solution if you guys think so.

@ratoaq2
Copy link
Collaborator

ratoaq2 commented Dec 1, 2017

And as of now
guessit('bluray')

will just detect the format properly.

normalize_source('WeB--DL.C')
would be
guessit('WeB--DL.C').get('format')

@Diaoul
Copy link
Owner

Diaoul commented Dec 1, 2017

I'd rather have a way to tell guessit that what I input is a format so it doesn't get confused. Is there a way to do that?

@ratoaq2
Copy link
Collaborator

ratoaq2 commented Dec 2, 2017

@Diaoul I could implement something similar in guessit. One idea is to add two advanced options: includes and excludes where you can control which properties guessit should evaluate/consider.
If you define includes, only properties defined in the includes list will have their rules executed. If you define excludes, all properties defined in the excludes list will have their rules skipped. That way you can decide which parser rules guessit will use.

In our scenario, we want guessit to only consider source (former format) rules:

guessit --includes source 'WeB--DL.C'

GuessIt found: {
    "source": "Web", 
}

@Toilal What do you think about having includes/excludes properties? We can configure disabled for each rebulk rule:

def source():
    rebulk = Rebulk(disabled=lambda context: not is_enabled(context, 'source'))

That way we can give guessit users better control on what rules they want to be enabled or not.

@fernandog
Copy link
Collaborator Author

I don't get how that will fix having format as list or string. Also guessit still can have a conflict finding more than one source/format in a given filename (or folder + filename)

I'm talking about a step earlier that if we should match format in subliminal when multiple formats are available (wrong/correct formats).

@fernandog
Copy link
Collaborator Author

@ratoaq2 as you merged the guessit enabling properties PR, what need to be done in subliminal now?

@fernandog fernandog mentioned this pull request Jan 31, 2018
@dyve
Copy link

dyve commented Feb 26, 2018

Any chance of a PyPI release with this fix?

@h3llrais3r
Copy link
Contributor

@fernandog Any progress on this? Can this PR be merged to fix the list vs str issue of the guessed format?

@fernandog
Copy link
Collaborator Author

@ratoaq2 Is going to do a PR soon using the new unreleased guessit 3.0 to fix this

@hpsbranco
Copy link
Contributor

hpsbranco commented Apr 6, 2018

The 'title' field also throws an error with certain videos, e.g., 'Rick.and.Morty.S03E04.The.Return.of.Worldender.1080p.AMZN.WEB-DL.DD5.1.H.264-QOQ.mkv'.

The provider (LegendasTV) returns a subtitle with the path

'Rick.and.Morty.S03.1080p.WEBRip-WEB-DL/Amazon.WEB-DL-QOQ/Rick.and.Morty.S03E04.Vindicators.3.The.Return.of.Worldender.1080p.Amazon.WEB-DL.DD+5.1.H.264-QOQ.srt',

which gives the following guess:

GuessIt found: {
    "season": 3,
    "screen_size": "1080p",
    "format": "WEB-DL",
    "title": [
        "Amazon",
        "Rick and Morty"
    ],
    "release_group": "QOQ",
... other stuff, guessed right
}

When guess_matches calls sanitize on title, we get the same 'expected string or buffer', this time from the re module.

The fix is easy (I found the if/elif pattern better for readability):

if video.series and 'title' in guess:
    series_guess = guess['title']
    if isinstance(series_guess, six.string_types) and sanitize(series_guess) == sanitize(video.series):
        matches.add('series')
    elif isinstance(series_guess, list) and any(sanitize(t) == sanitize(video.series) for t in series_guess):
        matches.add('series')

A similar code can be used to fix the format problem.

@hpsbranco
Copy link
Contributor

By the way, the 'format' .lower() error also happens for 'Captain.America.Civil.War.2016.1080p.BluRay.x264.DTS-HD.MA.7.1-FGT.mkv', using Opensubtitles. This provider returns 'Captain.America.Civil.WAR.2016.1080p.HD.TC.AC3.x264-ETRG.br', which gets guessed as:

GuessIt found: {
    "title": "Captain America Civil WAR",
    "year": 2016,
    "screen_size": "1080p",
    "other": "HD",
    "format": [
        "Telecine",
        "BluRay"
    ],
... correct stuff
}

@gfjardim
Copy link

gfjardim commented Apr 19, 2018

@hpsbranco, your code did the trick for the format issue:

if video.format and 'format' in guess:
    if isinstance(guess['format'], basestring) and guess['format'].lower() == video.format.lower():
        matches.add('format')
    elif isinstance(guess['format'], list) and any( fmt.lower() == video.format.lower() for fmt in guess['format']):
        matches.add('format')

@miigotu
Copy link
Contributor

miigotu commented May 10, 2018

@gfjardim @hpsbranco basestring does not exist in py3, use six.string_types instead:

if video.format and 'format' in guess:
    if isinstance(guess['format'], six.string_types) and guess['format'].lower() == video.format.lower():
        matches.add('format')
    elif isinstance(guess['format'], list) and video.format.lower() in (fmt.lower() for fmt in guess['format']):
        matches.add('format')

Im going to add this PR#836 to SR for now to alleviate the subtitle failures until you guys come up with a permanent solution.

miigotu added a commit to SickChill/sickchill that referenced this pull request May 10, 2018
…for a temporary fix to format.lower issue

Signed-off-by: miigotu <miigotu@gmail.com>
miigotu added a commit to SickChill/sickchill that referenced this pull request May 10, 2018
…for a temporary fix to format.lower issue

Fixes #4546
Signed-off-by: miigotu <miigotu@gmail.com>
miigotu added a commit to SickChill/sickchill that referenced this pull request May 10, 2018
…for a temporary fix to format.lower issue

Fixes #4546
Signed-off-by: miigotu <miigotu@gmail.com>
@hpsbranco
Copy link
Contributor

guessit 3 solves this problem. However, there would be some changes to subliminal API due to the new values it returns (https://github.com/guessit-io/guessit/blob/develop/docs/migration2to3.rst).

So, we can use a simple mapping to return the old values, while benefiting from the improvements made to guessit, or simply use the new values. Either way, it's easy to do.

Let me know so I can submit a PR.

@ratoaq2
Copy link
Collaborator

ratoaq2 commented May 26, 2018

I'm planning to create a PR here for guessit 3.0 upgrade with the changes that will also fix this. But I've been busy lately, I can't promise this week

@ratoaq2
Copy link
Collaborator

ratoaq2 commented Feb 17, 2019

Already fixed by 3121a75

@ratoaq2 ratoaq2 closed this Feb 17, 2019
@ratoaq2 ratoaq2 deleted the feature/video_format branch February 17, 2019 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants