Fixed `.lower()` error thrown on certain videos #836

fernandog · 2017-11-20T13:03:19Z

Create a feature branch and cherry-picked his commit:
#814

Fixed .lower() error thrown on certain videos due to format being a list. Or something. I don't actually understand anything that's going on here, but this change works on Py2+Py3 and seems to achieve my desired result - get subtitles even on those files.

… a list. Or something. I don't actually understand anything that's going on here, but this change works on Py2+Py3 and seems to achieve my desired result - get subtitles even on those files.

fernandog · 2017-11-20T15:54:38Z

subliminal/subtitle.py

@@ -232,7 +233,9 @@ def guess_matches(video, guess, partial=False):
    if video.resolution and 'screen_size' in guess and guess['screen_size'] == video.resolution:
        matches.add('resolution')
    # format
-    if video.format and 'format' in guess and guess['format'].lower() == video.format.lower():
+    if video.format and guess.get('format') \
+            and isinstance(guess['format'], str if sys.version_info[0] >= 3 else basestring) \


@Deathspike

________________________________ pyflakes-check ________________________________
/home/travis/build/Diaoul/subliminal/subliminal/subtitle.py:237: UndefinedName
undefined name 'basestring'
============== 1 failed, 403 passed, 17 skipped in 323.29 seconds ==============

Just check for list instead or stick with str.

fernandog · 2017-11-20T16:16:42Z

@Diaoul like this?

ratoaq2

Check if it's a list instance is not a good approach in my opinion.
I strongly believe that format value doesn't need lowercase prior to comparison since its values are well known and predefined.

Diaoul · 2017-11-20T16:23:58Z

@ratoaq2 what would you suggest instead? I don't think lower is doing any harm here so best leave it for people that rename their videos to lower case or anything... Keep in mind that a video.format may come from other sources than guessit (refiners, media players, etc.)

fernandog · 2017-11-20T16:24:10Z

@ratoaq2 ok so how to proceed?

fernandog · 2017-11-25T18:37:40Z

@Diaoul I picked this fix from another PR. The premiss is that if video has multiple formats, then it's a conflict in guessit guess, so it can lead to false positive format match. So with this PR it will only match when format is not a list. What do you think about this premiss?

ratoaq2 · 2017-12-01T09:48:12Z

In my opinion all different format sources should go through guessit somehow.

The format needs to be parsed. A simple lowercase is not enough:

Bluray, Blu-ray, bd5, bd9, bd25
Web, webdl, webcap
TS, telesync, tele-sync
Tc, telecam
Cam, camera
Videots, dvd9, dvdr, dvd
Uhdtv, UltraHDTV
Dsr, dth, sat, satellite

Next major guessit version the format will become source and we'll have a more refined and predefined list of formats. And the RIP part will go away to another field: guessit-io/guessit#452

My point here is: if we keep format in the video object consistent with guessit then we can use simple equals and also profit from better matches since we rely on guessit parsing

Diaoul · 2017-12-01T09:58:24Z

The format may not always come from guessit. I'm OK to use the same standards as guessit however this should be documented. Currently most of the information is extracted from the filename and that's usually a good source of information.
In some cases, it can be useful to rely on other sources (media centers, downloaders, sonarr, radarr and friends) and for that we need a well defined list of possible values for those attributes.

Also, in the context of subtitles, we may not need the such detailed information. Maybe some families can be defined e.g. web, bluray, etc. as we can expect the subtitles from various web sources to match yet we may not care if this is bluray-this or bluray-that.

Diaoul · 2017-12-01T10:03:21Z

Now if this is possible for guessit to have something like normalize_source('WeB--DL.C') that returns the normalized version of "webdl" that would be awesome because it means we can apply this to external sources (both providers and media information sources)

fernandog · 2017-12-01T11:30:30Z

@Diaoul @ratoaq2 what about what I said The premiss is that if video has multiple formats, then it's a conflict in guessit guess- a correct format will always be a single value. If not, we shouldn't match it

ratoaq2 · 2017-12-01T12:04:19Z

@Diaoul That is what I meant.

Bluray or brrip or bdrip will all be bluray for guessit.
Web, webdl, webrip will all be Web for guessit.

GuessIt will normalize the source and subliminal will only profit from it.

@fernandog if we go down the path to have a consistent predefined source list, then you don't need to manipulate the value and can just do a simple comparison

fernandog · 2017-12-01T12:13:57Z

@ratoaq2 but can we assume that iN case of multiple formats then is a conflict in guessit?

Diaoul · 2017-12-01T12:59:27Z

I think the question here is: can a file have multiple sources for valid reasons?

fernandog · 2017-12-01T13:15:15Z

@Diaoul at my best knowledge, no, it can't.

@duramato is our specialist on this - and helped rato on guessit. he can confirm.

Diaoul · 2017-12-01T13:17:26Z

What about release packs? Or subtitles that are compatible with multiple formats?

duramato · 2017-12-01T13:33:17Z

What about release packs? Or subtitles that are compatible with multiple formats?

Yes both of those can have multiple formats

fernandog · 2017-12-01T13:38:35Z

ok. so one thing is filename have multiple formats, other thing is a conflict in guessit that results in multi-format guess. Most of time is the second one. So how to differentiate with high confidence?

fernandog · 2017-12-01T13:54:19Z

Talking with @duramato he came up with this example: Show.s01.720p.hdtv.web/s01e01.mkv

guessit won't find a format in the filename and it will use the folder. The folder will result in multiple formats.

IMHO its safer to not match a format because we don't know for sure which format is the correct one. Its safer to not match format instead of download a subtitle with high chances of being a wrong/out-of-sync one.

ratoaq2 · 2017-12-01T13:57:16Z

If you use guessit to create a video object from a filename and also use guessit to create a subtitle object and both have the very same name and both get multiple formats, format should match.

Diaoul · 2017-12-01T13:58:13Z

Except sometimes providers don't just give the subtitle filename, you have to assemble bits of the filename from the subtitle page on the website.

fernandog · 2017-12-01T13:58:20Z

@ratoaq2 not necessary. subtitle may have a different name, example Addi7ed

ratoaq2 · 2017-12-01T16:44:43Z

I'm not saying that happens for all providers and all cases. It happens for certain providers for certain cases. For these cases you might have a perfect release name match, but because both got multiple formats, you're discarding a valid match format match, although they are equal.

I'm just exploring the consequences of the proposed solution.

In my opinion I still believe it will be better to have the format in the video object and the subtitle object to always come from guessit. If guessit should be enhanced to better allow that, that's fine. It's also fine to have a interim solution if you guys think so.

ratoaq2 · 2017-12-01T16:56:11Z

And as of now
guessit('bluray')

will just detect the format properly.

normalize_source('WeB--DL.C')
would be
guessit('WeB--DL.C').get('format')

Diaoul · 2017-12-01T17:14:08Z

I'd rather have a way to tell guessit that what I input is a format so it doesn't get confused. Is there a way to do that?

ratoaq2 · 2017-12-02T16:33:49Z

@Diaoul I could implement something similar in guessit. One idea is to add two advanced options: includes and excludes where you can control which properties guessit should evaluate/consider.
If you define includes, only properties defined in the includes list will have their rules executed. If you define excludes, all properties defined in the excludes list will have their rules skipped. That way you can decide which parser rules guessit will use.

In our scenario, we want guessit to only consider source (former format) rules:

guessit --includes source 'WeB--DL.C'

GuessIt found: {
    "source": "Web", 
}

@Toilal What do you think about having includes/excludes properties? We can configure disabled for each rebulk rule:

def source():
    rebulk = Rebulk(disabled=lambda context: not is_enabled(context, 'source'))

That way we can give guessit users better control on what rules they want to be enabled or not.

fernandog · 2017-12-02T17:36:31Z

I don't get how that will fix having format as list or string. Also guessit still can have a conflict finding more than one source/format in a given filename (or folder + filename)

I'm talking about a step earlier that if we should match format in subliminal when multiple formats are available (wrong/correct formats).

fernandog · 2017-12-22T10:54:02Z

@ratoaq2 as you merged the guessit enabling properties PR, what need to be done in subliminal now?

dyve · 2018-02-26T09:34:06Z

Any chance of a PyPI release with this fix?

h3llrais3r · 2018-03-13T19:04:34Z

@fernandog Any progress on this? Can this PR be merged to fix the list vs str issue of the guessed format?

fernandog · 2018-03-13T19:56:00Z

@ratoaq2 Is going to do a PR soon using the new unreleased guessit 3.0 to fix this

hpsbranco · 2018-04-06T15:41:05Z

The 'title' field also throws an error with certain videos, e.g., 'Rick.and.Morty.S03E04.The.Return.of.Worldender.1080p.AMZN.WEB-DL.DD5.1.H.264-QOQ.mkv'.

The provider (LegendasTV) returns a subtitle with the path

'Rick.and.Morty.S03.1080p.WEBRip-WEB-DL/Amazon.WEB-DL-QOQ/Rick.and.Morty.S03E04.Vindicators.3.The.Return.of.Worldender.1080p.Amazon.WEB-DL.DD+5.1.H.264-QOQ.srt',

which gives the following guess:

GuessIt found: {
    "season": 3,
    "screen_size": "1080p",
    "format": "WEB-DL",
    "title": [
        "Amazon",
        "Rick and Morty"
    ],
    "release_group": "QOQ",
... other stuff, guessed right
}

When guess_matches calls sanitize on title, we get the same 'expected string or buffer', this time from the re module.

The fix is easy (I found the if/elif pattern better for readability):

if video.series and 'title' in guess:
    series_guess = guess['title']
    if isinstance(series_guess, six.string_types) and sanitize(series_guess) == sanitize(video.series):
        matches.add('series')
    elif isinstance(series_guess, list) and any(sanitize(t) == sanitize(video.series) for t in series_guess):
        matches.add('series')

A similar code can be used to fix the format problem.

hpsbranco · 2018-04-06T15:54:52Z

By the way, the 'format' .lower() error also happens for 'Captain.America.Civil.War.2016.1080p.BluRay.x264.DTS-HD.MA.7.1-FGT.mkv', using Opensubtitles. This provider returns 'Captain.America.Civil.WAR.2016.1080p.HD.TC.AC3.x264-ETRG.br', which gets guessed as:

GuessIt found: {
    "title": "Captain America Civil WAR",
    "year": 2016,
    "screen_size": "1080p",
    "other": "HD",
    "format": [
        "Telecine",
        "BluRay"
    ],
... correct stuff
}

gfjardim · 2018-04-19T13:49:03Z

@hpsbranco, your code did the trick for the format issue:

if video.format and 'format' in guess:
    if isinstance(guess['format'], basestring) and guess['format'].lower() == video.format.lower():
        matches.add('format')
    elif isinstance(guess['format'], list) and any( fmt.lower() == video.format.lower() for fmt in guess['format']):
        matches.add('format')

miigotu · 2018-05-10T05:48:36Z

@gfjardim @hpsbranco basestring does not exist in py3, use six.string_types instead:

if video.format and 'format' in guess:
    if isinstance(guess['format'], six.string_types) and guess['format'].lower() == video.format.lower():
        matches.add('format')
    elif isinstance(guess['format'], list) and video.format.lower() in (fmt.lower() for fmt in guess['format']):
        matches.add('format')

Im going to add this PR#836 to SR for now to alleviate the subtitle failures until you guys come up with a permanent solution.

…for a temporary fix to format.lower issue Signed-off-by: miigotu <miigotu@gmail.com>

…for a temporary fix to format.lower issue Fixes #4546 Signed-off-by: miigotu <miigotu@gmail.com>

hpsbranco · 2018-05-24T14:05:36Z

guessit 3 solves this problem. However, there would be some changes to subliminal API due to the new values it returns (https://github.com/guessit-io/guessit/blob/develop/docs/migration2to3.rst).

So, we can use a simple mapping to return the old values, while benefiting from the improvements made to guessit, or simply use the new values. Either way, it's easy to do.

Let me know so I can submit a PR.

ratoaq2 · 2018-05-26T11:51:05Z

I'm planning to create a PR here for guessit 3.0 upgrade with the changes that will also fix this. But I've been busy lately, I can't promise this week

ratoaq2 · 2019-02-17T16:14:36Z

Already fixed by 3121a75

Fixed .lower() error thrown on certain videos due to format being…

250fc9e

… a list. Or something. I don't actually understand anything that's going on here, but this change works on Py2+Py3 and seems to achieve my desired result - get subtitles even on those files.

fernandog requested a review from ratoaq2 November 20, 2017 13:05

flake

7043054

fernandog force-pushed the feature/video_format branch from fea480f to 7043054 Compare November 20, 2017 14:04

fernandog commented Nov 20, 2017

View reviewed changes

Review

509aa3e

ratoaq2 reviewed Nov 20, 2017

View reviewed changes

Add test to make sure multiple formats doesnt have a 'format' match

d7c8915

fernandog force-pushed the feature/video_format branch from ab9267d to d7c8915 Compare November 20, 2017 17:38

ratoaq2 mentioned this pull request Dec 3, 2017

Proposal for enabling/disabling properties guessit-io/guessit#513

Merged

2 tasks

fernandog mentioned this pull request Jan 31, 2018

Quick fix #761 #769

Closed

h3llrais3r mentioned this pull request Mar 13, 2018

[Request] Telegram notifier? h3llrais3r/Auto-Subliminal#32

Closed

h3llrais3r mentioned this pull request May 2, 2018

Error while searching for subtitles for some episodes h3llrais3r/Auto-Subliminal#36

Closed

miigotu added a commit to SickChill/sickchill that referenced this pull request May 10, 2018

Update sublimina to latest develop, with added Diaoul/subliminal#836 …

d6382c8

…for a temporary fix to format.lower issue Signed-off-by: miigotu <miigotu@gmail.com>

miigotu added a commit to SickChill/sickchill that referenced this pull request May 10, 2018

Update sublimina to latest develop, with added Diaoul/subliminal#836 …

f7c9fb3

…for a temporary fix to format.lower issue Fixes #4546 Signed-off-by: miigotu <miigotu@gmail.com>

miigotu mentioned this pull request May 10, 2018

Update sublimina to latest develop, with added Diaoul/Subliminal#836 … SickChill/sickchill#4666

Merged

3 tasks

miigotu added a commit to SickChill/sickchill that referenced this pull request May 10, 2018

Update sublimina to latest develop, with added Diaoul/subliminal#836 …

0d6cc76

…for a temporary fix to format.lower issue Fixes #4546 Signed-off-by: miigotu <miigotu@gmail.com>

Thilas mentioned this pull request May 26, 2018

Subliminal: Fixed .lower() error thrown on certain videos pymedusa/Medusa#4252

Closed

ratoaq2 closed this Feb 17, 2019

ratoaq2 deleted the feature/video_format branch February 17, 2019 16:14

Fixed .lower() error thrown on certain videos #836

Fixed .lower() error thrown on certain videos #836

Conversation

fernandog commented Nov 20, 2017

fernandog Nov 20, 2017

Choose a reason for hiding this comment

Diaoul Nov 20, 2017

Choose a reason for hiding this comment

fernandog commented Nov 20, 2017

ratoaq2 left a comment

Choose a reason for hiding this comment

Diaoul commented Nov 20, 2017

fernandog commented Nov 20, 2017

fernandog commented Nov 25, 2017 • edited Loading

ratoaq2 commented Dec 1, 2017

Diaoul commented Dec 1, 2017

Diaoul commented Dec 1, 2017

fernandog commented Dec 1, 2017

ratoaq2 commented Dec 1, 2017

fernandog commented Dec 1, 2017 • edited Loading

Diaoul commented Dec 1, 2017

fernandog commented Dec 1, 2017 • edited Loading

Diaoul commented Dec 1, 2017

duramato commented Dec 1, 2017 • edited Loading

fernandog commented Dec 1, 2017

fernandog commented Dec 1, 2017 • edited Loading

ratoaq2 commented Dec 1, 2017

Diaoul commented Dec 1, 2017

fernandog commented Dec 1, 2017

ratoaq2 commented Dec 1, 2017

ratoaq2 commented Dec 1, 2017

Diaoul commented Dec 1, 2017

ratoaq2 commented Dec 2, 2017

fernandog commented Dec 2, 2017

fernandog commented Dec 22, 2017

dyve commented Feb 26, 2018

h3llrais3r commented Mar 13, 2018

fernandog commented Mar 13, 2018

hpsbranco commented Apr 6, 2018 • edited Loading

hpsbranco commented Apr 6, 2018

gfjardim commented Apr 19, 2018 • edited Loading

miigotu commented May 10, 2018 • edited Loading

hpsbranco commented May 24, 2018

ratoaq2 commented May 26, 2018

ratoaq2 commented Feb 17, 2019

Fixed `.lower()` error thrown on certain videos #836

Fixed `.lower()` error thrown on certain videos #836

fernandog commented Nov 25, 2017 •

edited

Loading

fernandog commented Dec 1, 2017 •

edited

Loading

fernandog commented Dec 1, 2017 •

edited

Loading

duramato commented Dec 1, 2017 •

edited

Loading

fernandog commented Dec 1, 2017 •

edited

Loading

hpsbranco commented Apr 6, 2018 •

edited

Loading

gfjardim commented Apr 19, 2018 •

edited

Loading

miigotu commented May 10, 2018 •

edited

Loading