Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tvmaze metadata lookup failures #654

Open
SteveErl opened this issue Oct 24, 2022 · 14 comments · May be fixed by #701
Open

Tvmaze metadata lookup failures #654

SteveErl opened this issue Oct 24, 2022 · 14 comments · May be fixed by #701
Assignees

Comments

@SteveErl
Copy link
Contributor

  • Platform: All

  • MythTV version: fixes/32

  • Package version:

  • Component: metadatalookup

What steps will reproduce the bug?

Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, that field was empty.

For the same show, the early contents of the database entries had 'airdate' populated, but no 'airtime'. I'm guessing that at the time the data was populated, the broadcast time had not been decided yet. The missing 'airtime' caused the specific episode lookup to fail.

How often does it reproduce? Is there a required condition?

It's rare, but when it happens, it causes metadata lookups to fail.

What is the expected behaviour?

The lookup scripts should be more flexible to handle these sorts of issues.
If the 'runtime' field is empty, try using the 'averageRuntime' field. If the 'averageRuntime' field is also empty, set a default runtime length. If the 'airtime' field is empty, look for an episode which matches the 'airdate' field.

What do you see instead?

Additional information

Here's an example of the original data for "Everything's Trash":

{"id":60518,
"url":"https://www.tvmaze.com/shows/60518/everythings-trash",
"name":"Everything's Trash",
"type":"Scripted",
"language":"English",
"genres":[],
"status":"Running",
"runtime":null,
"averageRuntime":30,
"premiered":"2022-07-13",
"ended":null,
"officialSite":null,
"schedule": {"time":"","days":[]},
"rating": {"average":null},
"weight":94,

@SteveErl
Copy link
Contributor Author

I have a solution for this bug. Once I figure out how to push my solution to my GitHub branch, the fix will be available.

SteveErl added a commit to SteveErl/mythtv that referenced this issue Oct 25, 2022
@rcrdnalor
Copy link
Contributor

Looking at your provided patch, this issue actually addresses four items:

  • Cleanup python imports of not needed modules
    No functional change.
  • Fall back on average runtime if the the episode's runtime is not provided
    I am not sure what improvement is behind this change. I prefer to not
    provide inaccurate/guessed data if no data are available, e.g.: the
    hardcoded 60 mins for a duration which is not provided by tvmaze.
  • Add robustness when fetching a non existent timestamp for an episode:
    Do not interrupt iteration of all found episodes matching a certain timestamp
    if one episode does not provide accurate data.
  • Workaround when the grabber tvmaze.py is not providing correct data for
    the timestamp of an aired episode: Use aired date instead of aired timestamp.
    IMO, this workaround only fits to your actual use-case: 'record one episode a day'.
    What happens if a user records multiple episodes a day and is using the
    'timestamp' method to fetch actual episode data and tvmaze does not - yet -
    provide current timestamps?
    I assume the user will get confused because he does not know that this kind
    of workaround exists.

Once you considered this feedback, please open a pull request against mythtv/master
on git which addresses above items in separately commits and explain the need for change.
See https://www.mythtv.org/wiki/Development_guide
and https://www.mythtv.org/wiki/Submitting_Bug_Fixes
for details.
Writing detailed pull requests against current mythtv/master allows other
developers and users to efficiently review them and incorporate the changes.

@SteveErl
Copy link
Contributor Author

SteveErl commented Nov 1, 2022

@rcrdnalor : Thanks for your feedback.

Bullet 2 - I agree that assigning a default of 60 minutes was an ugly hack. I'll rework the code to eliminate it.

Bullet 4 - It's true that some TV shows air multiple original broadcast episodes in one day, but that's rare. There are cases where the TvMaze database provides an airdate but doesn't provide a timestamp, but that's very rare. So, in the extremely rare case of both of these happening simultaneously, there would be multiple episodes which would be possible matches for the recording. There's no demand that there be only one episode per day. The script will have more than one 'matchesFound', and will print them all out. It's already designed to do this. If all you have is a date, and there are 5 episodes matching that date, then all 5 should be printed out.

If I follow your suggestion to break down my solution into smaller chunks, do I have to create a new Issue for each one? Or can there be four commits corresponding to one Issue, with each one adding a little bit more? Does each commit require its own pull request, or can 4 commits go in one pull request?

The Submitting Bug Fixes document says to "run the current unstable/development version" but that seems dangerous to me. I need my MythTv to run in a stable fashion every day, so I'd rather have my system based on fixes/32. How can you test your code on an unstable release? Do you have to run it on a different machine?
The Submitting Bug Fixes document also suggests using Trac, but isn't that deprecated now? I'm under the impression that the current process is to use GitHub to document bugs and features, but it's frustrating that I cannot see how to assign myself an issue which I've created.

@rcrdnalor
Copy link
Contributor

Writing pull requests should be always against git/master.
If you use *buntu then this may help to install master for testing:
https://github.com/MythTV/packaging/tree/master/deb-light
For the separation of the bullets, I think I will do item 1 and item 3, anyway.
Please focus on items 2 & 4.

Just a quick question:
You said, that in 'rare cases' tvmaze does not provide early data.
If you fetch the metadata for that cases the next or the other day, does it succeed?

@SteveErl
Copy link
Contributor Author

SteveErl commented Nov 2, 2022

@rcrdnalor : Could you please clarify the process? When you said "open a pull request against mythtv/master ... which addresses above items in separate commits", does that mean I should have 4 separate commits against one issue (this one), or would I need to create 4 separate issues, with one commit against each one?
I've started porting my changes to git/master. So far, I've implemented bullet 1. Should I commit that change with the explanation for the need for the change in the commit comment, or does the need for the change have to be documented in its own issue?
Should there be one pull request for all 4 bullet items, or 4 pull requests for the 4 bullets?

@SteveErl
Copy link
Contributor Author

SteveErl commented Nov 3, 2022

@rcrdnalor : In regards to your quick question, I first noticed missing 'runtime' and 'airtime' fields in the TvMaze database when processing an episode of a show called "Everything's Trash". At the broadcast time of the show (7/27/22), and 2 days later (7/29/22), the database had:

"id":2356087, ... "airdate":"2022-07-27", "airtime":"","airstamp":"2022-07-27T16:00:00+00:00","runtime":null

I think about a week later, when I checked again, it showed

"id":2356087,... "airdate":"2022-07-27", "airtime":"22:00","airstamp":"2022-07-28T02:00:00+00:00","runtime":30

It appears that someone updated the database between these fetch times. Since the metadata is fetched right after the recording has finished, an update a week later doesn't really help.

@SteveErl
Copy link
Contributor Author

SteveErl commented Nov 4, 2022

Updating the solution to split it into smaller commit components and to base it off of mythtv/master.

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 4, 2022
SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 4, 2022
…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654
SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 4, 2022
SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 8, 2022
…ime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654
SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 9, 2022
…rageRuntime field. If both are missing, avoid a crash by testing for a non-NULL episode duration. MythTV#654"

This reverts commit 82bad85.

I've learned that I cannot push more than one commit from one branch.
@rcrdnalor
Copy link
Contributor

rcrdnalor commented Nov 12, 2022

Steve,
I see good progress here. Please keep in mind that the commit messages of your commits in your pull requests are empty.
Future reader of this source code will not see the motivation of the change and need to go down the hard way to identify the issue behind the pull request.
The text you wrote in the pull request message will not be included in the commit message when the pull request will be applied.

See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue
BTW, the keyword "Refs #654" should work as well within your commit message of your local branch.

Thank you for improving MythTV,
regards,
Roland

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 13, 2022
In rare cases, the TvMaze database provides null timestamp
or runtime data fields. To make the metadata retrieval script
more fault tolerant, null checks have been added for these values.

When the timestamp field is null, don't use it to generate a
datetime to use in a search for a matching episode.

When the runtime duration field is null, the delta size is
now set to zero minutes. The code which looks for an episode
match has been updated to separate "on time" recordings from
"a little late" recordings to accommodate the possible delta
size of zero.

Refs: MythTV#654
SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 13, 2022
The script tvmaze/show.py had a line which retrieved the episode
duration from the metadata provider, but then assigned it to
num_episodes, which makes no sense. There is no code looking for
num_episodes, but there is code looking for runtime, so the line
has been fixed to assign to runtime.

Using the TVmaze service for metadata lookup, there are rare
cases where the 'runtime' field is empty, but 'averageRuntime'
is not. This happened when looking up metadata for a TV show
called "Everything's Trash". Note that the database entry for
this show was subsequently updated to include the missing
'runtime' field. But on the night it was recorded, and for
several days later, that field was null.

Here's an excerpt of the original metadata for "Everything's Trash":

"id":60518,
"url":"https://www.tvmaze.com/shows/60518/everythings-trash",
"name":"Everything's Trash",
"type":"Scripted",
"language":"English",
"genres":[],
"status":"Running",
"runtime":null,
"averageRuntime":30,

To handle this type of scenario, when the 'runtime' field is null,
read the duration from the 'averageRuntime' field. If both are null,
then avoid trying to cast that value to an integer in tvmaze.py.

Refs: MythTV#654
SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 13, 2022
The tool pylint reported several unnecessary includes in
tvmaze.py. These have been removed to make the script run
a little bit faster.

Refs: MythTV#654
SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 13, 2022
The tool pylint reported several unnecessary includes in
tvmaze.py. These have been removed to make the script run
a little bit faster.

Refs: MythTV#654
@SteveErl
Copy link
Contributor Author

@rcrdnalor : I've amended commit messages and have generated 3 pull requests: #658, #662, #663. The fourth commit will be slightly dependent on the 3rd one getting merged to master. I'll have to wait for that to happen before proceeding with generation of that commit and the corresponding pull request.

SteveErl added a commit to SteveErl/mythtv that referenced this issue Nov 30, 2022
A recording of a broadcast of the first episode in
season 16 of "Criminal Minds" on CBS lead to a metadata
lookup failure. The general syntax for this metadata
lookup was:
    tvmaze.py -N "Criminal Minds" "2022-11-24 11:00:00"
Investigation revealed that for recent years this show is
normally only available on a streaming service called
Paramount+.
  https://api.tvmaze.com/shows/81
    ...
    "webChannel":{"id":107,"name":"Paramount+",
    "country":null,"officialSite":
    "https://www.paramountplus.com/"}
    ...

The null value for Country caused the Timezone
retrieval code in tvmaze.py to fail. The code has
been updated to handle a null Country field.

While regression testing, another crash appeared.
    tvmaze.py -N "The Masked Singer" "2022-11-24 19:00:00"
This show has distinct versions in many countries and
timezones.

country='United States',  timezone='America/New_York'
country='United Kingdom', timezone='Europe/London'
country='Australia',      timezone='Australia/Sydney'
country='Belgium',        timezone='Europe/Brussels'
country=None, show_tz =None  (Japanese version)
country='Finland',        timezone='Europe/Helsinki'
country='Germany',        timezone='Europe/Busingen'
country='Israel',         timezone='Asia/Jerusalem'
country='Mexico',         timezone='America/Monterrey'

The null Country field for the Japanese version is
handled fine, but passing 'Asia/Jerusalem' to
astimezone() caused it to throw an exception.

  UnboundLocalError: local variable 'ttmfmt'
  referenced before assignment

This appears to be a bug in python3. To prevent
it from crashing our script, UnboundLocalError
has been added to the exception catcher.

Some of the debug print messages have also been
updated to improve their format.

Refs: MythTV#654
@SteveErl
Copy link
Contributor Author

The UnboundLocalError is not necessarily a bug in python3. It seems to be in the Mythtv version of dt.py. Here's the stack trace when show_tz = 'Asia/Jerusalem':

Traceback (most recent call last):
  File "programs/scripts/metadata/Television/tvmaze.py", line 767, in <module>
    main()
  File "programs/scripts/metadata/Television/tvmaze.py", line 744, in main
    buildNumbers(args, opts)
  File "programs/scripts/metadata/Television/tvmaze.py", line 235, in buildNumbers
    dtInTgtZone = dtInLocalZone.astimezone(posixtzinfo(show_tz))
  File "/usr/lib/python3/dist-packages/MythTV/utility/singleton.py", line 49, in __call__
    inst = type.__call__(cls, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/MythTV/utility/dt.py", line 234, in __init__
    self._process(fd, version)
  File "/usr/lib/python3/dist-packages/MythTV/utility/dt.py", line 167, in _process
    t = unpack(ttmfmt, fd.read(calcsize(ttmfmt)))[0]
UnboundLocalError: local variable 'ttmfmt' referenced before assignment

@rcrdnalor
Copy link
Contributor

Looking at the definition:

$ file /usr/share/zoneinfo/Asia/Jerusalem 
/usr/share/zoneinfo/Asia/Jerusalem: timezone data (fat), version 3, 9 gmt time flags, 9 std time flags, no leap seconds, 149 transition times, 9 local time types, 21 abbreviation chars

the version 3 is not supported by

class posixtzinfo(basetzinfo, metaclass=InputSingleton):

Please open an issue against the python bindings.

@SteveErl
Copy link
Contributor Author

@rcrdnalor : Thanks. I've opened #672 for the issue. Should I change the comment to say UnboundLocalError is needed until version 3 of the timezone protocol is supported?

SteveErl added a commit to SteveErl/mythtv that referenced this issue Dec 3, 2022
The previous commit added an exception catch for
UnboundLocalError because regression testing
revealed a failure in handling the Asia/Jerusalem
timezone. This timezone uses the version 3 data
format. A new issue MythTV#672 was opened to ask for
support for version 3. Since that issue has now
been fixed, we no longer need to catch
UnboundLocalError.

Refs: MythTV#654
SteveErl added a commit to SteveErl/mythtv that referenced this issue Jan 11, 2023
When using tvmaze.py to retrieve Collection information, the
runtime information was not being displayed. This update adds
it to output. For example, the Collection information for
the TV show "Hogan's Heroes" is now:

/usr/share/mythtv/metadata/Television/tvmaze.py -C 1475
<?xml version='1.0' encoding='UTF-8'?>
<metadata>
  <item>
    <title>Hogan's Heroes</title>
...
    <inetref>1475</inetref>
    <imdb>tt0058812</imdb>
    <collectionref>1475</collectionref>
    <language>en</language>
    <releasedate>1965-09-17</releasedate>
    <userrating>8.000000</userrating>
    <popularity>85.0</popularity>
    <year>1965</year>
    <runtime>30</runtime>
...
  </item>
</metadata>

Refs: MythTV#654
linuxdude42 pushed a commit that referenced this issue Jan 22, 2023
The tool pylint reported several unnecessary includes in
tvmaze.py. These have been removed to make the script run
a little bit faster.

Refs: #654
linuxdude42 pushed a commit that referenced this issue Jan 22, 2023
The script tvmaze/show.py had a line which retrieved the episode
duration from the metadata provider, but then assigned it to
num_episodes, which makes no sense. There is no code looking for
num_episodes, but there is code looking for runtime, so the line
has been fixed to assign to runtime.

Using the TVmaze service for metadata lookup, there are rare
cases where the 'runtime' field is empty, but 'averageRuntime'
is not. This happened when looking up metadata for a TV show
called "Everything's Trash". Note that the database entry for
this show was subsequently updated to include the missing
'runtime' field. But on the night it was recorded, and for
several days later, that field was null.

Here's an excerpt of the original metadata for "Everything's Trash":

"id":60518,
"url":"https://www.tvmaze.com/shows/60518/everythings-trash",
"name":"Everything's Trash",
"type":"Scripted",
"language":"English",
"genres":[],
"status":"Running",
"runtime":null,
"averageRuntime":30,

To handle this type of scenario, when the 'runtime' field is null,
read the duration from the 'averageRuntime' field. If both are null,
then avoid trying to cast that value to an integer in tvmaze.py.

Refs: #654
linuxdude42 pushed a commit that referenced this issue Jan 22, 2023
When using tvmaze.py to retrieve Collection information, the
runtime information was not being displayed. This update adds
it to output. For example, the Collection information for
the TV show "Hogan's Heroes" is now:

/usr/share/mythtv/metadata/Television/tvmaze.py -C 1475
<?xml version='1.0' encoding='UTF-8'?>
<metadata>
  <item>
    <title>Hogan's Heroes</title>
...
    <inetref>1475</inetref>
    <imdb>tt0058812</imdb>
    <collectionref>1475</collectionref>
    <language>en</language>
    <releasedate>1965-09-17</releasedate>
    <userrating>8.000000</userrating>
    <popularity>85.0</popularity>
    <year>1965</year>
    <runtime>30</runtime>
...
  </item>
</metadata>

Refs: #654
linuxdude42 pushed a commit that referenced this issue Jan 22, 2023
In rare cases, the TvMaze database provides null timestamp
or runtime data fields. To make the metadata retrieval script
more fault tolerant, null checks have been added for these values.

When the timestamp field is null, don't use it to generate a
datetime to use in a search for a matching episode.

When the runtime duration field is null, the delta size is
now set to zero minutes. The code which looks for an episode
match has been updated to separate "on time" recordings from
"a little late" recordings to accommodate the possible delta
size of zero.

Refs: #654
linuxdude42 pushed a commit that referenced this issue Jan 22, 2023
A recording of a broadcast of the first episode in
season 16 of "Criminal Minds" on CBS lead to a metadata
lookup failure. The general syntax for this metadata
lookup was:
    tvmaze.py -N "Criminal Minds" "2022-11-24 11:00:00"
Investigation revealed that for recent years this show is
normally only available on a streaming service called
Paramount+.
  https://api.tvmaze.com/shows/81
    ...
    "webChannel":{"id":107,"name":"Paramount+",
    "country":null,"officialSite":
    "https://www.paramountplus.com/"}
    ...

The null value for Country caused the Timezone
retrieval code in tvmaze.py to fail. The code has
been updated to handle a null Country field.

While regression testing, another crash appeared.
    tvmaze.py -N "The Masked Singer" "2022-11-24 19:00:00"
This show has distinct versions in many countries and
timezones.

country='United States',  timezone='America/New_York'
country='United Kingdom', timezone='Europe/London'
country='Australia',      timezone='Australia/Sydney'
country='Belgium',        timezone='Europe/Brussels'
country=None, show_tz =None  (Japanese version)
country='Finland',        timezone='Europe/Helsinki'
country='Germany',        timezone='Europe/Busingen'
country='Israel',         timezone='Asia/Jerusalem'
country='Mexico',         timezone='America/Monterrey'

The null Country field for the Japanese version is
handled fine, but passing 'Asia/Jerusalem' to
astimezone() caused it to throw an exception.

  UnboundLocalError: local variable 'ttmfmt'
  referenced before assignment

This appears to be a bug in python3. To prevent
it from crashing our script, UnboundLocalError
has been added to the exception catcher.

Some of the debug print messages have also been
updated to improve their format.

Refs: #654
linuxdude42 pushed a commit that referenced this issue Jan 22, 2023
The previous commit added an exception catch for
UnboundLocalError because regression testing
revealed a failure in handling the Asia/Jerusalem
timezone. This timezone uses the version 3 data
format. A new issue #672 was opened to ask for
support for version 3. Since that issue has now
been fixed, we no longer need to catch
UnboundLocalError.

Refs: #654
@linuxdude42
Copy link
Contributor

Sorry, didn't meant to jump in here. I was looking at the list of pull requests, and didn't realize they were all connected to this issue. Hope I haven't caused a problem.

@SteveErl
Copy link
Contributor Author

@linuxdude42 : No apology necessary. I've been waiting a long time for any progress on getting these changes merged. Thank you very much!

SteveErl added a commit to SteveErl/mythtv that referenced this issue Jan 24, 2023
Sometimes, the TvMaze database only specifies a date
and not a time for an episode or set of episodes.
This is most common in webChannel originated shows.
In such a case, the 'Find Episode by Timestamp'
syntax, fails to find an exact or close match.
For example:
   tvmaze.py -N "Criminal Minds" "2022-11-24 21:00:00"

For this episode, in the TvMaze database
        airtime =
        airdate = 2022-11-24

With this code update, we detect when no airtime is
specified in the database, and apply a match-by-date
behavior.

The match-by-date results are only used as a last
resort. First we select exact timestamp matches.
If there are none of those, we select close
timestamp matches. If there are none of those,
then we'll select date only matches.

For example:

tvmaze.py -N "Criminal Minds" "2022-11-24 21:00:00"
<?xml version='1.0' encoding='UTF-8'?>
<metadata>
  <item>
    <title>Criminal Minds</title>
    <subtitle>Just Getting Started</subtitle>
    ...
    <season>16</season>
    <episode>1</episode>
    <inetref>81</inetref>
    ...
  </item>
  <item>
    <title>Criminal Minds</title>
    <subtitle>Sicarius</subtitle>
    ...
    <season>16</season>
    <episode>2</episode>
    <inetref>81</inetref>
    ...
  </item>
</metadata>

Note that there were 2 episodes released on 2022-11-24,
so both episodes appear in the output results.

In addition, a display of 'runtime' has been added
for the -M invocation option. For example

tvmaze.py -M "Fire Country"
<?xml version='1.0' encoding='UTF-8'?>
<metadata>
  <item>
    <title>Fire Country</title>
    ...
    <inetref>60339</inetref>
    <collectionref>60339</collectionref>
    <language>en</language>
    <releasedate>2022-10-07</releasedate>
    <userrating>6.300000</userrating>
    <popularity>99.0</popularity>
    <year>2022</year>
    <runtime>60</runtime>
    ...
  </item>
</metadata>

Resolves MythTV#654
@SteveErl SteveErl linked a pull request Jan 24, 2023 that will close this issue
5 tasks
@rcrdnalor rcrdnalor self-assigned this Feb 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants