Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed Captions/Subtitling issues #326

Closed
kmdewaal opened this issue Mar 7, 2021 · 14 comments
Closed

Closed Captions/Subtitling issues #326

kmdewaal opened this issue Mar 7, 2021 · 14 comments
Assignees
Labels
bug:general General Bugs component:mythtv:playback Issues relating to playback version:master Master Development Branch version:v31-fixes fixes/31

Comments

@kmdewaal
Copy link
Contributor

kmdewaal commented Mar 7, 2021

MythTV presentation of Closed Captions (subtitling) has issues, as reported on the forum in https://forum.mythtv.org/viewtopic.php?f=2&t=4340

Summary, as reported on the forum:
When playing "Buscando a Frida", a Telemundo show, the resulting text is virtually unusable. It contained maybe 20% of the text and often just parts of a word.
A video clip with few minutes of this show, "frida.ts", is available on a shared link in the forum and this clip has been used to reproduce the problem.

There have been two different problems identified:

  • The Subtitles menu
  • Rendering of the subtitles

Testing the video clip with VLC
When playing the clip with VLC the Subtitles tab has choices "Closed captions 1" to "Closed captions 4".
Choice 1 selects Spanish and choice 3 selects English.
Both subtitles are rendered correct.

Testing the video clip with mythfrontend
When playing the clip with mythfrontend the playback menu Subtitles shows:
Enable Subtitles
Select ATSC CC >
Select VBI CC >

The "Select ATSC CC" shows:
Select ATSC CC
ATSC CC 1: English
ATSC CC 2: Undetermined
See attachment "screen_atsc_menu.png"
screen_atsc_menu

When "ATSC CC 1: English" is selected then the Spanish subtitle is shown, but largely broken/unusable.
See attachment "screen_atsc_cc1.png"
screen_atsc_cc1

When "ATSC CC 2: Undetermined" is selected then English subtitle is shown, but largely broken/unusable.
See attachment "screen_atsc_cc2.png"
screen_atsc_cc2

The "Select VBI CC" shows:
Select VBI CC
CC 1: English
CC 3: Unknown
See attachment "screen_vbi_menu.png"
screen_vbi_menu

When "CC 1:English" is selected then the Spanish subtitle is shown CORRECT.
See attachment "screen_vbi_cc1.png".
screen_vbi_cc1

When "CC 3:Unknown" is selected then no subtitles are shown.

  • Platform:

  • As reported on the forum: MythTV 30.0 on Ubuntu 18.04.4 LTS

  • Reproduced on: Fedora 33

  • MythTV version:

  • As reported on the forum: v30

  • Reproduced with today's master

  • Package version:

  • Master built from source

  • Component:

  • myhtfrontend / playback

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behaviour?

The expected behaviour is:

  • Subtitle language choice as presented in the menu is the same as the language of the subtitle
  • Subtitle rendering is correct

What do you see instead?

Additional information

If the file "frida.ts" is not available anymore via the link on the forum
https://drive.google.com/drive/folders/1-YrIdeUGyPRdW5RwHpyn46Cyu2eGCER6?usp=sharing
then I can make a copy available on request.

Running mythfrontend with the "-v vbi" log option prints, among other information, both the Spanish and the English subtitle texts.
See attachment "mf-20210304-1815-vbi-frida.log
mf-20210304-1815-vbi-frida.log

@kmdewaal kmdewaal added bug:general General Bugs component:mythtv:playback Issues relating to playback version:master Master Development Branch version:v31-fixes fixes/31 labels Mar 7, 2021
@kmdewaal kmdewaal self-assigned this Mar 11, 2021
@kmdewaal
Copy link
Contributor Author

The video stream does have a Caption Service Descriptor and this says, the way I understand it, that there is one 608 and one 708 subtitle stream, both with the English language. This means that the menu's are correct in the sense that they show what is specified in the descriptor. Given that vlc only shows CC1 to CC4 without a language indication I think that the current MythTV implementation, which shows what is specified in the descriptor, is correct.

The remaining issues are:

  • The VBI/608 second stream (CC3) in English is not shown at all. The text is present in the stream and is decoded correctly by MythTV, as shown by the debug output, but it just does not appear on the screen
  • For ATSC/708 both streams, Spanish and English, are displayed wrong. It appears that parts of the text strings are interpreted as control codes which renders the remaining pieces of text in strange colors and in wrong screen locations.

@mathog
Copy link

mathog commented Mar 11, 2021

Possibly related bug. In at least one show, "Mallorca Files", using the "as reported" system from the first post in this issue, multi-line subtitles act like they see "cr" but not "lf" in CC1. This is an English language show, primarily, with Spanish and sometimes other European languages. Text which should be like this:

Fred ran to the sea
Sally jumped
Who?

shows up as

Who?umpede sea

If somebody provides instructions for using ffmpeg or some other program to cut out a small piece of a ".ts" file so that it retains the subtitles I will post an example video. At the moment all I have is an entire show, which is too big to post.

@kmdewaal
Copy link
Contributor Author

You can use dd to copy a part of the recording as follows:
dd if=your_recording.ts of=clip_for_debugging.ts bs=1024k count=100
This copies the first 100Mbytes of the file. One Mbyte is about one second.

@mathog
Copy link

mathog commented Mar 11, 2021

Is there a command to take a snippet out of the middle retaining the CC, like start at 15 minutes, duration 2 minutes?
I'm assuming there is some sort of header on the ts file, so that starting the dd at an offset may produce a broken video.

@kmdewaal
Copy link
Contributor Author

The ts format does not have headers. Keep in mind that it is used for broadcast and you can switch on your TV at any moment.
Still counting in Mbytes, you can skip the first part with the skip command. For example, skip the first 500 Mbytes and then copy 100 Mbytes is done like this:
dd if=your_recording.ts of=clip_for_debugging.ts bs=1024k skip=500 count=100
You can check with vlc if the clip_for_debugging is the piece that you want, the rule-of-thumb that one Mbyte is one second is not exact. With ffmpeg it should be possible to edit based on time but I never use that.

@kmdewaal
Copy link
Contributor Author

I've found a bug in the code for the display of the CC3 captions in the "Select VBI CC" menu (the 608 captions).
The fix for this bug does fix the CC3 captions for both the "frida.ts" and the "loli.ts" clips.
In both of these clips the CC3 choice is displayed as "CC3: Unknown" but this actually does select the English subtitles.
If my understanding of the code is correct then the "Select VBI CC" / "CC3: xxx" choice does never show anything for anybody.
As I cannot receive ATSC channels myself I would like to hear if this is really the case before I commit the fix.

@mathog
Copy link

mathog commented Mar 12, 2021

This example from "Mallorca Files" contains an example of a multi-line caption which collapses to one line. I didn't write the exact text down, but it was something about "dressing like a Mallorcan". The clip is not long so it is easy to find this glitch.

clip_for_debugging.zip

@mathog
Copy link

mathog commented Mar 12, 2021

If my understanding of the code is correct then the "Select VBI CC" / "CC3: xxx" choice does never show anything for anybody.
As I cannot receive ATSC channels myself I would like to hear if this is really the case before I commit the fix.

On our MythTV 30.0 Ubuntu 18.04.4 LTS system selecting "VBI CC" / "CC3: unknown" is accepted but no subtitles at all appear. There are not even blank boxes, it is as if no CC at all is active.

There was one oddity. The first time I tried to test this the video was paused and then the VB1 CC menu was invoked. It did not list "CC3: unknown", only "CC1: English". However, when the menu was invoked while the video was running it did show this option, and it showed it afterwards when the video was paused as well. This behavior was
repeatable so long as some other video was played in between attempts. Note, this was a Spanish language recording from 2018, and while it said "CC1: English" the captions were actually in Spanish.

@kmdewaal
Copy link
Contributor Author

On our MythTV 30.0 Ubuntu 18.04.4 LTS system selecting "VBI CC" / "CC3: unknown" is accepted but no subtitles at all appear. There are not even blank boxes, it is as if no CC at all is active

Thanks for reporting back. This is consistent with my tests on master (pre-v32) and I have a fix for this that I will commit to master.

There was one oddity. The first time I tried to test this the video was paused and then the VB1 CC menu was invoked. It did not list "CC3: unknown", only "CC1: English". However, when the menu was invoked while the video was running it did show this option, and it showed it afterwards when the video was paused as well.

Subtitle menu's are built when subtitles are encountered in the stream so it can take a short time before the menu's are built.

while it said "CC1: English" the captions were actually in Spanish.

This is consistent with the frida.ts and loli.ts clips. There is metadata in the stream that tells that the language is English so that is what MythTV reports. This is in my understanding not a bug in MythTV but this is how the broadcaster generates the stream.

The issues visible in the ATSC (CC708) subtitles in the clip_for_debugging.ts look to me similar to the issues visible in frida.ts when the ATSC CC1 or CC2 subtitle is selected. It looks like part of the subtitle text is interpreted as position and color codes which causes wrong positions, wrong colors and missing text on the screen.

kmdewaal added a commit that referenced this issue Mar 12, 2021
The CEA-608 closed captions can show two different subtitle streams
for two different languages, called CC1 and CC3.
When present, these streams can be selected in the subtitle menu.
Due to a bug only the CC1 stream is actually shown.
This is now fixed and also the CC3 stream can now be shown.

Refs #326
@kmdewaal
Copy link
Contributor Author

@mathog Can you please provide a recording that has correct ATSC CC1 and CC2 subtitling? This would be useful with debugging and regression testing. If possible with a size of at least 100Mbyte. With gmail this can be mailed direct to me, email address klaas.de.waal@gmail.com

@mathog
Copy link

mathog commented Mar 17, 2021

Example sent by email.

@kmdewaal
Copy link
Contributor Author

As reported, the ATSC CC fails on version 30 and it still fails on today's master.
However, with MythTV version 0.28, as found on Mythbuntu 16.04, the ATSC CC is OK. Some bisecting to do.....

kmdewaal added a commit that referenced this issue Mar 21, 2021
In avformatdecoder.cppp the closed captions packets are extracted from
the video stream and sent to the decoders for processing.
In commit 4880fe2 of Nov 19, 2016  a check
is introduced to prevent potential out-of-bound memory access.
This check is not correct, causing the last closed caption packet
in each buffer to be always discarded. This causes all sorts of
issues in the rendering of the closed captions.
The check on out-of-bound memory access is now corrected.

Refs #326
kmdewaal added a commit that referenced this issue Mar 22, 2021
In avformatdecoder.cppp the closed captions packets are extracted from
the video stream and sent to the decoders for processing.
In commit 4880fe2 of Nov 19, 2016  a check
is introduced to prevent potential out-of-bound memory access.
This check is not correct, causing the last closed caption packet
in each buffer to be always discarded. This causes all sorts of
issues in the rendering of the closed captions.
The check on out-of-bound memory access is now corrected.

Refs #326

(cherry picked from commit 4528c70)
Signed-off-by: Klaas de Waal <kdewaal@mythtv.org>
kmdewaal added a commit that referenced this issue Mar 22, 2021
In avformatdecoder.cppp the closed captions packets are extracted from
the video stream and sent to the decoders for processing.
In commit 4880fe2 a check is introduced
to prevent potential out-of-bound memory access.
This check is not correct, causing the last closed caption packet
in each buffer to be always discarded. This causes all sorts of
issues in the rendering of the closed captions.
The check on out-of-bound memory access is now corrected.

Refs #326

(copied from commit 4528c70)
kmdewaal added a commit that referenced this issue Mar 22, 2021
The CEA-608 closed captions can show two different subtitle streams
for two different languages, called CC1 and CC3.
When present, these streams can be selected in the subtitle menu.
Due to a bug only the CC1 stream is actually shown.
This is now fixed and also the CC3 stream can now be shown.

Refs #326

(cherry picked from commit 78edc37)
Signed-off-by: Klaas de Waal <kdewaal@mythtv.org>
@kmdewaal
Copy link
Contributor Author

The fix for the CEA-708 ATSC CC, as commited to master, is now backported to v31 and v30.
The fix for the EIA-608 VBI CC, as committed to master, is now backported to v31.

kmdewaal added a commit that referenced this issue Mar 23, 2021
Differentiate between CR (Carriage Return) and HCR (Horizontal Carriage Return)
control codes as specified in CEA-708-D, Section 7.1.4, page 30.

Refs #326
kmdewaal added a commit that referenced this issue Mar 23, 2021
Add Caption Service Descriptor debug output.
No functional changes.

Refs #326
kmdewaal added a commit that referenced this issue Mar 23, 2021
The CEA-608 closed captions can show two different subtitle streams
for two different languages, called CC1 and CC3.
When present, these streams can be selected in the subtitle menu.
Due to a bug only the CC1 stream is actually shown.
This is now fixed and also the CC3 stream can now be shown.

Refs #326

(cherry picked from commit 78edc37)
@kmdewaal
Copy link
Contributor Author

The fix for the EIA-608 VBI CC, as committed to master, is now also backported to v30.
This makes now master, v31 and v30 completely fixed.
There is an additional fix applied to master for HCR (horizontal carriage return) handling which is specified in CEA-708-D but which was not implemented. This is not related to a known issue and hence it is not backported to v31 or v30.
The problems reported in this ticket are now fixed so this ticked is now closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:general General Bugs component:mythtv:playback Issues relating to playback version:master Master Development Branch version:v31-fixes fixes/31
Projects
None yet
Development

No branches or pull requests

2 participants