Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buggy ttml support #122

Closed
MikaYuoadas opened this issue Nov 18, 2014 · 5 comments
Closed

Buggy ttml support #122

MikaYuoadas opened this issue Nov 18, 2014 · 5 comments

Comments

@MikaYuoadas
Copy link
Contributor

The generated ttml file are not valid: they start and end with the valid ttml headers (xml), but the rest of the file is just a in regular SubRip format.

Here's a short example of the kind of file I get with ccextractor -out=smptett video.ts:

<?xml version="1.0" encoding="UTF-8" ?>
<tt xmlns="http://www.w3.org/ns/ttml" xml:lang="en">
<body>
<div>
1
00:00:48,280 --> 00:00:49,880
Mon pauvre, je suis désolée !

2
00:00:50,080 --> 00:00:50,720
Ca va ?

3
00:00:50,960 --> 00:00:51,920
T'as rien ?

</div></body></tt>

The input files I have are all .ts with embedded dvb_teletext subtitles.

@anshul1912
Copy link
Contributor

Which version are you using I am getting correct output.

I am using git version from here.

@MikaYuoadas
Copy link
Contributor Author

Same here, I'm on latest commit (b95e06c).

I've started looking a bit at the code and it looks like it should only affect dvb_teletext subtitles.
It looks like it's around line 632 in file src/lib_ccx/telxcc.c: the switch only has a a CCX_OF_TRANSCRIPT and a default to srt.

I'm getting a more correct output by adding another case for smptett like this:

case CCX_OF_SMPTETT:
    timestamp_to_smptetttime(page->show_timestamp, timecode_show);
    timestamp_to_smptetttime(page->hide_timestamp, timecode_hide);
    if (ctx->wbout1.fh!=-1)
        fdprintf(ctx->wbout1.fh, "      <p region=\"speaker\" begin=\"%s\" end=\"%s\">%s</p>\n", timecode_show, timecode_hide, page_buffer_cur);

But this quick & dirty fix duplicate existing code to generate smptett and doesn't handle line ending correctly (the -lf param is completly ignored).

@anshul1912
Copy link
Contributor

can you share your video file, I will look at it.

It seems teletext code is untouched from decades.
Actually correct solution would be not to write anything in the output file, We should pass decoder subtitle and things should be written there. If you look for dvb_subtitle and 608 things are like that.
and that decode sub context must be passed to encoder.

Now at time of initialization encoder writes the header and footer correctly, but it never gets the decode packet so that it can handle it.

and last question are you willing to contribute this in ccextractor.

@MikaYuoadas
Copy link
Contributor Author

I'd like to, but unfortunately I don't have the time right now to do it correctly.

@MikaYuoadas
Copy link
Contributor Author

Forgot to close this ticket when PR #123 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants