Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CEA-708] Is there a way to extract subtitles #677

Open
fsh240led opened this issue Feb 2, 2017 · 14 comments
Open

[CEA-708] Is there a way to extract subtitles #677

fsh240led opened this issue Feb 2, 2017 · 14 comments
Assignees

Comments

@fsh240led
Copy link

@fsh240led fsh240led commented Feb 2, 2017

The others are extracted without problems
But the video below is not extract
https://docs.google.com/uc?id=0B42jO8cBUe6baEtqLTNkRnExbVk&export=download

PotPlayer displays subtitles
http://i.imgur.com/SX9KApM.png

@fsh240led fsh240led changed the title [CEA-708] Does not extract subtitles [CEA-708] Is there a way to extract subtitles? Feb 7, 2017
@fsh240led fsh240led changed the title [CEA-708] Is there a way to extract subtitles? [CEA-708] Is there a way to extract subtitles Feb 7, 2017
@cfsmp3

This comment has been minimized.

Copy link
Contributor

@cfsmp3 cfsmp3 commented Feb 7, 2017

@fsh240led

This comment has been minimized.

Copy link
Author

@fsh240led fsh240led commented Feb 8, 2017

Preferences required

Preferences(F5) -> Filter Control -> Video Decoder -> Built-in codec/DXVA settings -> Enable Closed Captioning (Check)

@cfsmp3

This comment has been minimized.

Copy link
Contributor

@cfsmp3 cfsmp3 commented Feb 8, 2017

I'm probably missing some fonts, this is what a see after enabling CC.

sbsplus

@fsh240led

This comment has been minimized.

Copy link
Author

@fsh240led fsh240led commented Feb 8, 2017

@cfsmp3

This comment has been minimized.

Copy link
Contributor

@cfsmp3 cfsmp3 commented Feb 9, 2017

Assigning to @Izaron since he was the last one looking into Korean.

@cfsmp3

This comment has been minimized.

Copy link
Contributor

@cfsmp3 cfsmp3 commented Feb 9, 2017

GSoC qualification: 3 points.

@Izaron

This comment has been minimized.

Copy link
Member

@Izaron Izaron commented Feb 9, 2017

@fsh240led Are this output at 00:09 from the other video player is incorrect also in comparing with PotPlayer? Can you confirm it?

default

@cfsmp3 and @AlexBratosin2001 (since you're PTS man)
Problem is there: https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/stream_functions.c#L307
Can you give me specs about this binary stream, when (nextheader[7]&0xC0) == 0xC0? Maybe there just problem with hexadecimal hardcoded constants.

@fsh240led

This comment has been minimized.

Copy link
Author

@fsh240led fsh240led commented Feb 9, 2017

It is an unknown words
It should be output as

@mahalwal

This comment has been minimized.

Copy link
Contributor

@mahalwal mahalwal commented Dec 21, 2017

@fsh240led Can you tell which font are you using from this link? ( http://ko.cooltext.com/Fonts-Unicode-Korean ) I tried Batang(che), Gungsuh(che) and they are still giving random characters.

@MatejMecka

This comment has been minimized.

Copy link
Contributor

@MatejMecka MatejMecka commented Jan 3, 2018

@fsh240led Can you tell me with which command did you got this subtitles?

@cfsmp3 cfsmp3 added GCI19 and removed CEA-708 labels Oct 15, 2018
@navimakarov

This comment has been minimized.

Copy link
Contributor

@navimakarov navimakarov commented Dec 11, 2018

The problem was that when we tried to process this file we got error "Window has to be defined" because decoder->current_window == -1;
https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708.c#L1375
So I found out that in ccx_decoders_708.c we had a condition which was impossible according to author's comment but here that condition returned true which crashed ccextractor extracting captions and made ret = 0; which is No captions found in Input error.
Here is the problem:
https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708.c#L1728
And the obvious fix is to change:
_dtvcc_decoders_reset(dtvcc);
return;
to
dtvcc->current_packet_length = len;

  • Note: I think it would be better also to delete comment("Is this possible") cause as we see it is possible.

After all those changes ccextractor is able to extract captions from this file.

@cfsmp3

This comment has been minimized.

Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 11, 2018

OK so it's good research here... but let's not be happy with that, we need to find out what's actually going on.

if (dtvcc->current_packet_length != len) // Is this possible?

So there dtvcc->current_packet_length is how much data we have for that packet.
len is the size length according to the packet header.

So we have 3 possibilities:
a) We have more data than the declared packet length. If yes - what's that data, where did it come from, what is it for? Can we ignore it?
b) We have LESS data than the declared packet length. This is really not good, we can't process the packet at all (we would read out of bounds)
c) They match, which is the expected thing, but we know that's not the case here.

So first - let's check if it's a or b.

@navimakarov

This comment has been minimized.

Copy link
Contributor

@navimakarov navimakarov commented Dec 12, 2018

So our problem is "a problem"(We have more data than the declared packet length.) cause after debugging we constantly get in this particular file dtvcc->current_packet_length = len + 2; But before that we have a condition len = len * 2; That means that while getting len from here: https://github.com/CCExtractor/ccextractor/blob/master/src/lib_ccx/ccx_decoders_708.c#L1712 we get it wrong and len must equals to len - 1. Maybe this is a problem with False PTS/DTS or
False PES header which we get before or maybe the problem is with hardcoded value in that line of code I shared above. I'm working on it and has no ideas what is this data for. But I compared my output to PotPlayer's output and it is absolutely identical so I think that we can skip this data.

@cfsmp3

This comment has been minimized.

Copy link
Contributor

@cfsmp3 cfsmp3 commented Dec 12, 2018

That line:

int len = dtvcc->current_packet[0] & 0x3F; // 6 least significants bits

is correct. The packet header has 6 bits for the packet length and that mask gives you those 6 bits.

You may need to go over the actual specs to understand that code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.