Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEBVTT : Issue when contains multiple blank lines. #55

Open
jmarchalonis opened this issue Sep 20, 2017 · 3 comments
Open

WEBVTT : Issue when contains multiple blank lines. #55

jmarchalonis opened this issue Sep 20, 2017 · 3 comments

Comments

@jmarchalonis
Copy link

jmarchalonis commented Sep 20, 2017

When parsing a WEBVTT file that has a cue time and follows with two blank lines, it will though an error. I read the standards and this is acceptable for there to be a blank line representing science. I get files generated in this way from a outside vendor. Is there anyway this can be resolved?

See Example of a file below. Also attached is a zip with on working version and one no-working version of the same VTT file.

WEBVTT

00:00:00.000 --> 00:00:04.100 align:middle line:90%


00:00:04.100 --> 00:00:14.690 align:middle line:84%
Foreign policy is a very important aspect of our
government and impacts greatly on our national security.

Please let me know... Thanks,
Jason

VTT.zip

@mwleinad
Copy link
Contributor

mwleinad commented Apr 6, 2018

I'm having the exact same issue.
Is there a fix for this?

@Natkeeran
Copy link

I encountered the issue trying to use this library.

@humbertocastelo
Copy link

humbertocastelo commented Jan 11, 2022

You can try using the code below to correct the content at runtime, i used this code to fix the content of 50 subtitles in VTT, maybe in other files it needs some additional adjustment, but in general that's it.

<?php

$contents = file_get_contents($file);
$contents = preg_replace('`(\x0d\x0a){3,}`', "\x0a", $contents);
$contents = preg_replace('`(\x0d){3,}`', "\x0a", $contents);
$contents = preg_replace('`(\x0a){3,}`', "\x0a", $contents);
$contents = preg_replace_callback('`((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3}) --> ((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3})( .*)?[\x0a]+((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3}) --> ((?:[0-9]{2,}:)?[0-9]{2}:[0-9]{2}.[0-9]{3})( .*)?`', function($match) {
	return sprintf("%s --> %s%s\x0a\x0a%s --> %s%s", $match[1], $match[2], $match[3], $match[4], $match[5], isset($match[6]) ? $match[6] : '');
}, $contents);
$contents = trim($contents)."\x0a";
$parser = new \Captioning\Format\WebvttFile();
try {
	$parser->loadFromString($contents);
	print('Parser Success'.'<br/>'."\x0a");
} catch (\Exception $e) {
	print($e->getMessage().'<br/>'."\x0a");
	exit();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants