Added BOM-handling to SSA. #7

Des-Nerger · 2018-12-31T08:44:40Z

When I tried to give astisub an .ass file containing the BOM-header I got the following error:
astisub: line '<U+FEFF>[Script Info]' should contain at least one ':'
This commit fixes the problem.

coveralls · 2018-12-31T08:50:48Z

Coverage increased (+0.1%) to 76.306% when pulling e211454 on Des-Nerger:master into 146a999 on asticode:master.

…es sections

Fixes: https://github.com/asticode/go-astisub/issues/8

asticode · 2019-01-02T09:08:32Z

bom-aware-scanner.go

+	lookingForBom bool
+}
+
+func NewBomAwareScanner(r io.Reader) *BomAwareScanner {


Can you guide me through why you need to create a specific scanner instead of just trimming the BOM byte on the first line? Is there any gain with this scanner?

I thought it could be reused for other formats. For instance, WebVTT. But maybe I'm indeed over-engineering.

Also, do you plan to make go-astisub work for other encodings, like UTF-16? I think a dedicated scanner could abstract away their differences, including different BOMs.

Making go-astisub compatible with UTF-16 is a nice idea indeed. But in the context of this PR, I'd rather keep things simple and only trim BOM bytes.

Could you do it yourself? I can't come up with a nice implementation, don't want to duplicate stuff.

Sure I can do once the PR is merged.

You'll need to remove your new scanner though.

Ok, I have now removed it.

asticode · 2019-01-02T09:09:18Z

ssa.go

-			return
+		if len(split) < 2 || split[0] == "" {
+			switch sectionName {
+			case ssaSectionNameScriptInfo, ssaSectionNameStyles: // Do nothing


Can you provide an example for this case?

Here's the fragments I had problems with:

[V4+ Styles] Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding Style: Default,MS UI Gothic,90,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,5,0,1,2,2,1,10,10,10,0 // Style: Rubi,MS UI Gothic,50,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,1,10,10,10,0 // [Events]

Dialogue: 0,0:23:40.69,0:23:45.56,Default,,0000,0000,0000,,東福寺駅近くにある会社からの人生の訓辞なのかなぁ～ ;任天堂京都本社 :次回標題 Dialogue: 0,0:23:45.79,0:23:47.20,Default,,0000,0000,0000,,次回　「願望」

The specification says: SSA will discard any lines it doesn't understand.
and also I wanted to make the behaviour closer to libass.
Libass only reports about ill-formed lines if they are in [Fonts] or [Events] sections. Otherwise, it discard them like the rest.

OK makes sense.

Could you add at least one line not understood by the SSA parser in the testdata/example-in.ssa so that this case is covered by the tests?

Hmm... these event lines

Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Unknown descriptor: should be discarded

fail with this error:
astisub: building new ssa event failed: astisub: content has 1 items whereas style format has 10 items
So maybe instead of the specific || split[0] == "" check, there should be a more general check for an unknown descriptor.

I think this check is fine to avoid lines without a :.

But as you mention there should be an additionnal check for the descriptor. I would place it here and would check the value of header based on the value of sectionName and discard lines that don't match the correct header value.

Does that make sense?

The || split[0] == "" check is not to avoid lines without a :, it is to avoid lines with an empty descriptor (like the :次回標題 line above). Anyway, handling of unknown nonempty descriptors can be added later after the merge of this PR. go-astisub now works with the subtitles I needed, and that's enough for me for now.

ssa.go

subtitles.go

webvtt.go

asticode · 2019-01-02T09:21:36Z

@Des-Nerger Nice changes! Some minor fixes needed though.

Let me know.

asticode · 2019-01-03T13:12:10Z

I've merged the PR and added the BOM header trim.

Let me know if go-astisub still doesn't work with the subtitles you needed.

Cheers

Added BOM-handling to SSA.

3f371af

Des-Nerger added 6 commits December 31, 2018 22:42

Relaxed requirements on SSA: allow nondescript lines in Info and Styl…

0774b3f

…es sections

SSA: changed Outline, Shadow, Spacing to float64

6366675

SSA: allow nondescript lines for other sections, but log message

2066c43

SSA: consider empty-descriptor lines to be nondescript

1101479

WebVTT: allow any first line that starts with "WEBVTT"

203d9b1

Fixes: https://github.com/asticode/go-astisub/issues/8

WebVTT: more strict following to WebVTT specs

ee48385

Fixes: https://github.com/asticode/go-astisub/issues/8

asticode requested changes Jan 2, 2019

View reviewed changes

asticode self-assigned this Jan 2, 2019

Des-Nerger added 3 commits January 2, 2019 23:18

asticode/pull/7 fixes

13b38ea

asticode/pull/7 fixes2

4da3c70

asticode/pull/7 fixes3

e211454

asticode closed this Jan 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added BOM-handling to SSA. #7

Added BOM-handling to SSA. #7

Des-Nerger commented Dec 31, 2018

coveralls commented Dec 31, 2018 •

edited

Loading

asticode Jan 2, 2019

Des-Nerger Jan 2, 2019

Des-Nerger Jan 2, 2019

asticode Jan 2, 2019

Des-Nerger Jan 2, 2019

asticode Jan 3, 2019

Des-Nerger Jan 3, 2019

asticode Jan 2, 2019

Des-Nerger Jan 2, 2019

asticode Jan 2, 2019

Des-Nerger Jan 2, 2019 •

edited

Loading

asticode Jan 3, 2019

Des-Nerger Jan 3, 2019 •

edited

Loading

asticode commented Jan 2, 2019

asticode commented Jan 3, 2019

Added BOM-handling to SSA. #7

Added BOM-handling to SSA. #7

Conversation

Des-Nerger commented Dec 31, 2018

coveralls commented Dec 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Des-Nerger Jan 2, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Des-Nerger Jan 3, 2019 • edited Loading

Choose a reason for hiding this comment

asticode commented Jan 2, 2019

asticode commented Jan 3, 2019

coveralls commented Dec 31, 2018 •

edited

Loading

Des-Nerger Jan 2, 2019 •

edited

Loading

Des-Nerger Jan 3, 2019 •

edited

Loading