Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added BOM-handling to SSA. #7

Closed
wants to merge 10 commits into from
Closed

Conversation

Des-Nerger
Copy link
Contributor

When I tried to give astisub an .ass file containing the BOM-header I got the following error:
astisub: line '<U+FEFF>[Script Info]' should contain at least one ':'
This commit fixes the problem.

@coveralls
Copy link

coveralls commented Dec 31, 2018

Coverage Status

Coverage increased (+0.1%) to 76.306% when pulling e211454 on Des-Nerger:master into 146a999 on asticode:master.

lookingForBom bool
}

func NewBomAwareScanner(r io.Reader) *BomAwareScanner {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you guide me through why you need to create a specific scanner instead of just trimming the BOM byte on the first line? Is there any gain with this scanner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it could be reused for other formats. For instance, WebVTT. But maybe I'm indeed over-engineering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you plan to make go-astisub work for other encodings, like UTF-16? I think a dedicated scanner could abstract away their differences, including different BOMs.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making go-astisub compatible with UTF-16 is a nice idea indeed. But in the context of this PR, I'd rather keep things simple and only trim BOM bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do it yourself? I can't come up with a nice implementation, don't want to duplicate stuff.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can do once the PR is merged.

You'll need to remove your new scanner though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have now removed it.

ssa.go Outdated
return
if len(split) < 2 || split[0] == "" {
switch sectionName {
case ssaSectionNameScriptInfo, ssaSectionNameStyles: // Do nothing
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the fragments I had problems with:

[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,MS UI Gothic,90,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,5,0,1,2,2,1,10,10,10,0
//
Style: Rubi,MS UI Gothic,50,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,1,10,10,10,0
//

[Events]
Dialogue: 0,0:23:40.69,0:23:45.56,Default,,0000,0000,0000,,東福寺駅近くにある会社からの人生の訓辞なのかなぁ~
;任天堂京都本社

:次回標題
Dialogue: 0,0:23:45.79,0:23:47.20,Default,,0000,0000,0000,,次回 「願望」

The specification says: SSA will discard any lines it doesn't understand.
and also I wanted to make the behaviour closer to libass.
Libass only reports about ill-formed lines if they are in [Fonts] or [Events] sections. Otherwise, it discard them like the rest.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK makes sense.

Could you add at least one line not understood by the SSA parser in the testdata/example-in.ssa so that this case is covered by the tests?

Copy link
Contributor Author

@Des-Nerger Des-Nerger Jan 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... these event lines

Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Unknown descriptor: should be discarded

fail with this error:
astisub: building new ssa event failed: astisub: content has 1 items whereas style format has 10 items
So maybe instead of the specific || split[0] == "" check, there should be a more general check for an unknown descriptor.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is fine to avoid lines without a :.

But as you mention there should be an additionnal check for the descriptor. I would place it here and would check the value of header based on the value of sectionName and discard lines that don't match the correct header value.

Does that make sense?

Copy link
Contributor Author

@Des-Nerger Des-Nerger Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The || split[0] == "" check is not to avoid lines without a :, it is to avoid lines with an empty descriptor (like the :次回標題 line above). Anyway, handling of unknown nonempty descriptors can be added later after the merge of this PR. go-astisub now works with the subtitles I needed, and that's enough for me for now.

ssa.go Outdated Show resolved Hide resolved
subtitles.go Show resolved Hide resolved
webvtt.go Show resolved Hide resolved
@asticode
Copy link
Owner

asticode commented Jan 2, 2019

@Des-Nerger Nice changes! Some minor fixes needed though.

Let me know.

@asticode asticode self-assigned this Jan 2, 2019
@asticode
Copy link
Owner

asticode commented Jan 3, 2019

I've merged the PR and added the BOM header trim.

Let me know if go-astisub still doesn't work with the subtitles you needed.

Cheers

@asticode asticode closed this Jan 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants