-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added BOM-handling to SSA. #7
Conversation
bom-aware-scanner.go
Outdated
lookingForBom bool | ||
} | ||
|
||
func NewBomAwareScanner(r io.Reader) *BomAwareScanner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you guide me through why you need to create a specific scanner instead of just trimming the BOM byte on the first line? Is there any gain with this scanner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it could be reused for other formats. For instance, WebVTT. But maybe I'm indeed over-engineering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you plan to make go-astisub work for other encodings, like UTF-16? I think a dedicated scanner could abstract away their differences, including different BOMs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making go-astisub compatible with UTF-16 is a nice idea indeed. But in the context of this PR, I'd rather keep things simple and only trim BOM bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you do it yourself? I can't come up with a nice implementation, don't want to duplicate stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can do once the PR is merged.
You'll need to remove your new scanner though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have now removed it.
ssa.go
Outdated
return | ||
if len(split) < 2 || split[0] == "" { | ||
switch sectionName { | ||
case ssaSectionNameScriptInfo, ssaSectionNameStyles: // Do nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide an example for this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the fragments I had problems with:
[V4+ Styles]
Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
Style: Default,MS UI Gothic,90,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,5,0,1,2,2,1,10,10,10,0
//
Style: Rubi,MS UI Gothic,50,&H00FFFFFF,&H000000FF,&H00000000,&H00000000,0,0,0,0,100,100,0,0,1,2,2,1,10,10,10,0
//
[Events]
Dialogue: 0,0:23:40.69,0:23:45.56,Default,,0000,0000,0000,,東福寺駅近くにある会社からの人生の訓辞なのかなぁ~
;任天堂京都本社
:次回標題
Dialogue: 0,0:23:45.79,0:23:47.20,Default,,0000,0000,0000,,次回 「願望」
The specification says: SSA will discard any lines it doesn't understand
.
and also I wanted to make the behaviour closer to libass.
Libass only reports about ill-formed lines if they are in [Fonts] or [Events] sections. Otherwise, it discard them like the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK makes sense.
Could you add at least one line not understood by the SSA parser in the testdata/example-in.ssa
so that this case is covered by the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... these event lines
Format: Marked, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Unknown descriptor: should be discarded
fail with this error:
astisub: building new ssa event failed: astisub: content has 1 items whereas style format has 10 items
So maybe instead of the specific || split[0] == ""
check, there should be a more general check for an unknown descriptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check is fine to avoid lines without a :
.
But as you mention there should be an additionnal check for the descriptor. I would place it here and would check the value of header
based on the value of sectionName
and discard lines that don't match the correct header
value.
Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The || split[0] == ""
check is not to avoid lines without a :
, it is to avoid lines with an empty descriptor (like the :次回標題
line above). Anyway, handling of unknown nonempty descriptors can be added later after the merge of this PR. go-astisub now works with the subtitles I needed, and that's enough for me for now.
@Des-Nerger Nice changes! Some minor fixes needed though. Let me know. |
I've merged the PR and added the BOM header trim. Let me know if go-astisub still doesn't work with the subtitles you needed. Cheers |
When I tried to give astisub an .ass file containing the BOM-header I got the following error:
astisub: line '<U+FEFF>[Script Info]' should contain at least one ':'
This commit fixes the problem.