Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packet: add IsSynced i/o helper and reduce false positive rate of Sync #97

Merged
merged 1 commit into from
Jan 18, 2019

Conversation

kortschak
Copy link
Contributor

Please take a look.

The change to Sync here increases the false positive rate for finding packet starts, but the approach to walking to a sync byte is statistically flawed (it's a multiple testing issue) and I'd recommend that the function actually just gets removed (I'm happy to do that here).

Updates #87.

@kortschak
Copy link
Contributor Author

ping

@guygrigsby
Copy link
Collaborator

Why is this necessary?

@guygrigsby
Copy link
Collaborator

Also, please provide a summary of what this change does

@kortschak
Copy link
Contributor Author

This is necessary because the function claims correctness that it cannot provide, with significant amounts of work.

The sync byte is a poor mark of packet boundaries, but it is what we have. Howevere, it's not intended to be used in a multiple testing scenario.

Under correct use, you have a p=1/256 probability of getting an incorrectly positive claim that a stream offset is the start of a packet given a uniform probability of each byte in a stream. This significantly degrades when an arbitrary scan forward for sync bytes is done; if a false positive rate of 10p (ten times worse than the correct usage) is allowed, you will expect a false positive within 11 bytes. This degradation of the false positive rate is presumably why the double byte check with a 188 byte offset is done in the current code, this however only increased the expected distance between false positives to just over 2.5kB, with a systematic false positive introduced when the next packet's sync byte cannot be read (e.g. at the last packet).

In summary...
Jelly beans

The way the work is currently done in Sync also is inflexible (requiring a *byte.Buffer rather than an io.ByteScanner) and allocated unnecessarily, though really, Sync should just go away; people who want to play with fire and random stream statistics should roll their own, using IsSynced (hopefully with some other check that the packet in meaningful between syncable than just checking the next packet's sync byte.

The change to Sync reduces the amount of work done and weakens the claim of statistical correctness by a power. This means that a false positive is expected within 11 bytes instead of 2.5kB, but then leaves the work of determining whether the synced packet position is actually correct to the user, who presumably has an awareness of what it is that they are looking for. It removes the false negative at the end of the stream.

The addition of IsSynced provides a tool for users to scan a stream for sync bytes, or checking whether they are synced. Scanning using IsSynced is under the control of the user, and so they can limit the amount of scanning or otherwise handle unexpected stream more correctly than with Sync.

@BlakeOrth
Copy link
Collaborator

While we can all agree the current approach is flawed, I'm not sure decreasing our confidence of being synced with a stream by an order of magnitude and then passing the onus of being synced onto the user's plate is really the correct course of action here.

I'm going to push back on the assertion that "the sync byte is all we have." What we really have is a 32 bit MPEG2-TS header, with 8 bits that must be a fixed value (0x47), 13 bits that cannot be 0x0004-0x000F (reserved range of PID values), and 2 bits that cannot be 00 (reserved adaptation_field_control). I unfortunately don't have time to do the math (though if you do I'd enjoy seeing the numbers), but increasing both the scope of the sync parse check as well as potentially looking forward more packets may push us towards some level of confidence that decrease the false positive rate to a point where it doesn't matter.

I do agree with the decision to include the IsSynced, after-all it's what the spec suggests in Table 2-1.

@kortschak
Copy link
Contributor Author

kortschak commented Nov 13, 2018

While we can all agree the current approach is flawed, I'm not sure decreasing our confidence of being synced with a stream by an order of magnitude and then passing the onus of being synced onto the user's plate is really the correct course of action here.

To be clear, my preferred option would be to delete Sync and leave scanning up to the user entirely. Doing unsafe statistical heuristic operations shouldn't be encouraged.

However, with 15 of 32 bits with the pattern you point out, constrained we can scan 341B without an expected false positive with single packet checks (as expected better than the current situation and without false negatives at the ends of streams). I can make that change, but I think a warning on the function is worthwhile as well.

@kortschak
Copy link
Contributor Author

kortschak commented Nov 13, 2018

Sorry, that estimate is optimistic. I'll fix it momentarily.

@kortschak
Copy link
Contributor Author

ping

@guygrigsby
Copy link
Collaborator

kortschak, It sounded to me like you were going to make the change to use other header info to help sync the stream with greater reliability. That's what I would like to see. I think that while the original change may be technically correct, we have to weigh the practicalities of such a change.

@kortschak
Copy link
Contributor Author

@BlakeOrth's comment indicated more discussion was necessary. I didn't get a reply to my assessment, so I didn't proceed with potentially futile work.

@kortschak
Copy link
Contributor Author

ping

@kortschak
Copy link
Contributor Author

ping

@BlakeOrth
Copy link
Collaborator

Hi Dan, sorry for the delay, I've been focused on other things.

The reality of the situation is that we need to strike some balance between practical and "technically correct". Unfortunately I believe the removing Sync in its entirety, resulting in an API change for users, is the wrong answer. Expanding Sync to utilize more of the MPEG header to provide greater confidence is probably a good move given the order of magnitude gain we get with minimal additional compute and time spent accumulating packets. I'd also be open to other creative ideas on how to increase the reliability of this function.

I would also like to reiterate that including the IsSynced function, which would allow users to scan themselves if they chose, is absolutely something that should be included.

@kortschak
Copy link
Contributor Author

Please take a look at the code.

@kortschak
Copy link
Contributor Author

kortschak commented Jan 12, 2019

Note also that the change here fixes #110.

@kortschak kortschak changed the title packet: add IsSynced i/o helper and reduce false negative rate of Sync packet: add IsSynced i/o helper and reduce false positive rate of Sync Jan 12, 2019
@kortschak
Copy link
Contributor Author

kortschak commented Jan 12, 2019

In implementing the additional parts of the IsSynced check, I can see that the 28kB estimate was optimistic since the pattern described actually has very little information. The only real way to improve the validity of the check is to parse the entire packet and confirm it is valid.

Despite all this, with #110, the issue is now not just about technical correctness versus practicality, but actually broken code. Under go1.12 the current code fails tests. go1.12 is planned to be released in three week. It would be very nice if we could get something merged that will stop this code being broken.

@kortschak
Copy link
Contributor Author

ping

@BlakeOrth
Copy link
Collaborator

This looks good to me. I'm going to give @guygrigsby a little time to respond if he feels the need, otherwise I'll merge this in tomorrow. Thanks for the contribution!

Also note this Fixes #110

@kortschak
Copy link
Contributor Author

For the sake of the bug tracker.

Fixes #110.

@BlakeOrth
Copy link
Collaborator

@kortschak Sorry, but could you quickly rebase this onto the most recent commit in master?

The change to Sync significantly decreases the chance of a false
positive but the approach of walking to a header is still statistically
flawed.
@kortschak
Copy link
Contributor Author

Done

Contributing here has been not the most frictionless work I've done.

@BlakeOrth
Copy link
Collaborator

Feedback noted, sorry about that.

@BlakeOrth BlakeOrth merged commit e96c643 into Comcast:master Jan 18, 2019
@kortschak
Copy link
Contributor Author

Thanks

@kortschak kortschak deleted the sync branch January 18, 2019 02:01
eric pushed a commit to fancybits/gots that referenced this pull request Jun 20, 2019
Comcast#97)

The change to Sync significantly decreases the chance of a false
positive but the approach of walking to a header is still statistically
flawed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants