New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video bug: large video frames are not sent correctly #620

Closed
zoff99 opened this Issue Nov 24, 2017 · 28 comments

Comments

Projects
8 participants
@zoff99

zoff99 commented Nov 24, 2017

this bug causes all of these (i am pretty sure)

#235
#618

qTox/qTox#2529
irungentoo#1479
irungentoo#1474
irungentoo#1479

irungentoo#1525

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

solution:

zoff99/c-toxcore@8c9cec4...zoff99:10217ea1ace48bd8cc098f4a689701e7e5741d3b

it works now perfectly. but the solution breaks compatibility with video/audio on existing clients

@SkyzohKey SkyzohKey added this to the v0.1.11 milestone Nov 24, 2017

@SkyzohKey SkyzohKey added this to Triage in Security via automation Nov 24, 2017

@SkyzohKey SkyzohKey moved this from Triage to !!!! Critical !!!! in Security Nov 24, 2017

@SkyzohKey SkyzohKey moved this from !!!! Critical !!!! to In Progress in Security Nov 24, 2017

@Diadlo

This comment has been minimized.

Diadlo commented Nov 24, 2017

@zoff99 Are you "pretty sure" or you have tested it?)
If you want, in the evening I can take part in the test

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

@Diadlo i have tested it. but yes please contact me in the evening, that would be good

@nurupo

This comment has been minimized.

Member

nurupo commented Nov 24, 2017

You should keep the aligment checks and dummy variables, they are there because mannol did what should not be done -- sending those structs over the wire as they are, meaning that sizes of those structs must be the same for the sender and the receiver, for any CPU architecture, no matter the alignment. If the alignment is not the same on all machines, then the code is broken and you might get garbage for video and possibly memory overwriting vulnerabilities.

@nurupo

This comment has been minimized.

Member

nurupo commented Nov 24, 2017

Also, since those structs are sent over the wire, I don't think this change would be backwards-compatible with existing tox clients as it would change the size of those structs. This should work only between patched tox clients. So this is a protocol breaking change. We might want to change the packet ids for the video or something, so that older clients and patched clients couldn't do video with each other, because it's likely to result in garbage and memory issues for some CPU architectures.

@nurupo

This comment has been minimized.

Member

nurupo commented Nov 24, 2017

Other than that, it looks good. So, the issue was that 16-bit size was too small? @mannol you might want to look into this.

It seems like this is commit is made on top of your other commits, I see some // Zoff comments that are nit counted in the diff, would be nicer if it was on top of c-toxcore master, in case there is more of diff that helped fixing the issue that we are not seeing.

@nurupo

This comment has been minimized.

Member

nurupo commented Nov 24, 2017

Btw, in your commits you write

/*
vpx_codec_err_t vpx_codec_decode ( vpx_codec_ctx_t * ctx,
const uint8_t * data,
unsigned int data_sz,
void * user_priv,
long deadline
)

here you can cleary see it
"data_sz" is 32bit unsigned
*/

The size of int, unsigned or not, is platform-dependent. It might be 16-bits, or 32-bits, and I wouldn't be too surprised to see it being 64-bits. The C standard only specifies the minimum range for the primitive types, and int is said to be at least 16 bits. So

here you can cleary see it
"data_sz" is 32bit unsigned

Is not incorrect for some platforms. A more correct statement would be that size of data_sz is at leat 16 bits, or that data_sz is exactly sizeof(data_sz).

Perhaps mannol was lead to believeint to be 16-bits by the same mistake of not knowing that it's platform-dependent?

Since our code depends on it, it would be wise to add some compile-time assert checks for sizes, e.g. static_assert(sizeof(data_sz) <= size(uint32_t)), and all other variables size of which we assume. (Note that there is no static_assert in C, at least C99, but it's an easy to write macro, you can google it).

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

@nurupo i already wrote that

image

also a keyframe is easily 200kbytes in size, when you use 16bit for size it can never ever work.

i removed the alignment checks for testing. when there is a release they can be put in again, once a final size is chosen.

also i think mannol tried to adhere to:
https://en.wikipedia.org/wiki/RTP_Control_Protocol#Packet_header
as much as possible, thus making the length 16bits.
not thinking that video frames will be much larger than that.
since rtp (i think) is meant to be 1 single packet, and in toxav its used for multiple packets, this is nonsense anyway.

@nurupo

This comment has been minimized.

Member

nurupo commented Nov 24, 2017

also a keyframe is easily 200kbytes in size, when you use 16bit for size it can never ever work.

Right, I'm not arguing about that. I'm saying that int's size is platform-dependent, so saying that it's 32-bit in size is incorrect for some platforms. That's the whole reason why there are fixed-size integers like uint32_t, which are always 32-bit no matter the platform.

@nurupo

This comment has been minimized.

Member

nurupo commented Nov 24, 2017

@SkyzohKey why is this issue in Security project? It's not about security.

@zoff99 zoff99 modified the milestones: v0.1.11, v0.2.0 Nov 24, 2017

@zoff99 zoff99 removed this from In Progress in Security Nov 24, 2017

@zoff99 zoff99 added this to To do in Video Nov 24, 2017

@zoff99 zoff99 removed this from To do in Video Nov 24, 2017

@zoff99 zoff99 added this to To do in Video via automation Nov 24, 2017

@SkyzohKey SkyzohKey moved this from To do to Triage in Video Nov 24, 2017

@SkyzohKey SkyzohKey moved this from Triage to In Progress in Video Nov 24, 2017

@SkyzohKey

This comment has been minimized.

SkyzohKey commented Nov 24, 2017

@nurupo Cannot this bug be exploited and compromise security (buffer overflow, use after free, etc)?
I'm not really a security expert thus asking. Also note that as I'm not a native english speaker, triage of issues is sometimes hard for me and I have to choose between the labels/projects we currently have. ^^

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

@nurupo
well vpx source code defines it like that. so ask the WEBM project what size they actually mean :-)

vpx_codec_err_t vpx_codec_decode ( vpx_codec_ctx_t * ctx,
const uint8_t * data,
unsigned int data_sz,
void * user_priv,
long deadline

@Diadlo

This comment has been minimized.

Diadlo commented Nov 24, 2017

Mostly issue really fixed. Still have some problems, but just small

image
image
image

@mannol

This comment has been minimized.

mannol commented Nov 24, 2017

OK, after some analysis here are my findings:

  • This was definitely an issue (spot on @zoff99!)
  • Changing the RTPheader breaks the protocol (very undesirable)
  • Might be fixed without breaking the protocol (a possibility)

It's been a long time since I last wrote code for toxav (about 2 years or something?), but, IIRC, vpx decoder shouldn't care how encoded chunks are packed as long as they are ordered correctly. That being said, changing this loop in a way that splits the payload into multiple RTP messages might just be the solution we are looking for.

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

@mannol you still do not understand the what the issue is.

@mannol

This comment has been minimized.

mannol commented Nov 24, 2017

@zoff99 I don't?

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

@mannol
if your video frame from the encoder has 65537 bytes and you take uint16_t from that you get the number 1. then you send "1" in the RTP Message as data length.
on the other end the decoder will give corrupt frame always because it assumes there is only 1 byte data. video will never show.

my emergency hotfit will at least fix this critical issue

i can demonstrate were you never get any video frame on the other side ever!
you can leave it running for hours.
you can see it live here. with my ToxCam app using original c-toxcore

@mannol

This comment has been minimized.

mannol commented Nov 24, 2017

@zoff99 That's what I'm saying: spit the video frame from the encoder (larger than 64K) into multiple RTP messages (smaller than 64K). The decoder should be OK handling these chunks independently as long as they are in order (which they most likely will be).

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

wont work. also 0.2.0 will break things anyway. so lets do it there and also include VP9. which also works fine.
i just fear 0.2.0 will take forever to arrive with TokTok's commit policy

@mannol

This comment has been minimized.

mannol commented Nov 24, 2017

@zoff99 you tried it already?

@zoff99

This comment has been minimized.

zoff99 commented Nov 24, 2017

@mannol rather look at my emergency hotfix. to save at least what is there already

@mannol

This comment has been minimized.

mannol commented Nov 24, 2017

@zoff99 what?

@GrayHatter

This comment has been minimized.

GrayHatter commented Nov 25, 2017

We might want to change the packet ids for the video or something,

No, I have a better solution for this. ToxAV will get a single packet number, and then it can wrap it's traffic with an additional ID if ToxAV needs to. Toxcore shouldn't be abartrating ToxAV inner workings.

well vpx source code defines it like that. so ask the WEBM project what size they actually mean :-)

It dosen't matter what they mean/want. Toxcore is a security/crypto project, VPX isn't... We should do what we want to do. u32 seems to be that, no?

my emergency hotfit will at least fix this critical issue

It's not an emergency hotfix, mostly because it's not hot, not really a fix for this issue, nor emergency.

The real fix (from my view at least) would be to prepend all toxav packets with a "binary" blob with an arch independent layout, that ToxAV could then version, and pass into a handler that could then pass back a struct for the current version of ToxAV, making all future changes backwards compat. (Hopefully)

I'm also going to echo what @nurupo said. Open a pull to the toxcore repo with JUST these changes so I can review the changes, and maybe merge them into my toxcore fork.

@zoff99

This comment has been minimized.

zoff99 commented Nov 25, 2017

exactly my thinking, long term the need to be protocol versions. but not only for ToxAV, but for all.
there will be protocol breaking thing every now and then.
so nodes need to announce what protocol they are speaking.

@sudden6

This comment has been minimized.

sudden6 commented Nov 25, 2017

@mannol I think splitting this loop can cause additional problems:

  • IIRC toxav packets are lossy, so there may be packets dropped, which probably will cause problems.
  • packets may arrive out of order, will RTP handle this case?

If these can be solved, I think that's the best short term solution.

@mannol

This comment has been minimized.

mannol commented Nov 25, 2017

@sudden6

IIRC toxav packets are lossy, so there may be packets dropped, which probably will cause problems.

RTP message is already split in chunks of ~1.5KB in size, regardless of it's actual length so increasing the RTP message size will mean that you send more of those chunks. RTP works in lossy environment, there are mechanisms for packet reordering and loss handling.

packets may arrive out of order, will RTP handle this case?

They may but, the jitter buffer will likely order them correctly. Lossy packets are not as "lossy" as they seem to be so that is not a big problem.

@sudden6

This comment has been minimized.

sudden6 commented Nov 25, 2017

Ok, then that's very good.

mannol added a commit that referenced this issue Nov 25, 2017

Proposed fix for #620
This is the implementation of the [proposed fix](#620 (comment)) for [this issue](#620).

Haven't tried it myself yet but, it's simple enough to work. NOTE: this fix does not break the protocol.
@nurupo

This comment has been minimized.

Member

nurupo commented Nov 25, 2017

@SkyzohKey no, it's not exploitable. The bug is that if the video frame is greater than 65535 bytes in size, only the first <packet-size> mod 65536 bytes of the frame will be sent, i.e. between 0 and 65535, dropping the rest of the frame. No uninitialized memory is sent and no way to exploit it, at least as far as I can see.

mannol added a commit that referenced this issue Nov 25, 2017

Proposed fix for #620
This is the implementation of the [proposed fix](#620 (comment)) for [this issue](#620).

Haven't tried it myself yet but, it's simple enough to work. NOTE: this fix does not break the protocol.

mannol added a commit that referenced this issue Nov 25, 2017

Proposed fix for #620
This is the implementation of the [proposed fix](#620 (comment)) for [this issue](#620).

Haven't tried it myself yet but, it's simple enough to work. NOTE: this fix does not break the protocol.

robinlinden added a commit that referenced this issue Nov 25, 2017

Proposed fix for #620
This is the implementation of the [proposed fix](#620 (comment)) for [this issue](#620).

Haven't tried it myself yet but, it's simple enough to work. NOTE: this fix does not break the protocol.

robinlinden added a commit that referenced this issue Dec 14, 2017

Proposed fix for #620
This is the implementation of the [proposed fix](#620 (comment)) for [this issue](#620).

Haven't tried it myself yet but, it's simple enough to work. NOTE: this fix does not break the protocol.

robinlinden added a commit that referenced this issue Dec 17, 2017

Split video payload into multiple packets when >65k
This is the implementation of the [proposed fix](#620 (comment)) for [this issue](#620).

Haven't tried it myself yet but, it's simple enough to work. NOTE: this fix does not break the protocol.

robinlinden added a commit that referenced this issue Dec 17, 2017

Split video payload into multiple packets when >65k
This is the implementation of the [proposed fix](#620 (comment)) for [this issue](#620).

@iphydf iphydf modified the milestones: v0.2.0-RC1, v0.2.0 Jan 14, 2018

@SkyzohKey SkyzohKey moved this from In Progress to Invalid in Video Feb 13, 2018

@SkyzohKey SkyzohKey moved this from Invalid to Done in Video Feb 13, 2018

@iphydf iphydf changed the title from WIP: Video bug found! to Video bug: large video frames are not sent correctly Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment