New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for TCP packet data corruption when in non-blocking mode #402

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
Fixed packets not being sent atomically over a TCP socket in non-bloc…
…king mode. High network congestion could lead to the packet size being sent successfully and the packet data not. The user would be returned NotReady even if the size had been sent resulting in data corruption at the receiving side.
@Bromeon

This comment has been minimized.

Show comment
Hide comment
@Bromeon

Bromeon Jun 6, 2013

Member

Related to issue #119

Member

Bromeon commented Jun 6, 2013

Related to issue #119

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 6, 2013

Member

This is really expensive. I prefer the previous solution (#130) where 4 bytes are reserved at the beginning of the packet data, directly in sf::Packet.

The only thing I have to think about, is how well it mixes with the virtual functions onSend / onReceive.

Member

LaurentGomila commented Jun 6, 2013

This is really expensive. I prefer the previous solution (#130) where 4 bytes are reserved at the beginning of the packet data, directly in sf::Packet.

The only thing I have to think about, is how well it mixes with the virtual functions onSend / onReceive.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 7, 2013

Member

I highly doubt it will be in total more expensive to allocate an extra block a single time when you send the packet. When constructing the packet, the std::vector containing the data starts at some minimal size. Since there is no way for the packet to know how big it will become, it will have to reallocate many many times over the course of its lifetime. Adding one more allocation that only allocates and frees exactly the amount of bytes needed will probably not even be noticeable among the std::vector reallocations. If the size is prepended to the packet data, the constant overhead of the pointer arithmetic and size conversions required to transparently provide packet size and data access to the user will also weigh in at some point. And finally, from a software engineering standpoint, you are adding a lot of unneeded scattering of a single concern which leads to less maintainable code ;).

Member

binary1248 commented Jun 7, 2013

I highly doubt it will be in total more expensive to allocate an extra block a single time when you send the packet. When constructing the packet, the std::vector containing the data starts at some minimal size. Since there is no way for the packet to know how big it will become, it will have to reallocate many many times over the course of its lifetime. Adding one more allocation that only allocates and frees exactly the amount of bytes needed will probably not even be noticeable among the std::vector reallocations. If the size is prepended to the packet data, the constant overhead of the pointer arithmetic and size conversions required to transparently provide packet size and data access to the user will also weigh in at some point. And finally, from a software engineering standpoint, you are adding a lot of unneeded scattering of a single concern which leads to less maintainable code ;).

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 7, 2013

Member

Hum... yeah, that's right.

Any reason why you're not using std::vector and a more elegant return send(...) ?

Member

LaurentGomila commented Jun 7, 2013

Hum... yeah, that's right.

Any reason why you're not using std::vector and a more elegant return send(...) ?

@mjbshaw

This comment has been minimized.

Show comment
Hide comment
@mjbshaw

mjbshaw Jun 7, 2013

I'm not sure if I should post here or the forums, but I don't think this patch fixes the issue. I think the main problem is actually the design of sf::Packet. Trying to turn TCP into a datagram protocol requires a different design. Consider the following (with this patch): The sf::Packet contains a large buffer that the user is trying to send without blocking. This would require multiple send() calls, as the first one (or first few) might be successful, but then the OS's local socket buffer fills up and return EWOULDBLOCK/EAGAIN. The sf::TcpSocket::send() function then returns sf::Socket::NotReady and doesn't send the rest of the data. But it's already sent the start of the packet! This problem is not fixed with this patch, or really any other patch that doesn't completely change how sf::Packet works.

There really are only two options: 1) getting rid of non-blocking mode, or 2) requiring the user to continue sending a half-sent packet. Option 1 is certainly not desirable, and that leaves option 2.

The problem is that the design of sf::Socket and sf::Packet try to make a non-blocking datagram protocol out of a stream protocol. If you want to do that, you have to be prepared to call send() multiple times (which makes it more of a stream-like usage instead of datagram usage... we can't escape that and still get non-blocking sockets). Yes, there is a loop in sf::TcpSocket::send(), but it bails on the first error and doesn't have any way of saving its current position in the buffer. The only way to truly solve this problem is to save the current position in the buffer, return sf::Socket::NotReady, and then require the user to resend the same packet (and the socket just picks back up where it left off, using the saved buffer position to resume sending the packet at the right place in the buffer).

mjbshaw commented Jun 7, 2013

I'm not sure if I should post here or the forums, but I don't think this patch fixes the issue. I think the main problem is actually the design of sf::Packet. Trying to turn TCP into a datagram protocol requires a different design. Consider the following (with this patch): The sf::Packet contains a large buffer that the user is trying to send without blocking. This would require multiple send() calls, as the first one (or first few) might be successful, but then the OS's local socket buffer fills up and return EWOULDBLOCK/EAGAIN. The sf::TcpSocket::send() function then returns sf::Socket::NotReady and doesn't send the rest of the data. But it's already sent the start of the packet! This problem is not fixed with this patch, or really any other patch that doesn't completely change how sf::Packet works.

There really are only two options: 1) getting rid of non-blocking mode, or 2) requiring the user to continue sending a half-sent packet. Option 1 is certainly not desirable, and that leaves option 2.

The problem is that the design of sf::Socket and sf::Packet try to make a non-blocking datagram protocol out of a stream protocol. If you want to do that, you have to be prepared to call send() multiple times (which makes it more of a stream-like usage instead of datagram usage... we can't escape that and still get non-blocking sockets). Yes, there is a loop in sf::TcpSocket::send(), but it bails on the first error and doesn't have any way of saving its current position in the buffer. The only way to truly solve this problem is to save the current position in the buffer, return sf::Socket::NotReady, and then require the user to resend the same packet (and the socket just picks back up where it left off, using the saved buffer position to resume sending the packet at the right place in the buffer).

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 7, 2013

Member

Laurent if you don't care as much about performance as I do, I can surely change it to use a std::vector instead. It might make it look more C++-elegantish but you will probably lose 0.00001% :D.

Member

binary1248 commented Jun 7, 2013

Laurent if you don't care as much about performance as I do, I can surely change it to use a std::vector instead. It might make it look more C++-elegantish but you will probably lose 0.00001% :D.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 7, 2013

Member

@mjbshaw Yes, when investigating this issue, I also considered the possibility of partial send()s where only a portion of the buffer specified is sent. After much reading around the internet, I found out this can only be the case if syscalls are interruptible. To cause an interruption of a syscall however, you would have to explicitly tell the operating system that any signals your program receives should interrupt the syscall and not cause it to automatically restart (see SA_RESTART). In that case send() would return with EINTR and the user would have to check the number of bytes really sent in order to know where to resume with the next send() call.

So basically, for modern operating systems, one can assume that it only queues the whole buffer at once or not at all. I believe Unix is the only operating system that still uses EINTR and I guess if Laurent cares he has to change the signature of TcpSocket::send() to include a sent parameter.

We can discuss this more with the code I used to cause this behaviour on the forum if anyone is interested :).

Member

binary1248 commented Jun 7, 2013

@mjbshaw Yes, when investigating this issue, I also considered the possibility of partial send()s where only a portion of the buffer specified is sent. After much reading around the internet, I found out this can only be the case if syscalls are interruptible. To cause an interruption of a syscall however, you would have to explicitly tell the operating system that any signals your program receives should interrupt the syscall and not cause it to automatically restart (see SA_RESTART). In that case send() would return with EINTR and the user would have to check the number of bytes really sent in order to know where to resume with the next send() call.

So basically, for modern operating systems, one can assume that it only queues the whole buffer at once or not at all. I believe Unix is the only operating system that still uses EINTR and I guess if Laurent cares he has to change the signature of TcpSocket::send() to include a sent parameter.

We can discuss this more with the code I used to cause this behaviour on the forum if anyone is interested :).

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 7, 2013

Member

There's one bug in this code: if the packet is empty, "data" is null. This should be handled explicitely to avoid problems with memcpy. Unless the function is safe with null when size is zero, but the man page doesn't really help to know what happens in this case.

Member

LaurentGomila commented Jun 7, 2013

There's one bug in this code: if the packet is empty, "data" is null. This should be handled explicitely to avoid problems with memcpy. Unless the function is safe with null when size is zero, but the man page doesn't really help to know what happens in this case.

@ghost ghost assigned LaurentGomila Jun 7, 2013

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 7, 2013

Member

Fixed.

Member

binary1248 commented Jun 7, 2013

Fixed.

@Fanael

This comment has been minimized.

Show comment
Hide comment
@Fanael

Fanael Jun 11, 2013

There's one bug in this code: if the packet is empty, "data" is null. This should be handled explicitely to avoid problems with memcpy. Unless the function is safe with null when size is zero, but the man page doesn't really help to know what happens in this case.

That's why I'd recommend using std::copy instead of std::memcpy, because range [it, it) is valid for all valid it.
Or perhaps no copy calls at all:

std::vector<char> dataBlock;
dataBlock.reserve(sizeof(packetSize) + size);
const char* packetSizeAddr = reinterpret_cast<const char*>(&packetSize);
dataBlock.insert(dataBlock.end(), packetSizeAddr, packetSizeAddr + sizeof(packetSize));
dataBlock.insert(dataBlock.end(), data, data + size);
return send(&dataBlock[0], dataBlock.size()); 

Fanael commented Jun 11, 2013

There's one bug in this code: if the packet is empty, "data" is null. This should be handled explicitely to avoid problems with memcpy. Unless the function is safe with null when size is zero, but the man page doesn't really help to know what happens in this case.

That's why I'd recommend using std::copy instead of std::memcpy, because range [it, it) is valid for all valid it.
Or perhaps no copy calls at all:

std::vector<char> dataBlock;
dataBlock.reserve(sizeof(packetSize) + size);
const char* packetSizeAddr = reinterpret_cast<const char*>(&packetSize);
dataBlock.insert(dataBlock.end(), packetSizeAddr, packetSizeAddr + sizeof(packetSize));
dataBlock.insert(dataBlock.end(), data, data + size);
return send(&dataBlock[0], dataBlock.size()); 
@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 11, 2013

Member

Although the difference is probably not significant, inserting bytes one by one really feels heavy for a simple raw byte copy.

Member

LaurentGomila commented Jun 11, 2013

Although the difference is probably not significant, inserting bytes one by one really feels heavy for a simple raw byte copy.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 11, 2013

Member

std::memcpy is the fastest way to copy raw blocks of bytes from one place to another if you know they don't overlap. I tried generating the asm code for each of the variants and std::copy and inserting are both synthesized using memmove (which is never faster than std::memcpy) although the source and destination should not overlap either in both of those cases. Using ICC, std::memcpy is optimized even more by using the built-in ssse3 memcpy. I won't be surprised if many processors already have automatic optimizations for such "simple" byte copying.

You can check the GCC Explorer output here.

Member

binary1248 commented Jun 11, 2013

std::memcpy is the fastest way to copy raw blocks of bytes from one place to another if you know they don't overlap. I tried generating the asm code for each of the variants and std::copy and inserting are both synthesized using memmove (which is never faster than std::memcpy) although the source and destination should not overlap either in both of those cases. Using ICC, std::memcpy is optimized even more by using the built-in ssse3 memcpy. I won't be surprised if many processors already have automatic optimizations for such "simple" byte copying.

You can check the GCC Explorer output here.

@Fanael

This comment has been minimized.

Show comment
Hide comment
@Fanael

Fanael Jun 12, 2013

inserting bytes one by one

Actually, on virtually all implementations, insert(vector.end(), …) just ensures there's enough space, possibly reallocating (but reserve already took care of that) and then proceeds to std::copy the bytes. So it's just as "one by one" as resize + memcpy. And GCC inlines it rather nicely, even if not perfectly, too.

I tried generating the asm code for each of the variants and std::copy and inserting are both synthesized using memmove

That's because arguments to std::copy may overlap.

which is never faster than std::memcpy

But in reality, it's rarely any slower, too, with memcpy just being a symbol alias to memmove on some platforms, for example Windows when using MSVC or GCC.

At any rate, are you sure that memmove vs memcpy is the bottleneck here, and not, uhm, the actual send call? And that gains are big enough to deal with the quirks of memcpy and memmove like nullptr handling when std::copy and vector.insert already do it for you, while also being more idiomatic?

Fanael commented Jun 12, 2013

inserting bytes one by one

Actually, on virtually all implementations, insert(vector.end(), …) just ensures there's enough space, possibly reallocating (but reserve already took care of that) and then proceeds to std::copy the bytes. So it's just as "one by one" as resize + memcpy. And GCC inlines it rather nicely, even if not perfectly, too.

I tried generating the asm code for each of the variants and std::copy and inserting are both synthesized using memmove

That's because arguments to std::copy may overlap.

which is never faster than std::memcpy

But in reality, it's rarely any slower, too, with memcpy just being a symbol alias to memmove on some platforms, for example Windows when using MSVC or GCC.

At any rate, are you sure that memmove vs memcpy is the bottleneck here, and not, uhm, the actual send call? And that gains are big enough to deal with the quirks of memcpy and memmove like nullptr handling when std::copy and vector.insert already do it for you, while also being more idiomatic?

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 12, 2013

Member

Yeah, I don't care much about what this code is compiled to, it really won't make a difference for the end user. I'm more concerned with how the code looks. At this point it's totally a matter of personal taste, and mine is that the original code looks better.

By the way, thanks. It's great to see users arguing on such details, taking the time to do in depth tests and to provide detailed information ;)

Member

LaurentGomila commented Jun 12, 2013

Yeah, I don't care much about what this code is compiled to, it really won't make a difference for the end user. I'm more concerned with how the code looks. At this point it's totally a matter of personal taste, and mine is that the original code looks better.

By the way, thanks. It's great to see users arguing on such details, taking the time to do in depth tests and to provide detailed information ;)

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 12, 2013

Member

Actually, the surprising thing is that the C++ standard requires in 25.3.1 for std::copy that:

result shall not be in the range [first,last)

so the ranges indeed aren't allowed to overlap. If this were not the case, std::copy_backward would be redundant.
Given this requirement the libraries still implement this with a memmove instead of a memcpy which I find strange...

But I guess since Laurent really doesn't care, there is no need to discuss this matter further ;).

Member

binary1248 commented Jun 12, 2013

Actually, the surprising thing is that the C++ standard requires in 25.3.1 for std::copy that:

result shall not be in the range [first,last)

so the ranges indeed aren't allowed to overlap. If this were not the case, std::copy_backward would be redundant.
Given this requirement the libraries still implement this with a memmove instead of a memcpy which I find strange...

But I guess since Laurent really doesn't care, there is no need to discuss this matter further ;).

@Fanael

This comment has been minimized.

Show comment
Hide comment
@Fanael

Fanael Jun 13, 2013

the C++ standard requires in 25.3.1 for std::copy that:

result shall not be in the range [first,last)

so the ranges indeed aren't allowed to overlap.

They are allowed to overlap. While result itself shall not be in the range [first, last), nothing prevents e.g. result + 1 from being in this range. So the following code that copies the elements [1, 5) to [0, 4) is valid:

int foo[5];
std::copy(foo + 1, foo + 5, foo);

Fanael commented Jun 13, 2013

the C++ standard requires in 25.3.1 for std::copy that:

result shall not be in the range [first,last)

so the ranges indeed aren't allowed to overlap.

They are allowed to overlap. While result itself shall not be in the range [first, last), nothing prevents e.g. result + 1 from being in this range. So the following code that copies the elements [1, 5) to [0, 4) is valid:

int foo[5];
std::copy(foo + 1, foo + 5, foo);
@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 13, 2013

Member

In that case std::memcpy would work too, because it (normally) does a forward direction copy and wouldn't overwrite a region before copying it to the destination. I mean with "overlaps" that the source would be overwriten by the copy before it completes which would fit with the standard description if it used a forward direction copy. The C99 standard is even more strange regarding this.

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.
If copying takes place between objects that overlap, the behavior is undefined.

It somehow doesn't allow overlapping in any direction. I guess they wanted to leave room for different implementations, which could also be the reason why the description of memmove mentions a temporary buffer although I assume most libraries do things a bit more efficiently than that.

Member

binary1248 commented Jun 13, 2013

In that case std::memcpy would work too, because it (normally) does a forward direction copy and wouldn't overwrite a region before copying it to the destination. I mean with "overlaps" that the source would be overwriten by the copy before it completes which would fit with the standard description if it used a forward direction copy. The C99 standard is even more strange regarding this.

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1.
If copying takes place between objects that overlap, the behavior is undefined.

It somehow doesn't allow overlapping in any direction. I guess they wanted to leave room for different implementations, which could also be the reason why the description of memmove mentions a temporary buffer although I assume most libraries do things a bit more efficiently than that.

@Fanael

This comment has been minimized.

Show comment
Hide comment
@Fanael

Fanael Jun 13, 2013

In that case std::memcpy would work too

No, as you yourself point out, the C standard says:

If copying takes place between objects that overlap, the behavior is undefined.

The implementation is free to copy backwards. From what I see, some implementations always do that for small arrays, by the means of a big switch for all possible small sizes and falling through from case N: to case N-4: (where the exact value of 4 is platform-dependent).

Fanael commented Jun 13, 2013

In that case std::memcpy would work too

No, as you yourself point out, the C standard says:

If copying takes place between objects that overlap, the behavior is undefined.

The implementation is free to copy backwards. From what I see, some implementations always do that for small arrays, by the means of a big switch for all possible small sizes and falling through from case N: to case N-4: (where the exact value of 4 is platform-dependent).

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 13, 2013

Member

I don't see why implementations would want to copy backwards if they could just copy forwards... Processors are probably optimized more for memory accesses from lower to higher addresses when handling data on the free store i.e. indirect memory access. And I'm pretty sure that vectorization is only applied in this direction as well.

As for small arrays, because memcpy is commonly implemented using a simple loop, I highly doubt there are any switch blocks involved in trying to optimize the performance. Depending on the size of the block and the degree of vectorization, the optimizer will unroll the copy loop anyway and you will end up with a handful of single copy operations as opposed to a loop.

Intel article on memcpy: http://software.intel.com/en-us/articles/memcpy-performance
Optimization on FreeBSD: http://now.cs.berkeley.edu/Td/bcopy.html

Member

binary1248 commented Jun 13, 2013

I don't see why implementations would want to copy backwards if they could just copy forwards... Processors are probably optimized more for memory accesses from lower to higher addresses when handling data on the free store i.e. indirect memory access. And I'm pretty sure that vectorization is only applied in this direction as well.

As for small arrays, because memcpy is commonly implemented using a simple loop, I highly doubt there are any switch blocks involved in trying to optimize the performance. Depending on the size of the block and the degree of vectorization, the optimizer will unroll the copy loop anyway and you will end up with a handful of single copy operations as opposed to a loop.

Intel article on memcpy: http://software.intel.com/en-us/articles/memcpy-performance
Optimization on FreeBSD: http://now.cs.berkeley.edu/Td/bcopy.html

@luiscubal

This comment has been minimized.

Show comment
Hide comment
@luiscubal

luiscubal Jun 13, 2013

@binary1248 Undefined behavior is undefined behavior. Depending on it is dangerous, even is there is an "usual" way things are done. "Probably" is not good enough.
See http://lwn.net/Articles/416821/
Also https://bugzilla.redhat.com/show_bug.cgi?id=638477#c31

luiscubal commented Jun 13, 2013

@binary1248 Undefined behavior is undefined behavior. Depending on it is dangerous, even is there is an "usual" way things are done. "Probably" is not good enough.
See http://lwn.net/Articles/416821/
Also https://bugzilla.redhat.com/show_bug.cgi?id=638477#c31

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 13, 2013

Member

@luiscubal And the religious debate begins :)

Maybe for some, undefined behaviour means undefined behaviour, but considering the standard was written before such subtleties were even considered, I wouldn't mind asking how undefined do they mean with undefined. Obviously different vendors can provide different implementations, but if 99% of them agree that under some assumption they could write code that performs noticeably faster then I welcome them to do so. They would just have to document this assumption somewhere (or at least make it widely known). When I write code, I don't necessarily always look in the standard. A problem only arises when the remaining 1% decide that they are going to implement something in a way that breaks all other code out there because they are "protected by the undefinedness" of the standard. The funny thing is, most of the time these creative people don't even contribute anything beneficial and use the standard as an excuse for their practical joke. What is even more pitiful is the argument that because this change broke so much code, it exposed all the supposedly "badly written" code of the programmers who followed convention as opposed to the standard in order to provide users with potentially more performance.

Libraries were written to make programmers' lives easier, whether that means that they would have to reinvent less wheels themselves or not worry about reading the standard every 5 minutes. If a library breaks previously working code for no good reason, it is an indication that its developers are getting old and forgot what its initial purpose was.

And "undefined behaviour" is not a restriction by the standard. It just means "they don't care". Writing something that narrows the undefined behaviour scenario down is still standard conforming. Using the code under the assumption that this undefined behaviour has been narrowed down is also still standard conforming because you don't need to conform to the standard, the library does. You just have to understand that you can't point your finger at the standard when something goes wrong.

Standards are never written with performance in mind, whereas libraries are. As such "undefined behaviour" according to the standard is something that you can take into account if you fully understand the implications. I'm sure the operating systems we are using right now to discuss still have many of these wonderful "hacks" in them that nobody really cares about but just "work".

End religious arguments

Does this really matter for this pull request? Not really. The 2 memory blocks are guaranteed to be non-overlapping anyway so it really doesn't matter how the copy is done. I just remarked that std::copy would be slower in this case because memmove would be used although the blocks don't overlap.

Maybe we should put an end to the more and more subjective arguments...

Member

binary1248 commented Jun 13, 2013

@luiscubal And the religious debate begins :)

Maybe for some, undefined behaviour means undefined behaviour, but considering the standard was written before such subtleties were even considered, I wouldn't mind asking how undefined do they mean with undefined. Obviously different vendors can provide different implementations, but if 99% of them agree that under some assumption they could write code that performs noticeably faster then I welcome them to do so. They would just have to document this assumption somewhere (or at least make it widely known). When I write code, I don't necessarily always look in the standard. A problem only arises when the remaining 1% decide that they are going to implement something in a way that breaks all other code out there because they are "protected by the undefinedness" of the standard. The funny thing is, most of the time these creative people don't even contribute anything beneficial and use the standard as an excuse for their practical joke. What is even more pitiful is the argument that because this change broke so much code, it exposed all the supposedly "badly written" code of the programmers who followed convention as opposed to the standard in order to provide users with potentially more performance.

Libraries were written to make programmers' lives easier, whether that means that they would have to reinvent less wheels themselves or not worry about reading the standard every 5 minutes. If a library breaks previously working code for no good reason, it is an indication that its developers are getting old and forgot what its initial purpose was.

And "undefined behaviour" is not a restriction by the standard. It just means "they don't care". Writing something that narrows the undefined behaviour scenario down is still standard conforming. Using the code under the assumption that this undefined behaviour has been narrowed down is also still standard conforming because you don't need to conform to the standard, the library does. You just have to understand that you can't point your finger at the standard when something goes wrong.

Standards are never written with performance in mind, whereas libraries are. As such "undefined behaviour" according to the standard is something that you can take into account if you fully understand the implications. I'm sure the operating systems we are using right now to discuss still have many of these wonderful "hacks" in them that nobody really cares about but just "work".

End religious arguments

Does this really matter for this pull request? Not really. The 2 memory blocks are guaranteed to be non-overlapping anyway so it really doesn't matter how the copy is done. I just remarked that std::copy would be slower in this case because memmove would be used although the blocks don't overlap.

Maybe we should put an end to the more and more subjective arguments...

@dghost

This comment has been minimized.

Show comment
Hide comment
@dghost

dghost Jun 14, 2013

I just remarked that std::copy would be slower in this case because memmove would be used although the blocks don't overlap.

IIRC, on most optimized platforms memmove will simply copy directly between the source and destination if they don't overlap. The overhead of bounds testing is generally trivial compared to the actual copy operation.

Maybe for some, undefined behaviour means undefined behaviour, but considering the standard was written before such subtleties were even considered, I wouldn't mind asking how undefined do they mean with undefined.

"undefined" in this context means "this is undefined because we don't know what architecture it is going to be written on, any given specification we write may be wrong on any given system, and you shouldn't be doing this anyways unless you actually know what's happening down there."

It's largely a moot point these days, but the language creators couldn't guarantee that the platform code is running on wouldn't, say, be faster copying memory in decreasing order. Or that the platform implementor wouldn't implement it like that to make shifting a block of memory to a higher, overlapping address faster (if source is [0:7] and dest is [4:11], you can copy in reverse order and maintain integrity). There are all manner of weird implementation quirks that exist out there that the specification was designed around - you just don't see them as often these days.

Anyways, that's how undefined they actually mean in these contexts. If it was only running on an x86 processor, or in a VM of some sort, they could make strong guarantees and it would be moot. But they can't really, so they left it up to everyone else to figure out what is best in any given situations.

Edited for clarity.

dghost commented Jun 14, 2013

I just remarked that std::copy would be slower in this case because memmove would be used although the blocks don't overlap.

IIRC, on most optimized platforms memmove will simply copy directly between the source and destination if they don't overlap. The overhead of bounds testing is generally trivial compared to the actual copy operation.

Maybe for some, undefined behaviour means undefined behaviour, but considering the standard was written before such subtleties were even considered, I wouldn't mind asking how undefined do they mean with undefined.

"undefined" in this context means "this is undefined because we don't know what architecture it is going to be written on, any given specification we write may be wrong on any given system, and you shouldn't be doing this anyways unless you actually know what's happening down there."

It's largely a moot point these days, but the language creators couldn't guarantee that the platform code is running on wouldn't, say, be faster copying memory in decreasing order. Or that the platform implementor wouldn't implement it like that to make shifting a block of memory to a higher, overlapping address faster (if source is [0:7] and dest is [4:11], you can copy in reverse order and maintain integrity). There are all manner of weird implementation quirks that exist out there that the specification was designed around - you just don't see them as often these days.

Anyways, that's how undefined they actually mean in these contexts. If it was only running on an x86 processor, or in a VM of some sort, they could make strong guarantees and it would be moot. But they can't really, so they left it up to everyone else to figure out what is best in any given situations.

Edited for clarity.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 17, 2013

Member

Fixed. I didn't merge your pull request directly because of minor style modifications, but thank you very much for all the time and efforts you have invested in this issue :)

Member

LaurentGomila commented Jun 17, 2013

Fixed. I didn't merge your pull request directly because of minor style modifications, but thank you very much for all the time and efforts you have invested in this issue :)

@mjbshaw

This comment has been minimized.

Show comment
Hide comment
@mjbshaw

mjbshaw Jun 18, 2013

@binary1248: Nope. One cannot assume that on a modern OS it will send the entire buffer in one call. Try sending a 2 GB buffer with nonblocking sockets. The first send call will succeed (in my tests, it sends about 500KB on the first send()), and then on the second send() call it results in EAGAIN/EWOULDBLOCK.

You cannot have a nonblocking stream protocol and treat it like a datagram protocol.

mjbshaw commented Jun 18, 2013

@binary1248: Nope. One cannot assume that on a modern OS it will send the entire buffer in one call. Try sending a 2 GB buffer with nonblocking sockets. The first send call will succeed (in my tests, it sends about 500KB on the first send()), and then on the second send() call it results in EAGAIN/EWOULDBLOCK.

You cannot have a nonblocking stream protocol and treat it like a datagram protocol.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 18, 2013

Member

I like to consider typical scenarios when discussing things like this. If people come up with such pathologic counter examples all the time I might as well stop trying to fix things for typical usage and declare everything broken forever. Typical buffer sizes in modern operating systems are in the order of 10s of KB. And considering that, for all scenarios that can be considered healthy and sane, we can assume that the data you specify in send() will be queued atomically. Like I said above, there are scenarios, which can be considered typical usage, where even small size sends() might not queue everything.

In your case, the failure to send data consistently is very obviously at the fault of the developer. In my case, it is the fault of the underlying system and not obvious to the programmer if things go wrong. That is the case that is important and requires attention. I really can't care about anything else.

I'd be more interested if you can find an example of a serious real-world application that send()s data in such big chunks using the POSIX socket API directly. There are probably very few if any.

Member

binary1248 commented Jun 18, 2013

I like to consider typical scenarios when discussing things like this. If people come up with such pathologic counter examples all the time I might as well stop trying to fix things for typical usage and declare everything broken forever. Typical buffer sizes in modern operating systems are in the order of 10s of KB. And considering that, for all scenarios that can be considered healthy and sane, we can assume that the data you specify in send() will be queued atomically. Like I said above, there are scenarios, which can be considered typical usage, where even small size sends() might not queue everything.

In your case, the failure to send data consistently is very obviously at the fault of the developer. In my case, it is the fault of the underlying system and not obvious to the programmer if things go wrong. That is the case that is important and requires attention. I really can't care about anything else.

I'd be more interested if you can find an example of a serious real-world application that send()s data in such big chunks using the POSIX socket API directly. There are probably very few if any.

@mjbshaw

This comment has been minimized.

Show comment
Hide comment
@mjbshaw

mjbshaw Jun 18, 2013

Indeed it was pathological, but it's actually pretty easy (and realistic) to send things that can be 1MB or more, which again would likely result in corrupted streams.

I'd be more interested if you can find an example of a serious real-world application that send()s data in such big chunks using the POSIX socket API directly.

Situations that might result in packets >= 1MB include anything sending a file (as there are lots of files >= 1MB), sending image/webcam data, sending game state (for example, after a game ends, the server might send it the game state necessary to do a local replay), etc. There are lots of situations where this is problematic.

You cannot guarantee how large the OS's buffers are, which means you cannot guarantee users of SFML how large their data packets may be before they start corrupting their data streams (when using non blocking sockets).

mjbshaw commented Jun 18, 2013

Indeed it was pathological, but it's actually pretty easy (and realistic) to send things that can be 1MB or more, which again would likely result in corrupted streams.

I'd be more interested if you can find an example of a serious real-world application that send()s data in such big chunks using the POSIX socket API directly.

Situations that might result in packets >= 1MB include anything sending a file (as there are lots of files >= 1MB), sending image/webcam data, sending game state (for example, after a game ends, the server might send it the game state necessary to do a local replay), etc. There are lots of situations where this is problematic.

You cannot guarantee how large the OS's buffers are, which means you cannot guarantee users of SFML how large their data packets may be before they start corrupting their data streams (when using non blocking sockets).

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 18, 2013

Member

It is true that there are many situations where data has to be sent with a very large size, however in those situations, middleware takes care of the fragmentation into smaller manageable chunks if this is not done directly by the application itself. Like I said, very few if any call send() with a pointer to a 1MB chunk of data and expect it to work without a hitch, especially when using non-blocking sockets.

As for the OS buffer size, unless people are still using their 15 year old systems, it is a valid assumption that OS buffer sizes are reasonably large (in relation to the size of the data to be sent). Assuming that very small buffers are still commonplace is borderline pathological if you ask me. SFML also doesn't (and shouldn't) handle the case where the OS send buffer is 0 bytes large does it?

Put simply, and I think Laurent can agree on this, if someone has a problem due to a false assumption that was made, we would be very interested to hear about this special situation. SFML can't and shouldn't try to cater to ALL possible configurations out there, 99.9% is enough. Whoever needs this kind of support can use the POSIX socket API themselves and be proud they wrote software for such an unforgiving system.

Member

binary1248 commented Jun 18, 2013

It is true that there are many situations where data has to be sent with a very large size, however in those situations, middleware takes care of the fragmentation into smaller manageable chunks if this is not done directly by the application itself. Like I said, very few if any call send() with a pointer to a 1MB chunk of data and expect it to work without a hitch, especially when using non-blocking sockets.

As for the OS buffer size, unless people are still using their 15 year old systems, it is a valid assumption that OS buffer sizes are reasonably large (in relation to the size of the data to be sent). Assuming that very small buffers are still commonplace is borderline pathological if you ask me. SFML also doesn't (and shouldn't) handle the case where the OS send buffer is 0 bytes large does it?

Put simply, and I think Laurent can agree on this, if someone has a problem due to a false assumption that was made, we would be very interested to hear about this special situation. SFML can't and shouldn't try to cater to ALL possible configurations out there, 99.9% is enough. Whoever needs this kind of support can use the POSIX socket API themselves and be proud they wrote software for such an unforgiving system.

@mjbshaw

This comment has been minimized.

Show comment
Hide comment
@mjbshaw

mjbshaw Jun 18, 2013

Well, I've stated what I think. I think we'll just agree to disagree when it comes to coding standards and what we think best practice is.

@LaurentGomila, I'd appreciate your input so I can know whether I should stop talking or not. I think that this issue is a potential security vulnerability and a crash waiting to happen, and as SFML prepares to go mobile there's an increasing chance that SFML will be used on a system and by a user that the packet does not get entirely sent with non blocking sockets and entirely corrupts the data stream.

mjbshaw commented Jun 18, 2013

Well, I've stated what I think. I think we'll just agree to disagree when it comes to coding standards and what we think best practice is.

@LaurentGomila, I'd appreciate your input so I can know whether I should stop talking or not. I think that this issue is a potential security vulnerability and a crash waiting to happen, and as SFML prepares to go mobile there's an increasing chance that SFML will be used on a system and by a user that the packet does not get entirely sent with non blocking sockets and entirely corrupts the data stream.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 18, 2013

Member

I haven't got the time to think about a totally different solution that would be 100% safe, so please continue to discuss, and with some luck we'll end up with another patch ;)

For now, all I can do is to implement the proposed patch and get new feedback. Still a lot better than doing nothing.

Member

LaurentGomila commented Jun 18, 2013

I haven't got the time to think about a totally different solution that would be 100% safe, so please continue to discuss, and with some luck we'll end up with another patch ;)

For now, all I can do is to implement the proposed patch and get new feedback. Still a lot better than doing nothing.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Jun 19, 2013

I feel like a total solution would boil down to one of two decisions.

  1. Force the sending of packets to be always blocking
  2. Require that the user check if the packet was fully sent or not, and have them resend the rest of it later, sort of how receive currently works.

Going with the first option would simplify packet sending, but it would lose the whole async non-blocking functionality, while the second option would result in a consistent send and receive mechanism but would add extra complexity for the user.

retep998 commented Jun 19, 2013

I feel like a total solution would boil down to one of two decisions.

  1. Force the sending of packets to be always blocking
  2. Require that the user check if the packet was fully sent or not, and have them resend the rest of it later, sort of how receive currently works.

Going with the first option would simplify packet sending, but it would lose the whole async non-blocking functionality, while the second option would result in a consistent send and receive mechanism but would add extra complexity for the user.

@mjbshaw

This comment has been minimized.

Show comment
Hide comment
@mjbshaw

mjbshaw Jun 19, 2013

Yeah, those are the only two solutions I've been able to come up with. Either non-blocking sockets can't be used (option 1), or (option 2) if non-blocking sockets are enabled, the user has to be prepared to resend a packet if only a part of it can be sent now without blocking (i.e. if sf::Socket::NotReady was returned). However, I think the name of sf::Socket::NotReady might not describe the error best, and a better name would be something like sf::Socket::Again or something like that, to signal that they must send again to complete sending this packet (or risk corrupting the stream if they send something else). (The name is fine for now; I'd keep it as NotReady for now and only consider changing it for SFML 3)

Non-blocking sockets would never return sf::Socket::NotReady/Again (or whatever the error is called), so from a usability point, they would not change. This would only affect non-blocking sockets.

mjbshaw commented Jun 19, 2013

Yeah, those are the only two solutions I've been able to come up with. Either non-blocking sockets can't be used (option 1), or (option 2) if non-blocking sockets are enabled, the user has to be prepared to resend a packet if only a part of it can be sent now without blocking (i.e. if sf::Socket::NotReady was returned). However, I think the name of sf::Socket::NotReady might not describe the error best, and a better name would be something like sf::Socket::Again or something like that, to signal that they must send again to complete sending this packet (or risk corrupting the stream if they send something else). (The name is fine for now; I'd keep it as NotReady for now and only consider changing it for SFML 3)

Non-blocking sockets would never return sf::Socket::NotReady/Again (or whatever the error is called), so from a usability point, they would not change. This would only affect non-blocking sockets.

@retep998

This comment has been minimized.

Show comment
Hide comment
@retep998

retep998 Jun 19, 2013

I'd be in favor of calling it sf::Socket::Partial

retep998 commented Jun 19, 2013

I'd be in favor of calling it sf::Socket::Partial

@MarioLiebisch

This comment has been minimized.

Show comment
Hide comment
@MarioLiebisch

MarioLiebisch Jun 19, 2013

Member

Why not sf::Socket::Incomplete? Unless I misunderstand the usage.

Member

MarioLiebisch commented Jun 19, 2013

Why not sf::Socket::Incomplete? Unless I misunderstand the usage.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 19, 2013

Member

If you want to support resuming the transmission of partially sent packets, you need accounting information. This is either saved inside sf::Packet, or sf::TcpSocket. The problem with saving it inside sf::Packet is the fact that we cannot assume the packet is non-transient i.e. it will not be modified or even exist anymore for the next call of send(). This only leaves the option to buffer the packet data inside the sf::TcpSocket object and resume transmission the next time send() is called (possibly) ignoring the newly passed packet or also transmitting it partially depending on how much buffer space is left after the previous send(). If this is the case, each sf::Packet would need a way of being uniquely identified so that the sf::TcpSocket knows whether to queue the new data and send or just resume transmission of previously queued data since the user has no way to tell the sf::TcpSocket that they want to transmit a completely new sf::Packet with exactly the same contents. It would also mean that the only option to use sf::Packet would be to always store it in some non-transient data structure until it is fully sent or even worse, to construct it on the free store and manage the pointer to it until it is fully sent. Both are annoying and will probably make using the networking API a nightmare for beginners.

A solution to this problem would probably require a complete rewrite of the Packet/Socket API and we all know how much Laurent loves complete rewrites.

Member

binary1248 commented Jun 19, 2013

If you want to support resuming the transmission of partially sent packets, you need accounting information. This is either saved inside sf::Packet, or sf::TcpSocket. The problem with saving it inside sf::Packet is the fact that we cannot assume the packet is non-transient i.e. it will not be modified or even exist anymore for the next call of send(). This only leaves the option to buffer the packet data inside the sf::TcpSocket object and resume transmission the next time send() is called (possibly) ignoring the newly passed packet or also transmitting it partially depending on how much buffer space is left after the previous send(). If this is the case, each sf::Packet would need a way of being uniquely identified so that the sf::TcpSocket knows whether to queue the new data and send or just resume transmission of previously queued data since the user has no way to tell the sf::TcpSocket that they want to transmit a completely new sf::Packet with exactly the same contents. It would also mean that the only option to use sf::Packet would be to always store it in some non-transient data structure until it is fully sent or even worse, to construct it on the free store and manage the pointer to it until it is fully sent. Both are annoying and will probably make using the networking API a nightmare for beginners.

A solution to this problem would probably require a complete rewrite of the Packet/Socket API and we all know how much Laurent loves complete rewrites.

@lbguilherme

This comment has been minimized.

Show comment
Hide comment
@lbguilherme

lbguilherme Jun 19, 2013

A possible implementation would be storing one sf::Packet inside sf::TcpSocket and make send(packet) move (C++11-like) the packet from the argument to the socket:

sf::Packet packet;
// add some data to packet
tcpsocket.send(packet); // this could return sf::Socket::Incomplete
// packet is now empty, its data was moved to the socket

If it successfully sent the whole packet, the packet inside the socket is now empty. Otherwise, the user must call some other function, like continueSend(), without arguments and it would resume sending from the stored packet, maybe returning sf::Socket::Incomplete again.

I think this is the simplest API and does not involve too much overhead.

lbguilherme commented Jun 19, 2013

A possible implementation would be storing one sf::Packet inside sf::TcpSocket and make send(packet) move (C++11-like) the packet from the argument to the socket:

sf::Packet packet;
// add some data to packet
tcpsocket.send(packet); // this could return sf::Socket::Incomplete
// packet is now empty, its data was moved to the socket

If it successfully sent the whole packet, the packet inside the socket is now empty. Otherwise, the user must call some other function, like continueSend(), without arguments and it would resume sending from the stored packet, maybe returning sf::Socket::Incomplete again.

I think this is the simplest API and does not involve too much overhead.

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 19, 2013

Member

Your solution prevents one from sending the same packet to multiple sockets.

Member

LaurentGomila commented Jun 19, 2013

Your solution prevents one from sending the same packet to multiple sockets.

@lbguilherme

This comment has been minimized.

Show comment
Hide comment
@lbguilherme

lbguilherme Jun 19, 2013

True. Another possibility then would be using a shared pointer to store packet's data.

lbguilherme commented Jun 19, 2013

True. Another possibility then would be using a shared pointer to store packet's data.

@binary1248

This comment has been minimized.

Show comment
Hide comment
@binary1248

binary1248 Jun 19, 2013

Member

I think it is too early to discuss possible solutions to this problem. Either SFML starts to make use of C++11 features, adds boost as a dependency (horrible) or considers a complete rewrite of how the packet API looks like.

If you ask me, I say scrap the synchronous socket API and go for an asynchronous socket API like asio (there is a standalone non-boost version). It is proven to be more scalable than any synchronous variant which is probably why all serious web servers that can handle c10k use asynchronous IO. It has the added side-effect of hiding lower level intricacies of the socket API from beginners. They can just send data over a TCP socket and rest assured that it will arrive at the opposite end... without blocking. It would save us oh so many discussions/explanations/tutorials....

I was already aware of the shortcomings of the SFML networking API a while ago and decided to write my own library built on asio that interfaces nicely with SFML. Maybe I will release an alpha version of it soon ;).

Member

binary1248 commented Jun 19, 2013

I think it is too early to discuss possible solutions to this problem. Either SFML starts to make use of C++11 features, adds boost as a dependency (horrible) or considers a complete rewrite of how the packet API looks like.

If you ask me, I say scrap the synchronous socket API and go for an asynchronous socket API like asio (there is a standalone non-boost version). It is proven to be more scalable than any synchronous variant which is probably why all serious web servers that can handle c10k use asynchronous IO. It has the added side-effect of hiding lower level intricacies of the socket API from beginners. They can just send data over a TCP socket and rest assured that it will arrive at the opposite end... without blocking. It would save us oh so many discussions/explanations/tutorials....

I was already aware of the shortcomings of the SFML networking API a while ago and decided to write my own library built on asio that interfaces nicely with SFML. Maybe I will release an alpha version of it soon ;).

@LaurentGomila

This comment has been minimized.

Show comment
Hide comment
@LaurentGomila

LaurentGomila Jun 19, 2013

Member

An asynchronous socket API would be awesome, I agree. Perhaps in SFML 3 :)

Member

LaurentGomila commented Jun 19, 2013

An asynchronous socket API would be awesome, I agree. Perhaps in SFML 3 :)

@mjbshaw

This comment has been minimized.

Show comment
Hide comment
@mjbshaw

mjbshaw Jun 19, 2013

I'd say put the book keeping variables and code in the sf::TcpSocket and make a contract stating that once a packet is sent (at least partially), it cannot be modified until it is completely sent (and if it gets destroyed before it is completely sent, then the socket must be closed and reopened, otherwise you'd be corrupting the stream).

Asynchronous sockets could be a cool addition to SFML 3 :)

mjbshaw commented Jun 19, 2013

I'd say put the book keeping variables and code in the sf::TcpSocket and make a contract stating that once a packet is sent (at least partially), it cannot be modified until it is completely sent (and if it gets destroyed before it is completely sent, then the socket must be closed and reopened, otherwise you'd be corrupting the stream).

Asynchronous sockets could be a cool addition to SFML 3 :)

@jungletoe

This comment has been minimized.

Show comment
Hide comment
@jungletoe

jungletoe Jul 28, 2014

This issue resurfaced. It's fine on Windows and VC++ but when I ported to Linux and GC++ the exact same issue happens with the buffer not sending the whole packet and the client (the receiver) being held into limbo waiting for it to come.

jungletoe commented Jul 28, 2014

This issue resurfaced. It's fine on Windows and VC++ but when I ported to Linux and GC++ the exact same issue happens with the buffer not sending the whole packet and the client (the receiver) being held into limbo waiting for it to come.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment