Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on 3.1. General Block Structure - endian and padding #41

Open
erik4711 opened this issue Nov 22, 2016 · 14 comments
Open

Comments on 3.1. General Block Structure - endian and padding #41

erik4711 opened this issue Nov 22, 2016 · 14 comments
Labels
pcapv3 aka pcapng

Comments

@erik4711
Copy link

erik4711 commented Nov 22, 2016

3.1. General Block Structure

  • The idea of having the "Block Total Length" both before the block body and after is a nice idea since it allows for fast forward and backward navigation in a pcapng file. However, backward navigation does not work very well if we allow switching between big and little endian in the same pcapng file (see comments for Comments on 4.1. Section Header Block - don't allow multiple SHB's #42 4.1 SHB). Code that navigates backwards and passes a section boundary where the endianness is changed will most likely loose traction of block boundaries.
  • Please refrain from padding stuff to 32 bit boundaries. Doing so actually introduces unnecessary complexity rather than solving a problem.

My recommendation is to not allow SHB's to change the endianness, or even better only allow ONE Section Header Block per pcapng file.

@erik4711 erik4711 changed the title Comments on 3.1. General Block Structure regarding endian and padding Comments on 3.1. General Block Structure - endian and padding Nov 22, 2016
@saleyn
Copy link

saleyn commented Nov 22, 2016

Is backward navigation really a necessity? It adds 4 bytes at the end of each recorded packet, the overhead which in cases of very chatty protocols quickly adds up and penalizes applications that don't need backward navigation.

@erik4711
Copy link
Author

Good point, backwards navigation can hardly be a necessity. It would be nice to see these redundant 4 bytes removed from the general block structure. However, I'm guessing the Wireshark community might wanna use the current structure for the final 1.0 spec, since they have been using this structure for quite some time already. Hopefully there will be a version 1.1 where we can discuss improvements like dropping the tailing block total length.

@packetfoo
Copy link
Contributor

packetfoo commented Nov 25, 2016

Backwards navigation is critical for me to have to be able to navigate backwards. There's two main reasons for this:

  1. being able to determine if a pcapng file is identical to one I have a meta data entry in my database for so I can look it up - one of the things I need to know for that is the time stamp of the last frame, so it's important I can find the start offset of the last frame without having to read the whole file
  2. being able to fix a damaged file by reading towards the messed up data from both ends of the file

For me the 4 additional bytes are easily worth the overhead.

@saleyn
Copy link

saleyn commented Nov 25, 2016

Though it seems that (1) is easily achievable by storing file's checksum (e.g. SHA/MD5) in the database. Regarding (2), usually the damage happens at the end of a file due to an abnormal writer's termination. In which case the backwards navigation would be useless anyway, since the end of the file might be partially written, so the reader would have to read from the beginning until the last consistent packet is found.

On the other hand, in very chatty protocols where payload of each packet is very small (e.g. financial FAST market data), containing 8-16 bytes per packet with 100's of Gigabytes of data per day, any extra bytes of overhead create an unnecessary storage burden.

So I would really like to see this "backward navigation" option either obviated or made optional.

@erik4711
Copy link
Author

@packetfoo This is why it's a bad idea to allow multiple SHB's with different endian in the same pcapng file. There is no way you can know whether to interpret the Block Total Length of the last frame as big endian or little endian. You can, of course, make an educated guess, but it would be nice if we didn't have to rely on guesswork to implement the pcapng spec.

@saleyn
Copy link

saleyn commented Nov 25, 2016

+1

@packetfoo
Copy link
Contributor

@saleyn no, MD5 or any kind of hash is not going to cut it. Not even the file size is, because you can add comments (or other things) to a pcapng file and the packets stay completely the same (which is all that matters to meta data regarding packets), but hash and size change. Also, calculating the hashes would require reading the full file each and every time, and I can't afford to do that.

@erik4711 I know what you mean - so far it worked fine since all I encounter are little endian files. Forcing a single endianess is something we should look at in the future, but first 1.0 needs to be done.

@packetfoo
Copy link
Contributor

By the way, I need to find out what the reason for the 32bit padding is - I think there is a very specific reason for this, probably extremely high capture performance. Meaning: not talking about lazy PC hardware, but 10G/40G/100G/200G FPGA based captures where writing to disk needs to be ultra fast - it's possible that there are I/O problems if not writing flat 32bit values.

@saleyn
Copy link

saleyn commented Nov 25, 2016

@packetfoo, I understand that there might be legitimate reasons for backward navigation. The question is if this requirement is general enough to make it a "hard-core" feature of the format.

@packetfoo
Copy link
Contributor

I think backwards movement is more useful to all capture files than avoiding 4 bytes for an extremely special capture application regarding tiny packets. Which wouldn't even be Ethernet compliant at that size, but there are other L2 protocols that do, of course, so I can see the point. But I still think 4 bytes overhead are completely insignificant compared to their usefulness of being able to move back and forth.

And making the trailing size optional kills the "read the last timestamp" feature completely, because how would I know if the last four bytes are the block size or part of the frame data?

@saleyn
Copy link

saleyn commented Nov 25, 2016

Regarding the "usefulness" - it's really all use-case driven. In your case the benefit of backward navigation seems to outweigh the overall file size. In my case, it's the other way around, and when frequently dealing with very large pcap files, I really never encountered the need to read a file from the end, and if that ever occurred I would likely come up with a method to index the files.

Regarding turning that to being an optional feature - one possibility could be to add a flag to the header block that would indicate that the file supports backward navigation, in which case the writer would append the size of last block at the end of each written block.

@packetfoo
Copy link
Contributor

You're right, it's a question of point of view. Problem with the flag is that the header block for the last frame could be anywhere since we can have multiple SHBs, so even if I read the first SHB I cannot be sure that the flag tells the truth about the last block in the file.

@saleyn
Copy link

saleyn commented Nov 25, 2016

You just named another reason to not allow multiple SHBs. :-)

@packetfoo
Copy link
Contributor

I know, they're complicating things. It has to be determined how useful it is to allow these, but that is again not something for the 1.0 spec. 1.0 will have to live with multiple SHBs.

@mcr mcr added the pcapv3 aka pcapng label Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pcapv3 aka pcapng
Projects
None yet
Development

No branches or pull requests

4 participants