MIDDLE lets you specify a start and end byte point in a file to extract.
#Status Status: Unnecessary.
In the process of benchmarking this, I realized (oops!) that if you
cat $file | head -n 2 the cat quits once the head has satisfied its count.
tail -c +$START $file | head -c $LENGTH > $new_file
npm install middle-js -g
##Sample use To extract bytes 3000->4500 of file myfile.txt:
middle -f myfile.txt -s 3000 -e 4500 > newfile.txt
- -f, --file: file path
- -s, --start: starts bytes, defaults to start of file.
- -e, --end: end byte, defaults to entire file.
- -l, --length: offset from start. If specified, overrides -e.
- -h, --help: this help.
*If only -f is required, and will act like "CAT". *If only -f and --end are specified, it will act like "HEAD". *If only -f and --start are specified, it will act like "TAIL", but counting from the start.
##Possible Issue: The byte offsets might be off by one or two, I couldn't quite put my finger on how it was supposed to work.
- --start is the real-number byte it should start with (the first byte is not 0)
- --end is the last byte it should include
ls -l large_file.avi
-rwxrwxrwx 1 avi avi 262008832 2009-06-01 03:05 large_file
Clear cache, MAY BE UNSAFE:
sync ; echo 3 | sudo tee /proc/sys/vm/drop_caches
time middle -f large_file -s 220000000 -l 10000000 > newfile
time dd bs=1 if=large_file skip=220000000 count=10000000 of=newfile2
(same results using
(This is very slow because it's not buffering, with bs block size set to 1. However, if you don't, then you are limited in granularity to block sizes rather than a byte position.)
How to use tail? I have to start counting from the end. So, 262008832-220000000 = 42008832
time tail -c 42008832 large_file | head -c 10000000 > newfile3
real 0m0.244s user 0m0.000s sys 0m0.032s
or better, using a
+ lets you tell head where to start from:
time tail -c +220000001 large_file | head -c 10000000 > newfile4
[There apparently doesn't exist a program that does this. HEAD will let you specify a start offset (in bytes or lines)... TAIL lets you specify an end offset (in bytes or lines)... but both otherwise read the entire file. If we know the exact bytes we want, that can be quite wasteful for large files. DD lets you do this, but requires setting a blocksize of 1, which makes it run extremely slowly. This should probably be written in C...]