Truncating queue files #374

rupinder10 · 2017-08-16T16:31:56Z

I have a situation where the rolling daily files see periods of high activity. So some files are large and some relatively very small. I would like to reclaim the wasted space in the files that dont use it. Is there a way to do this in the StoreListener so that when we release a file we can truncate its length to needed size only ?

This is on Windows so rapidly growing files cause issues with the team that manages the file system.

TerryW9 · 2017-08-17T07:10:22Z

Hi Rupinder,

Thank you for the question.
We will try to get back to you with an answer as soon as possible.

Best Regards,
Terry

epickrram · 2017-08-17T07:57:22Z

Hi Rupinder,
there is no supported API for retrieving the actual data length of a queue file.
If you really needed to, you could implement a StoreFileListener to perform the truncate operation like so:

@Override
public void onReleased(final int cycle, final File file) {
    try (SingleChronicleQueue copy = queueBuilder(queueDirectory, wireType).build()) {
        final WireStore wireStore = copy.
                storeForCycle(cycle, 0, false);
        final long lastWritePositionInStore = wireStore.writePosition();
        final int header = wireStore.bytes().readVolatileInt(lastWritePositionInStore);
        final int length = Wires.lengthOf(header);
        final long endOfLastRecord = lastWritePositionInStore + length + Wires.SPB_HEADER_SIZE;
        System.out.printf("Last write at %d, entry length %d, truncating to %d%n",
                lastWritePositionInStore, length, endOfLastRecord);

        final RandomAccessFile raf = new RandomAccessFile(file, "rw");
        raf.setLength(endOfLastRecord);
        raf.close();
    } catch (FileNotFoundException e) {
        // ignore
    } catch (IOException e) {
        // ignore
    }
}

However, please note that this is not a supported API, and relies on the internal format of the queue-file header. This code works with the current version of chronicle-queue, but may break in future versions.

If you decide to use such a method, you should carry out extensive testing to make sure that you are not accidentally truncating data from the end of the queue files. A safer way to solve the problem is to acquire more disk space.

peter-lawrey · 2017-08-17T10:06:19Z

The simplest way to reduce the waste at the end is to reduce the block size. If you have a default block size of 64 MB and you have 1 TB and rolling files you will run out of space due to this waste after retaining 34 years of data. However if you reduce the block size to say 2 MB you would be able to run for 1000 years. Note: Linux uses sparse files so you only ever waste 4KB per day. The space wasted is worth about 8 cents per day retained with high end redundant SSDs so I suggest you not spend more than a few dollars of your time on it. Peter. On 16 Aug. 2017 12:31, "rupinder10" <notifications@github.com> wrote: I have a situation where the rolling daily files see periods of high activity. So some files are large and some relatively very small. I would like to reclaim the wasted space in the files that dont use it. Is there a way to do this in the StoreListener so that when we release a file we can truncate its length to needed size only ? This is on Windows so rapidly growing files cause issues with the team that manages the file system. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#374>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABBU8b2jL2kiXzne5H94hFYSyn4qhlLuks5sYxl-gaJpZM4O5K2N> .

rupinder10 · 2017-08-17T12:51:10Z

Peter,

Your points are very valid. I was debating whether to do it or not and this certainly gives me some arguments to justify not doing it.

Another fallow up question then is about the block size. What is the impact of selecting a small block size when the files have to grow ? And what if the files are written using a certain block size but read with a different one. Does that cause issues ? I ask because, in some cases, files are copied around systems and the reader may not use the same block size. One way would be to wrap it in another API that does not allow them to change the block size.

peter-lawrey · 2017-08-17T13:23:11Z

The block size indirectly determines the maximum safe message size. Some operations require to be written to a single block including overlap. To be completely safe, make the block size at least 4x the maximum message size. Windows limits the mapping to be a multiple of at least 64 KB. In terms for differing block sizes, the files will need end up being the size based on the largest block size. Mixing them is likely to defeat the whole purpose if doing this. Btw if you use read only mode, windows won't let you memory map a region larger than what is on disk. Given disk space is cheap I rarely find good cause to change the block size and haven't benchmarked it's impact on windows. On Linux the jitter increases if you go outside 16 to 256 MB which is why we picked a mid point which doesn't waste much space. Regards Peter.

…

On 17 Aug. 2017 8:51 am, "rupinder10" ***@***.***> wrote: Peter, Your points are very valid. I was debating whether to do it or not and this certainly gives me some arguments to justify not doing it. Another fallow up question then is about the block size. What is the impact of selecting a small block size when the files have to grow ? And what if the files are written using a certain block size but read with a different one. Does that cause issues ? I ask because, in some cases, files are copied around systems and the reader may not use the same block size. One way would be to wrap it in another API that does not allow them to change the block size. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#374 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABBU8XglRoKCmxi3g8pK3Go1YZcAfsMUks5sZDdBgaJpZM4O5K2N> .

dpisklov · 2017-11-23T13:09:28Z

Another thing to note, queue files are zipped fairly well, especially the empty sections at the end. So one can implement a cron job to zip the files from old roll cycles if space is an issue.
I'll close the issue for now, @rupinder10 feel free to reopen if you feel there's more needed.

dpisklov closed this as completed Nov 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncating queue files #374

Truncating queue files #374

rupinder10 commented Aug 16, 2017

TerryW9 commented Aug 17, 2017

epickrram commented Aug 17, 2017

peter-lawrey commented Aug 17, 2017 via email

rupinder10 commented Aug 17, 2017

peter-lawrey commented Aug 17, 2017 via email

dpisklov commented Nov 23, 2017

Truncating queue files #374

Truncating queue files #374

Comments

rupinder10 commented Aug 16, 2017

TerryW9 commented Aug 17, 2017

epickrram commented Aug 17, 2017

peter-lawrey commented Aug 17, 2017 via email

rupinder10 commented Aug 17, 2017

peter-lawrey commented Aug 17, 2017 via email

dpisklov commented Nov 23, 2017