没必要在segment文件头部记录usedSize #1

downgoon · 2017-09-13T06:17:28Z

设计回顾

segment有个12B的固定头部。最必要的信息是第4个字段usedSize，表示segment文件的实际使用大小。您可能会说，为什么需要额外存储这个值？文件大小难道不是操作系统的文件系统就管理了的吗？！主要是因为实现时，底层采用了内存映射机制，每次映射了BSFConf.segmentLimitBytes大小，默认值是128MB，相当于每次segment都是以128MB为单位分配，无论它实际是否用到。

借鉴`kafka`

在 big-sequence-file中，多个segment的起名规则是：XXX_0.seg，XXX_1.seg，XXX_2.seg。在kafka里，segment文件名后缀记录了本文件相对整体文件的起始偏移量，相当于记录了上一个文件的结束量，这样就知道每个文件的有效字节数了。

例如：

hello.bsf
hello_00000000000000000000.seg
hello_00000000000000001003.seg
hello_00000000000000001946.seg
hello_00000000000000003068.seg

假设一个segment的预分配容量是1MB，超过1MB就切换下一个文件。那么hello_00000000000000000000.seg文件的有效大小就是下一个文件的文件名1003，并且有1024-1003=21B浪费（叫段内碎片）；同样hello_00000000000000001003.seg的有效大小就是 1946-1003=943B，浪费了1024-943=81B，也可以推断下一个文件（也就是hello_00000000000000001946.seg）的第一个消息的总长度一定超过了81B，否则不会触发切换新文件。

The text was updated successfully, but these errors were encountered:

downgoon changed the title ~~没必要在.seg文件头部记录文件字节数~~ 没必要在segment文件头部记录文件字节数 Sep 13, 2017

downgoon changed the title ~~没必要在segment文件头部记录文件字节数~~ 没必要在segment文件头部记录usedSize Sep 13, 2017

downgoon mentioned this issue Sep 13, 2017

.bsf头可简化，.seg无需内存映射 #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

没必要在segment文件头部记录usedSize #1

没必要在segment文件头部记录usedSize #1

downgoon commented Sep 13, 2017 •

edited

没必要在segment文件头部记录usedSize #1

没必要在segment文件头部记录usedSize #1

Comments

downgoon commented Sep 13, 2017 • edited

设计回顾

借鉴kafka

downgoon commented Sep 13, 2017 •

edited

借鉴`kafka`