Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove page size truncate of data and crc file #32

Merged
merged 2 commits into from
Sep 27, 2018

Conversation

zhongwuzw
Copy link
Contributor

  1. Remove file page size alignment. Etc, CRC-32 file is only need 4B, but we allocate one page 4096B, waste 4092B. And actually, we don't need to align, because mmap don't need size parameter to be aligned, it's just need not be 0, only off parameter must be a multiple of pagesize. I add a global variable DEFAULT_FILE_SIZE, currently it's equal to page size, but we can adjust it.
  2. Change fd check of validation, fd is invalid if fd == -1 .

@tencent-adm
Copy link
Member

tencent-adm commented Sep 26, 2018

CLA assistant check
All committers have signed the CLA.

@lingol
Copy link
Collaborator

lingol commented Sep 26, 2018

It's true that mmap don't require size to be aligned. However, the memory address you get from mmap is aligned indeed, and the size of that segment is at lease one pagesize. And one more thing, each file stored in hard drive will consume at lease one block. So changing the CRC-32 file's size to 4 won't save you anything.

The fd checking is correct. Remove the CRC_FILE_SIZE and the pr will be accepted.

@zhongwuzw
Copy link
Contributor Author

zhongwuzw commented Sep 26, 2018

@lingol It's true the memory address returned by mmap is aligned, and at lease one page size, but it not needs file size to be page aligned.

Mappings are, therefore, integer multiples of pages. If the len parameter provided by the caller is not aligned on a page boundary—perhaps because the underlying file's size is not a multiple of the page size—the mapping is rounded up to the next full page. The bytes inside this added memory, between the last valid byte and the end of the mapping, are zero-filled. Any read from that region will return zeros. Any writes to that memory will not affect the backing file, even if it is mapped as MAP_SHARED. Only the original len bytes are ever written back to the file.

And one more thing, each file stored in hard drive will consume at lease one block. So changing the CRC-32 file's size to 4 won't save you anything.

Yes, each file occupy at least one block, but block size != page size, block size always equal to 4KiB, page size equal to 16 KiB on 64-bit ARM platforms. More than 4KiB would waste.
Besides this, we don't need these round up adjustment and truncate things in code actually, it useless and wasteful IMO.

@lingol
Copy link
Collaborator

lingol commented Sep 27, 2018

As for the round up adjustment, it is better this way if not necessary. Not because of mmap's requirement, but for the sake of future usage. MMKV uses a double-size strategy to extend file size if space is not enough. It's better and simple for the file to have a size of n pagesize, and to increase by double. Like 16K, 32K, 64K, etc.

You may ask the file is given one pagesize for initialization, what's the need to round up? The fact is shit happens. In production the file's size may end up not rounding up to pagesize, and that causes some itches. So here we are, checking file size and rounding up when opening.

@zhongwuzw
Copy link
Contributor Author

@lingol Emm, maybe it's for the sake of future usage.

You mean if file's size not end up to pagesize, it may cause some issues? 🤔 can you give me some examples? thanks.

So I can revert the adjustment for data file, but I still want to make CRC-32 file size to 4B, it's already meet the requirement, what's your opinion?

And just some recommends about double-size algorithm, double size when data length is not big is ok, but if data grow up to big size, exponential growth may have some problems, like 500M, double to 1000M, currently, we just mmap the entire file, it would occupy a long continuous memory address, may not easy for kernel to find this demand. Maybe we can use some optimize algorithm, like Slow start of TCP, or split mmap to segments and load in need. However, if the frameworks only used just like NSUserDefaults, it's ok because user cannot store many data in it.

@lingol
Copy link
Collaborator

lingol commented Sep 27, 2018

You mean if file's size not end up to pagesize, it may cause some issues? 🤔 can you give me some examples? thanks.

Well, in the early days of MMKV development & trials, we receive some crash report when writing back to file. By looking into logs we found the file size is not round up. So a highly suspicion is that iOS may not tell the true about file's actual size when truncating, and writing beyond that may crash. We tried rounding up file size for protection, and those crashes are gone.

So I can revert the adjustment for data file, but I still want to make CRC-32 file size to 4B, it's already meet the requirement, what's your opinion?

Currently Okey, 4B is enough for iOS implementation, thought in Android it needs more than that.

@lingol
Copy link
Collaborator

lingol commented Sep 27, 2018

As for the growth algorithm, MMKV currently use the simple double-size algorithm. Mainly to avoid frequently full-write-back, which is slow. In fact, MMKV even makes a guess about future usage, making room not just for current append/insertion, but also for some more operation in-head. Checkout - (BOOL)ensureMemorySize:(size_t)newSize for detail info.

Make CRC-32 file length to 4B

adjust spacing

adjust spacing
@zhongwuzw zhongwuzw force-pushed the remove_file_size_page_size_align branch from c5d7c60 to 3857a82 Compare September 27, 2018 08:22
@zhongwuzw
Copy link
Contributor Author

@lingol

Well, in the early days of MMKV development & trials, we receive some crash report when writing back to file. By looking into logs we found the file size is not round up. So a highly suspicion is that iOS may not tell the true about file's actual size when truncating, and writing beyond that may crash. We tried rounding up file size for protection, and those crashes are gone.

Interesting! 🤔

Mainly to avoid frequently full-write-back, which is slow. In fact, MMKV even makes a guess about future usage, making room not just for current append/insertion, but also for some more operation in-head. Checkout - (BOOL)ensureMemorySize:(size_t)newSize for detail info.

Emm, I know it's for future usage, in my comment, I want to express maybe we can adjust the algorithm, not just always double size, just like Slow start of TCP or something I mentioned.

Anyway, I updated the CRC-32 file length.

@ysbing
Copy link

ysbing commented Sep 27, 2018

Double length does consume a bit of space, can it be more optimized in the future?

@lingol lingol merged commit 4519736 into Tencent:dev Sep 27, 2018
@zhongwuzw zhongwuzw deleted the remove_file_size_page_size_align branch September 27, 2018 08:45
@lingol
Copy link
Collaborator

lingol commented Oct 18, 2018

It turns out that limiting CRC file's size to 4 byte makes mlock fail in some devices. I'm gonna revert these logic.

@Knight-ZXW Knight-ZXW mentioned this pull request Sep 12, 2019
@sven-brobonds sven-brobonds mentioned this pull request Jul 21, 2020
@Leiluojun Leiluojun mentioned this pull request Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants