Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflink(Copy On Write)支持 #39

Merged
merged 9 commits into from
May 10, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
42 changes: 40 additions & 2 deletions quick_backup_multi/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,44 @@ def get_backup_file_name(backup_format: BackupFormat):
else:
raise ValueError('unknown backup mode {}'.format(backup_format))

try:
os.copy_file_range
Fallen-Breath marked this conversation as resolved.
Show resolved Hide resolved
except:
copy_file_range_supported=False
else:
copy_file_range_supported=True

#copy using "Copy On Write"
def _cpcow(src_path: str, dst_path: str):
if os.path.isdir(dst_path):
dst_path = os.path.join(dst_path, os.path.basename(src_path))

if copy_file_range_supported:
# f1 = open(src_path,'r').fileno() #It doesn't work,Why?
Fallen-Breath marked this conversation as resolved.
Show resolved Hide resolved
f11 = open(src_path,'rb')
f1 = f11.fileno()
f21 = open(dst_path,'wb+')
tksmly marked this conversation as resolved.
Show resolved Hide resolved
f2 = f21.fileno()
size = os.path.getsize(src_path)
try:
if size > 2**31 - 4096:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

仍然存在的魔法数字

如果是你自己写的,请给出理由;如果是参照得出的,请给出引用源

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于反复出现的魔法数字,请定义常量统一维护

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

仍然存在的魔法数字

如果是你自己写的,请给出理由;如果是参照得出的,请给出引用源

copy_file_range无论多大的单个文件,最多都只能复制2**31 - 4096字节
https://stackoverflow.com/questions/70368651/why-cant-linux-write-more-than-2147479552-bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于反复出现的魔法数字,请定义常量统一维护

已修改

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请在代码注释中注明出处链接,以便后续维护

对于这一个上限,根据上述链接中所述,其定义为(https://elixir.bootlin.com/linux/v4.8/source/include/linux/fs.h#L2130)

#define MAX_RW_COUNT (INT_MAX & PAGE_CACHE_MASK)

其中,PAGE_CACHE_MASKPAGE_SIZE 直接相关,而后者与可能与具体的系统有关。因此,假定 PAGE_SIZE 是 4096 也并不是一个稳妥的做法

考虑到 os.copy_file_range 可以返回实际操作的字节数,不妨使用一个较小的值作为每次拷贝的字节数,如 1GiB,并动态维护当前已拷贝的字节数,循环调用直到已拷贝的字节数达到期望值。这样的实现是无需考虑 PAGE_SIZE 的值的,能保证稳定的可用性。并且若实现正确,也不需要在执行 COW 前判一下文件大小是否超过 COW_COPY_LIMIT,均使用同一种实现即可

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy_file_range的返回值可能低于文件大小
所以还需要判断返回值时候为0

这是意料之内的表现,并且也是 #39 (comment) 中动态调整以拷贝字节数的用意

也许并不用记录已拷贝字节,只要判定返回是否为0或-1(出错)就行了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对于每次拷贝的字节数,如果担心占用过多内存,可以直接定死个如 1MiB、10MiB 之类的值

翻了一下glibc的实现,这个数其实大了没事

https://github.com/bminor/glibc/blob/db9b47e9f996bbdb831580ff7343542a017c80ee/support/support_copy_file_range.c
71-76

  char buf[8192];
  while (length > 0)
    {
      size_t to_read = length;
      if (to_read > sizeof (buf))
	to_read = sizeof (buf);
...

无论多大缓冲区都是8192
读取完了直接返回

      if (pinoff == NULL)
	read_count = read (infd, buf, to_read);
      else
	read_count = pread64 (infd, buf, to_read, *pinoff);
      if (read_count == 0)
	/* End of file reached prematurely.  */
	return copied;
...

count设为1gb应该是比较好的

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unix 风格的系统调用并不一定使用了 glibc 的实现,并且 glibc 的实现也并不一定稳定。我不建议使用依赖系统调用实现的做法,用记录已拷贝字节数的方案总是最稳妥的方案。

除此之外,你也在 #39 (comment) 提到过,“对小文件可能会导致内存占用过高”。如果这是真实的例子,那么选择一个 MiB 量级的每次拷贝数也是更优的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unix 风格的系统调用并不一定使用了 glibc 的实现,并且 glibc 的实现也并不一定稳定。我不建议使用依赖系统调用实现的做法,用记录已拷贝字节数的方案总是最稳妥的方案。

除此之外,你也在 #39 (comment) 提到过,“对小文件可能会导致内存占用过高”。如果这是真实的例子,那么选择一个 MiB 量级的每次拷贝数也是更优的

glibc只会增加功能,不会删除已有的功能
内存占用过高只是直觉,没有测试

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用记录已拷贝字节数的方案总是最稳妥的方案

返回值似乎总是比文件大小小一个字节,记录意义不大

for i in range(0, size, 2**31 - 4096):
os.copy_file_range(f1, f2, 2**31 - 4096, i) # need int, may overflow, so cannot copy files larger than 2GB in a single pass
else:
os.copy_file_range(f1, f2, size)
except Exception as e:
server_inst.logger.warning(str(e) + '({} -> {})'.format(src_path, src_path, dst_path))
Fallen-Breath marked this conversation as resolved.
Show resolved Hide resolved
shutil.copy(src_path, dst_path)

f11.close()
f21.close()
else:
shutil.copy(src_path, dst_path)

shutil.copystat(src_path, dst_path) # copy2
Fallen-Breath marked this conversation as resolved.
Show resolved Hide resolved

return dst_path


def copy_worlds(src: str, dst: str, intent: CopyWorldIntent, *, backup_format: Optional[BackupFormat] = None):
if backup_format is None:
Expand All @@ -96,12 +134,12 @@ def copy_worlds(src: str, dst: str, intent: CopyWorldIntent, *, backup_format: O

server_inst.logger.info('copying {} -> {}'.format(src_path, dst_path))
if os.path.isdir(src_path):
shutil.copytree(src_path, dst_path, ignore=lambda path, files: set(filter(config.is_file_ignored, files)))
shutil.copytree(src_path, dst_path, ignore=lambda path, files: set(filter(config.is_file_ignored, files)), copy_function=_cpcow)
elif os.path.isfile(src_path):
dst_dir = os.path.dirname(dst_path)
if not os.path.isdir(dst_dir):
os.makedirs(dst_dir)
shutil.copy(src_path, dst_path)
_cpcow(src_path, dst_path)
else:
server_inst.logger.warning('{} does not exist while copying ({} -> {})'.format(src_path, src_path, dst_path))
elif backup_format == BackupFormat.tar or backup_format == BackupFormat.tar_gz:
Expand Down