Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在Windows平台使用多线程进行文件复制 #51

Closed
wants to merge 6 commits into from
Closed

在Windows平台使用多线程进行文件复制 #51

wants to merge 6 commits into from

Conversation

tanhHeng
Copy link

多线程在Windows平台

普通的copytree等复制函数在Windows平台上调度性能较差,无法充分利用硬盘。
使用多线程可将速度提升约300%-600%

添加内容

函数

通过使用shutil.copytree函数先获取文件列表,threading模块建立线程,将待复制的文件按照线程数切分成n份分配给各个线程,开始计算
极大的提高了Windows平台上的备份速度

Config

copy_thread_active

默认值: 4

建立多个线程同时请求硬盘以加快复制速度,该选项控制了建立的线程数量

当该选项设定为0时,关闭多线程复制而采用传统复制方式

测试

环境:

  • Windows11 专业工作站版
  • CPU i9 13900k
  • 硬盘 tipro7000 RAID10 阵列(Windows RAID0存储池+镜像盘)
  • 存档大小:21.3GB
  • 文件数量:20,321

结果:

  • 无多线程:72.9秒(60-80秒)
  • 8线程:14秒(13-20秒)

速度提升了约300%-600%

@@ -119,6 +119,22 @@ mcd_root/
- Python >= 3.8
- 选项 `backup_format` 为 `plain`

### copy_thread_active
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请同步维护英文文档


注意:

- 一般情况下,多线程复制能够将速度提升4-5倍(测试结果)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果想要展示性能差异,请带上具体的测试场景

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

测试部分写在了pr里
并且由于存档大小和系统环境等差异,在不同使用场景下提升差异可能较大,提升较小的可能只有20%左右

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR 不是文档。readme 里要么不写,要么写完善


- 一般情况下,多线程复制能够将速度提升4-5倍(测试结果)

- 线程数不建议超过8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请提供建议参考来源

@@ -1,13 +1,15 @@
{
"id": "quick_backup_multi",
"version": "1.9.0",
"version": "1.10.0-beta",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请不要修改版本号,这不是这一类新增功能 PR 该做的

"name": "Quick Backup Multi",
"description": {
"en_us": "A backup / restore plugin, with multiple backup slot",
"zh_cn": "多槽位备份/回档插件"
},
"author": [
"Fallen_Breath"
"Fallen_Breath",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

你只是贡献者而已,并非维护者/作者

@@ -30,6 +30,44 @@ class CopyWorldIntent(Enum):
backup = auto()
restore = auto()

# 多线程复制文件
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请使用英文来写注释,并且不要写这种翻译变量名这一类无意义的注释

step = int(len(files_from)//thread_counts)
f = lambda _list: [_list[i:i+step] for i in range(0,len(_list),step)] # 切分文件为thread_count份
files_from,files_to = f(files_from),f(files_to)
for thread in range(thread_counts): # 多线程复制
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑使用线程池 ThreadPoolExecutor。借助线程池,无需手动切分复制任务、无需管理线程生命周期

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前仍在尝试各种不同的线程调用方式,感谢提醒,我会尝试测试线程池的调用及耗时

s_time = time.time()
for i in threads:
i.join()
server_inst.logger.info(time.time()-s_time)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果耗时展示仅用于调试,请删去

for file_from, file_to in zip(files_from,files_to):
try:
shutil.copy2(file_from,file_to)
except PermissionError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请不要无原因地抑制异常发生

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是由于session.lock在部分情况下可能会被mc锁定无法复制,从而引发permission异常。如果尝试在except中再验证一次复制的文件名,非session.lock文件则抛出异常应该能更好解决?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session.lock 这一类文件已可使用 ignored_files 配置进行忽略

shutil.copy2(file_from,file_to)
except PermissionError:
pass
if id != None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果耗时展示仅用于调试,请删去

try:
MultithreadedCopy(src_path, dst_path, config.copy_thread_active)
except Exception as e:
server_inst.logger.warn(f"多线程复制出错,使用常规复制 {e}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 请使用英文
  2. 请阐述进行重试操作的原因,什么时候会出现多线程失败而单线程能成功的情况
  3. 如果出现问题,请做出恰当的恢复操作,而非直接调用常规复制方法重试

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前的调试过程中似乎暂未发现多线程失败的情况?

@@ -30,6 +30,44 @@ class CopyWorldIntent(Enum):
backup = auto()
restore = auto()

# 多线程复制文件
class MultithreadedCopy:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

考虑到该类的使用方式

try:
MultithreadedCopy(src_path, dst_path, config.copy_thread_active)
except Exception as e:

请不用把类当函数来用,定义一个普通的函数即可。这些成员函数都可以在函数里面定义函数

shutil.copytree(src_path, dst_path, ignore=lambda path, files: set(filter(config.is_file_ignored, files)), copy_function=copy_file_fast)
if config.copy_thread_active >= 1:
try:
MultithreadedCopy(src_path, dst_path, config.copy_thread_active)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请正确支持功能 config.is_file_ignored

Copy link
Collaborator

@Fallen-Breath Fallen-Breath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

见 Files changed 页面的 review 项

@Fallen-Breath
Copy link
Collaborator

使用线程池,支持 enable_copy_file_range 选项,并暴露拷贝过程中出现的问题的参考实现:

def copy_tree_fast(src_path: str, dst_path: str, ignore=None, copy_function: Callable[[str, str], object] = shutil.copy2):
	with ThreadPoolExecutor(max_workers=max(1, config.copy_thread_active), thread_name_prefix='QBMFileCopier') as pool:
		def threaded_copy(s: str, d: str):
			tasks.append((s, d, pool.submit(copy_function, s, d)))

		tasks = []
		shutil.copytree(src_path, dst_path, ignore=ignore, copy_function=threaded_copy)

	# expose the possible exceptions
	for src, dst, future in tasks:
		try:
			future.result()
		except Exception:
			server_inst.logger.error('Failed to copy file from {} to {}'.format(src, dst))
			raise	
@@ -130,7 +148,7 @@
 
 			server_inst.logger.info('copying {} -> {}'.format(src_path, dst_path))
 			if os.path.isdir(src_path):
-				shutil.copytree(src_path, dst_path, ignore=lambda path, files: set(filter(config.is_file_ignored, files)), copy_function=copy_file_fast)
+				copy_tree_fast(src_path, dst_path, ignore=lambda path, files: set(filter(config.is_file_ignored, files)), copy_function=copy_file_fast)
 			elif os.path.isfile(src_path):
 				dst_dir = os.path.dirname(dst_path)
 				if not os.path.isdir(dst_dir):

@@ -11,6 +11,7 @@ class Configuration(Serializable):
size_display: bool = True
turn_off_auto_save: bool = True
enable_copy_file_range: bool = False
copy_thread_active: int = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请使用 01 作为默认值,让多线程复制默认关闭。毕竟该功能并非是无副作用的

@Fallen-Breath
Copy link
Collaborator

已于 f39a393 中实现并行复制

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants