[IOTDB-2078] Split large TsFile tool#4736
Conversation
961a822 to
8dcb3de
Compare
|
Please check the following cases before executing the tool:
And it's better for us to provide a shell to execute the routine :D |
| String[] filePathSplit = filename.split(IoTDBConstant.FILE_NAME_SEPARATOR); | ||
| int versionIndex = Integer.parseInt(filePathSplit[filePathSplit.length - 3]) + 1; | ||
| // to avoid compaction after restarting. NOTICE: This will take effect only in | ||
| filePathSplit[filePathSplit.length - 2] = "10"; |
There was a problem hiding this comment.
Make the "10" a input parameter is better :D
There was a problem hiding this comment.
Fixed, I also added a default num as a global param private static final String defaultLevelNum = "10"
| IoTDBDescriptor.getInstance().getConfig().getTargetCompactionFileSize(); | ||
|
|
||
| /** Maximum index of plans executed within this TsFile. */ | ||
| protected long maxPlanIndex = Long.MIN_VALUE; |
| protected long maxPlanIndex = Long.MIN_VALUE; | ||
|
|
||
| /** Minimum index of plans executed within this TsFile. */ | ||
| protected long minPlanIndex = Long.MAX_VALUE; |
| for (int i = 0; i < filePathSplit.length; i++) { | ||
| sb.append(filePathSplit[i]); | ||
| if (i != filePathSplit.length - 1) { | ||
| sb.append("-"); |
There was a problem hiding this comment.
Fixed to IoTDBConstant.FILE_NAME_SEPARATOR
|
SonarCloud Quality Gate failed. |
Exception detection, shell script and User Guide documents in Chinese and English are added in latest commit. Really appreciate your detailed code review and suggestions!! : ) |








Background
IoTDB will compact some large TsFiles in rel/0.12, which causes many problem to memory control and task management. We need a tool to split the large TsFile.
Introduction
The split tool will:
Split the file into N new files, and the files are all about 1 GB (this is configured in
target_compaction_file_size=1073741824)---- This will make sure the files will not be compacted after restarting in 0.13.
Shrink the size of chunks into
chunk_point_num_lower_bound_in_compactionpoints.(Notice: these two configuration is introduced in PR [IOTDB-2176] Limit target chunk size when performing inner space compaction #4698)
Change file names: version (+1 ~ +N) and level (10)
---- This will make sure the files will not be compacted after restarting in 0.12.
For example, here we have a file in which there are 5 devices and there are 10 points in one chunk:
With the split tool, the file is split into new files. Here are some more detailed features:
chunk_point_num_lower_bound_in_compaction=6).Notice:
Data in different devices may be split into one same file, so that the files won't be too small. In an experiment, a 28.6GB file is split into 24 files, among which the smallest file is 1.04GB, and the largest file is 1.74GB.
No matter in 0.12 (compaction is decided by level in file name) or 0.13 (compaction is decided by file size), all these files won't be compacted later after restarting.
Usage
./TsFileSplitTool fileNameLimitation
Split tool does not support these scenario currently: