-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[IOTDB-2807]Speed up the cross space compaction by multi-threads #5415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
3612543
first complete concurrent
choubenson ebbd04f
update device start time and end time after compacting
choubenson 91b8c1c
Merge branch 'master' into speedUpCompaction
choubenson 8787bb7
fix conflict
choubenson 6254f86
fix chunkReaderTest and use AtomicInteger array instead of Concurrent…
choubenson f304031
add Thread.interrupted
choubenson 4485587
fix hasStartChunk param in CrossSpaceCompactionWriter safe in concurr…
choubenson 3c26be1
change AtomicInteger[] to ConcurrentHashMap
choubenson 386479a
fix concurrency bug: The first thread has not flushed ChunkGroupHeade…
choubenson 144775c
change concurrenctHashMap to int[] array
choubenson 86a3c30
Merge branch 'master' into speedUpCompaction
choubenson 0053bf0
merge master
choubenson 3295ac7
rename map to array
choubenson fe78277
remove status in TsFileResource
choubenson 9d77ab7
start chunkGroup in all target files and clean empty chunkGroup at last
choubenson 3eeafb9
spotless
choubenson d16416f
add flushChunkToWriter method
choubenson c7966bf
define ChunkWriters as array instead of concurrenctHashMap
choubenson 65bccce
add tests to test whether cross compaction target file contain empty …
choubenson File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -20,6 +20,7 @@ | |
|
|
||
| import org.apache.iotdb.commons.conf.IoTDBConstant; | ||
| import org.apache.iotdb.db.conf.IoTDBDescriptor; | ||
| import org.apache.iotdb.db.engine.compaction.cross.rewrite.task.SubCompactionTask; | ||
| import org.apache.iotdb.db.engine.compaction.inner.utils.MultiTsFileDeviceIterator; | ||
| import org.apache.iotdb.db.engine.compaction.writer.AbstractCompactionWriter; | ||
| import org.apache.iotdb.db.engine.compaction.writer.CrossSpaceCompactionWriter; | ||
|
|
@@ -44,35 +45,41 @@ | |
| import org.apache.iotdb.db.utils.QueryUtils; | ||
| import org.apache.iotdb.tsfile.common.constant.TsFileConstant; | ||
| import org.apache.iotdb.tsfile.exception.write.WriteProcessException; | ||
| import org.apache.iotdb.tsfile.file.metadata.TimeseriesMetadata; | ||
| import org.apache.iotdb.tsfile.file.metadata.enums.TSDataType; | ||
| import org.apache.iotdb.tsfile.fileSystem.FSFactoryProducer; | ||
| import org.apache.iotdb.tsfile.read.common.BatchData; | ||
| import org.apache.iotdb.tsfile.read.reader.IBatchReader; | ||
| import org.apache.iotdb.tsfile.utils.Pair; | ||
| import org.apache.iotdb.tsfile.write.schema.IMeasurementSchema; | ||
| import org.apache.iotdb.tsfile.write.writer.TsFileIOWriter; | ||
|
|
||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| import java.io.File; | ||
| import java.io.IOException; | ||
| import java.util.ArrayList; | ||
| import java.util.Collections; | ||
| import java.util.HashMap; | ||
| import java.util.HashSet; | ||
| import java.util.List; | ||
| import java.util.Map; | ||
| import java.util.Set; | ||
| import java.util.concurrent.ExecutionException; | ||
| import java.util.concurrent.Future; | ||
| import java.util.stream.Collectors; | ||
|
|
||
| /** | ||
| * This tool can be used to perform inner space or cross space compaction of aligned and non aligned | ||
| * timeseries . Currently, we use {@link | ||
| * org.apache.iotdb.db.engine.compaction.inner.utils.InnerSpaceCompactionUtils} to speed up if it is | ||
| * an inner space compaction. | ||
| * an seq inner space compaction. | ||
| */ | ||
| public class CompactionUtils { | ||
| private static final Logger logger = | ||
| LoggerFactory.getLogger(IoTDBConstant.COMPACTION_LOGGER_NAME); | ||
| private static final int subTaskNum = | ||
| IoTDBDescriptor.getInstance().getConfig().getSubCompactionTaskNum(); | ||
|
|
||
| public static void compact( | ||
| List<TsFileResource> seqFileResources, | ||
|
|
@@ -108,6 +115,7 @@ public static void compact( | |
| } | ||
|
|
||
| compactionWriter.endFile(); | ||
| updateDeviceStartTimeAndEndTime(targetFileResources, compactionWriter); | ||
| updatePlanIndexes(targetFileResources, seqFileResources, unseqFileResources); | ||
| } finally { | ||
| QueryResourceManager.getInstance().endQuery(queryId); | ||
|
|
@@ -157,9 +165,9 @@ private static void compactAlignedSeries( | |
| if (dataBatchReader.hasNextBatch()) { | ||
| // chunkgroup is serialized only when at least one timeseries under this device has data | ||
| compactionWriter.startChunkGroup(device, true); | ||
| compactionWriter.startMeasurement(measurementSchemas); | ||
| writeWithReader(compactionWriter, dataBatchReader); | ||
| compactionWriter.endMeasurement(); | ||
| compactionWriter.startMeasurement(measurementSchemas, 0); | ||
| writeWithReader(compactionWriter, dataBatchReader, 0); | ||
| compactionWriter.endMeasurement(0); | ||
| compactionWriter.endChunkGroup(); | ||
| } | ||
| } | ||
|
|
@@ -170,59 +178,58 @@ private static void compactNonAlignedSeries( | |
| AbstractCompactionWriter compactionWriter, | ||
| QueryContext queryContext, | ||
| QueryDataSource queryDataSource) | ||
| throws MetadataException, IOException { | ||
| boolean hasStartChunkGroup = false; | ||
| throws IOException, InterruptedException { | ||
| MultiTsFileDeviceIterator.MeasurementIterator measurementIterator = | ||
| deviceIterator.iterateNotAlignedSeries(device, false); | ||
| Set<String> allMeasurements = measurementIterator.getAllMeasurements(); | ||
| int subTaskNums = Math.min(allMeasurements.size(), subTaskNum); | ||
|
|
||
| // assign all measurements to different sub tasks | ||
| Set<String>[] measurementsForEachSubTask = new HashSet[subTaskNums]; | ||
| int idx = 0; | ||
| for (String measurement : allMeasurements) { | ||
| List<IMeasurementSchema> measurementSchemas = new ArrayList<>(); | ||
| try { | ||
| if (IoTDBDescriptor.getInstance().getConfig().isEnableIDTable()) { | ||
| measurementSchemas.add(IDTableManager.getInstance().getSeriesSchema(device, measurement)); | ||
| } else { | ||
| measurementSchemas.add( | ||
| IoTDB.schemaProcessor.getSeriesSchema(new PartialPath(device, measurement))); | ||
| } | ||
| } catch (PathNotExistException e) { | ||
| logger.info("A deleted path is skipped: {}", e.getMessage()); | ||
| continue; | ||
| if (measurementsForEachSubTask[idx % subTaskNums] == null) { | ||
| measurementsForEachSubTask[idx % subTaskNums] = new HashSet<String>(); | ||
| } | ||
| measurementsForEachSubTask[idx++ % subTaskNums].add(measurement); | ||
| } | ||
|
|
||
| IBatchReader dataBatchReader = | ||
| constructReader( | ||
| device, | ||
| Collections.singletonList(measurement), | ||
| measurementSchemas, | ||
| allMeasurements, | ||
| queryContext, | ||
| queryDataSource, | ||
| false); | ||
| // construct sub tasks and start compacting measurements in parallel | ||
| List<Future<Void>> futures = new ArrayList<>(); | ||
| compactionWriter.startChunkGroup(device, false); | ||
| for (int i = 0; i < subTaskNums; i++) { | ||
| futures.add( | ||
| CompactionTaskManager.getInstance() | ||
| .submitSubTask( | ||
| new SubCompactionTask( | ||
| device, | ||
| measurementsForEachSubTask[i], | ||
| queryContext, | ||
| queryDataSource, | ||
| compactionWriter, | ||
| i))); | ||
| } | ||
|
|
||
| if (dataBatchReader.hasNextBatch()) { | ||
| if (!hasStartChunkGroup) { | ||
| // chunkgroup is serialized only when at least one timeseries under this device has | ||
| // data | ||
| compactionWriter.startChunkGroup(device, false); | ||
| hasStartChunkGroup = true; | ||
| } | ||
| compactionWriter.startMeasurement(measurementSchemas); | ||
| writeWithReader(compactionWriter, dataBatchReader); | ||
| compactionWriter.endMeasurement(); | ||
| // wait for all sub tasks finish | ||
| for (int i = 0; i < subTaskNums; i++) { | ||
| try { | ||
| futures.get(i).get(); | ||
| } catch (InterruptedException | ExecutionException e) { | ||
| logger.error("SubCompactionTask meet errors ", e); | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thread.interrepted(). |
||
| Thread.interrupted(); | ||
| throw new InterruptedException(); | ||
| } | ||
| } | ||
|
|
||
| if (hasStartChunkGroup) { | ||
| compactionWriter.endChunkGroup(); | ||
| } | ||
| compactionWriter.endChunkGroup(); | ||
| } | ||
|
|
||
| private static void writeWithReader(AbstractCompactionWriter writer, IBatchReader reader) | ||
| throws IOException { | ||
| public static void writeWithReader( | ||
| AbstractCompactionWriter writer, IBatchReader reader, int subTaskId) throws IOException { | ||
| while (reader.hasNextBatch()) { | ||
| BatchData batchData = reader.nextBatch(); | ||
| while (batchData.hasCurrent()) { | ||
| writer.write(batchData.currentTime(), batchData.currentValue()); | ||
| writer.write(batchData.currentTime(), batchData.currentValue(), subTaskId); | ||
| batchData.next(); | ||
| } | ||
| } | ||
|
|
@@ -232,7 +239,7 @@ private static void writeWithReader(AbstractCompactionWriter writer, IBatchReade | |
| * @param measurementIds if device is aligned, then measurementIds contain all measurements. If | ||
| * device is not aligned, then measurementIds only contain one measurement. | ||
| */ | ||
| private static IBatchReader constructReader( | ||
| public static IBatchReader constructReader( | ||
| String deviceId, | ||
| List<String> measurementIds, | ||
| List<IMeasurementSchema> measurementSchemas, | ||
|
|
@@ -268,6 +275,29 @@ private static AbstractCompactionWriter getCompactionWriter( | |
| } | ||
| } | ||
|
|
||
| private static void updateDeviceStartTimeAndEndTime( | ||
| List<TsFileResource> targetResources, AbstractCompactionWriter compactionWriter) { | ||
| List<TsFileIOWriter> targetFileWriters = compactionWriter.getFileIOWriter(); | ||
| for (int i = 0; i < targetFileWriters.size(); i++) { | ||
| TsFileIOWriter fileIOWriter = targetFileWriters.get(i); | ||
| TsFileResource fileResource = targetResources.get(i); | ||
| // The tmp target file may does not have any data points written due to the existence of the | ||
| // mods file, and it will be deleted after compaction. So skip the target file that has been | ||
| // deleted. | ||
| if (!fileResource.getTsFile().exists()) { | ||
| continue; | ||
| } | ||
| for (Map.Entry<String, List<TimeseriesMetadata>> entry : | ||
| fileIOWriter.getDeviceTimeseriesMetadataMap().entrySet()) { | ||
| String device = entry.getKey(); | ||
| for (TimeseriesMetadata timeseriesMetadata : entry.getValue()) { | ||
| fileResource.updateStartTime(device, timeseriesMetadata.getStatistics().getStartTime()); | ||
| fileResource.updateEndTime(device, timeseriesMetadata.getStatistics().getEndTime()); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| private static void updatePlanIndexes( | ||
| List<TsFileResource> targetResources, | ||
| List<TsFileResource> seqResources, | ||
|
|
@@ -280,7 +310,7 @@ private static void updatePlanIndexes( | |
| // in the new file | ||
| for (int i = 0; i < targetResources.size(); i++) { | ||
| TsFileResource targetResource = targetResources.get(i); | ||
| // remove the target file been deleted from list | ||
| // remove the target file that has been deleted from list | ||
| if (!targetResource.getTsFile().exists()) { | ||
| targetResources.remove(i--); | ||
| continue; | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to hash here? If the hash is not uniform, then the effect of multithreading could not be worse. Why not just use a concurrency stack or queue, each sub-thread get a measurement to compact when it is available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each sub task has its own hashset, so this won't happen.