Skip to content

[IOTDB-418] add some impl of NewSeriesReaderWithoutValueFilter#692

Merged
qiaojialin merged 31 commits intoapache:new_series_readerfrom
liutaohua:new_series_reader
Jan 14, 2020
Merged

[IOTDB-418] add some impl of NewSeriesReaderWithoutValueFilter#692
qiaojialin merged 31 commits intoapache:new_series_readerfrom
liutaohua:new_series_reader

Conversation

@liutaohua
Copy link
Contributor

@liutaohua liutaohua commented Dec 30, 2019

The process of processing read data is a very complicated one, and previous read interfaces were obscure and confusing.
I reimplemented a reader based on which all the data in the files could be queried in order

I have mainly done the following work:

  1. Renamed iAggregateReader in TsFile project to IPageSkipRader

  2. Added IChunkReader. Based on this interface, there are two implementations, ChunkReader and MemChunkReader

  3. Added ChunkMetadata for ReadonlyMemChunk and PageHeader for MemChunkReader

  4. Added IAggregateReader and IRawReader interface and AbstractDataReader abstract class to complete all functions of the new reader, respectively realizing RawDataReaderWithoutValueFilter, SeriesDataReaderWithoutValueFilter and SeriesDataReaderWithValueFilter

  5. Refactoring group by, raw query and aggregate query ,and so on

  6. In the previous group by implementation, the open and closed interval definition was wrong, [0-100] I changed it to [0-100)

Path seriesPath = queryDataSource.getSeriesPath();
TreeSet<TsFileResource> unseqTsFilesSet = new TreeSet<>((o1, o2) -> {
String queryMeasurement = seriesPath.getMeasurement();
List<Long> o1StartTimeList = o1.getChunkMetaDataList().stream()
Copy link
Member

@qiaojialin qiaojialin Dec 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ChunkMetadataList should all belongs to this measurement. No need to filter again. Besides, only un closed TsFile has the chunkMetaDataList.

The sort could compare by the startTimeMap.get(seriesPath.getDeviceId())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@qiaojialin qiaojialin changed the title add some impl add some impl of NewSeriesReaderWithoutValueFilter Dec 31, 2019
liudw added 13 commits January 2, 2020 15:55
…or-iotdb into new_series_reader

� Conflicts:
�	server/src/main/java/org/apache/iotdb/db/query/executor/EngineQueryRouter.java
�	server/src/main/java/org/apache/iotdb/db/query/reader/chunkRelated/DiskChunkReader.java
�	tsfile/src/main/java/org/apache/iotdb/tsfile/read/reader/chunk/AbstractChunkReader.java
…or-iotdb into new_series_reader

� Conflicts:
�	server/src/main/java/org/apache/iotdb/db/query/dataset/NewEngineDataSetWithoutValueFilter.java
@jixuan1989
Copy link
Member

please try to merge master to check whether the failed test can pass.

@liutaohua
Copy link
Contributor Author

please try to merge master to check whether the failed test can pass.

please help me merge master to IoTDB new_series_reader

Copy link
Contributor

@samperson1997 samperson1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks a lot for your effort and codes. You'd really done an excellent and fantastic job in this PR! I believe it will be quite great contribution after being merged.
After an initial quick view of changed files, I have found some tiny little problems. Most of them are just suggestions and you can decide whether to fix them or not. Afterwards, I will try to deeply review the logistic of all the codes.
Look forward to your further updating and contribution!

Statistics statsByType = Statistics.getStatsByType(dataType);
ChunkMetaData metaData = new ChunkMetaData(measurementUid, dataType, 0, statsByType);
if (!isEmpty()) {
List<TimeValuePair> sortedTimeValuePairList = getSortedTimeValuePairList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could TimeValuePairSorter be replaced by TVList?

public abstract AggreResultData getResult();


public abstract void calculateValueFromStatistics(Statistics chunkStatistics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest extracting AggreResultData out from AggregateFunction, each time calculateValueFromStatistics returns a AggreResultData, and merge outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


public abstract void calculateValueFromPageData(BatchData dataInThisPage) throws IOException;

public abstract void calculateValueFromPageData(BatchData dataInThisPage,long bound) throws IOException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc for bound

remove unused methods in this class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Override
public void calculateValueFromStatistics(Statistics chunkStatistics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void calculateValueFromStatistics(Statistics chunkStatistics)
public void calculateValueFromStatistics(Statistics statistics)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

groupByPlan.setUnit(unit);
groupByPlan.setDeduplicatedPaths(executePaths);
groupByPlan.setDeduplicatedDataTypes(dataTypes);
groupByPlan.setDeduplicatedDataTypes(tsDataTypes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a bug before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like a bug from before

timeHeap = new TreeSet<>();
for (int i = 0; i < seriesReaderWithoutValueFilterList.size(); i++) {
ManagedSeriesReader reader = seriesReaderWithoutValueFilterList.get(i);
RawDataReaderWithoutValueFilter reader = seriesReaderWithoutValueFilterList.get(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ManagedSeriesReader is for distribution, we'd better retain it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, but only batch was kept

// /**
// * constructor.
// */
// public AggregateEngineExecutor(AggregationPlan aggregationPlan) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this class be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, and I've removed all the unused classes and tests

if (!batchData.hasCurrent()) {
batchData = allDataReader.nextBatch();
}
afterPair = new TimeValuePair(batchData.currentTime(), batchData.currentTsPrimitiveType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we assure each batch from RawDataReader is not empty?

if (!batchData.hasCurrent()) {
batchData = allDataReader.nextBatch();
}
cachedPair = new TimeValuePair(batchData.currentTime(), batchData.currentTsPrimitiveType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same question with LinearFill

@@ -36,10 +37,10 @@
*/
public class DiskChunkReader implements IPointReader, IBatchReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiskChunkReader may could be replaced by ChunkReader implement IPointReader.
Maybe ChunkReaderWrap could be replaced by IChunkReader

* Aggregate results cannot be calculated using Statistics directly, using the data in each page
*
* @param dataInThisPage the data in Page
* @param bound the time upper bounder of data in unsequence data reader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate points whose time < bound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* <p>
*/
public class DiskChunkReader implements IPointReader, IBatchReader {
public class ChunkReaderIterator implements IPointReader, IBatchReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is confusing... it seems could iterate chunk reader

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to ChunkDataIterator

Comment on lines +123 to +153
while (newSeriesReader.hasNextChunk()) {
if (newSeriesReader.canUseChunkStatistics()) {
Statistics chunkStatistics = newSeriesReader.currentChunkStatistics();
function.calculateValueFromStatistics(chunkStatistics);
if (function.isCalculatedAggregationResult()) {
return function.getResult();
}
newSeriesReader.skipChunkData();
continue;
}
while (newSeriesReader.hasNextPage()) {
//cal by pageheader
if (newSeriesReader.canUsePageStatistics()) {
Statistics pageStatistic = newSeriesReader.currentChunkStatistics();
function.calculateValueFromStatistics(pageStatistic);
if (function.isCalculatedAggregationResult()) {
return function.getResult();
}
newSeriesReader.skipPageData();
continue;
}
//cal by pagedata
while (newSeriesReader.hasNextBatch()) {
function.calculateValueFromPageData(newSeriesReader.nextBatch());
if (function.isCalculatedAggregationResult()) {
return function.getResult();
}
}
}
}
return function.getResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract these codes to avoid repeated codes in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

this.context = context;
this.dataType = dataType;

if (filter != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if filter == null and has a TTL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@samperson1997 samperson1997 self-requested a review January 13, 2020 08:50
} else if (seqChunkMetadatas.isEmpty() && !unseqChunkMetadatas.isEmpty()) {
chunkMetaData = unseqChunkMetadatas.pollFirst();
} else if (!seqChunkMetadatas.isEmpty()) {
// seq 和 unseq 的 chunk metadata 都不为空
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// seq 和 unseq 的 chunk metadata 都不为空
// neither seqChunkMetadatas nor unseqChunkMetadatas is null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +350 to +352
* unseq file is a very special file that intersects not only with an ordered file, but also with
* another unseq file. So we need a way to find all the files that might be used to intersect the
* current measurement point.
Copy link
Member

@qiaojialin qiaojialin Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unseq files are very special files that intersect not only with sequence files, but also with
other unseq files. So we need to find all tsfiles that overlapped with current chunk and
extract chunks from the resource

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

private void fillOverlappedFiles() throws IOException {
while (!unseqFileResource.isEmpty()) {
Map<String, Long> startTimeMap = unseqFileResource.first().getStartTimeMap();
Long unSeqStartTime = startTimeMap.getOrDefault(seriesPath.getDevice(), Long.MIN_VALUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Long unSeqStartTime = startTimeMap.getOrDefault(seriesPath.getDevice(), Long.MIN_VALUE);
Long unSeqStartTime = startTimeMap.getOrDefault(seriesPath.getDevice(), Long.MAX_VALUE);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

while (!unseqFileResource.isEmpty()) {
Map<String, Long> startTimeMap = unseqFileResource.first().getStartTimeMap();
Long unSeqStartTime = startTimeMap.getOrDefault(seriesPath.getDevice(), Long.MIN_VALUE);
if (chunkMetaData.getEndTime() > unSeqStartTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (chunkMetaData.getEndTime() > unSeqStartTime) {
if (chunkMetaData.getEndTime() >= unSeqStartTime) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


//for test
public AbstractDataReader(Path seriesPath, TSDataType dataType,
Filter filter, QueryContext context, List<TsFileResource> resources) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Filter filter, QueryContext context, List<TsFileResource> resources) throws IOException {
Filter filter, QueryContext context, List<TsFileResource> seqResources) throws IOException {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* another unseq file. So we need a way to find all the files that might be used to intersect the
* current measurement point.
*/
private void fillOverlappedFiles() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private void fillOverlappedFiles() throws IOException {
private void unpackOverlappedFiles() throws IOException {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

while (!unseqChunkMetadatas.isEmpty()) {
long startTime = unseqChunkMetadatas.first().getStartTime();

if (chunkMetaData.getEndTime() > startTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (chunkMetaData.getEndTime() > startTime) {
if (chunkMetaData.getEndTime() >= startTime) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

while (!seqChunkMetadatas.isEmpty()) {
long startTime = seqChunkMetadatas.get(0).getStartTime();

if (chunkMetaData.getEndTime() > startTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (chunkMetaData.getEndTime() > startTime) {
if (chunkMetaData.getEndTime() >= startTime) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

} else {
Chunk chunk = chunkLoader.getChunk(metaData);
chunkReader = new ChunkReader(chunk, filter);
chunkReader.hasNextSatisfiedPage();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is strange, it's better to remove it.

@qiaojialin qiaojialin changed the title add some impl of NewSeriesReaderWithoutValueFilter [IOTDB-418] add some impl of NewSeriesReaderWithoutValueFilter Jan 14, 2020
@qiaojialin qiaojialin merged commit 3092715 into apache:new_series_reader Jan 14, 2020
@liutaohua liutaohua deleted the new_series_reader branch January 14, 2020 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants