Skip to content

[IOTDB-298]Last time-value query#821

Merged
qiaojialin merged 33 commits intoapache:masterfrom
wshao08:new_series_reader
Mar 3, 2020
Merged

[IOTDB-298]Last time-value query#821
qiaojialin merged 33 commits intoapache:masterfrom
wshao08:new_series_reader

Conversation

@wshao08
Copy link
Contributor

@wshao08 wshao08 commented Feb 17, 2020

JIRA issue: https://issues.apache.org/jira/browse/IOTDB-298

This PR implements a new LAST query for IoTDB

The LAST query is to return the most recent value of the given timeseries in a time-value format.
The basic Sql syntax of Last query is:

select last <Path> [COMMA <Path>]* from <PrefixPath> [COMMA <PrefixPath>]*

In order to trace the latest timestamp and value as quick as possible, this PR adds a cache to store this latest time and value in MNodes. This cache will be updated when values with larger timestamps are written to the specific timeseries.

Copy link
Contributor

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, you should pay attention to your code format.

  1. our tab is two blanks. yours is four
  2. you have lots of useless import statement, please remove them

Second, you should add some UTs and ITs.

Last, there are some wrong usage of IAggregateReader, I have commented.

Comment on lines +113 to +130
while (seriesReader.hasNextChunk()) {
while (seriesReader.hasNextPage()) {
// cal by page data
while (seriesReader.hasNextOverlappedPage()) {
BatchData nextOverlappedPageData = seriesReader.nextOverlappedPage();
int maxIndex = nextOverlappedPageData.length() - 1;
if (maxIndex < 0) {
continue;
}
long time = nextOverlappedPageData.getTimeByIndex(maxIndex);
if (time > maxTime) {
maxTime = time;
queryResult.setPairResult(maxTime, nextOverlappedPageData.getValueInTimestamp(time), tsDataType);
}
nextOverlappedPageData.resetBatchData();
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the chunk and page statistics if possible. The code will be like as:

        while (seriesReader.hasNextChunk()) {
            if (seriesReader.canUseCurrentChunkStatistics()) {
                if (seriesReader.currentChunkStatistics().getEndTime() > maxTime) {
                    maxTime = seriesReader.currentChunkStatistics().getEndTime();
                    queryResult.setPairResult(maxTime, seriesReader.currentChunkStatistics().getLastValue(), tsDataType);
                }
                seriesReader.skipCurrentChunk();
                continue;
            }
            while (seriesReader.hasNextPage()) {
                if (seriesReader.canUseCurrentPageStatistics()) {
                    if (seriesReader.currentPageStatistics().getEndTime() > maxTime) {
                        maxTime = seriesReader.currentPageStatistics().getEndTime();
                        queryResult.setPairResult(maxTime, seriesReader.currentPageStatistics().getLastValue(), tsDataType);
                    }
                    seriesReader.skipCurrentPage();
                    continue;
                }
                // cal by page data
                while (seriesReader.hasNextOverlappedPage()) {
                    BatchData nextOverlappedPageData = seriesReader.nextOverlappedPage();
                    int maxIndex = nextOverlappedPageData.length() - 1;
                    if (maxIndex < 0) {
                        continue;
                    }
                    long time = nextOverlappedPageData.getTimeByIndex(maxIndex);
                    if (time > maxTime) {
                        maxTime = time;
                        queryResult.setPairResult(maxTime, nextOverlappedPageData.getValueInTimestamp(time), tsDataType);
                    }
                    nextOverlappedPageData.resetBatchData();
                }
            }
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adopted

}
insertPlan.setDataTypes(dataTypes);
storageEngine.insert(insertPlan);
insertPlan.updateMNodeLastValues(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not suitable to put the updateMNodeLastValues method in the InsertPlan.
You can update the last value in the before loop, there you already got the measurementNode.

Copy link
Contributor Author

@wshao08 wshao08 Feb 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update function is moved into MNode structure.
I added a loop to update the last value after storageEngine.insert(). It is possible that insert() could fail and throw exceptions. In such case the last value cache will not be updated.

return storageEngine.insertBatch(batchInsertPlan);

Integer[] results = storageEngine.insertBatch(batchInsertPlan);
batchInsertPlan.updateMNodeLastValues(node);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as before

SFW, JOIN, UNION, FILTER, GROUPBY, ORDERBY, LIMIT, SELECT, SEQTABLESCAN, HASHTABLESCAN,
MERGEJOIN, FILEREAD, NULL, TABLESCAN, UPDATE, INSERT, BATCHINSERT, DELETE, BASIC_FUNC, IN, QUERY, MERGEQUERY,
AGGREGATION, AUTHOR, FROM, FUNC, LOADDATA, METADATA, PROPERTY, INDEX, INDEXQUERY, FILL,
AGGREGATION, AUTHOR, FROM, FUNC, LOADDATA, METADATA, PROPERTY, INDEX, INDEXQUERY, FILL, LAST,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add it to the last, because you don't know whether the programmers are using the enum's ordinal() function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

Comment on lines +295 to +311
public void updateMNodeLastValues(MNode node) {
long maxTime = Long.MIN_VALUE;
int maxIndex = 0;
for (int i = 0; i < times.length; i++) {
if (times[i] > maxTime) {
maxTime = times[i];
maxIndex = i;
}
}
for (int i = 0; i < measurements.length; i++) {
if (node.hasChild(measurements[i])) {
Object[] column = (Object[]) columns[i];
node.getChild(measurements[i]).updateCachedLast(maxTime, column[maxIndex], dataTypes[i]);
}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not the InsertPlan's responsibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Moved it out of InsertPlan

Comment on lines +194 to +201
public void updateMNodeLastValues(MNode node) throws QueryProcessException {
for (int i = 0; i < measurements.length; i++) {
if (node.hasChild(measurements[i])) {
Object value = CommonUtils.parseValue(dataTypes[i], values[i]);
node.getChild(measurements[i]).updateCachedLast(time, value, dataTypes[i]);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@wshao08 wshao08 requested a review from JackieTien97 February 18, 2020 17:29
Copy link
Contributor

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still some places need to be changed.

UTs and ITs are needed.

You should also add documents in both EN and CH docs, tell us about the last grammar and show us some examples.

Comment on lines +295 to +302
long maxTime = Long.MIN_VALUE;
int maxIndex = 0;
for (int i = 0; i < times.length; i++) {
if (times[i] > maxTime) {
maxTime = times[i];
maxIndex = i;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No matter what measurementIndex is, this calculation process is the same. It only needs to be called once.
You can extract that method outside the the calling loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return;
if (cachedLastValuePair == null) {
cachedLastValuePair = new TimeValuePair(timeValuePair.getTimestamp(), timeValuePair.getValue());
} else if (timeValuePair.getTimestamp() > cachedLastValuePair.getTimestamp()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (timeValuePair.getTimestamp() > cachedLastValuePair.getTimestamp()) {
} else if (timeValuePair.getTimestamp() >= cachedLastValuePair.getTimestamp()) {

insert time-value 1-1
insert time-value 1-2
the vlaue of 1 should be updated to 2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

public TimeValuePair composeLastTimeValuePair(int measurementIndex) {
long maxTime = Long.MIN_VALUE;
int maxIndex = 0;
for (int i = 0; i < times.length; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the time in BatchInsertPlan is always in ascending order, you could directly get the last

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LastFunctionCallContext functionCallContext = ctx.lastFunctionCall();
Path path = parseSuffixPath(functionCallContext.suffixPath());
selectOp.addLastPath(path, functionCallContext.LAST().getText());
queryOp.setSelectOperator(selectOp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the sql needs update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sql syntax updated

lastQueryResultList.add(lastQueryResult);
}

RowRecord resultRecord = constructLastRowRecord(lastQueryResultList);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result format should be updated to path, time, value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Result set updated

return resultRecord;
}

class LastQueryResult {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this class is not needed, you could use TimeValuePair directly unless you store the path of a series in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in the new version

}

// construct series reader without value filter
Filter timeFilter = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this field and use null directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}
if (queryResult.hasResult())
node.setCachedLast(queryResult.getPairResult());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice: considering inserting a point while querying.

It's better to remove the setCachedLast and change updateCachedLast method to a synchronized one. Then, extend updateCachedLast with a parameter: boolean highPriority, which is to determine whether update the value if the time is equal to the previous cached one.

Then, in the normal inserting process, update the cache with high priority. Here, update the cache with low priority.
This could resolve the condition: there is an insert (10,11) when you get the last point (10,10) from disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

@wshao08 wshao08 closed this Feb 24, 2020
@wshao08 wshao08 deleted the new_series_reader branch February 24, 2020 08:33
@wshao08 wshao08 reopened this Feb 24, 2020
@wshao08 wshao08 changed the base branch from new_series_reader to master February 24, 2020 09:10

下面的例子中查询时间序列root.ln.wf01.wt01.status最近时间戳的数据:
```
select last(status) from root.ln.wf01.wt01 disable align
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

*/
protected String fullPath;

private TimeValuePair cachedLastValuePair = null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to LeafMNode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

return cachedLastValuePair;
}

public synchronized void updateCachedLast(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add javadoc for highPriorityUpdate


private List<Path> suffixList;
private List<String> aggregations;
private boolean hasLast;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private boolean hasLast;
private boolean lastQuery;

queryPlan = new AggregationPlan();
((AggregationPlan) queryPlan)
.setAggregations(queryOperator.getSelectOperator().getAggregations());
} else if (queryOperator.hasLast()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if (queryOperator.hasLast()) {
} else if (queryOperator.isLastQuery()) {

plan.setDataTypeConsistencyChecker(null);
}

private void getLastQueryHeaders(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember, doesn't the respColumns contain time column?

throw new QueryProcessException(e);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a blank line...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -178,6 +180,12 @@ public class StorageGroupProcessor {
* file.
*/
private Map<Long, Map<String, Long>> latestFlushedTimeForEachDevice = new HashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this to latestDeviceFlushedTimeInEachPartition distinguish with globalLatestFlushedTimeForEachDevice

latestFlushedTimeForEachDevice.computeIfAbsent(timePartitionId, id -> new HashMap<>())
.putIfAbsent(insertPlan.getDeviceId(), Long.MIN_VALUE);

long latestFlushedTime = globalLatestFlushedTimeForEachDevice.computeIfAbsent(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
long latestFlushedTime = globalLatestFlushedTimeForEachDevice.computeIfAbsent(
long globalLatestFlushTime = globalLatestFlushedTimeForEachDevice.computeIfAbsent(

Comment on lines +522 to +523
if (latestFlushedTime < insertPlan.getTime())
globalLatestFlushedTimeForEachDevice.put(insertPlan.getDeviceId(), insertPlan.getTime());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (latestFlushedTime < insertPlan.getTime())
globalLatestFlushedTimeForEachDevice.put(insertPlan.getDeviceId(), insertPlan.getTime());
if (latestFlushedTime < insertPlan.getTime()) {
globalLatestFlushedTimeForEachDevice.put(insertPlan.getDeviceId(), insertPlan.getTime());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, move these three lines to insertToTsFileProcessor(), make it updated at the same time as the lastFlushTimeForEachDevice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes moved, looks better now.

}
}

public void updateInsertPlanLast(InsertPlan plan, Long latestFlushedTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void updateInsertPlanLast(InsertPlan plan, Long latestFlushedTime)
public void tryToUpdateCache(InsertPlan plan, Long latestFlushedTime)


TimeValuePair resultPair = new TimeValuePair(Long.MIN_VALUE, null);

for (int i = seqFileResources.size() - 1; i >= 0; i--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to loop, get the last one directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}

// Update cached last value with low priority
((LeafMNode)node).updateCachedLast(resultPair, false, Long.MIN_VALUE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when I saw this line, I think maybe move the lastFlushTime out of the updateCachedLast mehod and check in the insertion is better

List<TsFileResource> seqFileResources = dataSource.getSeqResources();
List<TsFileResource> unseqFileResources = dataSource.getUnseqResources();

TimeValuePair resultPair = new TimeValuePair(Long.MIN_VALUE, null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TimeValuePair resultPair = new TimeValuePair(Long.MIN_VALUE, null);
TimeValuePair resultPair;

Copy link
Member

@qiaojialin qiaojialin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add design doc


## Last 缓存更新策略

Last缓存更新的逻辑位于`LeafNode`的`updateCachedLast`方法内,这里引入两个额外的参数`highPriorityUpdate`和`latestFlushTime`。`highPriorityUpdate`用来表示本次更新是否是高优先级的,新数据写入而导致的缓存更新都被认为是高优先级更新,而查询时更新缓存默认为低优先级更新。`latestFlushTime`用来记录当前已被写回到磁盘的数据的最大时间戳。
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Last缓存更新的逻辑位于`LeafNode``updateCachedLast`方法内,这里引入两个额外的参数`highPriorityUpdate``latestFlushTime``highPriorityUpdate`用来表示本次更新是否是高优先级的,新数据写入而导致的缓存更新都被认为是高优先级更新,而查询时更新缓存默认为低优先级更新。`latestFlushTime`用来记录当前已被写回到磁盘的数据的最大时间戳。
Last缓存更新的逻辑位于`LeafMNode``updateCachedLast`方法内,这里引入两个额外的参数`highPriorityUpdate``latestFlushTime``highPriorityUpdate`用来表示本次更新是否是高优先级的,新数据写入而导致的缓存更新都被认为是高优先级更新,而查询时更新缓存默认为低优先级更新。`latestFlushTime`用来记录当前已被写回到磁盘的数据的最大时间戳。

Copy link
Contributor

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to judge whether the chunkLoader is null, otherwise it will cause the file closed exception like before, please fix it.

Comment on lines +95 to +99
if (data.getChunkLoader() == null) {
TsFileSequenceReader tsFileSequenceReader =
FileReaderManager.getInstance().get(resource, resource.isClosed());
data.setChunkLoader(new DiskChunkLoader(tsFileSequenceReader));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the if judgement, it will cause the file closed null pointer exception.
like:
for (ChunkMetaData data : currentChunkMetaDataList) {
TsFileSequenceReader tsFileSequenceReader =
FileReaderManager.getInstance().get(resource, resource.isClosed());
data.setChunkLoader(new DiskChunkLoader(tsFileSequenceReader));
}

it's ok to do so. because tsFileSequenceReader has been cached.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Contributor

@samperson1997 samperson1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good design!

Copy link
Contributor

@JackieTien97 JackieTien97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qiaojialin qiaojialin merged commit 6404d6d into apache:master Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants