[IOTDB-386] Vectorize the raw data query process #652

qiaojialin · 2019-12-13T11:50:18Z

Refactor the file reader and chunk reader organization in TsFile.
Implement IBatchReader in SeriesReaderWithoutValueFilter
Add a NewUnseqResourceMergeReader that only loads chunks when needed, not loading all
chunks at first. @Zesong Sun
Return TSQueryDataSet in the first query request. @dawei Liu
Avoid constructing RowRecord in NewEngineDataSetWIthoutValueFilter. @yuan Tian
Move the limit&slimit from client to the server @LEI Rui

The class name start with Old is for Aggregation and Group By. I leave them for future work.

The performance evaluation:

In my mac, one device, ten sensors, long datatype, RLE & SNAPPY. Each time series contains 1 billion data points. Total 562M data on disk.

I query the raw data of ten time series with a time filter that results in 10 million points in total.

master

f_batch_reader

* next to next batch in SeriesReaderWithoutValueFilter

* move fill data to executeQueryStatement

* add fill buffer in EngineDataSetWithoutValueFilter

* change UnseqResourceMergeReader to IBatchReader

* Annote some codes to avoid complication errors

remove hasNext() in BatchData

JackieTien97 · 2019-12-24T07:41:14Z

server/src/main/java/org/apache/iotdb/db/query/dataset/NewEngineDataSetWithoutValueFilter.java

+
+      // add value buffer of current series
+      ByteBuffer valueBuffer = ByteBuffer.allocate(valueBAOSList[seriesIndex].size());
+      valueBuffer.put(valueBAOSList[seriesIndex].toByteArray());


use getBuf() instead of toByteArray()

JackieTien97 · 2019-12-24T07:41:25Z

server/src/main/java/org/apache/iotdb/db/query/dataset/NewEngineDataSetWithoutValueFilter.java

+
+      // add bitmap buffer of current series
+      ByteBuffer bitmapBuffer = ByteBuffer.allocate(bitmapBAOSList[seriesIndex].size());
+      bitmapBuffer.put(bitmapBAOSList[seriesIndex].toByteArray());


use getBuf() instead of toByteArray()

JackieTien97 · 2019-12-24T07:58:06Z

server/src/main/java/org/apache/iotdb/db/query/externalsort/ExternalSortJobScheduler.java

+  synchronized long genTaskId() {
+   taskId++;
+   return taskId;


If the use of this class is only to generate a global unique taskId for external sort job, why don't we use a AtomicLong variable, and there is no need to use synchronized, it is too heavy.

JackieTien97 · 2019-12-24T08:01:09Z

server/src/main/java/org/apache/iotdb/db/query/fill/IFill.java

@@ -30,6 +30,7 @@
 import org.apache.iotdb.tsfile.read.common.Path;
 import org.apache.iotdb.tsfile.read.filter.TimeFilter;
 import org.apache.iotdb.tsfile.read.filter.basic.Filter;
+import org.apache.iotdb.tsfile.read.reader.IBatchReader;


delete this import, it isn't used.

JackieTien97 · 2019-12-24T08:02:11Z

server/src/main/java/org/apache/iotdb/db/query/reader/chunkRelated/CachedDiskChunkReader.java


 public class CachedDiskChunkReader implements IPointReader {

-  private ChunkReader chunkReader;
+  private AbstractChunkReader AbstractChunkReader;


The first letter of the variable should be lowercase.

JackieTien97 · 2019-12-24T08:07:45Z

server/src/main/java/org/apache/iotdb/db/query/reader/chunkRelated/ChunkReaderWrap.java

+  public IBatchReader getIBatchReader() throws IOException {
+    if (type.equals(ChunkReaderType.DISK_CHUNK)) {
+      Chunk chunk = chunkLoader.getChunk(chunkMetaData);
+      AbstractChunkReader AbstractChunkReader = new ChunkReader(chunk, filter);
+      return new DiskChunkReader(AbstractChunkReader);


It seems this method is not used.

JackieTien97 · 2019-12-24T08:09:15Z

server/src/main/java/org/apache/iotdb/db/query/reader/chunkRelated/DiskChunkReader.java


-  private ChunkReader chunkReader;
+  private AbstractChunkReader AbstractChunkReader;


The first letter of variable should be lowercase

JackieTien97 · 2019-12-24T08:12:34Z

server/src/main/java/org/apache/iotdb/db/query/reader/chunkRelated/DiskChunkReader.java

+  @Override
+  public boolean hasNextBatch() throws IOException {
+    return false;
+  }
+
+  @Override
+  public BatchData nextBatch() throws IOException {
+    return null;
+  }
+


The fake implementation is for future use? Why should the DiskChunkReader implement the IBatchReader?

We could remove this fake implementation currently. I plan to remove TimeValuePair in anywhere in the future.

JackieTien97 · 2019-12-24T08:17:08Z

...in/java/org/apache/iotdb/db/query/reader/resourceRelated/CachedUnseqResourceMergeReader.java

@@ -35,8 +35,8 @@ public CachedUnseqResourceMergeReader(List<Chunk> chunks, TSDataType dataType)
    super(dataType);
    int priorityValue = 1;
    for (Chunk chunk : chunks) {
-      ChunkReader chunkReader = new ChunkReaderWithoutFilter(chunk);
-      addReaderWithPriority(new CachedDiskChunkReader(chunkReader), priorityValue++);
+      AbstractChunkReader AbstractChunkReader = new ChunkReader(chunk, null);


The first letter of variable should be lowercase

JackieTien97

Excellent job! Now the structure of querying is more clear. And without constructing useless row record object, the performance does improve.

JackieTien97 · 2019-12-24T09:27:24Z

.../main/java/org/apache/iotdb/db/query/reader/resourceRelated/NewUnseqResourceMergeReader.java

+    chunkMetaDataList = chunkMetaDataList.stream()
+        .sorted(Comparator.comparing(ChunkMetaData::getStartTime)).collect(Collectors.toList());


There is no need to use stream().sorted, you can directly use chunkMetaDataList.sort(Comparator<? extends T>)

JackieTien97 · 2019-12-24T09:30:55Z

.../main/java/org/apache/iotdb/db/query/reader/resourceRelated/NewUnseqResourceMergeReader.java

+  public BatchData nextBatch() throws IOException {
+    return batchData;
+  }


If I keep calling nextBatch() without calling hasNextBatch(), I always get the same batchData which is obviously counterintuitive. At least, you should add a annotation to illustrate the relationship between nextBatch() and hasNextBatch().

Fixed, Lei Rui also plans to standard the semantic of next and hasNext in another PR. Therefore, I just fix this class and leave others.

JackieTien97 · 2019-12-24T10:54:00Z

server/src/main/java/org/apache/iotdb/db/tools/watermark/WatermarkEncoder.java

+  public int encodeInt(int value, long time);
+
+  public long encodeLong(long value, long time);
+
+  public float encodeFloat(float value, long time);
+
+  public double encodeDouble(double value, long time);


no need to add public

JackieTien97 · 2019-12-24T12:08:02Z

server/src/main/java/org/apache/iotdb/db/utils/MergeUtils.java

@@ -101,10 +101,10 @@ public static long collectFileSizes(List<TsFileResource> seqFiles, List<TsFileRe
  }

  public static int writeChunkWithoutUnseq(Chunk chunk, IChunkWriter chunkWriter) throws IOException {
-    ChunkReader chunkReader = new ChunkReaderWithoutFilter(chunk);
+    AbstractChunkReader AbstractChunkReader = new ChunkReader(chunk, null);


same as above

JackieTien97 · 2019-12-24T12:16:05Z

session/src/main/java/org/apache/iotdb/session/Session.java

+    this(host, port, Config.DEFAULT_USER, Config.DEFAULT_PASSWORD, 10000);
  }

  public Session(String host, String port, String username, String password) {
-    this(host, Integer.parseInt(port), username, password);
+    this(host, Integer.parseInt(port), username, password, 10000);
  }

  public Session(String host, int port, String username, String password) {
+    this(host, port, username, password, 10000);


All these magic number '10000' should use a constant variable to replace

move to Config in Session module

JackieTien97 · 2019-12-24T12:19:43Z

tsfile/src/main/java/org/apache/iotdb/tsfile/read/reader/series/AbstractFileSeriesReader.java

+
+  protected IChunkLoader chunkLoader;
+  protected List<ChunkMetaData> chunkMetaDataList;
+  protected AbstractChunkReader AbstractChunkReader;


lowercase it

JackieTien97 · 2019-12-24T12:21:50Z

...le/src/main/java/org/apache/iotdb/tsfile/read/reader/series/FileSeriesReaderByTimestamp.java

@@ -40,7 +40,7 @@
  protected List<ChunkMetaData> chunkMetaDataList;
  private int currentChunkIndex = 0;

-  private ChunkReader chunkReader;
+  private AbstractChunkReader AbstractChunkReader;


lowercase it

jdbc/src/main/java/org/apache/iotdb/jdbc/IoTDBPreparedInsertionStatement.java

jdbc/src/main/java/org/apache/iotdb/jdbc/IoTDBStatement.java

server/src/assembly/resources/conf/iotdb-engine.properties

server/src/test/java/org/apache/iotdb/db/engine/merge/MergeOverLapTest.java

server/src/test/java/org/apache/iotdb/db/query/reader/universal/PriorityMergeReaderTest.java

server/src/test/java/org/apache/iotdb/db/query/reader/universal/PriorityMergeReaderTest2.java

server/src/test/java/org/apache/iotdb/db/writelog/recover/UnseqTsFileRecoverTest.java

JackieTien97

I think it's ok for me.

…odules

samperson1997

Hi, I think everything is ok for me and I also test the raw data query process including both sequence data and unsequence data with some scripts. I really look forward to the following steps of query process optimization and codes refactor. 🎉

samperson1997 · 2019-12-26T01:38:52Z

example/jdbc/src/main/java/org/apache/iotdb/JDBCExample.java

@@ -24,6 +24,7 @@
 import java.sql.ResultSetMetaData;
 import java.sql.SQLException;
 import java.sql.Statement;
+import org.apache.iotdb.jdbc.IoTDBStatement;


This import is unused. Maybe you can remove it.

samperson1997 · 2019-12-26T01:41:29Z

jdbc/src/main/java/org/apache/iotdb/jdbc/IoTDBQueryResultSet.java

   * judge whether the specified column value is null in the current position
+   *
   * @param index column index
-   * @return
   */


This java doc is not consistent with the function

samperson1997 · 2019-12-26T01:51:06Z

server/src/main/java/org/apache/iotdb/db/query/reader/chunkRelated/DiskChunkReader.java

 * <p>
 * Note that <code>ChunkReader</code> is an abstract class with three concrete classes, two of which
 * are used here: <code>ChunkReaderWithoutFilter</code> and <code>ChunkReaderWithFilter</code>.
 * <p>
- * This class is used in {@link org.apache.iotdb.db.query.reader.resourceRelated.UnseqResourceMergeReader}.
+ * This class is used in {@link NewUnseqResourceMergeReader}.


This class is actually used in ChunkReaderWrap(This may be deleted as other files too)

samperson1997 · 2019-12-26T01:52:45Z

.../main/java/org/apache/iotdb/db/query/reader/resourceRelated/NewUnseqResourceMergeReader.java

+import java.util.ArrayList;
+import java.util.Comparator;
+import java.util.List;
+import java.util.stream.Collectors;


This unused import could be removed

samperson1997 · 2019-12-26T01:53:34Z

.../main/java/org/apache/iotdb/db/query/reader/resourceRelated/NewUnseqResourceMergeReader.java

+   * Create a ChunkReader with priority for each ChunkMetadata and put the ChunkReader to
+   * mergeReader one by one
+   */
+  @Override public boolean hasNextBatch() throws IOException {


This could be in two lines

Suggested change

@Override public boolean hasNextBatch() throws IOException {

@Override

public boolean hasNextBatch() throws IOException {

samperson1997 · 2019-12-26T02:09:35Z

tsfile/src/main/java/org/apache/iotdb/tsfile/read/reader/series/FileSeriesReader.java

-
-  public void skipPageData() {
-    chunkReader.skipPageData();
+    return filter.satisfy(chunkMetaData.getStatistics());


Method chunkSatisfied could changed to:

Suggested change

return filter.satisfy(chunkMetaData.getStatistics());

return filter == null || filter.satisfy(chunkMetaData.getStatistics());

samperson1997 · 2019-12-26T02:29:46Z

.../main/java/org/apache/iotdb/db/query/reader/resourceRelated/NewUnseqResourceMergeReader.java

+  private Filter timeFilter;
+  private int index = 0; // used to index current metadata in metaDataList
+
+  private static final int DEFAULT_BATCH_DATA_SIZE = 10000;


Should this be moved into the config file so that user could customized config it?

there is a parameter called aggregate_fetch_size, which can be used

samperson1997 · 2019-12-26T02:32:00Z

...main/java/org/apache/iotdb/db/query/reader/seriesRelated/SeriesReaderWithoutValueFilter.java

+  // cache batch data for unsequence reader
+  private BatchData unseqBatchData;
+
+  private static final int DEFAULT_BATCH_DATA_SIZE = 10000;


fixed as above

qiaojialin and others added 30 commits December 9, 2019 15:01

add batch reader interfaces

998eec6

add batch in dataset

6cc93e2

Merge branch 'master' into f_batch_reader

383af48

modify docs in SeriesReaderWithoutValueFilter

903ee78

Next2next bacth (#643)

75c4720

* next to next batch in SeriesReaderWithoutValueFilter

Add data to TSExecuteStatementResp (#631)

91ec286

* move fill data to executeQueryStatement

remove queryId2Plan

911f671

add fill buffer in EngineDataSetWithoutValueFilter (#646)

05f33f0

* add fill buffer in EngineDataSetWithoutValueFilter

[IOTDB-330] Improve the reading method of unsequence data (#619)

fbdb238

* change UnseqResourceMergeReader to IBatchReader

Fix bug of "Has not execute query" error when querying (#656)

83ac450

Original query process (#653)

79fb8d5

* Annote some codes to avoid complication errors

resolve conflict after merging master

787faff

fix fillBuffer bug in EngineDataSetWithoutValueFilter

ee9aa4b

fix fillBuffer bug in EngineDataSetWithoutValueFilter

5b121fe

fix tsfile bug and PriorityMergeReader bug

a5aa240

Merge branch 'master' into f_batch_reader

b7e5dc4

fix EngineDataSet next and hasnext

af28ac3

fix valuefilter

774c3ba

add aggregation reader

542889d

fix groupby device raw data query bug

91ed01b

fix a bug about queryId in TSServiceImpl.createQueryDataSet (#665)

8435405

[Query refactor] remove hasNext() in BatchData (#668)

64136da

remove hasNext() in BatchData

merge master

c27e968

Update SeriesReaderWithoutValueFilter.java

4918325

fix NewUnseqResourceMergeReader bug

a319022

Merge remote-tracking branch 'origin/f_batch_reader' into f_batch_reader

a2ac7a1

fix IoTDBSeriesReaderIT test bug

b154019

fix overlap bug and add some test

d37800f

fix unseq overlap bug and add test

44bf76f

remove nextChunkStartTime in NewUnseqResourceMergeReader

b6be76e

JackieTien97 reviewed Dec 24, 2019

View reviewed changes

qiaojialin added 3 commits December 24, 2019 17:26

fix review of Yuan Tian

de7dbcb

fix watermark

5788151

fix mock

028c678

JackieTien97 requested changes Dec 24, 2019

View reviewed changes

qiaojialin added 2 commits December 24, 2019 21:36

Merge remote-tracking branch 'origin/master' into f_batch_reader

8071c1d

fix SessionDataSet

0fb4d34

qiaojialin changed the title ~~[386] Vectorize the raw data query process~~ [IOTDB-386] Vectorize the raw data query process Dec 25, 2019

add -am in sonar test

083a171

mdf369 reviewed Dec 25, 2019

View reviewed changes

qiaojialin and others added 2 commits December 25, 2019 17:08

fix sonar in trabis

6512fc1

add mvn generate-sources for code coverage test

d5d430d

JackieTien97 approved these changes Dec 25, 2019

View reviewed changes

xiangdong huang and others added 4 commits December 25, 2019 20:21

fix code-coverage stage does not compile service-rpc and some other m…

ccbb771

…odules

split coveralls and code-coverage

9030856

fix dongfang's review

cfc933c

Merge remote-tracking branch 'origin/f_batch_reader' into f_batch_reader

5144cfd

mdf369 approved these changes Dec 25, 2019

View reviewed changes

samperson1997 approved these changes Dec 26, 2019

View reviewed changes

fix Zesong Sun's review

837aaa5

qiaojialin merged commit f0f229d into master Dec 26, 2019

qiaojialin deleted the f_batch_reader branch December 26, 2019 03:30


		private ChunkReader chunkReader;
		private AbstractChunkReader AbstractChunkReader;

		chunkMetaDataList = chunkMetaDataList.stream()
		.sorted(Comparator.comparing(ChunkMetaData::getStartTime)).collect(Collectors.toList());

	@Override public boolean hasNextBatch() throws IOException {
	@Override
	public boolean hasNextBatch() throws IOException {

	return filter.satisfy(chunkMetaData.getStatistics());
	return filter == null \|\| filter.satisfy(chunkMetaData.getStatistics());

[IOTDB-386] Vectorize the raw data query process #652

[IOTDB-386] Vectorize the raw data query process #652

Conversation

qiaojialin commented Dec 13, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackieTien97 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JackieTien97 left a comment

Choose a reason for hiding this comment

samperson1997 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qiaojialin commented Dec 13, 2019 •

edited