[IOTDB-144]meta data cache for query #262

little-emotion · 2019-07-18T02:27:15Z

To increase query speed, metadata needs to be cached, including TsFileMetaData and TsDeviceMetaData.
The files in IOTDB are organized according to time, so the query frequency of different files varies greatly. Cache metadatas of frequently queried files can reduce the time of reading metaData from disk. Besides, all sensors' metadata of a device is in TsDeviceMetaData. In order to read the metadata of a sensor, we need to read the entire TsDeviceMetaData. In the current version, querying multiple sensors in a device simultaneously need to read TsDeviceMetaData multiple times, so consider caching TsDeviceMetaData.

iotdb/src/main/java/org/apache/iotdb/db/engine/StorageEngine.java

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/DeviceMetaDataCache.java

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/TsFileMetaDataCache.java

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/TsFileMetadataUtils.java

iotdb/src/main/java/org/apache/iotdb/db/engine/storagegroup/StorageGroupProcessor.java

LeiRui

For code polish suggestions:

the creation of TsFileSequenceReader in the four classes of the resourceRelated package can be moved backward to just above the creation of ChunkLoaderImpl since DeviceMetaDataCache.getInstance().get instead of MetadataQuerierByFileImpl is used to prepare metaDataList.
The method TsFileMetadataUtils.getTsRowGroupBlockMetaData. Since the concept of rowgroup is out of date, why not name it getTsDeviceMetaData?

LeiRui · 2019-07-21T02:08:00Z

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/DeviceMetaDataCache.java

+        if (chunkMetaDataSize == 0 && !value.isEmpty()) {
+          chunkMetaDataSize = RamUsageEstimator.sizeOf(value.get(0));
+        }
+        return value.size() * chunkMetaDataSize + key.length() * 2;


Why +key.length() * 2 here?
It is ChunkMetaData not TsFileMetaData.

I mistake. It is right.

LeiRui · 2019-07-21T02:20:49Z

iotdb/src/main/java/org/apache/iotdb/db/engine/storagegroup/StorageGroupProcessor.java

+  /**
+   * This linked hash set records the access order of sensors used by query.
+   */
+  private LinkedHashSet<String> lruForSensorUsedInQuery = new LinkedHashSet<>();


Does LinkedHashSet has LRU function? Maybe you want to use LruLinkedHashMap but that is not set.

LeiRui · 2019-07-21T05:05:24Z

iotdb/src/main/java/org/apache/iotdb/db/engine/storagegroup/StorageGroupProcessor.java

@@ -458,6 +470,24 @@ public QueryDataSource query(String deviceId, String measurementId, QueryContext
    }
  }

+  /**
+   * returns the top k% measurements which are most frequently used in queries.


I don't think the current method behaves as this annotation reads.
The current lruForSensorUsedInQuery is LinkedHashSet, which only records the access order of sensors used by query.
Even some kind of LruLinkedHashSet is used, it can only return the first num oldeset sensors kept in the LRU, not the most frequently used ones.

For example, I query d0.s0, d0.s1, d1.s2, d0.s3, d1.s1, d2.s1, d3.s1, d4.s2. The sensor are s0,s1,s2,s3,s1,s1,s1,s2. Suppose num=3

LinkedHashSet
returns s0,s1,s2

some kind of LruLinkedHashSet. Suppose memory is big enough.
returns s0,s3,s1. The top 3 of most recently used measurements.

But the top 3 of most frequently used measurements should be s1,s2,s3/s0.

Maybe you should think twice about the design, most frequently used measurements or most recently used measurements? Or as I comment in the next review, most recently used device?

most frequenly used / most recently used + measurements / devices / device.measurement
So we have 6 possible plans:

most frequenly used measurements

most recently used measurements

most frequenly used devices

most recently used devices

most frequenly used device.measurement

most recently used device.measurement

It's a bug in code. I wanted to record most recently used measurements. So I have changed LinkedHashSet to LinkedList.

The load of the database will change over time, and the measurements of this query must be the recently inserted to lruForSensorUsedInQuery , so I prefer to use the recently used measurements.

When querying the metadata of a device, IOTDB will only query the metadata of the device query needs, and the metadata of other devices will not be queried. So we can only cache metadata of the devices used for querying, instead of caching hot devices.

Cartesian product of device and measurement is memory unacceptable

LeiRui · 2019-07-21T05:33:10Z

iotdb/src/main/java/org/apache/iotdb/db/engine/storagegroup/StorageGroupProcessor.java

@@ -447,6 +456,9 @@ public void putAllWorkingTsFileProcessorIntoClosingList() {
  // TODO need a read lock, please consider the concurrency with flush manager threads.
  public QueryDataSource query(String deviceId, String measurementId, QueryContext context) {
    insertLock.readLock().lock();
+    synchronized (lruForSensorUsedInQuery) {
+      lruForSensorUsedInQuery.add(measurementId);


Same as above, the current lruForSensorUsedInQuery is for measurementId only. My questions are:

Why not use deviceId.measurementId?
For example, root.vehicle.d0.s0 and root.vehicle.d1.s0 are different series. However, by the current code, only s0 is used. The hottest sensor doesn't equal the hottest series.

Take it further, why not use deviceId
and cache TsDeviceMetadata for every recently used device? (Or frequenly used device, but I prefer recently used device here. )
For example, I query d0.s0, d0.s1, d0.s2, d0.s3 and d0 is the most recently used device.
One benefit is that if the TsDeviceMetadata of file1.d0 is cached and file1.d0.s4 is not found in this TsDeviceMetadata, then file1 can be skipped safely with knowing that d0.s4 doesn't exist in this file. However, in the current pr, suppose that List<ChunkMetadata> of file1.d0.s1、file1.d0.s2、file1.d0.s3 is cached separately and file1.d0.s4 is not found in the cache, then file1 needs to be read again for d0.s4.

Cartesian product of device and measurement is memory unacceptable. If Cartesian product is not cached, the corresponding relationship between device and measurement will certainly be lost, which is unavoidable.

One reason for choosing to cache measurement metadata was that its replacement granularity was smaller than caching TsDeviceMetaData. Correspondingly, it is more friendly to the application scenario of querying only a small part of measurement for each device. In addition, TsFileMetaData can also filter out the tsfile without a measurement. But when some devices in tsfile have the measurements, and others don't, the problem you're talking about may arise.Both ways have their own shortcomings.

LeiRui

Two small modification suggestions.

LeiRui · 2019-07-23T06:50:11Z

iotdb/src/main/java/org/apache/iotdb/db/engine/storagegroup/StorageGroupProcessor.java

@@ -147,6 +150,12 @@
   */
  private ModificationFile mergingModification;

+  /**
+   * This linked hash set records the access order of sensors used by query.


linked list

LeiRui · 2019-07-23T06:50:42Z

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/DeviceMetaDataCache.java

   */
-  private LinkedHashMap<String, TsDeviceMetadata> lruCache;
+  private LruLinkedHashMap<String, List<ChunkMetaData>> lruCache;

  private AtomicLong cacheHintNum = new AtomicLong();


Hit or hint?

hit.It's a typo.

LeiRui

Here is another cacheHitNum.

LeiRui · 2019-07-23T08:17:37Z

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/TsFileMetaDataCache.java

   */
-  private ConcurrentHashMap<String, TsFileMetaData> cache;
+  private LruLinkedHashMap<String, TsFileMetaData> cache;
  private AtomicLong cacheHintNum = new AtomicLong();


LeiRui · 2019-07-23T08:44:44Z

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/DeviceMetaDataCache.java

      TsDeviceMetadata blockMetaData = TsFileMetadataUtils
-          .getTsRowGroupBlockMetaData(filePath, deviceId,
+          .getTsDeviceMetaData(filePath, seriesPath.getDevice(),


seriesPath.getMeasurement can be used in the getTsDeviceMetaData to help filtering by adding the following logic:

if (!fileMetaData.getMeasurementSchema().containsKey(measurementId)) { return null; }

LeiRui

Let me sum up the latest four reviews for your convenience:

annotation: linked list, not set (StorageGroupProcessor)
name: cacheHitNum not cacheHintNum (TsFileMetaDataCach and DeviceMetaDataCache are covered.)
logic: If metaDataList size equals 0, then the creation of TsFileSequenceReader can be skipped. (UnseqResourceMergeReader and UnseqResourceReaderByTimestamp are covered.)
logic: include the filter of measurement in the TsFileMetadataUtils.getTsDeviceMetaData

LeiRui · 2019-07-23T08:59:20Z

...src/main/java/org/apache/iotdb/db/query/reader/resourceRelated/UnseqResourceMergeReader.java

        metaDataList = tsFileResource.getChunkMetaDatas();
      }

      // create and add ChunkReader with priority
+      TsFileSequenceReader tsFileReader = FileReaderManager.getInstance()
+          .get(tsFileResource.getFile().getPath(), tsFileResource.isClosed());
      ChunkLoaderImpl chunkLoader = new ChunkLoaderImpl(tsFileReader);
      for (ChunkMetaData chunkMetaData : metaDataList) {


If metaDataList size equals 0, then the creation of TsFileSequenceReader can be skipped.

qiaojialin · 2019-07-23T11:30:20Z

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/LruLinkedHashMap.java

+/**
+ * This class is an LRU cache. <b>Note: It's not thread safe.</b>
+ */
+public abstract class LruLinkedHashMap<K, V> extends LinkedHashMap<K, V> {


Suggested change

public abstract class LruLinkedHashMap<K, V> extends LinkedHashMap<K, V> {

public abstract class LRULinkedHashMap<K, V> extends LinkedHashMap<K, V> {

@qiaojialin actually I don't like the function suggested change. Look here, you changed the class name but forget to change the file name, which Suyue doesn't notice too.
I personally prefer you just point out where the problem is instead of changing it for her.

qiaojialin · 2019-07-23T11:32:28Z

iotdb/src/main/java/org/apache/iotdb/db/engine/cache/RamUsageEstimator.java

+
+/**
+ * This class is copied from apache lucene, version 4.6.1. Estimates the size(memory representation)
+ * of Java objects. https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.6.1/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java


This should be listed in the License file

meta data cache for query

201d782

jt2594838 requested changes Jul 18, 2019

View reviewed changes

little-emotion added 2 commits July 18, 2019 22:37

modify pr review

3101686

merge

5c90550

jt2594838 approved these changes Jul 19, 2019

View reviewed changes

LeiRui requested changes Jul 20, 2019

View reviewed changes

LeiRui requested changes Jul 21, 2019

View reviewed changes

little-emotion added 2 commits July 23, 2019 10:00

modify pr review

dbdfc04

merge master

3c565fc

LeiRui requested changes Jul 23, 2019

View reviewed changes

LeiRui reviewed Jul 23, 2019

View reviewed changes

qiaojialin reviewed Jul 23, 2019

View reviewed changes

little-emotion added 2 commits July 23, 2019 21:23

modify pr review

c7db801

Merge remote-tracking branch 'origin/master' into matadata_cache

20f41ae

LeiRui approved these changes Jul 23, 2019

View reviewed changes

rename LruLinkedHashMap.java to LRULinkedHashMap.java

6d4fca1

qiaojialin merged commit 126eac7 into master Jul 25, 2019

qiaojialin deleted the matadata_cache branch July 25, 2019 01:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[IOTDB-144]meta data cache for query #262

[IOTDB-144]meta data cache for query #262

little-emotion commented Jul 18, 2019

LeiRui left a comment •

edited

LeiRui Jul 21, 2019

LeiRui Jul 22, 2019

LeiRui Jul 21, 2019

little-emotion Jul 23, 2019

LeiRui Jul 21, 2019 •

edited

little-emotion Jul 23, 2019

LeiRui Jul 21, 2019 •

edited

little-emotion Jul 23, 2019

LeiRui left a comment

LeiRui Jul 23, 2019

little-emotion Jul 23, 2019

LeiRui Jul 23, 2019

little-emotion Jul 23, 2019

LeiRui left a comment

LeiRui Jul 23, 2019

little-emotion Jul 23, 2019

LeiRui Jul 23, 2019

little-emotion Jul 23, 2019

LeiRui left a comment

LeiRui Jul 23, 2019

little-emotion Jul 23, 2019

qiaojialin Jul 23, 2019

little-emotion Jul 23, 2019

LeiRui Jul 24, 2019

qiaojialin Jul 23, 2019

little-emotion Jul 23, 2019

	public abstract class LruLinkedHashMap<K, V> extends LinkedHashMap<K, V> {
	public abstract class LRULinkedHashMap<K, V> extends LinkedHashMap<K, V> {

[IOTDB-144]meta data cache for query #262

[IOTDB-144]meta data cache for query #262

Conversation

little-emotion commented Jul 18, 2019

LeiRui left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeiRui Jul 21, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeiRui Jul 21, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeiRui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeiRui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeiRui left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeiRui left a comment •

edited

LeiRui Jul 21, 2019 •

edited

LeiRui Jul 21, 2019 •

edited