Lift the storage limit for tag and attribute management by linxt20 · Pull Request #12447 · apache/iotdb

linxt20 · 2024-04-29T09:28:26Z

The goal of this implementation is that for a tagmap that stores more than tagAttributeTotalSize, the program can support continuing to allocate a new space of the length of tagAttributeTotalSize and continue to store it.

Storage design:
- How to record the newly allocated space with a length of tagAttributeTotalSize: During tagmap serialization, identification is distinguished between tags that exceed tagAttributeTotalSize and those that do not exceed tagAttributeTotalSize. Currently, the identifier is considered to be the first read int value.
  - If this tagmap does not exceed tagAttributeTotalSize, it is based on the original storage format (mapsize key-value key-value). The first value is MapSize and is recorded as a positive number, 0, -1. (When map is null, mapsize is -1, map is not null, but when empty is true, mapsize is 0)
  - If this tagmap exceeds tagAttributeTotalSize, record the number of storage blocks it occupies offsetListNum, recorded as -offsetListNum. The format is -offsetlistnum offset1 ... offsetn. Note that offset1 records the offset of the second storage block. The offset of the starting block has been recorded in measurenode, so the number of offsets is 1 less than offsetlistnum. The follow-up is to record the MapSzie key-value key-value pairs according to the normal storage.
- In this way, the storage format of the old data is not modified, the storage of new data can be supported, and the original deserialization interface can be reused.
Implementation details:
- The read and readtag interfaces of taglogfile have been modified and simplified to only require offset parameters. On the one hand, the read length is no longer a fixed length, and the caller does not know how much space is needed. On the other hand, this taglogfile provides an external interface and requires relatively complete encapsulation. The specific length is something that the caller does not need to care about.
- Calculate the number of required offsets through the inequality Num * MAX_LENGTH < TotalMapSize + 4 + Long.BYTES * Num <= MAX_LENGTH * (Num + 1). There are at most two solutions within the range of the result. This one takes the smaller solution to save space. There are two solutions that are easy to understand, that is, just adding the offset will exceed the current block upper limit.
Current reading process:
- The current reading processes of read and readtag are very similar, except that read will read one more attribute, so the reading process is abstracted into a function parsebytebuffer.
- In this function, a blocksize data will be read first. According to the first int of blocksize, if the first int is greater than or equal to -1, then it is a block, otherwise it is multiple blocks. If it is a block, you can return after restoring the offset. If there are multiple blocks, you can create a very large bytebuffer that can accommodate all data based on offsetlistsize and put the data of the first block. Then after the first int, the 8-byte offset is continuously read, and after reading the block corresponding to the offset, it is put into the super large bytebuffer. After loop reading, the entire bytebuffer can be read out, and the offset of the bytebuffer is set before the tagmapsize to return.
Current writing process:
- Serialize the content: First calculate the actual length of the serialized content. If it does not exceed 1 blocksize, then store it according to the storage format of a block (also the old storage format). If the actual length exceeds 1 blocksize, then you need to reserve space for offsetlist and serialize the actual content behind it.
- Write content to file:
  - During the writing process, if the writing offset is a negative number, it means writing from the end.
  - First, you need to read the original data of the offset that needs to be written and obtain the space block position it occupies. The function parseoffsetlist function is used here to implement this process. Compared with parsebytebuffer, it has been pruned to a certain extent. It only needs to read the offset sequence of the sequence header, reducing the complexity of processing.
  - This is then divided into three situations for processing:
    - Situation 1: It turns out that there is only one block in this place, and now there is only one block to write, so just write it directly.
    - Situation 2: It turns out that the block occupied by this place is more than the amount of data to be written currently, so the original block needs to be reused at this time. Create a bytebuffer with the original blocksize, reserve the original offset space, and then put the currently written data into it. If there are multiple blocks now, remember to adjust the offset to skip the space reserved during serialization. Then you only need to write the corresponding blocks to the corresponding offset positions in sequence according to the original offset list.
    - Case 3: The space written now is greater than or equal to the original space. At this time, the space reserved during serialization is sufficient. However, since some offsets need to be written to obtain, the serialization results without offsets need to be written first according to the offset records of the original block. If the block is not enough, then continue writing at the end of the filechannel, and then record the offset position into the offset list. Finally, the data of the offset list is also combined into a buffer, and then written sequentially from the starting position according to the offset list. In this way, the entire writing process is completed.
The difficulty in this part is that the offset of the file will change with reading and writing. You need to always pay attention to the changes in the position and limit of the bytebuffer and make adjustments.
extra work
- Optimization of SRStatementGenerator: Found and fixed the non-standard implementation of reading tagmap in SRStatementGenerator. By setting the parsebytebuffer function to static, the SRStatementGenerator is allowed to be called directly, reducing modifications to the source code and avoiding inconsistency issues.
- Increase the number and size limits of Tags and Attributes: Restrictive parameters for the number of Tags and Attributes and their individual sizes are introduced. Currently, the number of supported tags and attributes has the same upper limit, rather than the upper limit of both together, which is relatively reasonable. Currently, this quantity and size restrictive parameter is implemented in the serialization of the writing process. During serialization, the serialization size of tagmap and attributemap needs to be calculated. Checks for mapsize and entrysize are added here to ensure that the checks are completed before writing and that the length will not be processed repeatedly.
- Add tagindex to the memory control of metadata: Detailed memory increase and decrease calculations are implemented for the addindex and removeindex functions in tagmanager. At the same time, the assignment and recycling of the storage structure are completed, and the basic memory calculation interface for the increase or decrease of index is completed. In the case of memory overflow, new tags cannot be created. Here, the corresponding tags are added in renameTagOrAttributeKey, setTagsOrAttributesValue, addTags, and upsertAliasAndTagsAndAttributes in schemaregionmemoryimpl and schemaregionpbtreeimpl when the corresponding tag is not empty. If memory overflows, execution will be refused and an error will be thrown.
- Fix memory control omissions: The original alisa is involved in memory control, but for the addition and modification of alisa, there is no check for memory overflow. That is to say, when the memory overflows, alisa can still be added normally, which will cause further memory overflow. The current modification is upsertAlias in schemaregionmemoryimpl and schemaregionpbtreeimpl, which checks whether the memory overflows when alisa is not null.

…ast bug

…te_management

JackieTien97

We also need to add some it for this.

...ore/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/tag/TagLogFile.java

...ore/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/tag/TagManager.java

JackieTien97 · 2024-04-30T01:30:52Z

iotdb-core/node-commons/src/main/java/org/apache/iotdb/commons/conf/CommonConfig.java

+  private int tagAttributeEachMaxNum = 20;
+  private int tagAttributeEachMaxSize = 100;


Add all of these two config into iotdb-common.properties in iotdb-core/node-commons/src/assembly/resources/conf/iotdb-common.properties and load these two in CommonDescriptor.loadCommonProps.

also remember to change the comments about tag_attribute_total_size in iotdb-core/node-commons/src/assembly/resources/conf/iotdb-common.properties

...ore/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/tag/TagLogFile.java

JackieTien97 · 2024-04-30T02:00:12Z

...ore/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/tag/TagLogFile.java

+      if (blockOffset.size()
+          > blockNumReal) { // if the original space is larger than the new space, the original


>= or >?

JackieTien97 · 2024-04-30T10:09:56Z

...ore/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/tag/TagLogFile.java

-        blockOffset.add(position);
        for (int i = 1; i < blockNum; i++) {
-          blockOffset.add(ReadWriteIOUtils.readLong(byteBuffers));
+          Long nextPosition = ReadWriteIOUtils.readLong(byteBuffers);


Suggested change

Long nextPosition = ReadWriteIOUtils.readLong(byteBuffers);

long nextPosition = ReadWriteIOUtils.readLong(byteBuffers);

JackieTien97 · 2024-04-30T10:10:54Z

...ore/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/tag/TagManager.java

@@ -181,18 +180,16 @@ public void addIndex(String tagKey, String tagValue, IMeasurementMNode<?> measur

    int memorySize = 0;


make it as long

JackieTien97 · 2024-04-30T10:11:02Z

...ore/datanode/src/main/java/org/apache/iotdb/db/schemaengine/schemaregion/tag/TagManager.java

@@ -212,20 +209,18 @@ public void removeIndex(String tagKey, String tagValue, IMeasurementMNode<?> mea
    // init memory size
    int memorySize = 0;


make it as long

linxt20 added 8 commits April 24, 2024 19:49

finish all logic and fix so many bugs，store and wait for fixing the l…

aa8e47c

…ast bug

fix the last bug so supporting the big many block to small many block

e988679

fix the format problem and finish test function in different version

2d5c311

fix the rude parse impl to using api , add NumLimit and Size Limit

b7fd8f8

finish memory control

e63b2dd

Merge branch 'master' into Lift_the_storage_limit_for_tag_and_attribu…

a280d4e

…te_management

format problem fix and comment change into english

cc22d04

finish memory test and format problem fix

1264cf6

JackieTien97 requested changes Apr 30, 2024

View reviewed changes

linxt20 added 2 commits April 30, 2024 17:12

pr review problem fixed and add IT

90599e4

add runwith

f5295d6

JackieTien97 requested changes Apr 30, 2024

View reviewed changes

new review problem fixed

d9486c2

JackieTien97 approved these changes Apr 30, 2024

View reviewed changes

JackieTien97 merged commit 7df7e5c into apache:master May 6, 2024

SzyWilliam pushed a commit to SzyWilliam/iotdb that referenced this pull request Nov 26, 2024

Lift the storage limit for tag and attribute management (apache#12447)

813fd5d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lift the storage limit for tag and attribute management#12447

Lift the storage limit for tag and attribute management#12447
JackieTien97 merged 11 commits intoapache:masterfrom
linxt20:Lift_the_storage_limit_for_tag_and_attribute_management

linxt20 commented Apr 29, 2024 •

edited

Loading

Uh oh!

JackieTien97 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackieTien97 Apr 30, 2024

Uh oh!

JackieTien97 Apr 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackieTien97 Apr 30, 2024

Uh oh!

JackieTien97 Apr 30, 2024

Uh oh!

JackieTien97 Apr 30, 2024

Uh oh!

JackieTien97 Apr 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		private int tagAttributeEachMaxNum = 20;
		private int tagAttributeEachMaxSize = 100;

		if (blockOffset.size()
		> blockNumReal) { // if the original space is larger than the new space, the original

	Long nextPosition = ReadWriteIOUtils.readLong(byteBuffers);
	long nextPosition = ReadWriteIOUtils.readLong(byteBuffers);

		@@ -181,18 +180,16 @@ public void addIndex(String tagKey, String tagValue, IMeasurementMNode<?> measur

		int memorySize = 0;

		@@ -212,20 +209,18 @@ public void removeIndex(String tagKey, String tagValue, IMeasurementMNode<?> mea
		// init memory size
		int memorySize = 0;

Conversation

linxt20 commented Apr 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackieTien97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackieTien97 Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

JackieTien97 Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JackieTien97 Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

JackieTien97 Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

JackieTien97 Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

JackieTien97 Apr 30, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

linxt20 commented Apr 29, 2024 •

edited

Loading