-
Notifications
You must be signed in to change notification settings - Fork 971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IOTDB-1140 optimize regular data encoding #2621
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@jixuan1989 @qiaojialin hi , could you please review this PR? |
SonarCloud Quality Gate failed. |
qiaojialin
approved these changes
Feb 5, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
yuqi1129
reviewed
Feb 6, 2021
import org.apache.iotdb.tsfile.common.conf.TSFileDescriptor; | ||
import org.junit.*; | ||
|
||
import java.io.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's no advice to import star in java
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
current regular data encoding algorithm:
Calculate the difference between two adjacent values. The smallest difference is used as the equal-frequency frequency.
Determine the data range of this batch of data based on the difference between the last value and the first value.
Traverse this batch of data, use a BitSet, compare the difference between two adjacent values with the same frequency, and save the value true by default,
If the value is not equal to the equal frequency, calculate the number of equal frequency differences and set the value to false at the corresponding position, indicating that the point is a missing point.
this algorithm only can identity missing point, if have error point , it will throw exception..
because BitSet only can do this thing, indicates whether the same frequency exists in a segment of data
But there is some optimize point..
If there is an abnormal value in a column of values, the algorithm is deviated if the difference is directly obtained to the minimum value.
sample: 1000,1100,1800,1400,1500...
current algorithm be do not use...
1800 is a error point, we should identity error point, revise data.
revise data should be : 1000,1100,1300,1400,1500
After discussion , solution: