Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IOTDB-2740] Equal size bucket sampling UDFs: EQUAL_SIZE_BUCKET_RANDOM_SAMPLE, EQUAL_SIZE_BUCKET_AGG_SAMPLE, EQUAL_SIZE_BUCKET_M4_SAMPLE #5518

Merged
merged 18 commits into from
Apr 22, 2022

Conversation

ZhanGHanG9991
Copy link
Contributor

@ZhanGHanG9991 ZhanGHanG9991 commented Apr 13, 2022

Description

Add EqualBucketSample in UDF


This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

@ZhanGHanG9991 ZhanGHanG9991 changed the title Equalbucketsamplezh Add EqualBucketSample in UDF Apr 13, 2022
@ZhanGHanG9991 ZhanGHanG9991 marked this pull request as ready for review April 16, 2022 02:47
Comment on lines 58 to 62
double sum = 0;
for (int i = 0; i < windowSize; i++) {
sum += rowWindow.getRow(i).getInt(0) * 1.0 / windowSize;
}
collector.putDouble(time, sum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double sum = 0;
for (int i = 0; i < windowSize; i++) {
sum += rowWindow.getRow(i).getInt(0) * 1.0 / windowSize;
}
collector.putDouble(time, sum);
double sum = 0;
for (int i = 0; i < windowSize; i++) {
sum += rowWindow.getRow(i).getInt(0);
}
collector.putDouble(time, sum / windowSize);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改该部分实现


#### 等数量分桶聚合采样

采用聚合采样法对输入序列进行采样,用户需要另外提供一个聚合函数参数即
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

说一下每个桶输出的时间的行为(每个桶采样输出的时间戳为第一个点的时间戳)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,文档已修改

Comment on lines 348 to 355
double avg = 0, sum = 0;
for (int i = 0; i < windowSize; i++) {
avg += rowWindow.getRow(i).getInt(0) * 1.0 / windowSize;
}
for (int i = 0; i < windowSize; i++) {
sum += (rowWindow.getRow(i).getInt(0) - avg) * (rowWindow.getRow(i).getInt(0) - avg);
}
collector.putDouble(time, sum / windowSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double avg = 0, sum = 0;
for (int i = 0; i < windowSize; i++) {
avg += rowWindow.getRow(i).getInt(0) * 1.0 / windowSize;
}
for (int i = 0; i < windowSize; i++) {
sum += (rowWindow.getRow(i).getInt(0) - avg) * (rowWindow.getRow(i).getInt(0) - avg);
}
collector.putDouble(time, sum / windowSize);
double avg = 0, sum = 0;
for (int i = 0; i < windowSize; i++) {
avg += rowWindow.getRow(i).getInt(0);
}
avg /= windowSize;
for (int i = 0; i < windowSize; i++) {
int delta = rowWindow.getRow(i).getInt(0) - avg;
sum += delta * delta;
}
collector.putDouble(time, sum / windowSize);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改这部分的实现

@SteveYurongSu SteveYurongSu changed the title Add EqualBucketSample in UDF [IOTDB-2740] Equal size bucket sampling UDFs: EQUAL_SIZE_BUCKET_RANDOM_SAMPLE, EQUAL_SIZE_BUCKET_AGG_SAMPLE, EQUAL_SIZE_BUCKET_M4_SAMPLE Apr 21, 2022
Copy link
Member

@SteveYurongSu SteveYurongSu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SteveYurongSu SteveYurongSu merged commit dc344e5 into apache:master Apr 22, 2022
xinzhongtianxia pushed a commit to xinzhongtianxia/iotdb that referenced this pull request Apr 24, 2022
* remotes/upstream/master:
  Serialize measurement schema of insert node to wal entry (apache#5638)
  filter non schemaRegionDir (apache#5640)
  [IOTDB-2976] Add English and Chinese docs for count devices and count storage groups (apache#5635)
  change jenkins timeout from 2 hours to 3 hours
  [IOTDB-2740] Equal size bucket sampling UDFs: EQUAL_SIZE_BUCKET_RANDOM_SAMPLE, EQUAL_SIZE_BUCKET_AGG_SAMPLE, EQUAL_SIZE_BUCKET_M4_SAMPLE (apache#5518)
  Fix the issue that EndTime in FragmentInstanceContext is not set (apache#5636)
  fix concurrent bug of CachedMNodeContainer.putIfAbsent (apache#5632)
  [IOTDB-2880] Fix NPE occured in ci test (apache#5634)
  Fix CI (apache#5639)
  Add ColumnMerger to merge multipul input columns of same sensor into one column (apache#5630)
  Add block cancel when GetBlockTask throws exception (apache#5628)
  fix the bug when matching multi-wildcard in pattern tree (apache#5631)
  [IOTDB-2835]Fix empty page in selfcheck method of TsFileSequenceReader (apache#5552)
  Add FragmentInstanceStateMachine for FragmentInstance State change (apache#5615)
  [IOTDB-2880] Fix import check style (apache#5629)
  [IOTDB-2971] Fix sink handle memory leak (apache#5626)
  [rocksdb] updated the interface support (apache#5625)
  [IOTDB-2970] Code style: Avoid wildcard imports (apache#5622)
  [IOTDB-2880]Add procedure framework (apache#5477)
  [rocksdb] add rocksdb properties (apache#5588)

# Conflicts:
#	server/src/main/java/org/apache/iotdb/db/mpp/sql/planner/LocalExecutionPlanner.java
@ZhanGHanG9991 ZhanGHanG9991 deleted the equalbucketsamplezh branch April 26, 2022 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants