Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft RFC for support nullable_number_type. #4158

Closed
wants to merge 39 commits into from
Closed

Draft RFC for support nullable_number_type. #4158

wants to merge 39 commits into from

Conversation

wangzhen11aaa
Copy link
Contributor

Signed-off-by: wang.zhen wangzhenaaa7@gmail.com

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Draft RFC for support nullable_number_type.

Changelog

  • Bug Fix

Related Issues

#4079

Fixes #issue

Test Plan

Unit Tests

Stateless Tests

Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
@databend-bot databend-bot added the pr-bugfix this PR patches a bug in codebase label Feb 15, 2022
@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@vercel
Copy link

vercel bot commented Feb 15, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/databend/databend/FvAHo2NUBKXEvqtaEnFAZjpFfEcR
✅ Preview: https://databend-git-fork-wangzhen11aaa-fsupportnullabl-9f5602-databend.vercel.app

[Deployment for 581e339 canceled]

@wangzhen11aaa
Copy link
Contributor Author

@sundy-li I have wrote part of the process codes, is this the idea right?

Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
/// Type promote to the nearest higher type.
/// UInt8 -> UInt16 -> UInt32 -> UInt64 -> bitmap<128>
/// Int8 -> Int16 -> Int32 -> Int64 -> bitmap<128>
pub(crate) fn _type_promotion(column: &ColumnRef, _datatype: TypeID) -> Option<TypeID> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have construct_numeric_type && numeric_byte_size.

Copy link
Contributor Author

@wangzhen11aaa wangzhen11aaa Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have construct_numeric_type && numeric_byte_size.
oops, duplicated ..
But, I have a question. IMHO,
numeric_byte_size is some method bound with concrete type, But this method returns a dyn trait. The number_byte_size method will be used via a type downcast, which I think is not a good solution. what?do you think?
@sundy-li

@mergify
Copy link
Contributor

mergify bot commented Feb 15, 2022

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
Signed-off-by: wang.zhen <wangzhenaaa7@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Feb 20, 2022

Codecov Report

Merging #4158 (581e339) into main (8d7e856) will decrease coverage by 0%.
The diff coverage is 16%.

Impacted file tree graph

@@          Coverage Diff           @@
##            main   #4158    +/-   ##
======================================
- Coverage     60%     60%    -1%     
======================================
  Files        804     804            
  Lines      40918   41062   +144     
======================================
+ Hits       24887   24908    +21     
- Misses     16031   16154   +123     
Impacted Files Coverage Δ
common/datavalues/src/columns/group_hash.rs 36% <0%> (-24%) ⬇️
common/datavalues/src/columns/series.rs 67% <0%> (-8%) ⬇️
common/datavalues/src/macros.rs 68% <ø> (ø)
...ommon/datavalues/src/types/deserializations/mod.rs 0% <0%> (ø)
.../datavalues/src/types/deserializations/nullable.rs 0% <0%> (ø)
...ry/src/pipelines/transforms/group_by/aggregator.rs 83% <ø> (ø)
...datablocks/src/kernels/data_block_group_by_hash.rs 58% <25%> (-24%) ⬇️
...mmon/datablocks/src/kernels/data_block_group_by.rs 46% <50%> (-1%) ⬇️
common/management/src/cluster/cluster_mgr.rs 77% <0%> (-2%) ⬇️
metasrv/src/network.rs 97% <0%> (-2%) ⬇️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d7e856...581e339. Read the comment docs.

} else {
column.data_type()
};
if _typ.data_type_id().is_integer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to rename _typ


/// Init the nullable part's offset in bytes. It follows the values part.
#[inline]
fn init_nullable_offset_via_fields(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this function return Result<usize>

Copy link
Contributor Author

@wangzhen11aaa wangzhen11aaa Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this function return Result<usize>

I see some codes use this kind of initialization as c style. Using Result makes codes more healthy, it is a better way.

@@ -36,6 +37,17 @@ pub trait TypeDeserializer: Send + Sync {
fn de(&mut self, reader: &mut &[u8]) -> Result<()>;
fn de_default(&mut self);
fn de_batch(&mut self, reader: &[u8], step: usize, rows: usize) -> Result<()>;
fn de_batch_with_nullable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only for nullable, so it is not must to put this into trait.

Copy link
Contributor Author

@wangzhen11aaa wangzhen11aaa Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only for nullable, so it should not to put this into trait.
Yes, your idea is valuable.

// todo, We should get the length of missed bytes which caused
// by self.inner.de method.
if reader[null_offset - 1] & 1 == 1 {
self.bitmap.push(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.bitmap.push(reader[null_offset - 1] & 1 != 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.bitmap.push(reader[null_offset - 1] & 1 != 1)

That's cool.

@@ -31,9 +32,17 @@ impl DataBlock {
let mut group_key_len = 0;
for col in column_names {
let column = block.try_column_by_name(col)?;
let typ = column.data_type();
let typ = if column.is_nullable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use remove_nullable directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use remove_nullable directly.

remove_nullable is c~.

remove_nullable(&column.data_type())
} else {
column.data_type()
};
if typ.data_type_id().is_integer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can support is_numeric.

}

#[inline]
fn check_group_columns_has_nullable(group_columns: &[&ColumnRef]) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_columns.any(|c| c.is_nullable())


/// Init the nullable part's offset in bytes. It follows the values part.
#[inline]
fn init_nullable_offset(group_keys: &[&ColumnRef]) -> Result<usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_keys.iter().map(|c| c.data_type().numeric_byte_size().unwrap()).sum()

@@ -305,3 +427,43 @@ fn build(
}
Ok(())
}

#[inline]
fn build_keys_with_nullable_column(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_keys_with_nullable_column build merge into one method.

}
Ok(group_keys)
}

fn group_by_get_indices(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove group_by && group_by_get_indices

if size == mem_size {
if col.is_nullable() {
let writer = unsafe { writer.add(*offsize) };
Series::fixed_hash_with_nullable(col, writer, step, *null_offset)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dowcast to Nullablecolumn and apply this method.

@@ -36,6 +59,27 @@ impl Series {
})
}

pub fn fixed_hash_with_nullable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@@ -138,3 +193,30 @@ impl GroupHash for StringColumn {
Ok(())
}
}

impl GroupHash for NullableColumn {
fn fixed_hash_with_nullable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

column: &Arc<dyn Column>,
bitmap: &Bitmap,
step: usize,
null_offset: usize,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge into fixed_hash.

Option<(&Bitmap, usize)>

}
Ok(())
}

pub fn fixed_hash(column: &ColumnRef, ptr: *mut u8, step: usize) -> Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Option<&Bitmap, usize>

@@ -46,6 +47,23 @@ impl TypeDeserializer for NullableDeserializer {
Ok(())
}

fn de_batch_with_nullable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Donot put it into trait.

for row in 0..rows {
let mut reader = &reader[step * row..];
self.inner.de(&mut reader)?;
self.bitmap.push(reader[*null_offset - type_size] & 1 != 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!=1 ---> = 0

@sundy-li
Copy link
Member

sundy-li commented Mar 5, 2022

Continue in #4340

@sundy-li sundy-li closed this Mar 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-review pr-bugfix this PR patches a bug in codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants