Skip to content

Conversation

@waynexia
Copy link
Member

@waynexia waynexia commented Dec 7, 2020

This PR add supports to take() on FixedSizeListArray by implementing following two methods:

  • take_value_indices_from_fixed_size_list(), which calculates indeces of child array for take(). And
  • take_fixed_size_list(), which constructs result list array.

And removed a dynamic cast inside take_value_indices_from_list() by the way.

Signed-off-by: wayne <i@waynest.com>
Signed-off-by: wayne <i@waynest.com>
@github-actions
Copy link

github-actions bot commented Dec 7, 2020

@waynexia
Copy link
Member Author

Hi @alamb @jorgecarleitao, could you please take a look at this:wink:. Thanks!

The purpose of this pr is to support list_sort in #8856.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a small comment to improve coverage, but the implementation looks really good!

Some(vec![Some(3), Some(4), Some(5)]),
Some(vec![Some(6), Some(7), Some(8)]),
],
vec![2, 1, 0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible / useful to add a test that uses some null indices?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate your review!

The third case used null indices (2 in [3, 2, 1, 2, 0]) if you mean to null value inside a list.

BTW I also changed the first case to include some nulls in sub-array in the following commit.

Signed-off-by: wayne <i@waynest.com>
@waynexia
Copy link
Member Author

It looks like CI jammed and the coverage report doesn't show up so I ran one on my machine. The report is posted below.

coverage report

  149|       |/// Takes/filters a fixed size list array's inner data using the offsets of the list array.
  150|       |pub(super) fn take_value_indices_from_fixed_size_list(
  151|       |    list: &FixedSizeListArray,
  152|       |    indices: &PrimitiveArray<Int32Type>,
  153|       |    length: <Int32Type as ArrowPrimitiveType>::Native,
  154|       |) -> PrimitiveArray<Int32Type> {
  155|      3|    let mut values = vec![];
  156|       |
  157|     11|    for i in 0..indices.len() {
  158|     11|        if indices.is_valid(i) {
  159|     11|            let index = indices.value(i) as usize;
  160|     11|            let start = list.value_offset(index);
  161|       |
  162|     11|            values.extend(start..start + length);
  163|     11|        }
  164|     11|    }
  165|       |
  166|      3|    PrimitiveArray::<Int32Type>::from(values)
  167|      3|}
  536|       |/// `take` implementation for `FixedSizeListArray`
  537|       |///
  538|       |/// Calculates the index and indexed offset for the inner array,
  539|       |/// applying `take` on the inner array, then reconstructing a list array
  540|       |/// with the indexed offsets
  541|       |fn take_fixed_size_list<IndexType>(
  542|       |    values: &ArrayRef,
  543|       |    indices: &PrimitiveArray<IndexType>,
  544|       |    length: <Int32Type as ArrowPrimitiveType>::Native,
  545|       |) -> Result<ArrayRef>
  546|       |where
  547|       |    IndexType: ArrowNumericType,
  548|       |    IndexType::Native: ToPrimitive,
  549|       |{
  550|      3|    let indices = indices
  551|      3|        .as_any()
  552|      3|        .downcast_ref::<PrimitiveArray<Int32Type>>()
  553|      3|        .expect("FixedSizeListArray's indices type should be 32-bit signed integer");
  554|      3|    let list = values
  555|      3|        .as_any()
  556|      3|        .downcast_ref::<FixedSizeListArray>()
  557|      3|        .unwrap();
  558|      3|
  559|      3|    let list_indices = take_value_indices_from_fixed_size_list(list, indices, length);
  560|      3|    let taken = take_impl::<Int32Type>(&list.values(), &list_indices, None)?;
  561|       |
  562|       |    // determine null count and null buffer, which are a function of `values` and `indices`
  563|      3|    let mut null_count = 0;
  564|      3|    let num_bytes = bit_util::ceil(indices.len(), 8);
  565|      3|    let mut null_buf = MutableBuffer::new(num_bytes).with_bitset(num_bytes, true);
  566|      3|    let null_slice = null_buf.data_mut();
  567|       |
  568|     11|    for i in 0..indices.len() {
  569|     11|        if !indices.is_valid(i) || list.is_null(indices.value(i) as usize) {
  570|      2|            bit_util::unset_bit(null_slice, i);
  571|      2|            null_count += 1;
  572|      9|        }
  573|     11|    }
  574|       |
  575|      3|    let list_data = ArrayDataBuilder::new(list.data_type().clone())
  576|      3|        .len(indices.len())
  577|      3|        .null_count(null_count)
  578|      3|        .null_bit_buffer(null_buf.freeze())
  579|      3|        .offset(0)
  580|      3|        .add_child_data(taken.data())
  581|      3|        .build();
  582|      3|
  583|      3|    Ok(Arc::new(FixedSizeListArray::from(list_data)))
  584|      3|}

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- I reviewed the test cases carefully and they looked good to me. Thanks @waynexia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants