Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add rpad and lpad expressions #2157

Merged
merged 11 commits into from
May 7, 2024

Conversation

murex971
Copy link
Contributor

@murex971 murex971 commented Apr 19, 2024

Resolves #1931
Resolves #1930

@github-actions github-actions bot added enhancement New feature or request documentation Improvements or additions to documentation labels Apr 19, 2024
@jaychia jaychia requested a review from kevinzwang April 20, 2024 20:37
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work so far! Biggest items of feedback are just in properly working with string characters instead of bytes.

src/daft-core/src/series/ops/utf8.rs Outdated Show resolved Hide resolved
src/daft-dsl/src/functions/utf8/rpad.rs Outdated Show resolved Hide resolved
src/daft-dsl/src/functions/utf8/rpad.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/utf8.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/utf8.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/utf8.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/utf8.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/utf8.rs Outdated Show resolved Hide resolved
tests/series/test_utf8_ops.py Show resolved Hide resolved
daft/expressions/expressions.py Show resolved Hide resolved
@murex971
Copy link
Contributor Author

@kevinzwang addressed the feedback.
Not sure why the tests are filing though, builds and tests running fine on my local.

@kevinzwang kevinzwang self-requested a review April 23, 2024 17:32
@kevinzwang
Copy link
Member

Not sure why the tests are filing though, builds and tests running fine on my local.

@murex971 the tests are failing because the function is_valid_input_lengths was removed from utf8.rs in the main branch. Could you try merging main into your branch and then resolving the errors that arise from the merged code?

Also pinging @jaychia because he made that change

@kevinzwang
Copy link
Member

I believe the functionality of is_valid_input_lengths was inlined into parse_inputs, you could look into using that function in the kernel instead!

@jaychia
Copy link
Contributor

jaychia commented Apr 24, 2024

I believe the functionality of is_valid_input_lengths was inlined into parse_inputs, you could look into using that function in the kernel instead!

Yes! We made it cleaner and easier to have parse_inputs check both the validity of the input lengths, and also return information about the expected output as a tuple: (is_full_null: bool, expected_length: usize)

Feel free to check out the other kernels for usage!

@murex971
Copy link
Contributor Author

murex971 commented Apr 25, 2024

Yes! We made it cleaner and easier to have parse_inputs check both the validity of the input lengths, and also return information about the expected output as a tuple: (is_full_null: bool, expected_length: usize)

@jaychia
The implemented parse_inputs function takes in an array of input arrays (other_arrs), assuming they all would have same data types.

fn parse_inputs<T>(
    self_arr: &Utf8Array,
    other_arrs: &[&DataArray<T>],
) 

In the case of rpad one input is string and another is of numeric type, so the function cannot be directly use here, however I put the same logic inside the rpad function.
I am thinking of refactoring parse_inputs functionality by allowing it to take 2 different inputs of any type, this would incorporate all the case (would handle it in a different PR). Thoughts ?

cc: @kevinzwang

@murex971 murex971 changed the title [FEAT] Add rpad expression for string [FEAT] Add rpad and lpad expressions May 2, 2024
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay. Happy to see these improvements to the code, and its awesome that you were also able to add lpad!

Overall, the code looks good to go, just a few small comments and suggestions. Make sure to also take a look at some of the unresolved comments from the previous review. Great work!

src/daft-dsl/src/functions/utf8/rpad.rs Outdated Show resolved Hide resolved
src/daft-dsl/src/functions/utf8/lpad.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/utf8.rs Outdated Show resolved Hide resolved
Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go ahead and approve it, should be ready to merge once these final things are resolved

Copy link
Member

@kevinzwang kevinzwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to go! Just one comment about the refactored pad function

Comment on lines 864 to 867
Ok(match placement {
PadPlacement::Left => fillchar.chain(val.chars()).collect(),
PadPlacement::Right => val.chars().chain(fillchar).collect(),
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: I would recommend trying to do this match outside of the iteration so that we don't have an extra branching statement in the hot path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@kevinzwang
Copy link
Member

Amazing, I'll merge it in!

@kevinzwang kevinzwang merged commit d035454 into Eventual-Inc:main May 7, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[EXPRESSIONS] .str.rpad [EXPRESSIONS] .str.lpad
3 participants