Skip to content

Bug fix, First accumulator multiple batch aware#6503

Merged
mustafasrepo merged 2 commits intoapache:mainfrom
synnada-ai:feature/first_bug_fix
Jun 1, 2023
Merged

Bug fix, First accumulator multiple batch aware#6503
mustafasrepo merged 2 commits intoapache:mainfrom
synnada-ai:feature/first_bug_fix

Conversation

@mustafasrepo
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #6502.

Rationale for this change

Previously first_value accumulator was giving the first value of the last batch received as result. It wasn't aware of whether first_value was set at previous batches or not. Hence when working with multiple batch data the results produced were wrong. This PR fixes this problem.

What changes are included in this PR?

Are these changes tested?

Yes, new unit test is added for this new behavior.

Are there any user-facing changes?

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label May 31, 2023
first: ScalarValue,
// At the beginning, `is_set` is `false`, this means `first` is not seen yet.
// Once we see (`is_set=true`) first value, we do not update `first`.
is_set: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential way that might let the compiler check that is_set was updated correctly would be something like:

struct FirstValueAccumulator {
  /// None until the data has been seen
  first: Option<ScalarValue>
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have considered this option, however, during evaluate call if first is None. We need to return NULL for appropriate type, such as ScalarValue::Int64(None) if type is Int64. However, datatype is only available during initialization. If we want to use Option for first, we may need to store data type also. Hence I opted for this approach

@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 31, 2023

Thanks @mustafasrepo

@mustafasrepo mustafasrepo merged commit e8bce3f into apache:main Jun 1, 2023
@mustafasrepo mustafasrepo deleted the feature/first_bug_fix branch June 2, 2023 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FIRST accumulator gives wrong result when fed with multiple batches

3 participants