Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] Simplify IPC writer for dense unions #32613

Closed
asfimport opened this issue Aug 8, 2022 · 2 comments · Fixed by #33822
Closed

[C++] Simplify IPC writer for dense unions #32613

asfimport opened this issue Aug 8, 2022 · 2 comments · Fixed by #33822

Comments

@asfimport
Copy link
Collaborator

asfimport commented Aug 8, 2022

In ARROW-10580 we fixed the Arrow C++ implementation so that dense union offsets are always (non-strictly) monotonic for a given child, as mandated by the spec.

The IPC writer implementation, however, still assumes that dense union offsets may be in any order:

// Offsets may not be ascending, so we need to find out the start offset
// for each child
for (int64_t i = 0; i < length; ++i) {
const uint8_t code = type_codes[i];
if (child_offsets[code] == -1) {
child_offsets[code] = unshifted_offsets[i];
} else {
child_offsets[code] = std::min(child_offsets[code], unshifted_offsets[i]);
}
}

This can probably be simplified, making it slightly less costly to emit a sliced union array.

Reporter: Antoine Pitrou / @pitrou

Related issues:

Note: This issue was originally created as ARROW-17339. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Antoine Pitrou / @pitrou:
Also for the record the Go code for this is as follows:

	for i, c := range codes {
		if offsets[c] == -1 {
			// offsets are guaranteed to be increasing according to the spec
			// so the first offset we find for a child is the initial offset
			// and will become the "0" for this child.
			offsets[c] = unshiftedOffsets[i]
			shiftedOffsets[i] = 0
		} else {
			shiftedOffsets[i] = unshiftedOffsets[i] - offsets[c]
		}
		lengths[c] = maxI32(lengths[c], shiftedOffsets[i]+1)
	}

@rtadepalli
Copy link
Contributor

Hi, I'd like to work on this if it is still available. Thanks.

@kou kou closed this as completed in #33822 Feb 2, 2023
kou pushed a commit that referenced this issue Feb 2, 2023
JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: #32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: #32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@kou kou added this to the 12.0.0 milestone Feb 2, 2023
sjperkins pushed a commit to sjperkins/arrow that referenced this issue Feb 10, 2023
)

JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: apache#32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: apache#32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
gringasalpastor pushed a commit to gringasalpastor/arrow that referenced this issue Feb 17, 2023
)

JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: apache#32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: apache#32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
fatemehp pushed a commit to fatemehp/arrow that referenced this issue Feb 24, 2023
)

JIRA: https://issues.apache.org/jira/browse/ARROW-17339 
Closes: apache#32613

### Rationale for this change
Dense union offsets are always non-strictly monotonic for any given child as mandated by the spec, The C++ implementation still assumes that the offsets may be in any order. This can be improved.

### What changes are included in this PR?

Just a change to eliminate looping over the size of a `DenseUnionArray` twice.

### Are these changes tested?

I am not functionally changing anything. All changes respect the spec, and behavior should be 1:1 with the existing implementation. I believe existing tests should suffice.

### Are there any user-facing changes?

There are no user facing changes for this.

* Closes: apache#32613

Lead-authored-by: Ramasai Tadepalli <ramasai.tadepalli+3108@gmail.com>
Co-authored-by: Ramasai <ramasai.tadepalli+3108@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants