Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java][Vector]UnionListWriter is unable write some types #35352

Closed
henrymai opened this issue Apr 27, 2023 · 0 comments · Fixed by #35353
Closed

[Java][Vector]UnionListWriter is unable write some types #35352

henrymai opened this issue Apr 27, 2023 · 0 comments · Fixed by #35353

Comments

@henrymai
Copy link
Contributor

Describe the bug, including details regarding any error messages, version, and platform.

UnionListWriter ends up throwing an exception when trying to use the UnionListWriter as follows:

[...]

org.apache.arrow.vector.complex.writer.FieldWriter writer = vector.getWriter();
writer.allocate();
writer.startList();
writer.timeStampMilliTZ().writeTimeStampMilliTZ(1000L);
// ^ exception thrown here:
// IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

[...]

This also occurs for types FixedSizeBinary and Duration.

Component(s)

Java

henrymai added a commit to henrymai/arrow that referenced this issue Apr 27, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 28, 2023
…me types

Fix the UnionListWriter template to correctly generate the methods to
allow writing to the types that were failing before.

Without the fix, the new test added would have the following kinds of failures:
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.

Since this fix is a generalized change, it fixes the same problem with writing
for a bunch of other types like fixedSizeBinary and duration.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 29, 2023
"semi complex" types like TimeStamp*TZ, Duration, and FixedSizeBinary
were missing implementations in UnionListWriter, UnionVector,
UnionReader and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like ListVectors, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
*Vectors classes for these types:
  - The get methods did not set fields like TimeStampMilliTZHolder.timezone,
    DurationHolder.unit, FixedSizeBinaryHolder.byteWidth.

  - The set methods did not all validate that those fields matched what
    the vector's ArrowType was set to. For example TimeStampMilliTZHolder.timezone should
    match ArrowType.Timestamp.timezone on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized ArrowTypes. Instead, users should
use the write methods that take *Holders that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for Decimal have already also been marked as deprecated, for the same
reasoning.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 29, 2023
"semi complex" types like TimeStamp*TZ, Duration, and FixedSizeBinary
were missing implementations in UnionListWriter, UnionVector,
UnionReader and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like ListVectors, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
*Vectors classes for these types:
  - The get methods did not set fields like TimeStampMilliTZHolder.timezone,
    DurationHolder.unit, FixedSizeBinaryHolder.byteWidth.

  - The set methods did not all validate that those fields matched what
    the vector's ArrowType was set to. For example TimeStampMilliTZHolder.timezone should
    match ArrowType.Timestamp.timezone on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized ArrowTypes. Instead, users should
use the write methods that take *Holders that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for Decimal have already also been marked as deprecated, for the same
reasoning.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 29, 2023
"semi complex" types like TimeStamp*TZ, Duration, and FixedSizeBinary
were missing implementations in UnionListWriter, UnionVector,
UnionReader and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like ListVectors, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
*Vectors classes for these types:
  - The get methods did not set fields like TimeStampMilliTZHolder.timezone,
    DurationHolder.unit, FixedSizeBinaryHolder.byteWidth.

  - The set methods did not all validate that those fields matched what
    the vector's ArrowType was set to. For example TimeStampMilliTZHolder.timezone should
    match ArrowType.Timestamp.timezone on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized ArrowTypes. Instead, users should
use the write methods that take *Holders that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for Decimal have already also been marked as deprecated, for the same
reasoning.
henrymai added a commit to henrymai/arrow that referenced this issue Apr 29, 2023
"semi complex" types like TimeStamp*TZ, Duration, and FixedSizeBinary
were missing implementations in UnionListWriter, UnionVector,
UnionReader and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like ListVectors, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
*Vectors classes for these types:
  - The get methods did not set fields like TimeStampMilliTZHolder.timezone,
    DurationHolder.unit, FixedSizeBinaryHolder.byteWidth.

  - The set methods did not all validate that those fields matched what
    the vector's ArrowType was set to. For example TimeStampMilliTZHolder.timezone should
    match ArrowType.Timestamp.timezone on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized ArrowTypes. Instead, users should
use the write methods that take *Holders that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for Decimal have already also been marked as deprecated, for the same
reasoning.
henrymai added a commit to henrymai/arrow that referenced this issue May 11, 2023
"semi complex" types like TimeStamp*TZ, Duration, and FixedSizeBinary
were missing implementations in UnionListWriter, UnionVector,
UnionReader and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like ListVectors, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
*Vectors classes for these types:
  - The get methods did not set fields like TimeStampMilliTZHolder.timezone,
    DurationHolder.unit, FixedSizeBinaryHolder.byteWidth.

  - The set methods did not all validate that those fields matched what
    the vector's ArrowType was set to. For example TimeStampMilliTZHolder.timezone should
    match ArrowType.Timestamp.timezone on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized ArrowTypes. Instead, users should
use the write methods that take *Holders that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for Decimal have already also been marked as deprecated, for the same
reasoning.
henrymai added a commit to henrymai/arrow that referenced this issue May 11, 2023
"semi complex" types like TimeStamp*TZ, Duration, and FixedSizeBinary
were missing implementations in UnionListWriter, UnionVector,
UnionReader and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like ListVectors, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
*Vectors classes for these types:
  - The get methods did not set fields like TimeStampMilliTZHolder.timezone,
    DurationHolder.unit, FixedSizeBinaryHolder.byteWidth.

  - The set methods did not all validate that those fields matched what
    the vector's ArrowType was set to. For example TimeStampMilliTZHolder.timezone should
    match ArrowType.Timestamp.timezone on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized ArrowTypes. Instead, users should
use the write methods that take *Holders that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for Decimal have already also been marked as deprecated, for the same
reasoning.
henrymai added a commit to henrymai/arrow that referenced this issue May 11, 2023
"semi complex" types like TimeStamp*TZ, Duration, and FixedSizeBinary
were missing implementations in UnionListWriter, UnionVector,
UnionReader and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like ListVectors, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
*Vectors classes for these types:
  - The get methods did not set fields like TimeStampMilliTZHolder.timezone,
    DurationHolder.unit, FixedSizeBinaryHolder.byteWidth.

  - The set methods did not all validate that those fields matched what
    the vector's ArrowType was set to. For example TimeStampMilliTZHolder.timezone should
    match ArrowType.Timestamp.timezone on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized ArrowTypes. Instead, users should
use the write methods that take *Holders that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for Decimal have already also been marked as deprecated, for the same
reasoning.
lidavidm pushed a commit that referenced this issue May 11, 2023
### Rationale for this change

"semi complex" types like `TimeStamp*TZ`, `Duration`, and `FixedSizeBinary`
were missing implementations in `UnionListWriter`, `UnionVector`,
`UnionReader` and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like `ListVector`s, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
`*Vector`s classes for these types:
  - The get methods did not set fields like `TimeStampMilliTZHolder.timezone`,
    `DurationHolder.unit`, `FixedSizeBinaryHolder.byteWidth`.

  - The set methods did not all validate that those fields matched what
    the vector's `ArrowType` was set to. For example `TimeStampMilliTZHolder.timezone` should
    match `ArrowType.Timestamp.timezone` on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized `ArrowTypes`. Instead, users should
use the write methods that take `*Holders` that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for `Decimal` have already also been marked as deprecated, for the same
reasoning.

### What changes are included in this PR?

- Various template changes. See this gist for a diff of the generated files: https://gist.github.com/henrymai/03b0f5a4165cd9822aa797c89bbd74e9
   - Note that the diff for the ComplexWriter.java generated ones is not included since that change is easy to see from just the template.

- Additional tests to verify that this is fixed.

### Are these changes tested?

Yes, added unit tests and they pass.

### Are there any user-facing changes?

Yes, users will now be able to write to types like TimeStamp*TZ, Duration, and FixedSizeBinary on ListVector, MapVector, StructVector due to added implementations for these types on UnionListVector, UnionWriter.

This also marks the non-holder write interfaces for these types as `@ Deprecated`.
The interfaces themselves have not been removed, so this part is not a breaking change.
Also as mentioned above, this follows in the footsteps of what `Decimal*` has already done, where they marked the non holder write interfaces as `@ Deprecated`.

Additionally another user visible change is that when reading these types into a holder all of the fields are now populated. For example `TimeStamp*TZHolder.timezone` is now populated when it was not before.

**This PR includes breaking changes to public APIs.**

We now throw an exception when some of the holder fields that correspond to the Vector's ArrowType parameters do not match.
This is a breaking change because previously this would silently accept the writes but was actually incorrect, since the element written does not correspond to the right ArrowType variant due to mismatching parameters (like ArrowType.Timestamp.unit, ArrowType.Duration.unit, and ArrowType.FixedSizeBinary.byteWidth).

* Closes: #35352

Authored-by: Henry Mai <henrymai@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
@lidavidm lidavidm added this to the 13.0.0 milestone May 11, 2023
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…35353)

### Rationale for this change

"semi complex" types like `TimeStamp*TZ`, `Duration`, and `FixedSizeBinary`
were missing implementations in `UnionListWriter`, `UnionVector`,
`UnionReader` and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like `ListVector`s, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
`*Vector`s classes for these types:
  - The get methods did not set fields like `TimeStampMilliTZHolder.timezone`,
    `DurationHolder.unit`, `FixedSizeBinaryHolder.byteWidth`.

  - The set methods did not all validate that those fields matched what
    the vector's `ArrowType` was set to. For example `TimeStampMilliTZHolder.timezone` should
    match `ArrowType.Timestamp.timezone` on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized `ArrowTypes`. Instead, users should
use the write methods that take `*Holders` that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for `Decimal` have already also been marked as deprecated, for the same
reasoning.

### What changes are included in this PR?

- Various template changes. See this gist for a diff of the generated files: https://gist.github.com/henrymai/03b0f5a4165cd9822aa797c89bbd74e9
   - Note that the diff for the ComplexWriter.java generated ones is not included since that change is easy to see from just the template.

- Additional tests to verify that this is fixed.

### Are these changes tested?

Yes, added unit tests and they pass.

### Are there any user-facing changes?

Yes, users will now be able to write to types like TimeStamp*TZ, Duration, and FixedSizeBinary on ListVector, MapVector, StructVector due to added implementations for these types on UnionListVector, UnionWriter.

This also marks the non-holder write interfaces for these types as `@ Deprecated`.
The interfaces themselves have not been removed, so this part is not a breaking change.
Also as mentioned above, this follows in the footsteps of what `Decimal*` has already done, where they marked the non holder write interfaces as `@ Deprecated`.

Additionally another user visible change is that when reading these types into a holder all of the fields are now populated. For example `TimeStamp*TZHolder.timezone` is now populated when it was not before.

**This PR includes breaking changes to public APIs.**

We now throw an exception when some of the holder fields that correspond to the Vector's ArrowType parameters do not match.
This is a breaking change because previously this would silently accept the writes but was actually incorrect, since the element written does not correspond to the right ArrowType variant due to mismatching parameters (like ArrowType.Timestamp.unit, ArrowType.Duration.unit, and ArrowType.FixedSizeBinary.byteWidth).

* Closes: apache#35352

Authored-by: Henry Mai <henrymai@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…35353)

### Rationale for this change

"semi complex" types like `TimeStamp*TZ`, `Duration`, and `FixedSizeBinary`
were missing implementations in `UnionListWriter`, `UnionVector`,
`UnionReader` and other associated classes.

This patch adds these missing methods so that these types can now be
written to things like `ListVector`s, whereas before it would throw an
exception because the methods were just not implemented.

For example, without this patch, one of the new tests added would fail:
```
    TestListVector.testWriterGetTimestampMilliTZField:913 ? IllegalArgument You tried to write a TimeStampMilliTZ type when you are using a ValueWriter of type UnionListWriter.
```

There are also fixes for get and set methods for holders for the respective
`*Vector`s classes for these types:
  - The get methods did not set fields like `TimeStampMilliTZHolder.timezone`,
    `DurationHolder.unit`, `FixedSizeBinaryHolder.byteWidth`.

  - The set methods did not all validate that those fields matched what
    the vector's `ArrowType` was set to. For example `TimeStampMilliTZHolder.timezone` should
    match `ArrowType.Timestamp.timezone` on the vector and should throw if it doesn't.
    Otherwise users would never get a signal that there is anything wrong with their code writing
    these holders with mismatching values.

This patch additionally marks some of the existing interfaces for writing these
"semi complex" types as deprecated, because they do not actually provide enough
context to properly write these parameterized `ArrowTypes`. Instead, users should
use the write methods that take `*Holders` that do provide enough context for the
writers to properly write these types. Also note that the equivalent write
methods for `Decimal` have already also been marked as deprecated, for the same
reasoning.

### What changes are included in this PR?

- Various template changes. See this gist for a diff of the generated files: https://gist.github.com/henrymai/03b0f5a4165cd9822aa797c89bbd74e9
   - Note that the diff for the ComplexWriter.java generated ones is not included since that change is easy to see from just the template.

- Additional tests to verify that this is fixed.

### Are these changes tested?

Yes, added unit tests and they pass.

### Are there any user-facing changes?

Yes, users will now be able to write to types like TimeStamp*TZ, Duration, and FixedSizeBinary on ListVector, MapVector, StructVector due to added implementations for these types on UnionListVector, UnionWriter.

This also marks the non-holder write interfaces for these types as `@ Deprecated`.
The interfaces themselves have not been removed, so this part is not a breaking change.
Also as mentioned above, this follows in the footsteps of what `Decimal*` has already done, where they marked the non holder write interfaces as `@ Deprecated`.

Additionally another user visible change is that when reading these types into a holder all of the fields are now populated. For example `TimeStamp*TZHolder.timezone` is now populated when it was not before.

**This PR includes breaking changes to public APIs.**

We now throw an exception when some of the holder fields that correspond to the Vector's ArrowType parameters do not match.
This is a breaking change because previously this would silently accept the writes but was actually incorrect, since the element written does not correspond to the right ArrowType variant due to mismatching parameters (like ArrowType.Timestamp.unit, ArrowType.Duration.unit, and ArrowType.FixedSizeBinary.byteWidth).

* Closes: apache#35352

Authored-by: Henry Mai <henrymai@users.noreply.github.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants