Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C#] Exception thrown when creating schema with multiple columns with the same name #34076

Closed
DanTm99 opened this issue Feb 8, 2023 · 2 comments · Fixed by #34125
Closed
Assignees
Milestone

Comments

@DanTm99
Copy link
Contributor

DanTm99 commented Feb 8, 2023

Describe the bug, including details regarding any error messages, version, and platform.

When creating a Schema Schema.cs, if the list of Fields contains fields that share the same name, an exception is thrown when constructing the fields dictionary.

The use of a dictionary disallows multiple fields that have the same name, even though it should be allowed.

Component(s)

C#

@DanTm99
Copy link
Contributor Author

DanTm99 commented Feb 9, 2023

This exception is thrown when constructing the _fieldsDictionary, because the keys for it are the field names and a dictionary cannot have multiple entries with the same key.

This is publicly exposed as IReadOnlyDictionary<string, Field> Fields, and avoiding this exception would require either skipping fields with duplicate names when constructing it (making the dictionary an inaccurate representation of the fields), or by replacing the IReadOnlyDictionary<string, Field> with another type (e.g. ILookup<string, Field> or IReadOnlyList<Field>) which would be a breaking change for anyone using Fields directly.

@westonpace @eerhardt thoughts?

@westonpace
Copy link
Member

Yes, that is my understanding. I think maintaining backwards compatibility (by skipping fields) is ok, maybe even preferred, if we can mark the API as deprecated. However, we would need to ensure we don't break anything internally. For example, if we read from an IPC file into a table, slice the table, and then write a new IPC file, it should still have all the fields. This would mean we'd have to update internal APIs to avoid using this deprecated field (though marking it as deprecated would probably lead us to do that anyways).

DanTm99 added a commit to DanTm99/arrow that referenced this issue Feb 17, 2023
westonpace pushed a commit that referenced this issue Mar 3, 2023
Skip fields with duplicate names when building the `_fieldsDictionary` in the Schema constructor.
This allows schemas to have multiple fields with the same name, as the field name is used as the key for this dictionary.

Deprecate the existing `IReadOnlyDictionary<string, Field> Fields` and expose a new `IReadOnlyList<Field> FieldsList` and a new `ILookup<string, Field> FieldsLookup`. Replace usages of `Fields` with `FieldsList` and `FieldsLookup`.
This allows for access to fields that were omitted from the `_fieldsDictionary`.
* Closes: #34076

Lead-authored-by: DanTm99 <danyaal99@hotmail.co.uk>
Co-authored-by: Danyaal Khan <danyaal99@hotmail.co.uk>
Signed-off-by: Weston Pace <weston.pace@gmail.com>
@westonpace westonpace added this to the 12.0.0 milestone Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants