You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are currently rewriting some of our components to use Arrow and when appending data to the record, we want to basically deduplicate/aggregate the data before adding new rows. For that, we either need to keep track of the previously appended data outside of Arrow, or (and that's what we want to enhance) we read back the data from the underlying builders.
Next, to potentially adding the *Builder Value(i int) * functions to some types we would mostly be interested in reading back values from a *BinaryDictionaryBuilder.
I've managed to read the indices back from the builders, but then failed and stopped trying to read the values from the underlying BinaryMemoTable.
It would be super helpful to read back those strings from the Dictionary's BinaryMemoTable. Is this something we can somehow add? Is this something within the scope of this library?
diff --git a/go/arrow/array/dictionary.go b/go/arrow/array/dictionary.go
index 2409e296c..622dece04 100644
--- a/go/arrow/array/dictionary.go+++ b/go/arrow/array/dictionary.go@@ -1208,6 +1209,54 @@ func (b *BinaryDictionaryBuilder) InsertStringDictValues(arr *String) (err error
return
}
+func (b *BinaryDictionaryBuilder) GetValueIndex(i int) int {+ switch b := b.idxBuilder.Builder.(type) {+ case *Int64Builder:+ return int(b.Value(i))+ case *Uint64Builder:+ return int(b.Value(i))+ case *Float64Builder:+ return int(b.Value(i))+ case *Int32Builder:+ return int(b.Value(i))+ case *Uint32Builder:+ return int(b.Value(i))+ case *Float32Builder:+ return int(b.Value(i))+ case *Int16Builder:+ return int(b.Value(i))+ case *Uint16Builder:+ return int(b.Value(i))+ case *Int8Builder:+ return int(b.Value(i))+ case *Uint8Builder:+ return int(b.Value(i))+ case *TimestampBuilder:+ return int(b.Value(i))+ case *Time32Builder:+ return int(b.Value(i))+ case *Time64Builder:+ return int(b.Value(i))+ case *Date32Builder:+ return int(b.Value(i))+ case *Date64Builder:+ return int(b.Value(i))+ case *DurationBuilder:+ return int(b.Value(i))+ default:+ return -1+ }+}++func (b *BinaryDictionaryBuilder) Value(i int) []byte {+ return []byte{}+}++func (b *BinaryDictionaryBuilder) ValueStr(i int) string {+ //b.memoTable.(*hashing.BinaryMemoTable).+ return ""+}+
type FixedSizeBinaryDictionaryBuilder struct {
dictionaryBuilder
byteWidth int
Component(s)
Go
The text was updated successfully, but these errors were encountered:
@metalmatze this seems like a good idea (though for the dictionary builder getting an index you only have to handle the integral types, int8/16/32/64 and uint8/16/32/64, as all the other types are not valid dictionary index types).
For a binary memo table, since it maintains a binary builder internally and only stores the dictionary indices in the hash table as the values, it should be pretty easy to add a Value method which takes in an index and returns the value by calling b.builder.Value(idx). The other types would be a bit harder and would likely require maintaining some sort of reverse mapping.
Would you be willing to make a PR for this? At least for the binary dictionary types since that would be pretty easy to implement as suggested above.
…rs (#35744)
### Rationale for this change
See #35711
### What changes are included in this PR?
Adding `*BinaryDictionaryBuilder.GetValueIndex(i int) int` to get the value's index in the dictionary. Then two methods to get a `[]byte` and `string` results. `Value(i int) []byte` and `ValueStr(i int) string`.
### Are these changes tested?
So far I have added very little testing. Mostly some sort of integration test to read back values from the BinaryDictionaryBuilder. We should definitely add some test for the `*BinaryMemoTable) Value(i int) []byte`.
### Are there any user-facing changes?
Just additions, no breaking API changes.
cc @ zeroshade
* Closes: #35711
Authored-by: Matthias Loibl <mail@matthiasloibl.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Describe the enhancement requested
We are currently rewriting some of our components to use Arrow and when appending data to the record, we want to basically deduplicate/aggregate the data before adding new rows. For that, we either need to keep track of the previously appended data outside of Arrow, or (and that's what we want to enhance) we read back the data from the underlying builders.
I was able to find several occurrences of reading values from builders.
Next, to potentially adding the
*Builder Value(i int) *
functions to some types we would mostly be interested in reading back values from a*BinaryDictionaryBuilder
.I've managed to read the indices back from the builders, but then failed and stopped trying to read the values from the underlying
BinaryMemoTable
.It would be super helpful to read back those strings from the Dictionary's
BinaryMemoTable
. Is this something we can somehow add? Is this something within the scope of this library?Component(s)
Go
The text was updated successfully, but these errors were encountered: