Skip to content

Commit

Permalink
Prevent infinite loop while writing on CategoricalArrays (#44)
Browse files Browse the repository at this point in the history
Supporting CategoricalArrays is a bit tricky. Ideally, we could use the
same interface as PooledArrays and just rely on `DataAPI.refarray` and
`DataAPI.refpool`, but alas, a `CategoricalArray` returns a
`CategoricalRefPool` from `DataAPI.refpool`, with `CategoricalValue`
elements. The core issue comes when we try to serialize the pool, which
is a recursive process: recursive until we reach a known "leaf" type, at
which point the recursion stops. Unfortunately, a `CategoricalValue`
isn't a known leaf type, so it's treated as a `StructType`, where each
of its fields are serialized. One of the fields is the
`CategoricalRefPool`, so we get stuck in a never-ending recursive loop
serializing `CategoricalValue`s and `CategoricalRefPool`s.

This PR proposes a quick hack where we check the `DataType` name for
`:CategoricalRefPool` or `:CategoricalArray` and if so, just unwrap the
values so the recursion will be broken. It's obviously a little hacky,
but also avoids taking on the always-problematic CategoricalArrays
dependency.
  • Loading branch information
quinnj committed Oct 23, 2020
1 parent 5c207ea commit b109c49
Showing 1 changed file with 10 additions and 1 deletion.
11 changes: 10 additions & 1 deletion src/arraytypes/dictencoding.jl
Expand Up @@ -96,7 +96,12 @@ function arrowvector(::DictEncodedType, x, i, nl, fi, de, ded, meta; dictencode:
for i = 1:length(inds)
@inbounds inds[i] -= 1
end
data = arrowvector(DataAPI.refpool(x), i, nl, fi, de, ded, nothing; dictencode=dictencodenested, dictencodenested=dictencodenested, dictencoding=true, kw...)
pool = DataAPI.refpool(x)
# horrible hack? yes. better than taking CategoricalArrays dependency? also yes.
if typeof(pool).name.name == :CategoricalRefPool
pool = [get(pool[i]) for i = 1:length(pool)]
end
data = arrowvector(pool, i, nl, fi, de, ded, nothing; dictencode=dictencodenested, dictencodenested=dictencodenested, dictencoding=true, kw...)
encoding = DictEncoding{eltype(data), typeof(data)}(id, data, false)
de[id] = Lockable(encoding)
else
Expand All @@ -111,7 +116,11 @@ function arrowvector(::DictEncodedType, x, i, nl, fi, de, ded, meta; dictencode:
deltas = eltype(x)[]
len = length(x)
inds = Vector{encodingtype(len)}(undef, len)
categorical = typeof(x).name.name == :CategoricalArray
for (j, val) in enumerate(x)
if categorical
val = get(val)
end
@inbounds inds[j] = get!(pool, val) do
push!(deltas, val)
length(pool)
Expand Down

0 comments on commit b109c49

Please sign in to comment.