Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling Heterogeneous types #135

Closed
Sov-trotter opened this issue Jun 29, 2020 · 6 comments
Closed

Handling Heterogeneous types #135

Sov-trotter opened this issue Jun 29, 2020 · 6 comments

Comments

@Sov-trotter
Copy link
Contributor

Hello folks. As described in this discourse post, we are adding GeometryBasics support for Shapefile and GeoJSONTables. We seem to be stuck while working on this PR. I had also posted a question on Julia slack but couldn't describe the problem well there.
GeometryBasics uses StructArrays to store individual geometries along with metadata.
Now when we add GeometryBasics support to the parsers, the plan is to store the GeometryBasics-fied geometries in a StructArray.
The thing we are aiming at is being consistent with GeometryBasics so as to avoid writing any sort of conversions later on(for plotting etc, since makie supports GeometryBasics).

Problem 1 :

Formats like geojson allow for multiple geometries in a single file, since StructArrays don't support multiple types it's currently impossible to store those geometries using StructArrays.

Problem 2 :

The type inconsistency problem doesn't seem to be with geometries only. It appears for the metadata too.

julia> point1 = meta(Point(3, 1), city="Abuja", rainfall=1221.2)
2-element PointMeta{2,Int64,Point{2,Int64},(:city, :rainfall),Tuple{String,Float64}} with indices SOneTo(2):
 3
 1

julia> point2 = meta(Point(2, 1), city="Delhi", rainfall=121)
2-element PointMeta{2,Int64,Point{2,Int64},(:city, :rainfall),Tuple{String,Int64}} with indices SOneTo(2):
 2
 1

julia> StructArray([point1, point2])
ERROR: Field meta not part of Element

The error is clearly because of the type inconsistency of rainfall here(Float64 and Int).

One idea we had was promotion/conversion rules for types but that'd be hard to generalize for most practical cases I guess.

@Sov-trotter
Copy link
Contributor Author

Sov-trotter commented Jun 29, 2020

As @visr suggested another simple reproducible example is :

julia> StructArray(Union{Float64, Int}[1.0, 2])
ERROR: MethodError: no method matching fieldnames(::Type{Union{Float64, Int64}})

@piever
Copy link
Collaborator

piever commented Jun 29, 2020

Problem 1

I think what you want to do is possible, but there may be a bit of misconception about StructArray. So, the constructor StructArray aims to create a StructArray which is clearly not possible if the data are floats of integers. What I suspect you want is to use collect_structarray which uses a "progressive widening" structure.

In particular, you may want to use ArrayInitializer, which does not assume that the data is made of structs. For example:

julia> using StructArrays

julia> init = StructArrays.ArrayInitializer(t -> t <: NamedTuple)
StructArrays.ArrayInitializer{var"#3#4",typeof(StructArrays.arrayof)}(var"#3#4"(), StructArrays.arrayof)

julia> collect_structarray(((a=1, b=12), 2.3), initializer=init) # if fields are not compatible, give up and use `Vector{Any}`
2-element Array{Any,1}:
  (a = 1, b = 12)
 2.3

julia> collect_structarray(((a=1, b=12), (a=11, b=11)), initializer=init) # do struct of array thing if fields are compatible
2-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Int64}}:
 (a = 1, b = 12)
 (a = 11, b = 11)

julia> collect_structarray(((a=1, b=12), (a=11, b="string")), initializer=init) # expand only specific fields as needed
2-element StructArray(::Array{Int64,1}, ::Array{Any,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Any}}:
 NamedTuple{(:a, :b),Tuple{Int64,Any}}((1, 12))
 NamedTuple{(:a, :b),Tuple{Int64,Any}}((11, "string"))

A lot of thought and testing went into this, so it should have the features you need (unfortunately, not nearly the same effort went into writing docs!). The expansions of the various columns uses Base.promote_typejoin (same as collect, so ints and floats will give you Real).

@Sov-trotter
Copy link
Contributor Author

Sov-trotter commented Jun 29, 2020

Thanks for looking into this. I'll definitely try this out. : )
Also, really sorry for not replying back on the slack thread, I thought filing an issue would be better.

@piever
Copy link
Collaborator

piever commented Jun 29, 2020

No problem, issue is definitely better!

To be more concrete, something like this line, should instead be:

iter = (basicgeometry(f) for f in t) # no need to collect here
collect_structarray(iter, init=custom_initializer)

where the only challenge is determining what is a good initializer for the Geo use case. It needs to take a type and axes and spit out an uninitialized AbstractArray. Defining one can be somewhat convoluted. A possible shortcut is to use ArrayInitializer(unwrap) where unwrap is a function telling you on what types to do the "struct of arrays" transformations. So ArrayInitializer(t -> t isa Geometry) (not sure how they are called) would be a decent attempt.

@Sov-trotter
Copy link
Contributor Author

Sov-trotter commented Jun 29, 2020

Ah! Cool.
I'll see how it goes with the initializer and ping you with the outcome?
Also, if you want, later on I can try putting some details based on my experience into the docs?

@piever
Copy link
Collaborator

piever commented Jun 29, 2020

Problem 2

This is slightly trickier. I seem to remember GeometryBasics uses some tricks to have a special types that gives some advantages. One simple option is doing the collection of metadata and points separately. For example:

julia> geoms = collect_structarray([Point(3, 1), Point(2, 1)])
2-element StructArray(::Array{Tuple{Int64,Int64},1}) with eltype Point{2,Int64}:
 [3, 1]
 [2, 1]

julia> geoms_meta = meta(geoms, city=["Delhi", "New York"], rainfall=[10, 100])
2-element StructArray(StructArray(::Array{Tuple{Int64,Int64},1}), ::Array{String,1}, ::Array{Int64,1}) with eltype PointMeta{2,Int64,Point{2,Int64},(:city, :rainfall),Tuple{String,Int64}}:
 [3, 1]
 [2, 1]

If you are given an iterator of both geometries and metadata and want to do this operation in one swipe, that is possible but a bit more involved.

I'll see how it goes with the initializer and ping you with the outcome?

Sure, happy to help!

Also, if you want, later on I can try putting some details based on my experience into the docs?

Definitely! That'd be very helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants