Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

just for my learning... (please!) #89

Open
lewisl opened this issue Jan 10, 2022 · 5 comments
Open

just for my learning... (please!) #89

lewisl opened this issue Jan 10, 2022 · 5 comments
Labels

Comments

@lewisl
Copy link

lewisl commented Jan 10, 2022

struct Table{T <: NamedTuple, N, Data <: NamedTuple{<:Any, <:Tuple{Vararg{AbstractArray{<:Any,N}}}}} <: AbstractArray{T, N}

What is the type qualifier for the struct saying?

@adigitoleo
Copy link

adigitoleo commented Jan 27, 2022

This is a (nested) parametric type definition, which are IMHO the most complicated part of Julia's type system. If you're not familiar with them, I have opened a PR to try and clarify parametric types in the manual, might also be worth checking the linked issue.

Let's go through, from left to right, (maintainers please correct me if I'm wrong).

The Table type declares three things: the row types, a dummy "dimension", and the column types. It might first seem redundant to declare both row and column types, but this is necessary because the row type doesn't cover the types of the column names, nor the column container type.

T <: NamedTuple, N, Data (...)

T reifies to a NamedTuple that "maps" column names to a type, thus defining the type of any single row. Let's take an example table:

julia> t = Table(a = [1, 2, 3], b = [2.0, 4.0, 6.0])
Table with 2 columns and 3 rows:
     a  b
   ┌───────
 11  2.0
 22  4.0
 33  6.0

julia> typeof(t[1])
NamedTuple{(:a, :b), Tuple{Int64, Float64}}

In this case T became NamedTuple{(:a, :b), Tuple{Int64, Float64}. The <: is necessary in the definition, because type parameters are invariant.

The N always resolves to 1 (see next snippet), and is necessary only so that we can have Table <: AbstractArray which means that tables inherit a bunch of nice methods. Basically, the Table is like a Vector of rows (recall that Vecetor is an alias for Array{T,1}).

Now the fun part, the data itself:

julia> typeof(t)
Table{NamedTuple{(:a, :b), Tuple{Int64, Float64}}, 1, NamedTuple{(:a, :b), Tuple{Vector{Int64}, Vector{Float64}}}}

julia> typeof(t.a)
Vector{Int64} (alias for Array{Int64, 1})

julia> typeof(t.b)
Vector{Float64} (alias for Array{Float64, 1})

The column-based data are stored in one big NamedTuple. The types of the column names themselves are not constrained (<:Any). Next, we have the type of the data column itself, which is again parametric. In this case, Tuple{Vararg{AbstractArray{<:Any,N}}} resolved to Tuple{Vector{Int64}, Vector{Float64}}. We must use Vararg because the number of columns (i.e. Vectors) is not known until the table is constructed. The same dummy "dimension" parameter can be re-used, because it will also always be 1 (no such thing as a 2D column).

I hope this clarifies things. If you have any suggestions on how to improve the documentation for parametric types, let me know and I can maybe include it in my PR. In fact, this type definition could serve nicely as a showcase example...

@adigitoleo
Copy link

I've just read in #55 that it's actually a bit more complicated in practice: you can end up with N = 2 tables in some cases. The discussion over in that issue should be consulted for more details, I have provided an incomplete overview.

@adigitoleo
Copy link

May I also suggest changing the title to something more descriptive (like "Understanding the Table type qualifier").

@lewisl
Copy link
Author

lewisl commented Jan 28, 2022 via email

@adigitoleo
Copy link

adigitoleo commented Jan 29, 2022

What does the syntax of T <: NamedTuple, N, Data (…) say?
And what does the (…) signify?

This is not Julia syntax, I just wrote it like that for brevity. The component T <: NamedTuple says that the type parameter T must be a NamedTuple*.

NamedTuple has to include an N type and a Data type, which are themselves defined above the appearance of the T <: ...?

No, the Table type itself contains three constituent types, which are declared as type parameters: T, N and Data. Each of these reifies to some concrete type when an instance of Table is created.

But, the syntax of the T <: assertion remains a bit baffling.

I agree that this is the most confusing part. Hopefully it makes more sense now? T <: NamedTuple just declares that T can be any type, so long as it is from the NamedTuple parametric family*. The commas are not a type union syntax, so the next parts, i.e. N and Data, are independent, constituent types.

Maybe your parametric types PR could address this.

My PR is only about changing documentation, and I doubt that changes to fundamental Julia syntax would be accepted at v1.7 of the language.

*It could seem confusing that T <: NamedTuple seems to assert a subtype relation between T and NamedTuple, despite the latter being a parametric type (which cannot be subtyped in Julia):

julia> isconcretetype(NamedTuple)
false

julia> isabstracttype(NamedTuple)
false

What's going on here? It's neither abstract nor concrete? I have highlighted this in my changes, but if both of these return false, then we are dealing with a parametric composite type. Parametric types aren't concrete, because they represent a family of types, but they need not represent a family of abstract types. In this case, the <: syntax is asserting that T is one of the types that is defined by the NamedTuple parametric type. I think this "overloading" of the <: syntax is what you find confusing, and I would tend to agree, but I'm not sure it's bad enough to justify changing the language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants