# Named Array Interface

So far we have only created a way to load in the data as an array, with the associated names for the dimentions, but we do not have any way to access the data with the names of the dimensions like in xarray.

A huge drawback of using Julia is that it is extremely young, so it's not really possible to just pick the most popular package similar to xarray and build on top of that. There are currently a lot of packages which provide (most of) the functionality we require:

- https://github.com/ITensor/ITensors.jl
- https://github.com/JuliaArrays/AxisArrays.jl
- https://github.com/davidavdav/NamedArrays.jl
- https://github.com/SciML/LabelledArrays.jl
- https://github.com/rafaqz/DimensionalData.jl
- https://github.com/invenia/NamedDims.jl

And more...

The first part of this project involved a [feasibility study](https://github.com/RobertRosca/cfgrib-notes/blob/master/191215-02-proposal.ipynb) where I picked what seemed to be the most suitable option for the initial implementation, and in the end I settled on `AxisArrays.jl` as the best option. However this needs to be very flexible in case other implementations 'win out' in the long run, or in case it ends up that a different implementation works better for different tasks.

## Activating a Backend

### Plots.jl Approach

This is a common probjem in Julia, and a few packages provide a unified interface to multiple backends. The best example of this is Plots.jl, which itself is not really a plotting library, it is more of an abstraction over existing plotting backends.

For example, plots would be used by:

```
using Plots
gr()

x = 1:10; y = rand(10, 2) # 2 columns means two lines
plot(x, y, title = "Two Lines", label = ["Line 1" "Line 2"], lw = 3)

plot(x, y, seriestype = :scatter, title = "My Scatter Plot")

p1 = plot(x, y) # Make a line plot
p2 = scatter(x, y) # Make a scatter plot
p3 = plot(x, y, xlabel = "This one is labelled", lw = 3, title = "Subtitle")
p4 = histogram(x, y) # Four histograms each with 10 points? Why not!
plot(p1, p2, p3, p4, layout = (2, 2), legend = false)

# etc...
```

The above would use the GR framework backend to create the plots, as `gr()` was called after the plotting module was imported. If you change the line from `gr()` to `plotly()`, the same functions would run and produce the equivalent plots with plotly, same if you use `pyplot()`.

This is a very neat approach, as one single line allows you to completely chang the backend for the rest of the code. For plotting this is very useful as you may want to use an interactive plotting backend, like plotly, when exploring the data initially, but then change to a more performance-orientated backend, like GR, for the final product or when running headless.

It's not clear to me if this much flexibility would be needed, and creating this might add a lot of really unecessary abstraction overhead. Such an implementation is not [that complex if you are faimiliar with julia](https://github.com/JuliaPlots/Plots.jl/blob/master/src/backends.jl) but it does rely on some advanced julia metaprogramming concepts, which would be quite offputting to newcommers.

### `Convert` Approach

Julia has a standardised approach to conversion and type promotion, [from the docs](https://docs.julialang.org/en/v1/manual/conversion-and-promotion/):

```
julia> x = 12
12

julia> typeof(x)
Int64

julia> convert(UInt8, x)
0x0c

julia> typeof(ans)
UInt8

julia> convert(AbstractFloat, x)
12.0

julia> typeof(ans)
Float64

julia> a = Any[1 2 3; 4 5 6]
2×3 Array{Any,2}:
 1  2  3
 4  5  6

julia> convert(Array{Float64}, a)
2×3 Array{Float64,2}:
 1.0  2.0  3.0
 4.0  5.0  6.0
 ```
 
The following automatically calls convert:
 
- Assigning to an array converts to the array's element type.
- Assigning to a field of an object converts to the declared type of the field.
- Constructing an object with new converts to the object's declared field types.
- Assigning to a variable with a declared type (e.g. local x::T) converts to that type.
- A function with a declared return type converts its return value to that type.
- Passing a value to ccall converts it to the corresponding argument type.

So, we can define a function `convert(::AxisArray, x::cfgrib.DataSet)` which would then automatically convert our type to the desired type where required.

This means that, at least in theory, if somebody has code which expects an input type of `AxisArray`, it would automatically accept that type. For example:

In [None]:
struct DS # DataSet equivalent, dimension names and arrays not linked
    a::Int
    b::Int
end

In [None]:
struct AA # AxisArray, like xarray
    c::Int
end

Base.convert(::Type{AA}, x::DS) = AA(x.a + x.b)

In [None]:
struct NA # NamedArray, like xarray
    d::Int
end

Base.convert(::Type{NA}, x::DS) = NA(x.a + x.b)

In [None]:
ds = DS(1, 2) # DataSet

In [None]:
struct OtherPackage # Somebody elses package
    e::AA # They expect an AxisArray type
end

In [None]:
OtherPackage(ds) # They pass OUR DataSet into THEIR type, and it gets converted automatically

This automatic type conversion is really nice and extremely useful for facilitating interoperability between packages, however this feature is more for developers.

The user side of this would be just directly calling convert:

In [None]:
na_with_convert = convert(NA, ds)

Or, a bit more user friendly, using the type you'd like to convert to as a funtction::

In [None]:
NA(x::DS) = convert(NA, x)

In [None]:
na_direct_call = NA(ds)

So in the end the users would do something like:

```
dataset = DataSet(path, read_keys, filter, etc...)

data = convert(AxisArray, dataset)

data = AxisArray(dataset)
```