# Proposal

Other Notebooks:

 - [Notes](./191201-00-notes.ipynb)
 - [cfgrib playground](./191201-01-cfgrib-playground.ipynb)
 - [Looking at some existing climate packages in Julia](./191215-00-julia-climate-tools.ipynb)
 - [Looking at aditional tools in Julia](./191215-01-julia-additional-tools.ipynb)

## Named Array Backend - [Exploring Julia xarray equivalents](./191208-00-julia-xarrays-like.ipynb)

There are currently around five options in Julia for creating n-dimensional arrays with named axis:

 - [AxisArrays](https://github.com/JuliaArrays/AxisArrays.jl)
 - [NamedArrays](https://github.com/davidavdav/NamedArrays.jl)
 - [NamedDims.jl](https://github.com/invenia/NamedDims.jl)
 - [DimensionalData.jl](https://github.com/rafaqz/DimensionalData.jl)
 - [ITensors.jl](https://github.com/ITensor/ITensors.jl)

Out of those, both DimensionalData and ITensors explicitly stated that they are preview releases/very early in development and that the interfaces will continue to change. This is an inevitable problem in Julia as the language itself is very young and the ecosystem has yet to settle on standard packages in the same way that Python has.

The remaining options were AxisArrays, NamedArrays, and NamedDims. The main benefits Julia has over Python is performance and the focus on multiple dispatch, these both require type-stable data that the compiler can easily deal with and interpret, NamedArrays does not allow for this when doing axis lookups so (even though it is in the top for popularity) I decided to rule it out.

NamedDims does what it says on the tin and lets you name the dimensions - without associated coordinate values, so it was ruled out.

Which leaves AxisArrays. However, as mentioned in accompanying notebook, there are extensive discussion about implementing large changes to AxisArrays in the future due to some limitations caused by the current architecture of the package. These changes should end up being mostly internal, meaning that packages depending on AxisArrays should not have to change much (if at all) to be compatible with future versions. The discussions are spread across these issues on GitHub: [AxisArrays Roadmap](https://github.com/JuliaArrays/AxisArrays.jl/issues/7), [AxisArrays Issue: Use value indexing by default](https://github.com/JuliaArrays/AxisArrays.jl/issues/84), [AxisArraysFuture Plan](https://github.com/JuliaCollections/AxisArraysFuture/issues/1).

Given those discussions, I had another look at DimensionalData (which explicitly mentions that it is under active development and unstable) and it is also a very nice option, as it is effectively a much newer 'cleaner' version of AxisArrays. Its implementation allows for much easier extensibility and abstraction, the syntax is more user-friendly and slightly less verbose, and finally it appears to solve the problems AxisArrays has encountered.

In the end stability will be a problem with any choice in an ecosystem which has not settled down on standard packages. As an indication of how much movement is planned, here is a [comment from a discussion](https://github.com/JuliaCollections/AxisArraysFuture/issues/1#issuecomment-484702271) about the future of many of these packages:

> > I just started looking through the NamedDims repo, so I apologize if this is already documented but I missed it. Is NamedDims ultimately intended to be integrated into AxisArrays or is it suppose to be a dependency or something else entirely?
>
> NamedDims.jl is intended to:
>   A) Be used on its own,
>   B) Be integrated into a future package along with IndexedDims.jl (name pending, Indexes.jl?), and that future package will replace AxisArrays.jl. (Like how StaticArrays.jl, replaced FixedSizedArrays.jl)

My honest view is that this space will change massively over the next few years, so no current choice will survive for long. There are a lot of very new, very early-development packages like [AbstractIndecies](https://github.com/Tokazama/AbstractIndices.jl) or [IndexedDims](https://github.com/invenia/IndexedDims.jl), [AcceleratedArrays](https://github.com/andyferris/AcceleratedArrays.jl), and more, which have the potential to completely change the ecosystem in the future depending on their success.

A caveat to this is that even if things do change that should not be too big of a problem. Unlike Python, Julia has an excellent built-in package manager which handles version dependencies and concretisation beautifully, meaning large deprecations should not be a problem as you will always know what versions your package depends on and you can install all of these in a separate environment very easily.

The other bonus is that all of these packages are relatively simple wrappers around the base Julia implementation of arrays, meaning that they are all on the order of 1-2 thousand lines of code, whereas xarray is on the order of tens of thousands of lines (core being ~28k). This isn't the best measure of complexity, but such a huge difference does indicate a big difference in the amount of effort and knowledge required to maintain these packages, which is a good sign for the long-term health of the ecosystem (at least once it has had a chance to settle down).

In summary, there's no easy way to pick the best option or to say what will be used in a year or two. But, at least for now, AxisArrays has the most good points:

 - It has a larg set of contributors, and more use in the community
 - The interface is quite similar `xarrays`
 - It allows for "type-stable selection of dimensions and compile-time axis lookup", which lets the compiler to keep the code performant
 - Already has integration with other packages, e.g. `SimpleTraits.jl`
 - Implementation with a metadata layer exists in `ImageMetadata.jl` - required for xarray-like attributes