# The `dkist.Dataset`

In DKIST data parlance, a "dataset" is the smallest unit of data that is searchable from the data centre, and represents a single self-contained observation [check with Stu for a better short definition here].
The user tools represent this unit of data with the `Dataset` class.
Within this class the data are stored as many FITS files, each containing a single frame of the observation, and an ASDF file describing how the frames relate to each other.
For VTF data, for example, one FITS file would contain a single narrowband image in one Stokes profile at a single time.
Since there will be very many of these files, each with their own FITS header, manually tracking and inspecting them would be unmanageable.
The `Dataset` class combines these many files into one object, allowing you to inspect the properties and combined headers of the whole dataset.

There are a few ways to construct a `Dataset` object.
For the first we will need the ASDF file for the dataset, which we can get using `Fido` as we saw yesterday.

In [None]:
# Imports
import dkist
import dkist.net
from sunpy.net import Fido, attrs as a

In [None]:
# Create DKIST Fido client instance
res = Fido.search(a.dkist.Dataset('BEOGN'))

res

In [None]:
files = Fido.fetch(res)
files

Notice that the file we have downloaded is a single ASDF file, **not** the whole dataset.
We can use this file to construct the `Dataset`:

In [None]:
ds = dkist.Dataset.from_asdf(files[0])

Now we have a `Dataset` object which describes the shape, size and physical dimensions of the array, but doesn't yet contain any of the actual data.
This may sound unhelpful but we'll see how it can be very powerful.

First let's have a look at the basic representation of the `Dataset`.

In [None]:
ds

This tells us that we have a 4-dimensional data cube and what values the axes correspond to.
Importantly, it not only gives us information about the *pixel* axes (the actual dimensions of the array itself), but also the *world* axes (the physical quantities related to the observation).
It also gives us a correlation matrix showing how the pixel axes relate to the world axes.

## NDCube and array vs pixel vs world

**Pixel is fortran order, array is C order. So pixel_to_world takes coordinates in fortran order but array_to_world doesn't.**

An important and useful aspect of the `Dataset` is that it is coordinate aware.
That is, it is able to map between array indices and physical dimensions.
This means that you can easily convert from a position in the array to a location defined by physical coordinates.

For this mapping we have three related concepts: _array_ axes, _pixel_ axes and _world_ axes.
Array axes are simply the indices of the array.
Pixel axes are the same but in Fortran order, as this is how axes are specified in FITS headers.
World axes are the physical dimensions of the data.

To track these axes and convert between them, the `Dataset` has a `wcs` attribute (World Coordinate System).

In [None]:
# Convert array indices to world (physical) coordinates
ds.wcs.array_index_to_world(0, 10, 20, 30)

In [None]:
# Convert pixel coords to world coords
# Notice reversed order compared to array indices
ds.wcs.pixel_to_world(30, 20, 10, 0)

***Put an example or more detail here***



### Something?

Finally the correlation matrix tells us which pixel axes correspond to which world axes.
In this case the first three pixel axes align exactly with three of the world axes.
However, the slit axis maps to both longitude *and* latitude, since the slit is unlikely to be aligned to either one.