# More Dataset

Firstly we need to re-create our dataset object from the last notebook:

In [None]:
import dkist
import dkist.net
from sunpy.net import Fido, attrs as a

res = Fido.search(a.dkist.Dataset('BEOGN'))
files = Fido.fetch(res)
ds = dkist.Dataset.from_asdf(files[0])

The `Dataset` object allows us to do some basic inspection of the dataset as a whole without having to download the entire thing, using the metadata in the FITS headers.
This will save you a good amount of time and also ease the load on the DKIST servers.
For example, we can check the seeing conditions during the observation.

Notice that the file we have downloaded is a single ASDF file, **not** the whole dataset.
We can use this file to construct the `Dataset`:

In [None]:
# Will need this
import matplotlib.pyplot as plt

ds = dkist.Dataset.from_asdf(files[0])

# This may be useful here
ds.meta['inventory']['headerDocumentationUrl']

In [None]:
# Just look at the headers for Stokes I so there aren't 4 lots of the same values
I_headers = ds.headers[ds.headers['DINDEX4'] == 1]
plt.plot(I_headers['ATMOS_R0'])
plt.show()

This information allows us to select the parts of the data where the seeing is good, and only download those files.
We will see a more detailed demonstration of how to do this later.

There is an important point to note about slicing the array to reduce the number of files, which is that you need to keep in mind how the data are stored across those files.
We can see a little more information about the files with the `files` attribute of the `Dataset`:

In [None]:
ds.files

So in this case we can see that each FITS file contains effectively a 2D image - a single raster scan at one polarisation state - and that we have 4000 of these files to make a full 4D dataset.
What this means is that if we look at a subset of the scan steps or polarisation states, we will reduce the number of files across which the array is stored.

In [None]:
ds[0]

First, notice that when we slice a `Dataset` like this, the output we get here shows us not just the updated array shape but also the updated dimensions.
Because we're looking at a single polarisation state, that axis and the corresponding physical axis have been removed.

In [None]:
ds[0].files

However, if we decide we want to look at a single wavelength, we are taking a row of pixels from every single file.
So although we reduce the dimensions of the array, we are not reducing the number of files we need to reference - and therefore download.

In [None]:
ds[:, :, 500, :].data.shape

In [None]:
ds[:, :, 500, :].files

## Downloading the quality report and preview movie

For each dataset a quality report is produced during calibration which gives useful information about the quality of the data.
This is accessible through the `Dataset`'s `quality_report()` method, which will download a PDF of the quality report to the base path of the dataset.
This uses parfive underneath, which is the same library `Fido` uses, so it will return the same kind of `results` object.
If the download has been successful, this can be treated as a list of filenames.

In [None]:
qr = ds.files.quality_report()
qr

This method takes the optional arguments `path` and `overwrite`.
`path` allows you to specify a different location for the download, and `overwrite` is a boolean which tells the method whether or not to download a new copy if the file already exists.

Similarly, each dataset also has a short preview movie showing the data.
This can be downloaded in exactly the same way as the quality report but using the `preview_movie()` method:

In [None]:
pm = ds.files.preview_movie()
pm