# Dask Loading and Saving to Zarr
Here we will quickly show an option to open a folder of tif files as a dask stack. Afterwards we can save it to a zarr file which makes handling of the data much easier. To load the folder we are using a custom function `make_dask_stack_from_folder`: 
```python
def make_dask_stack_from_folder(folder: str, prefix: str):
    filenames = sorted(glob(folder + prefix + "*.tif"), key=alphanumeric_key)
    # read the first file to get the shape and dtype
    # ASSUMES THAT ALL FILES SHARE THE SAME SHAPE/TYPE
    sample = imread(filenames[0])

    lazy_imread = delayed(imread)  # lazy reader
    lazy_arrays = [lazy_imread(fn) for fn in filenames]
    dask_arrays = [
        da.from_delayed(delayed_reader, shape=sample.shape, dtype=sample.dtype)
        for delayed_reader in lazy_arrays
    ]
    # Stack into one large dask.array
    return da.stack(dask_arrays, axis=0)
```

If you want to see how it works you can visit the [napari site](https://napari.org/tutorials/processing/dask.html) as it is just an implementation of the function shown there. 

In [1]:
from dask_image_procesing_tips_n_tricks import make_dask_stack_from_folder
import dask.array as da

# Make sure to select the folder location on your machine!
folder_name = r"C:\Users\ryans\Documents\output data (big)\dask tutorial\Lund Zenodo t180 - 260 stepsize 2\\"
prefix_name = "lund_i"

lund_stack = make_dask_stack_from_folder(folder=folder_name, prefix=prefix_name)
lund_stack

Unnamed: 0,Array,Chunk
Bytes,2.77 GiB,71.00 MiB
Shape,"(40, 71, 1024, 512)","(1, 71, 1024, 512)"
Count,120 Tasks,40 Chunks
Type,uint16,numpy.ndarray
"Array Chunk Bytes 2.77 GiB 71.00 MiB Shape (40, 71, 1024, 512) (1, 71, 1024, 512) Count 120 Tasks 40 Chunks Type uint16 numpy.ndarray",40  1  512  1024  71,

Unnamed: 0,Array,Chunk
Bytes,2.77 GiB,71.00 MiB
Shape,"(40, 71, 1024, 512)","(1, 71, 1024, 512)"
Count,120 Tasks,40 Chunks
Type,uint16,numpy.ndarray


Although the size could be handled by our RAM this will change as soon as we re-scale the images. And even if it might just fit into memory the parallelization possibilities of dask still make processing much faster than looping through a folder. To be able to open this stack easily as dask stacks it is convenient to store these stacks as zarr files. This file format allows us to have an organized dataset that we can access through dask:

In [3]:
%%time
da.to_zarr(
    arr = lund_stack, 
    # Make sure to select a file location on your machine!
    url = r"C:\Users\ryans\Documents\output data (big)\dask tutorial\lund_zenodo.zarr", 
    component="original_data", 
    overwrite=True, 
    compute=True,
    compressor = None)

CPU times: total: 6.17 s
Wall time: 2.32 s


As you can see we have more CPU time elapsed than the time it took to run the cell. This is because even saving is parallelized! Now for some explanation about zarr. In general the structure is much like you would structure a folder of tif files, with different folders for the different versions of the dataset. For general information on zarr you can read the [documentation](https://zarr.readthedocs.io/en/stable/) but we will go through a few of the parameters of the [`to_zarr`](https://docs.dask.org/en/stable/generated/dask.array.to_zarr.html) function now. 

- `url`: The URL is just the path to the file. This could be a URL if you are working with a file server but we will not go through this use case here
- `component`: This is basically a sub folder in the zarr file in which you would like to store the array. This allows you to have multiple versions of your dataset e.g.: rescaled, denoised, labelled, downsampled, etc.
- `compressor`: The zarr file format has many options of how to compress your data. Here it is advised that you choose a method which suits your needs. For more information the [documentation](https://zarr.readthedocs.io/en/stable/tutorial.html#compressors) has some info
- `compute`: This determines if the images are actually computed or if only the delayed dask function is stored. If you want to have images which you can instantly access you want to compute your results before saving them