# notebook for writing edr browse products

This is designed to be run on a fully-created version of the
EDR data collection. It simply opens every FITS file, converts
it to JPEG, fills out a simple label, and writes the JPEG and
XML files out to a duplicate of the EDR directory structure.

## performance tips
Performance tip: Parallelizing this probably won't help that
much unless you're in an unusual operating environment, because
your most likely bottleneck is IOPS -- encoding these teensy
arrays as JPEGs and inserting a few lines of text requires very
little working memory, processing power, or throughput. Use a very
fast disk if you want to speed it up. If you do have a good reason
to parallelize it, I recommend using ```pathos``` or simply
running multiple instances of this notebook; Python vanilla
```multiprocessing``` will fail when attempting to pickle parts
of this pipeline.

In [None]:
import datetime as dt

import fs.copy
from fs.osfs import OSFS

from clem_conversion import ClemBrowseWriter


In [None]:
# root of the to-be-created EDR browse directory tree
browse_fs = OSFS('~/buckets/clem_output/browse/edr/')

# root of the already-created EDR browse directory tree
data_fs = OSFS('~/buckets/clem_output/data/edr/')

In [None]:
# make the whole directory tree, avoiding tedious directory-
# making later. will take a minute; there are a million or
# so directories.
fs.copy.copy_structure(data_fs, browse_fs)

In [None]:
browse_start_time = dt.datetime.now()
for ix, file in enumerate(data_fs.walk.files(filter=['*.fits'])):
    if ix % 1000 == 0:
        print(file)
        print(str((dt.datetime.now() - browse_start_time).total_seconds()))
        browse_start_time = dt.datetime.now()
    path, fn = fs.path.split(file)
    pds4_root = fn[:-5] 
    output_path = browse_fs.getsyspath(path)
    ClemBrowseWriter(
        pds4_root,
        "edr"
    ).write_pds4(output_path + '/', verbose=False)