# Data Management

*J. Runnoe* <br>
*October, 2023*

In the course of your data reduction, it will be useful to have a coding skillset for repeating tasks and working with the imaging data set. This notebook will introduce some helpful coding skills in this context.

---
## Contents
* [Exercises](#exercises)
* [Summary](#summary)

---
## Exercises <a class="anchor" id="exercises"></a>


In [1]:
# import block
import numpy as np
from astropy.io import fits
from matplotlib import pyplot as plt
from matplotlib import rc
import matplotlib as mpl
%matplotlib inline
from astropy.visualization import hist
from ccdproc import ImageFileCollection
import ccdproc as ccdp
from astropy.modeling import fitting
from astropy.modeling.models import Polynomial1D,Chebyshev1D,Legendre1D,Hermite1D
from astropy.nddata import CCDData
import glob
from datetime import datetime
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator)

In [2]:
# import convenience plotting functions downloaded from 
# here: https://github.com/mwcraig/ccd-reduction-and-photometry-guide
phot_tutorial_dir = '/Users/runnojc1/Software/py/ccd-reduction-and-photometry-guide/notebooks/'
import sys
sys.path.insert(0,phot_tutorial_dir)
from convenience_functions import show_image

In [3]:
# for imaging, I like the photometry notebook plot defaults
# so use their custom style for larger fonts and figures
plt.style.use('/Users/runnojc1/Software/py/ccd-reduction-and-photometry-guide/notebooks/guide.mplstyle')

# set a couple more default parameters for the plots below
rc('font', size=20)
rc('axes', grid=True)

In [4]:
data_dir = '/Users/runnojc1/Dropbox/Research/teaching/F2021/ASTR8060/Imaging/'            # raw data directory

#### 1. Collecting filenames

Create a list of your bias filenames in three different ways:

##### Using list comprehension

List comprehension is a "Pythonic" way of writing a for loop. It is often a little bit faster than writing a loop, and it allows you to fill your arrays in one line instead of having to declare your empty variables and then fill them in the loop. The syntax might look like this:

```python
telescope_names = ['Gemini','Keck','Magellan','WIYN']
instrument_names = ['GMOS','KCIW','MAGE','ODI-N']

# normal loop
names = []
for scope,instr in zip(telescope_names,instrument_names):
    names.append(scope+'/'+instr)

# list comprehension
names = [scope+'/'+instr for scope,instr in zip(telescope_names,instrument_names)]
```

In [5]:
# 1. list comprehension


##### Using [`glob`](https://docs.python.org/3/library/glob.html).

`glob` is a Python package that is good for listing the contents of a directory if you know the general pattern of filename you'd like to find (`listdir` is a good option if you'd just like to list everything in a directory). You might use it like this:

```python
import glob
img_files = glob.glob(data_dir+'a[0-1]*.fits')
```

This will return all the raw data files with numbers starting in 0 or 1 (i.e. everything with a number <200). You may find that you need to sort them.



In [6]:
# 2. glob


##### Using an [`ImageFileCollection`](https://ccdproc.readthedocs.io/en/latest/api/ccdproc.ImageFileCollection.html).

An `ImageFileCollection` is a class in the `ccdproc` package that provides a useful way to represent many image files and their parameters. [Notebook [01-11]](https://github.com/astropy/ccd-reduction-and-photometry-guide/blob/main/notebooks/01-11-reading-images.ipynb) has some useful tutorials for reading images, including examples with `ImageFileCollections`.

```python
from ccdproc import ImageFileCollection
imgs = ImageFileCollection(data_dir,glob_include='a???.fits',glob_exclude='*ot*.fits') # exclude otz files in case of rerun
mask = imgs.summary['exptime'] == 30. # find all 30s exposures
files_t30 = np.array(imgs.files_filtered(include_path=True))[mask]
```

This code will read all the files matching the pattern "a???.fits" (except those with the "ot" flags) into an `ImageFileCollection`. It will create a mask for those with 30s exposures and return their filenames.

In [7]:
# 3. ImageFilecollection


#### 2. Extracting information from fits headers

A useful exercise is to be able to extract e.g., the time an exposure was taken from its header. Try to extract the time stamp for each flat frame and plot median pixel value in the image versus time.

It may be useful to express the times using the [`datetime`](https://docs.python.org/3/library/datetime.html) module. It is also helpful to know that you can get Python to mark just the time when plotting `datetime` objects with the following:

```python
fig,ax = plt.subplots(figsize=(8,6))
ax.plot(x,y)
ax.set_ylabel('Value')
ax.set_xlabel('Time')
ax.xaxis.set_major_formatter(mpl.dates.DateFormatter('%H:%M'))
plt.show()
```

Finally, note that when reading in many header data units and headers from fits files, you can use up a lot of your computer's available memory. Eventually, it may complain. In that case, you can delete variables you no longer need:

```python
hdu = fits.open('filename')
del hdu
```

---
## Summary <a class="anchor" id="summary"></a>

At this point, all of you should have:
* Practiced multiple ways of pulling up filenames.
* Learned to extract and plot times from the fits headers.