# Notebook to assist in development of create_calibration_df

## Function: `create_calibration_df`

### Description
The `create_calibration_df` function is designed to create a DataFrame that holds information on the calibration data frame types extracted from FITS header data. It filters and groups data based on specific image types (`IMAGETYP`) and gain values (`GAIN`).

### Parameters
- `df` (`pandas.DataFrame`): The input DataFrame containing FITS header data. It is expected to include columns for `IMAGETYP` and `GAIN`.

### Returns
- `pandas.DataFrame`: A DataFrame with the columns 'TYPE', 'GAIN', and 'NUMBER'. Here, 'TYPE' corresponds to the types of images (like DARK, BIAS, FLAT, FLATDARKS), 'GAIN' reflects the gain value, and 'NUMBER' indicates the count of each group combination.

### Functionality
1. **Filtering**: The function starts by selecting rows from the input DataFrame where `IMAGETYP` matches one of the relevant types for calibration (DARK, BIAS, FLAT, FLATDARKS).
2. **Grouping and Counting**: It then groups the filtered data by `IMAGETYP` and `GAIN`, and counts the number of occurrences for each group.
3. **Formatting Output**: Finally, the function renames the 'IMAGETYP' column to 'TYPE' in the output DataFrame for clarity.

### Usage
This function is used to create a data frame that summarises the numbers of teh differnt types of calibration frame fo later use in the project.


### Import libraries and developed code functions

In [2]:
from utils import *

### Define create_calibration_df function

In [5]:
def create_calibration_df(df):
    """
    Creates a DataFrame for calibration data based on specific IMAGETYPs and GAIN.
    
    Parameters:
    df (pandas.DataFrame): The DataFrame containing FITS header data.
    
    Returns:
    pandas.DataFrame: A DataFrame with columns 'TYPE', 'GAIN', and 'NUMBER'.
    """
    relevant_types = ['DARK', 'BIAS', 'FLAT','FLATDARKS']
    filtered_df = df[df['IMAGETYP'].isin(relevant_types)]
    group_counts = filtered_df.groupby(['IMAGETYP', 'GAIN']).size().reset_index(name='NUMBER')
    return group_counts.rename(columns={'IMAGETYP': 'TYPE'})


## Testing

### Supply the path to directory containing mutliple FITS files and execute function. The test directory has no hierachy but contains Lights, Dark, Flats and Bias frames. Run the header extraction function

In [14]:
path = '/home/steve/Desktop/AstroData/NGC 7822'
#Call extraction function to create header dataframe.
#extract_fits_headers() was imported from utils
headers_df = extract_fits_headers(path)

### Create the dataframe that holds calibration framw information 

In [7]:
calibration_df = create_calibration_df(headers_df)

### Display data frame containing header information calibration_df

In [11]:
calibration_df

Unnamed: 0,TYPE,GAIN,NUMBER
0,BIAS,0,100
1,BIAS,100,100
2,DARK,0,50
3,DARK,100,50
4,FLAT,0,200
5,FLAT,100,150


### The results show that the function has sucessfully collected the calibration frame information from the directory having a flat hierachy.

### The result should be the same for a directory containing subdirectories as the collection function works and this function only operates on the datafarme that contains the collected header information. But we will test its use on a hirachical directory

In [15]:
path = '/mnt/HDD_8TB/Preselected/Flaming Star Nebula Mosaic started 30th January 2023'
#Call function
headers_df1 = extract_fits_headers(path)

calibration_df = create_calibration_df(headers_df1)

calibration_df

Unnamed: 0,TYPE,GAIN,NUMBER
0,BIAS,0,100
1,BIAS,100,100
2,DARK,0,50
3,DARK,100,50
4,FLAT,0,200
5,FLAT,100,150


### All looks fine. The function will work with both hierachical and flat folders