## Function: `create_lights_df`

### Description
This function creates a DataFrame specifically for 'LIGHT' type data extracted from FITS header data. It replaces filter names with corresponding codes and aggregates specific columns for analysis. The function is designed to work with multiple DataFrames, integrating them to form a comprehensive dataset for 'LIGHT' type astronomical observations.

### Parameters
- `df` (`pandas.DataFrame`): DataFrame containing FITS header data.
- `calibration_df` (`pandas.DataFrame`): DataFrame containing calibration data.
- `filters_df` (`pandas.DataFrame`): DataFrame containing filter codes.
- `bortle` (`int`): Bortle scale value indicating the darkness of the sky.
- `mean_sqm` (`float`): Mean sqm value representing sky brightness.
- `mean_fwhm` (`float`): Mean FWHM (Full Width at Half Maximum) value, indicating the clarity of the observed stars.

### Returns
- `pandas.DataFrame`: A DataFrame with organized 'LIGHT' type data. It includes columns such as date, filter code, gain, binning, calibration counts (darks, flats, bias, flatDarks), sensor cooling, mean sqm, mean fwhm, and temperature.

### Functionality
1. **Filter Data**: Selects only 'LIGHT' type data from the input FITS headers DataFrame.
2. **Data Transformation**: Converts dates, aligns gain values, and merges with the filters DataFrame to replace filter names with codes.
3. **Aggregation**: Groups data by date, filter code, gain, and binning, calculating mean values and counts as necessary.
4. **Calibration Data Integration**: Integrates calibration data for darks, flats, bias, and flatDarks using the specified gain value.
5. **Additional Data Integration**: Includes provided bortle, mean sqm, and mean fwhm values.
6. **Output Formatting**: Rounds and formats sensor cooling and temperature data, ensuring the DataFrame is structured with the specified column order.

### Usage
The `create_lights_df` function is essential for astronomers and astrophotographers looking to analyze their observational data effectively. It simplifies the process of organizing and preparing data for detailed analysis or for uploading to platforms like AstroBin.


### Import libraries and developed code functions

In [11]:
from utils import *

### Define create_lights_df function

In [190]:
def create_lights_df(df, calibration_df, filters_df, bortle, mean_sqm, mean_fwhm):
    """
    Creates a DataFrame for 'LIGHT' type data with specified columns, replacing filter names with codes.
    
    Parameters:
    df (pandas.DataFrame): The DataFrame containing FITS header data.
    calibration_df (pandas.DataFrame): The DataFrame containing calibration data.
    filters_df (pandas.DataFrame): The DataFrame containing filter codes.
    bortle (int): Bortle scale value.
    mean_sqm (float): Mean sqm value.
    mean_fwhm (float): Mean FWHM value.
    
    Returns:
    pandas.DataFrame: A DataFrame with specified columns for 'LIGHT' type data.
    """


    # Create a dataframe woth only light image header data extracted from the main data frame
    light_df = df[df['IMAGETYP'] == 'LIGHT'].copy()
    # Extract the date from the date-time information
    light_df['date'] = pd.to_datetime(light_df['DATE-LOC']).dt.date
    #group on gain
    light_df['gain'] = light_df['GAIN'].astype(calibration_df['GAIN'].dtype)
    #rename XBINNING column
    light_df.rename(columns={'XBINNING': 'binning'}, inplace=True)
    

    #Replace the filter names with Astrobin filter codes 
    #Merge light_df with filters_df by matching the 'FILTER' column in light_df to the 'Filter' column in filters_df
    #This combinines data from both DataFrames based on filter names.
    light_df = light_df.merge(filters_df, left_on='FILTER', right_on='filter')

    #Group light_df by 'date', 'Filter code ', 'gain', and 'XBINNING'.git/
    #then calculate the sum of all frames in the group defined on the specfic 'DATE-LOC',
    #the mean of 'EXPOSURE', 'CCD-TEMP', and 'FOCTEMP',

    aggregated = light_df.groupby(['date', 'code', 'gain', 'binning']).agg(
        number=('DATE-LOC', 'count'),
        duration=('EXPOSURE', 'mean'),
        sensorCooling=('CCD-TEMP', 'mean'),
        temperature=('FOCTEMP', 'mean')
    ).reset_index()
    #aggregate all other parameters for the aggregated dataframe
    aggregated['sensorCooling'] = aggregated['sensorCooling'].round().astype(int)
    aggregated['flatDarks'] = 0
    aggregated['bortle'] = bortle
    aggregated['meanSqm'] = mean_sqm
    aggregated['meanFWHm'] = mean_fwhm
 

    #define function that collects all calibration frame information
    def get_calibration_data(row, cal_type):
        match = calibration_df[(calibration_df['TYPE'] == cal_type) & (calibration_df['GAIN'] == row['gain'])]
        return match['NUMBER'].sum() if not match.empty else 0

    #add these to the aggregated data frame 
    aggregated['darks'] = aggregated.apply(get_calibration_data, cal_type='DARK', axis=1)
    aggregated['flats'] = aggregated.apply(get_calibration_data, cal_type='FLAT', axis=1)
    aggregated['bias'] = aggregated.apply(get_calibration_data, cal_type='BIAS', axis=1)
    aggregated['flatDarks'] = aggregated.apply(get_calibration_data, cal_type='FLATDARKS', axis=1)
    aggregated['sensorCooling'] = aggregated['sensorCooling'].round(1)
    aggregated['temperature'] = aggregated['temperature'].round(1)
    #rename Code column
    aggregated.rename(columns={'code': 'filter'}, inplace=True)


    #adjust the column order to match that required by AstoBin
    column_order = ['date', 'filter', 'number', 'duration', 'binning', 'gain', 
                    'sensorCooling', 'darks', 'flats', 'flatDarks', 'bias', 
                    'meanSqm', 'meanFWHm', 'temperature']
    #aggregated = aggregated.reindex(columns=column_order)

    return aggregated.reindex(columns=column_order)

## Testing

Supply the path to directory containing mutliple FITS files and execute function. The test directory has no hierachy but contains Lights, Dark, Flats and Bias frames. Run the header extraction function and the calibration collection function.Read the filters.csv file

In [191]:
path = '/home/steve/Desktop/AstroData/NGC 7822'
filter_path ='/mnt/HDD_4TB/Code/Astronomy/AstroBinUploader/AstroBinUploader/filters.csv'
bortle = 4
mean_sqm = 20.4
mean_fwhm = 2.6
#Call extraction function to create header dataframe.
#extract_fits_headers() was imported from utils
headers_df = extract_fits_headers(path)
calibration_df = create_calibration_df(headers_df)
filters = pd.read_csv(filter_path)


In [192]:
lights = create_lights_df(headers_df, calibration_df, filters, bortle, mean_sqm, mean_fwhm)

In [194]:
# Set the option to display all rows
pd.set_option('display.max_rows', None)
lights

Unnamed: 0,date,filter,number,duration,binning,gain,sensorCooling,darks,flats,flatDarks,bias,meanSqm,meanFWHm,temperature
0,2023-08-15,4663,9,600.0,1,100,-10,50,150,0,100,20.4,2.6,10.7
1,2023-08-17,4663,19,600.0,1,100,-10,50,150,0,100,20.4,2.6,13.3
2,2023-08-28,4663,2,600.0,1,100,-10,50,150,0,100,20.4,2.6,11.8
3,2023-08-28,4844,7,600.0,1,100,-10,50,150,0,100,20.4,2.6,10.4
4,2023-09-02,4663,16,600.0,1,100,-10,50,150,0,100,20.4,2.6,13.6
5,2023-09-03,4663,14,600.0,1,100,-10,50,150,0,100,20.4,2.6,11.3
6,2023-09-03,4844,16,600.0,1,100,-10,50,150,0,100,20.4,2.6,15.9
7,2023-09-04,4844,27,600.0,1,100,-10,50,150,0,100,20.4,2.6,13.7
8,2023-09-15,4752,33,600.0,1,100,-10,50,150,0,100,20.4,2.6,12.7
9,2023-09-16,4752,25,600.0,1,100,-10,50,150,0,100,20.4,2.6,11.8


The results show that the function has sucessfully collected the calibration frame information from the directory having a flat hierachy.

The result should be the same for a directory containing subdirectories as the collection function works and this function only operates on the datafarme that contains the collected header information. But we will test its use on a hirachical directory

In [197]:
path = '/mnt/HDD_8TB/Preselected/Flaming Star Nebula Mosaic started 30th January 2023'
filter_path ='/mnt/HDD_4TB/Code/Astronomy/AstroBinUploader/AstroBinUploader/filters.csv'
bortle = 4
mean_sqm = 20.4
mean_fwhm = 2.6
#Call extraction function to create header dataframe.
#extract_fits_headers() was imported from utils
headers_df = extract_fits_headers(path)
calibration_df = create_calibration_df(headers_df)
filters = pd.read_csv(filter_path)
lights = create_lights_df(headers_df, calibration_df, filters, bortle, mean_sqm, mean_fwhm)
lights

Unnamed: 0,date,filter,number,duration,binning,gain,sensorCooling,darks,flats,flatDarks,bias,meanSqm,meanFWHm,temperature
0,2023-01-29,4663,6,600.0,1,100,-10,50,150,0,100,20.4,2.6,3.2
1,2023-01-30,4663,1,600.0,1,100,-10,50,150,0,100,20.4,2.6,4.4
2,2023-02-05,4663,23,600.0,1,100,-10,50,150,0,100,20.4,2.6,-2.4
3,2023-02-06,4663,6,600.0,1,100,-10,50,150,0,100,20.4,2.6,-3.0
4,2023-02-07,4663,42,600.0,1,100,-10,50,150,0,100,20.4,2.6,-3.3
5,2023-02-08,4663,13,600.0,1,100,-10,50,150,0,100,20.4,2.6,-4.8
6,2023-02-08,4844,14,600.0,1,100,-10,50,150,0,100,20.4,2.6,-0.5
7,2023-02-09,4844,28,600.0,1,100,-10,50,150,0,100,20.4,2.6,-1.6
8,2023-02-10,4844,13,600.0,1,100,-10,50,150,0,100,20.4,2.6,-3.1
9,2023-02-13,4844,31,600.0,1,100,-10,50,150,0,100,20.4,2.6,0.7


All looks fine. The function will work with both hierachical and flat folders