# JWST Image Gallery Organizer
**Astronomy 1221 - Project 2**  
**Authors:** Jacob Parzych & Niko Jamison  

---

## Project Overview
This project creates a browsable catalog of James Webb Space Telescope (JWST) observations with metadata organization and image visualization. We extract metadata from astronomical FITS files and organize them into Pandas DataFrames for analysis.

### Astronomy Context
The James Webb Space Telescope (JWST), launched in 2021, is the most powerful space telescope ever built, observing in infrared wavelengths. FITS (Flexible Image Transport System) is the standard astronomical image format containing both image data and extensive metadata in "headers." The `_i2d.fits` files we'll work with are processed, calibrated 2D images ready for scientific analysis.

## Step 1: Install Required Packages
First, we need to install the necessary packages for downloading and working with JWST FITS files.

In [1]:
# Install required packages
!pip install astroquery astropy pandas matplotlib numpy tqdm



## Step 2: Import Libraries
Import all necessary Python libraries for data manipulation, FITS file handling, and visualization.

In [2]:
# Core data manipulation and analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
import os
import warnings
warnings.filterwarnings('ignore')

# Astronomical data handling
from astropy.io import fits

print("All libraries imported successfully!")
print(f"Pandas version: {pd.__version__}")
print(f"Astropy installed and ready for FITS file handling")

All libraries imported successfully!
Pandas version: 2.3.3
Astropy installed and ready for FITS file handling


## Step 3: Data Acquisition - Downloading JWST FITS Files

We'll use two approaches to download JWST data:
1. **Programmatic Download** using `astroquery` (demonstrates automation and reproducibility)
2. **Manual Download Instructions** for understanding the MAST Archive interface

### Approach 1: Programmatic Download using astroquery

We'll search for JWST observations of several famous targets and download their calibrated `_i2d.fits` files. These targets were chosen because they're iconic JWST "First Images" released in 2022 and have excellent data quality.

In [3]:
# Create directory for storing FITS files
data_dir = './jwst_data'
os.makedirs(data_dir, exist_ok=True)

print(f"Data directory created: {data_dir}")
print(f"Current working directory: {os.getcwd()}")

Data directory created: ./jwst_data
Current working directory: /Users/jacobparzych/astron1221/Astron_1221_JWST_Image_Gallery


### Download Strategy: Direct URLs from MAST Archive

Since MAST API queries can be unreliable, we'll use direct URLs to download specific JWST Early Release Observation (ERO) images. This approach is more reliable and similar to what was demonstrated in Lecture 14.

We'll download `_i2d.fits` files, which are:
- **Level 3 calibrated data products**
- **2D images** that have been fully processed and calibrated
- **Science-ready** for analysis
- The standard format used by professional astronomers

In [4]:
import urllib.request
import gzip
import shutil
from tqdm import tqdm

# Define JWST images to download from MAST archive
# These are specific JWST Early Release Observations (ERO) from famous targets
jwst_files = {
    # Southern Ring Nebula (NGC 3132) - NIRCam images
    "jw02732001001_02101_00001_nrcb1_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02732001001_02101_00001_nrcb1_i2d.fits",
        "target": "NGC 3132 (Southern Ring Nebula)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02732001001_02101_00002_nrcb2_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02732001001_02101_00002_nrcb2_i2d.fits",
        "target": "NGC 3132 (Southern Ring Nebula)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02732001001_02101_00001_nrcb3_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02732001001_02101_00001_nrcb3_i2d.fits",
        "target": "NGC 3132 (Southern Ring Nebula)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    # Carina Nebula (NGC 3324) - NIRCam images  
    "jw02731001001_02101_00001_nrcb1_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02731001001_02101_00001_nrcb1_i2d.fits",
        "target": "NGC 3324 (Carina Nebula)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02731001001_02101_00001_nrcb2_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02731001001_02101_00001_nrcb2_i2d.fits",
        "target": "NGC 3324 (Carina Nebula)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    # Stephan's Quintet - NIRCam images
    "jw02733001001_02101_00001_nrcb1_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733001001_02101_00001_nrcb1_i2d.fits",
        "target": "Stephan's Quintet",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02733001001_02101_00001_nrcb2_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733001001_02101_00001_nrcb2_i2d.fits",
        "target": "Stephan's Quintet",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02733001001_02101_00001_nrcb3_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02733001001_02101_00001_nrcb3_i2d.fits",
        "target": "Stephan's Quintet",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    # SMACS 0723 (Deep Field) - NIRCam images
    "jw02736001001_02101_00001_nrcb1_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02736001001_02101_00001_nrcb1_i2d.fits",
        "target": "SMACS 0723 (Deep Field)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02736001001_02101_00001_nrcb2_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02736001001_02101_00001_nrcb2_i2d.fits",
        "target": "SMACS 0723 (Deep Field)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02736001001_02101_00001_nrcb3_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02736001001_02101_00001_nrcb3_i2d.fits",
        "target": "SMACS 0723 (Deep Field)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
    "jw02736001001_02101_00001_nrcb4_i2d.fits": {
        "url": "https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:JWST/product/jw02736001001_02101_00001_nrcb4_i2d.fits",
        "target": "SMACS 0723 (Deep Field)",
        "instrument": "NIRCam",
        "filter": "F090W"
    },
}

print(f"Prepared {len(jwst_files)} JWST images for download from {len(set(info['target'] for info in jwst_files.values()))} different targets")
print("\nTarget breakdown:")
targets_count = {}
for info in jwst_files.values():
    target = info['target']
    targets_count[target] = targets_count.get(target, 0) + 1

for target, count in targets_count.items():
    print(f"  - {target}: {count} images")

Prepared 12 JWST images for download from 4 different targets

Target breakdown:
  - NGC 3132 (Southern Ring Nebula): 3 images
  - NGC 3324 (Carina Nebula): 2 images
  - Stephan's Quintet: 3 images
  - SMACS 0723 (Deep Field): 4 images


### Download FITS Files

Now we'll download each FITS file using `urllib.request`. This is the same approach used in Lecture 14 for downloading JWST data. The function checks if files already exist to avoid re-downloading.

In [5]:
def download_file(url, filename, data_dir):
    """
    Download a file from a URL to a local directory with progress bar.
    
    Parameters:
    -----------
    url : str
        URL to download from
    filename : str
        Local filename to save as
    data_dir : str
        Directory to save file in
    
    Returns:
    --------
    bool
        True if download succeeded, False otherwise
    """
    filepath = os.path.join(data_dir, filename)
    
    # Check if file already exists
    if os.path.exists(filepath):
        filesize_mb = os.path.getsize(filepath) / (1024 * 1024)
        print(f"  ✓ {filename} already exists ({filesize_mb:.1f} MB)")
        return True
    
    try:
        # Download with progress bar (similar to Lecture 14)
        print(f"  Downloading {filename}...")
        
        class DownloadProgressBar(tqdm):
            def update_to(self, b=1, bsize=1, tsize=None):
                if tsize is not None:
                    self.total = tsize
                self.update(b * bsize - self.n)
        
        with DownloadProgressBar(unit='B', unit_scale=True, miniters=1, desc=filename) as t:
            urllib.request.urlretrieve(url, filepath, reporthook=t.update_to)
        
        filesize_mb = os.path.getsize(filepath) / (1024 * 1024)
        print(f"  ✓ Downloaded {filename} ({filesize_mb:.1f} MB)")
        return True
        
    except Exception as e:
        print(f"  ✗ Error downloading {filename}: {e}")
        return False

# Download all FITS files
print("Starting JWST FITS file downloads...")
print("="*60)

downloaded_files = []
failed_files = []

for filename, info in jwst_files.items():
    print(f"\n{info['target']} - {info['instrument']} - {info['filter']}")
    success = download_file(info['url'], filename, data_dir)
    
    if success:
        downloaded_files.append(filename)
    else:
        failed_files.append(filename)

print("\n" + "="*60)
print("Download Summary:")
print(f"  ✓ Successfully downloaded/found: {len(downloaded_files)} files")
if failed_files:
    print(f"  ✗ Failed downloads: {len(failed_files)} files")
print("="*60)

Starting JWST FITS file downloads...

NGC 3132 (Southern Ring Nebula) - NIRCam - F090W
  ✓ jw02732001001_02101_00001_nrcb1_i2d.fits already exists (113.0 MB)

NGC 3132 (Southern Ring Nebula) - NIRCam - F090W
  ✓ jw02732001001_02101_00002_nrcb2_i2d.fits already exists (113.2 MB)

NGC 3132 (Southern Ring Nebula) - NIRCam - F090W
  ✓ jw02732001001_02101_00001_nrcb3_i2d.fits already exists (113.2 MB)

NGC 3324 (Carina Nebula) - NIRCam - F090W
  ✓ jw02731001001_02101_00001_nrcb1_i2d.fits already exists (113.0 MB)

NGC 3324 (Carina Nebula) - NIRCam - F090W
  ✓ jw02731001001_02101_00001_nrcb2_i2d.fits already exists (113.2 MB)

Stephan's Quintet - NIRCam - F090W
  ✓ jw02733001001_02101_00001_nrcb1_i2d.fits already exists (113.0 MB)

Stephan's Quintet - NIRCam - F090W
  ✓ jw02733001001_02101_00001_nrcb2_i2d.fits already exists (113.2 MB)

Stephan's Quintet - NIRCam - F090W
  ✓ jw02733001001_02101_00001_nrcb3_i2d.fits already exists (113.2 MB)

SMACS 0723 (Deep Field) - NIRCam - F090W
  ✓ jw027

### Verify Downloaded Files

Let's verify that our FITS files were downloaded successfully by scanning the data directory and displaying file information.

In [6]:
# Scan for all FITS files in the data directory
all_fits_files = []
for root, dirs, files in os.walk(data_dir):
    for file in files:
        if file.endswith('.fits'):
            all_fits_files.append(os.path.join(root, file))

print(f"Found {len(all_fits_files)} FITS files in {data_dir}/")
print("\n" + "="*60)
print("File Inventory:")
print("="*60)

total_size_mb = 0
for i, filepath in enumerate(sorted(all_fits_files), 1):
    filename = os.path.basename(filepath)
    filesize_mb = os.path.getsize(filepath) / (1024 * 1024)
    total_size_mb += filesize_mb
    
    # Get target info if available
    if filename in jwst_files:
        target = jwst_files[filename]['target']
        print(f"{i:2d}. {filename:50s} | {filesize_mb:6.1f} MB | {target}")
    else:
        print(f"{i:2d}. {filename:50s} | {filesize_mb:6.1f} MB")

print("="*60)
print(f"Total data: {total_size_mb:.1f} MB ({total_size_mb/1024:.2f} GB)")
print("="*60)

Found 12 FITS files in ./jwst_data/

File Inventory:
 1. jw02731001001_02101_00001_nrcb1_i2d.fits           |  113.0 MB | NGC 3324 (Carina Nebula)
 2. jw02731001001_02101_00001_nrcb2_i2d.fits           |  113.2 MB | NGC 3324 (Carina Nebula)
 3. jw02732001001_02101_00001_nrcb1_i2d.fits           |  113.0 MB | NGC 3132 (Southern Ring Nebula)
 4. jw02732001001_02101_00001_nrcb3_i2d.fits           |  113.2 MB | NGC 3132 (Southern Ring Nebula)
 5. jw02732001001_02101_00002_nrcb2_i2d.fits           |  113.2 MB | NGC 3132 (Southern Ring Nebula)
 6. jw02733001001_02101_00001_nrcb1_i2d.fits           |  113.0 MB | Stephan's Quintet
 7. jw02733001001_02101_00001_nrcb2_i2d.fits           |  113.2 MB | Stephan's Quintet
 8. jw02733001001_02101_00001_nrcb3_i2d.fits           |  113.2 MB | Stephan's Quintet
 9. jw02736001001_02101_00001_nrcb1_i2d.fits           |  113.0 MB | SMACS 0723 (Deep Field)
10. jw02736001001_02101_00001_nrcb2_i2d.fits           |  113.2 MB | SMACS 0723 (Deep Field)
11. jw027

---

## ✅ Data Acquisition Complete!

We have successfully downloaded JWST `_i2d.fits` files from the MAST Archive. These are fully calibrated, science-ready images from the James Webb Space Telescope's Early Release Observations.

### What We Have:
- **Southern Ring Nebula (NGC 3132)**: Multiple NIRCam pointings showing this planetary nebula
- **Carina Nebula (NGC 3324)**: Star-forming region with iconic "Cosmic Cliffs"
- **Stephan's Quintet**: Interacting galaxy group

### Next Steps:
Now that we have the data, the next phases of the project will be:
1. **Extract metadata** from FITS headers (target name, coordinates, instrument, filter, exposure time, etc.)
2. **Organize into Pandas DataFrame** for easy manipulation and analysis
3. **Create catalog features** (filtering, sorting, summary statistics)
4. **Visualize the data** (plots, analysis dashboard)
5. **Display FITS images** (optional but recommended)

The data acquisition phase is complete. Time to start analyzing!

---

## 📊 Project Status

**Phase 1: Data Acquisition** ✅ **COMPLETE**

We have successfully:
- Set up the project environment and imported necessary libraries
- Created a data directory structure
- Downloaded 7 JWST `_i2d.fits` files using direct URLs from MAST
- Verified all files and confirmed successful downloads

**Total Data**: ~XX MB of JWST observations from 3 famous targets

### Ready for Next Phase!
The FITS files are now ready for:
1. Metadata extraction
2. DataFrame organization  
3. Catalog creation
4. Analysis and visualization

To continue the project, add new cells below to extract and analyze the FITS metadata.

---

## Phase 2: Metadata Extraction and DataFrame Organization

Now that we have the FITS files downloaded, let's extract important metadata from the FITS headers and organize it into a Pandas DataFrame for easy analysis.

In [15]:
def extract_fits_metadata(filepath):
    """
    Extract important metadata from a FITS file header.
    
    Parameters:
    -----------
    filepath : str
        Path to the FITS file
        
    Returns:
    --------
    dict
        Dictionary containing extracted metadata
    """
    try:
        with fits.open(filepath) as hdul:
            header = hdul[0].header  # Primary header for _i2d.fits files
            
            # For _i2d.fits files, check if there's a SCI extension with additional info
            sci_header = None
            if len(hdul) > 1:
                for ext in hdul[1:]:
                    if hasattr(ext, 'name') and ext.name == 'SCI':
                        sci_header = ext.header
                        break
            
            # Extract key metadata fields with fallback options
            # Use sci_header for certain fields if available, otherwise use primary header
            working_header = sci_header if sci_header is not None else header
            
            # Get image dimensions - try SCI extension first, then primary
            naxis1 = working_header.get('NAXIS1', header.get('NAXIS1', 0))
            naxis2 = working_header.get('NAXIS2', header.get('NAXIS2', 0))
            
            # If still 0, try to get from the actual data shape
            if naxis1 == 0 or naxis2 == 0:
                if len(hdul) > 1 and hasattr(hdul[1], 'data') and hdul[1].data is not None:
                    if len(hdul[1].data.shape) >= 2:
                        naxis2, naxis1 = hdul[1].data.shape[-2:]  # Last two dimensions
            
            metadata = {
                'filename': os.path.basename(filepath),
                'filepath': filepath,
                # Target information - try multiple possible keywords
                'target': header.get('TARGNAME', header.get('TARGPROP', header.get('OBJECT', 'Unknown'))),
                # Instrument information
                'instrument': header.get('INSTRUME', 'Unknown'),
                'detector': header.get('DETECTOR', 'Unknown'),
                'filter': header.get('FILTER', header.get('FILTNAM1', 'Unknown')),
                'pupil': header.get('PUPIL', header.get('PUPILNAM', 'N/A')),
                # Exposure information
                'exposure_time': header.get('EFFEXPTM', header.get('EXPTIME', header.get('XPOSURE', 0.0))),
                # Coordinates
                'ra': working_header.get('CRVAL1', header.get('RA_TARG', header.get('TARG_RA', None))),
                'dec': working_header.get('CRVAL2', header.get('DEC_TARG', header.get('TARG_DEC', None))),
                # Observation metadata
                'date_obs': header.get('DATE-OBS', header.get('DATE_OBS', 'Unknown')),
                'program_id': header.get('PROGRAM', header.get('PROPOSID', 'Unknown')),
                'observation_id': header.get('OBSERVTN', header.get('OBS_ID', 'Unknown')),
                'visit_id': header.get('VISIT', header.get('VISIT_ID', 'Unknown')),
                # Image properties
                'image_shape': f"{naxis1} x {naxis2}",
                'file_size_mb': os.path.getsize(filepath) / (1024 * 1024),
            }
            
            return metadata
            
    except Exception as e:
        print(f"Error reading {filepath}: {e}")
        return None

# Extract metadata from all FITS files
print("Extracting metadata from FITS files...")
print("="*60)

metadata_list = []
for filepath in sorted(all_fits_files):
    print(f"Processing: {os.path.basename(filepath)}")
    metadata = extract_fits_metadata(filepath)
    if metadata:
        metadata_list.append(metadata)

print("="*60)
print(f"Successfully extracted metadata from {len(metadata_list)} files")
print("="*60)

Extracting metadata from FITS files...
Processing: jw02731001001_02101_00001_nrcb1_i2d.fits
Processing: jw02731001001_02101_00001_nrcb2_i2d.fits
Processing: jw02732001001_02101_00001_nrcb1_i2d.fits
Processing: jw02732001001_02101_00001_nrcb3_i2d.fits
Processing: jw02732001001_02101_00002_nrcb2_i2d.fits
Processing: jw02733001001_02101_00001_nrcb1_i2d.fits
Processing: jw02733001001_02101_00001_nrcb2_i2d.fits
Processing: jw02733001001_02101_00001_nrcb3_i2d.fits
Processing: jw02736001001_02101_00001_nrcb1_i2d.fits
Processing: jw02736001001_02101_00001_nrcb2_i2d.fits
Processing: jw02736001001_02101_00001_nrcb3_i2d.fits
Processing: jw02736001001_02101_00001_nrcb4_i2d.fits
Successfully extracted metadata from 12 files


### Create Pandas DataFrame

Now let's organize the extracted metadata into a Pandas DataFrame for easy manipulation and analysis.

In [16]:
# Create DataFrame from metadata list
df = pd.DataFrame(metadata_list)

# Display basic info
print(f"Created DataFrame with {len(df)} rows and {len(df.columns)} columns")
print("\n" + "="*60)
print("DataFrame Columns:")
print("="*60)
for col in df.columns:
    print(f"  - {col}")

print("\n" + "="*60)
print("First few rows:")
print("="*60)
df.head()

Created DataFrame with 12 rows and 16 columns

DataFrame Columns:
  - filename
  - filepath
  - target
  - instrument
  - detector
  - filter
  - pupil
  - exposure_time
  - ra
  - dec
  - date_obs
  - program_id
  - observation_id
  - visit_id
  - image_shape
  - file_size_mb

First few rows:


Unnamed: 0,filename,filepath,target,instrument,detector,filter,pupil,exposure_time,ra,dec,date_obs,program_id,observation_id,visit_id,image_shape,file_size_mb
0,jw02731001001_02101_00001_nrcb1_i2d.fits,./jwst_data/jw02731001001_02101_00001_nrcb1_i2...,NGC 3324,NIRCAM,NRCB1,F187N,CLEAR,289.893,159.281828,-58.574776,2022-06-03,2731,1,1,2058 x 2055,113.008118
1,jw02731001001_02101_00001_nrcb2_i2d.fits,./jwst_data/jw02731001001_02101_00001_nrcb2_i2...,NGC 3324,NIRCAM,NRCB2,F187N,CLEAR,289.893,159.246643,-58.570494,2022-06-03,2731,1,1,2058 x 2058,113.181152
2,jw02732001001_02101_00001_nrcb1_i2d.fits,./jwst_data/jw02732001001_02101_00001_nrcb1_i2...,NGC 7320,NIRCAM,NRCB1,F090W,CLEAR,236.209,338.983893,33.892724,2022-06-11,2732,1,1,2057 x 2055,112.969666
3,jw02732001001_02101_00001_nrcb3_i2d.fits,./jwst_data/jw02732001001_02101_00001_nrcb3_i2...,NGC 7320,NIRCAM,NRCB3,F090W,CLEAR,236.209,338.973199,33.909377,2022-06-11,2732,1,1,2058 x 2058,113.181152
4,jw02732001001_02101_00002_nrcb2_i2d.fits,./jwst_data/jw02732001001_02101_00002_nrcb2_i2...,NGC 7320,NIRCAM,NRCB2,F090W,CLEAR,236.209,339.000753,33.900149,2022-06-11,2732,1,1,2058 x 2058,113.181152


### Display Full DataFrame

Let's view the complete DataFrame with all metadata organized and readable.

In [17]:
# Display the full DataFrame with better formatting
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

print("="*80)
print("JWST Image Gallery - Complete Metadata Catalog")
print("="*80)
print(f"\nTotal Images: {len(df)}")
print(f"Total Data Size: {df['file_size_mb'].sum():.1f} MB")
print(f"\nTargets: {df['target'].nunique()}")
print(f"Instruments: {df['instrument'].nunique()}")
print(f"Filters: {df['filter'].nunique()}")
print("\n" + "="*80)

# Display selected columns for readability
display_cols = ['filename', 'target', 'instrument', 'detector', 'filter', 'exposure_time', 'image_shape', 'file_size_mb']
print("\nKey Metadata:")
print(df[display_cols].to_string(index=True))
print("="*80)

JWST Image Gallery - Complete Metadata Catalog

Total Images: 12
Total Data Size: 1357.7 MB

Targets: 4
Instruments: 1
Filters: 3


Key Metadata:
                                    filename              target instrument detector filter  exposure_time  image_shape  file_size_mb
0   jw02731001001_02101_00001_nrcb1_i2d.fits            NGC 3324     NIRCAM    NRCB1  F187N        289.893  2058 x 2055    113.008118
1   jw02731001001_02101_00001_nrcb2_i2d.fits            NGC 3324     NIRCAM    NRCB2  F187N        289.893  2058 x 2058    113.181152
2   jw02732001001_02101_00001_nrcb1_i2d.fits            NGC 7320     NIRCAM    NRCB1  F090W        236.209  2057 x 2055    112.969666
3   jw02732001001_02101_00001_nrcb3_i2d.fits            NGC 7320     NIRCAM    NRCB3  F090W        236.209  2058 x 2058    113.181152
4   jw02732001001_02101_00002_nrcb2_i2d.fits            NGC 7320     NIRCAM    NRCB2  F090W        236.209  2058 x 2058    113.181152
5   jw02733001001_02101_00001_nrcb1_i2d.fits      

### Summary Statistics

Let's analyze the catalog with summary statistics grouped by target, instrument, and filter.

In [18]:
# Summary statistics by target
print("="*60)
print("Images by Target:")
print("="*60)
target_summary = df.groupby('target').agg({
    'filename': 'count',
    'exposure_time': 'mean',
    'file_size_mb': 'sum'
}).rename(columns={
    'filename': 'num_images',
    'exposure_time': 'avg_exposure_time',
    'file_size_mb': 'total_size_mb'
})
print(target_summary)

print("\n" + "="*60)
print("Images by Detector:")
print("="*60)
detector_summary = df['detector'].value_counts()
print(detector_summary)

print("\n" + "="*60)
print("Overall Statistics:")
print("="*60)
print(f"Total images: {len(df)}")
print(f"Total data size: {df['file_size_mb'].sum():.2f} MB ({df['file_size_mb'].sum()/1024:.2f} GB)")
print(f"Average file size: {df['file_size_mb'].mean():.2f} MB")
print(f"Average exposure time: {df['exposure_time'].mean():.2f} seconds")
print(f"Date range: {df['date_obs'].min()} to {df['date_obs'].max()}")
print("="*60)

Images by Target:
                    num_images  avg_exposure_time  total_size_mb
target                                                          
NGC 3132                     3            289.893     339.370422
NGC 3324                     2            289.893     226.189270
NGC 7320                     3            236.209     339.331970
SMACS J0723.3-7327           4            837.468     452.828979

Images by Detector:
detector
NRCB1    4
NRCB2    4
NRCB3    3
NRCB4    1
Name: count, dtype: int64

Overall Statistics:
Total images: 12
Total data size: 1357.72 MB (1.33 GB)
Average file size: 113.14 MB
Average exposure time: 459.00 seconds
Date range: 2022-06-03 to 2022-06-11


### Debug: Inspect FITS File Structure

Let's inspect one FITS file to understand its structure and find the correct header keywords.

In [11]:
# Inspect the first FITS file to understand its structure
if all_fits_files:
    test_file = all_fits_files[0]
    print(f"Inspecting: {os.path.basename(test_file)}")
    print("="*60)
    
    with fits.open(test_file) as hdul:
        print(f"\nNumber of extensions: {len(hdul)}")
        print("\nExtension summary:")
        hdul.info()
        
        print("\n" + "="*60)
        print("PRIMARY Header (first 50 keywords):")
        print("="*60)
        for i, (key, value) in enumerate(hdul[0].header.items()):
            if i < 50:
                print(f"{key:10s} = {value}")
        
        # Check if there's a SCI extension
        if len(hdul) > 1:
            print("\n" + "="*60)
            print(f"Extension 1 ({hdul[1].name}) Header (first 50 keywords):")
            print("="*60)
            for i, (key, value) in enumerate(hdul[1].header.items()):
                if i < 50:
                    print(f"{key:10s} = {value}")

Inspecting: jw02731001001_02101_00001_nrcb1_i2d.fits

Number of extensions: 9

Extension summary:
Filename: ./jwst_data/jw02731001001_02101_00001_nrcb1_i2d.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU     360   ()      
  1  SCI           1 ImageHDU        75   (2058, 2055)   float32   
  2  ERR           1 ImageHDU        10   (2058, 2055)   float32   
  3  CON           1 ImageHDU        10   (2058, 2055, 1)   int32   
  4  WHT           1 ImageHDU         9   (2058, 2055)   float32   
  5  VAR_POISSON    1 ImageHDU         9   (2058, 2055)   float32   
  6  VAR_RNOISE    1 ImageHDU         9   (2058, 2055)   float32   
  7  VAR_FLAT      1 ImageHDU         9   (2058, 2055)   float32   
  8  ASDF          1 BinTableHDU     11   1R x 1C   [16915B]   

PRIMARY Header (first 50 keywords):
SIMPLE     = True
BITPIX     = 8
NAXIS      = 0
EXTEND     = True
DATE       = 2025-09-17T10:23:15.636
ORIGIN     = STSCI
TIMESYS    = UTC
TIMEUNIT