# Baker+2025d: Data Overview and Exploration

This notebook demonstrates how to load and explore the quiescent galaxy catalog from Baker+2025d.

## Contents
1. Load the catalog
2. Explore basic properties
3. Visualize key relationships
4. Select subsamples

In [None]:
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from astropy.table import Table
import sys
sys.path.append('../scripts')
from load_catalog import load_catalog, catalog_summary, select_subsample
from plotting_utils import setup_plot_style, plot_mass_size_relation, plot_UVJ_diagram

# Set up plotting style
setup_plot_style()
%matplotlib inline

## 1. Load the Catalog

The main catalog contains over 700 massive quiescent galaxies with comprehensive measurements.

In [None]:
# Load the main catalog
# Note: Update the path to point to your actual data location
catalog_path = '../data/catalogs/main_catalog_v1.0.fits'

try:
    catalog = load_catalog(catalog_path)
except FileNotFoundError:
    print("Catalog file not found. Please ensure data files are in the correct location.")
    print("Expected location:", catalog_path)

## 2. Catalog Summary

Let's look at the basic properties of our sample.

In [None]:
# Print summary statistics
catalog_summary(catalog)

In [None]:
# Display first few rows
print("\nFirst 5 galaxies in the catalog:")
catalog[:5].show_in_notebook()

## 3. Visualizations

### 3.1 Mass-Size Relation

In [None]:
# Plot mass-size relation
# Color-code by redshift bins
z_bins = [(0.5, 1.0), (1.0, 1.5), (1.5, 2.0), (2.0, 2.5)]

fig, ax = plot_mass_size_relation(catalog, z_bins=z_bins)
plt.tight_layout()

### 3.2 Redshift Distribution

In [None]:
# Plot redshift distribution
z = catalog['redshift']
z = z[~np.isnan(z)]

plt.figure(figsize=(10, 5))
plt.hist(z, bins=30, color='darkblue', alpha=0.7, edgecolor='black')
plt.xlabel('Redshift', fontsize=14)
plt.ylabel('Number of Galaxies', fontsize=14)
plt.title('Redshift Distribution of Quiescent Galaxies', fontsize=16)
plt.grid(True, alpha=0.3)
plt.tight_layout()

### 3.3 Stellar Mass Distribution

In [None]:
# Plot mass distribution
mass = catalog['log_stellar_mass']
mass = mass[~np.isnan(mass)]

plt.figure(figsize=(10, 5))
plt.hist(mass, bins=25, color='darkred', alpha=0.7, edgecolor='black')
plt.xlabel(r'log(M$_*$/M$_\odot$)', fontsize=14)
plt.ylabel('Number of Galaxies', fontsize=14)
plt.title('Stellar Mass Distribution', fontsize=16)
plt.grid(True, alpha=0.3)
plt.tight_layout()

### 3.4 UVJ Diagram

The UVJ color-color diagram is a classic way to separate quiescent and star-forming galaxies.

In [None]:
# Plot UVJ diagram
fig, ax = plot_UVJ_diagram(catalog, highlight_quiescent=True)
plt.tight_layout()

## 4. Selecting Subsamples

You can easily select subsamples based on various criteria.

In [None]:
# Example 1: Low-redshift, massive galaxies
low_z_massive = select_subsample(catalog, z_max=1.0, mass_min=11.0)
print(f"Selected {len(low_z_massive)} low-z massive galaxies")

In [None]:
# Example 2: High-redshift sample
high_z = select_subsample(catalog, z_min=1.5)
print(f"Selected {len(high_z)} high-z galaxies")

In [None]:
# Example 3: Compact galaxies (small for their mass)
# Select galaxies in a specific mass range
mass_selected = select_subsample(catalog, mass_min=10.8, mass_max=11.2)

# Find compact ones (e.g., below median size)
sizes = mass_selected['effective_radius']
sizes_clean = sizes[~np.isnan(sizes)]
median_size = np.median(sizes_clean)

compact = mass_selected[mass_selected['effective_radius'] < median_size]
print(f"Found {len(compact)} compact galaxies (Re < {median_size:.2f} kpc)")

## 5. Simple Analysis Example

Let's compute the mass-size relation slope at different redshifts.

In [None]:
from scipy.stats import linregress

# Define redshift bins
z_bins = [(0.5, 1.0), (1.0, 1.5), (1.5, 2.0)]

print("Mass-Size Relation Slopes:\n")
for z_min, z_max in z_bins:
    # Select redshift bin
    subsample = select_subsample(catalog, z_min=z_min, z_max=z_max)
    
    # Get mass and size
    mass = subsample['log_stellar_mass']
    size = np.log10(subsample['effective_radius'])
    
    # Remove NaNs
    good = ~(np.isnan(mass) | np.isnan(size))
    mass = mass[good]
    size = size[good]
    
    # Fit linear relation
    if len(mass) > 10:
        slope, intercept, r_value, p_value, std_err = linregress(mass, size)
        print(f"{z_min:.1f} < z < {z_max:.1f}: slope = {slope:.3f} ± {std_err:.3f}")
    else:
        print(f"{z_min:.1f} < z < {z_max:.1f}: insufficient data")

## 6. Export Subsample

Save a subsample for further analysis.

In [None]:
# Select and save a subsample
# subsample = select_subsample(catalog, z_min=1.0, z_max=1.5, mass_min=10.8)
# subsample.write('my_subsample.fits', overwrite=True)
# print(f"Saved subsample with {len(subsample)} galaxies")

## Summary

In this notebook, we:
1. Loaded the Baker+2025d quiescent galaxy catalog
2. Explored the basic properties of the sample
3. Created visualizations of key relationships
4. Demonstrated how to select subsamples
5. Performed a simple analysis of the mass-size relation

For more detailed analyses, see the other notebooks in this directory!