# Cluster Galaxy Classifier - Student Notebook

Use this notebook to identify cluster member galaxies in your own cluster images using a pre-trained model.

## Instructions:
1. Run each cell in order (click the play button or press Shift+Enter)
2. Upload your cluster image when prompted
3. View and download your results!

## Step 1: Install Dependencies

In [None]:
# Install required packages
!pip install -q numpy scipy pandas scikit-learn scikit-image opencv-python matplotlib astropy joblib

## Step 2: Download the Code and Model

In [None]:
# Clone the repository
!git clone https://github.com/ASPIRONOMY/cluster-galaxy-classifier.git
%cd cluster-galaxy-classifier

## Step 3: Upload Your Cluster Image

In [None]:
from google.colab import files
import os

# Upload your cluster image
print("Upload your cluster image (PNG or JPG):")
uploaded = files.upload()

# Get the uploaded file name
image_filename = list(uploaded.keys())[0]
print(f"\n[OK] Uploaded: {image_filename}")

## Step 4: Classify Galaxies in Your Image

In [None]:
# Import the pipeline
from complete_pipeline import run_complete_pipeline
from pathlib import Path

# Create results directory
Path('results').mkdir(exist_ok=True)

# Run classification on your image
print("="*60)
print("CLASSIFYING GALAXIES IN YOUR CLUSTER")
print("="*60)

results = run_complete_pipeline(
    image_file=image_filename,
    model_dir='models',  # Uses pre-trained model
    output_dir='results',
    detection_method='comprehensive',  # Best for detecting all objects including large ones
    confidence_threshold=0.7  # Higher = fewer false positives (try 0.6-0.8)
)

print(f"\n[OK] Classification complete!")
print(f"Detected {len(results['detected_galaxies'])} total galaxies")
print(f"Classified {len(results['cluster_members'])} as cluster members")
print(f"Classified {len(results['non_members'])} as non-members")

## Step 5: View Results

In [None]:
from IPython.display import Image, display
import pandas as pd

# Display the visualization
image_name = Path(image_filename).stem
viz_file = f'results/{image_name}_classified.png'
csv_file = f'results/{image_name}_members.csv'

if Path(viz_file).exists():
    print("Visualization (Red circles = Cluster Members, Blue circles = Non-Members):")
    display(Image(viz_file))
else:
    print(f"Visualization file not found: {viz_file}")

In [None]:
# Show coordinates of cluster members
if Path(csv_file).exists():
    members_df = pd.read_csv(csv_file)
    print(f"\nCluster Member Coordinates ({len(members_df)} members):")
    display(members_df.head(20))  # Show first 20
    if len(members_df) > 20:
        print(f"\n... and {len(members_df) - 20} more members")
else:
    print(f"CSV file not found: {csv_file}")

## Step 6: Download Your Results

In [None]:
# Download the visualization image
if Path(viz_file).exists():
    print("Downloading visualization image...")
    files.download(viz_file)
    print("[OK] Visualization downloaded!")
else:
    print("Visualization file not found")

# Download the CSV file with coordinates
if Path(csv_file).exists():
    print("\nDownloading coordinates CSV...")
    files.download(csv_file)
    print("[OK] Coordinates downloaded!")
else:
    print("CSV file not found")

## Tips

- **Confidence threshold**: Adjust `confidence_threshold` (0.6-0.9) to control sensitivity
  - `0.7` = Balanced (default)
  - `0.8` = Very conservative (fewer false positives)
  - `0.6` = More sensitive (may include some false positives)
- **Detection method**: `'comprehensive'` is recommended (detects all sizes including large objects)
- **Image format**: PNG or JPG RGB images work best

## Troubleshooting

**"Model file not found"**:
- Make sure the repository was cloned correctly
- Check that the `models/` folder exists with model files

**"No galaxies detected"**:
- Try adjusting detection parameters
- Make sure your image is clear and not corrupted

**"Too many false positives"**:
- Increase `confidence_threshold` to 0.8 or 0.9