# Galaxy Zoo CNN

This notebook outlines a basic Convolutional Neural Network (CNN) for classifying galaxy images from the Galaxy Zoo project.

## 1. Import Necessary Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Add any other libraries you might need, e.g., for data loading or specific layers
# import pandas as pd
# from sklearn.model_selection import train_test_split

## 2. Load and Preprocess Data

This section is a placeholder. You'll need to replace it with actual code to load and preprocess your Galaxy Zoo dataset.

Key steps typically include:
- Loading images and their corresponding labels.
- Resizing images to a consistent input shape for the CNN.
- Normalizing pixel values (e.g., scaling to [0, 1]).
- Splitting data into training, validation, and test sets.
- One-hot encoding labels if you have multiple classes.

In [None]:
# Placeholder for data loading and preprocessing
# Ensure your data is in the format: (num_samples, height, width, channels)
# Example:
# (x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data() 

img_height = 128  # Example: Adjust based on your dataset and experimentation
img_width = 128   # Example: Adjust based on your dataset and experimentation
num_classes = 10  # Example: Adjust to the number of galaxy types you're classifying

# --- Replace with your actual data loading below --- 
# x_train = ... # Training images (NumPy array)
# y_train = ... # Training labels (NumPy array)
# x_val = ...   # Validation images (NumPy array)
# y_val = ...   # Validation labels (NumPy array)
# x_test = ...  # Test images (NumPy array)
# y_test = ...  # Test labels (NumPy array)

# print(f"x_train shape: {x_train.shape}")
# print(f"y_train shape: {y_train.shape}")

# # Normalize pixel values to be between 0 and 1
# x_train = x_train.astype('float32') / 255.0
# x_val = x_val.astype('float32') / 255.0
# x_test = x_test.astype('float32') / 255.0

# # Convert class vectors to binary class matrices (one-hot encoding)
# y_train = keras.utils.to_categorical(y_train, num_classes)
# y_val = keras.utils.to_categorical(y_val, num_classes)
# y_test = keras.utils.to_categorical(y_test, num_classes)

## 3. Define the CNN Model Architecture

This is a basic example. You might need to adjust the number of layers, filters, kernel sizes, pooling, and dropout rates based on your specific dataset and performance.

In [None]:
model = keras.Sequential(
    [
        # Input layer - specify input_shape for the first layer
        layers.Input(shape=(img_height, img_width, 3)), # Assuming RGB images (3 channels)
        
        # Convolutional Block 1
        layers.Conv2D(32, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        # Convolutional Block 2
        layers.Conv2D(64, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        # Convolutional Block 3
        layers.Conv2D(128, (3, 3), activation='relu', padding='same'),
        layers.MaxPooling2D((2, 2)),
        
        # Flattening the feature maps
        layers.Flatten(),
        
        # Dense (fully connected) layers
        layers.Dense(128, activation='relu'),
        layers.Dropout(0.5), # Dropout for regularization
        layers.Dense(num_classes, activation='softmax') # Output layer - softmax for multi-class classification
    ]
)

# Print model summary
model.summary()

## 4. Compile the Model

Configure the learning process. This involves choosing:
- **Optimizer:** Algorithm to update weights (e.g., Adam, SGD).
- **Loss Function:** Measures how well the model is doing (e.g., `categorical_crossentropy` for multi-class classification).
- **Metrics:** Used to monitor training and testing steps (e.g., `accuracy`).

In [None]:
model.compile(
    optimizer='adam',
    loss='categorical_crossentropy', # Use 'binary_crossentropy' for binary classification
    metrics=['accuracy']
)

## 5. Train the Model

This is where the model learns from the training data. You'll need to provide your training and validation data.

Key parameters:
- `epochs`: Number of times the model will iterate over the entire training dataset.
- `batch_size`: Number of samples processed before the model's internal parameters are updated.

In [None]:
# Placeholder for model training - uncomment and adapt when data is loaded
# Make sure x_train, y_train, x_val, y_val are defined in Section 2.

# epochs = 25 # Example: Adjust as needed
# batch_size = 32 # Example: Adjust as needed

# history = model.fit(
#     x_train, y_train,
#     epochs=epochs,
#     batch_size=batch_size,
#     validation_data=(x_val, y_val) # Provide validation data to monitor performance
# )

### Plot Training History (Optional but Recommended)

Visualizing training and validation loss/accuracy can help diagnose overfitting or underfitting.

In [None]:
# def plot_history(history_object):
#     acc = history_object.history['accuracy']
#     val_acc = history_object.history['val_accuracy']
#     loss = history_object.history['loss']
#     val_loss = history_object.history['val_loss']
#     epochs_range = range(len(acc))

#     plt.figure(figsize=(12, 4))
#     plt.subplot(1, 2, 1)
#     plt.plot(epochs_range, acc, label='Training Accuracy')
#     plt.plot(epochs_range, val_acc, label='Validation Accuracy')
#     plt.legend(loc='lower right')
#     plt.title('Training and Validation Accuracy')

#     plt.subplot(1, 2, 2)
#     plt.plot(epochs_range, loss, label='Training Loss')
#     plt.plot(epochs_range, val_loss, label='Validation Loss')
#     plt.legend(loc='upper right')
#     plt.title('Training and Validation Loss')
#     plt.show()

# # Call this function after training (if history object is available)
# # if 'history' in locals():
# #    plot_history(history)

## 6. Evaluate the Model

Assess the model's performance on unseen test data.

In [None]:
# Placeholder for model evaluation - uncomment and adapt when data is loaded
# Make sure x_test and y_test are defined in Section 2.

# print("\nEvaluating model on test data...")
# results = model.evaluate(x_test, y_test, batch_size=128)
# print(f"Test loss: {results[0]:.4f}")
# print(f"Test accuracy: {results[1]:.4f}")

## 7. Make Predictions

Use the trained model to make predictions on new, unseen images.

In [None]:
# Placeholder for making predictions
# You would typically load new images, preprocess them similarly to the training data,
# and then use model.predict()

# Example:
# new_images = ... # Load and preprocess new images
# predictions = model.predict(new_images)
# predicted_classes = np.argmax(predictions, axis=1)
# print(f"Predictions: {predictions}")
# print(f"Predicted classes: {predicted_classes}")

## Further Steps & Considerations:

- **Data Augmentation:** Increase the diversity of your training data by applying random transformations (rotations, flips, zooms) to your images. Keras has `ImageDataGenerator` or `tf.image` functions for this.
- **Transfer Learning:** Use a pre-trained model (e.g., VGG16, ResNet, MobileNet) as a feature extractor or fine-tune it on your Galaxy Zoo data. This can be very effective, especially with smaller datasets.
- **Hyperparameter Tuning:** Experiment with different learning rates, batch sizes, number of layers, filters, etc., to find the optimal configuration.
- **Regularization:** Techniques like L1/L2 regularization or more dropout can help prevent overfitting.
- **Callbacks:** Use Keras callbacks for tasks like saving the best model during training (`ModelCheckpoint`), early stopping (`EarlyStopping`), or adjusting the learning rate (`ReduceLROnPlateau`).
- **More Complex Architectures:** Explore more advanced CNN architectures if needed.