# Galaxy Classification Neural Network

This notebook trains a convolutional neural network to classify galaxies using the Galaxy10 DECals dataset. It explores different optimizers and learning rates to find the best combination.

In [1]:
# Import necessary libraries
import numpy as np
import tensorflow as tf
from sklearn.model_selection import train_test_split
from importData import load_galaxy_data
from PIL import Image
from datetime import datetime

2024-06-12 21:38:27.895310: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-06-12 21:38:27.937792: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
  from .autonotebook import tqdm as notebook_tqdm


## Data Preprocessing

The following function preprocesses the dataset by resizing images to 64x64 pixels and normalizing them. The dataset is then split into training and testing sets.

In [2]:
def preprocess_data(dataset):
    new_size = (64, 64)
    images = np.array([np.array(Image.fromarray(np.array(image['image'])).resize(new_size)) for image in dataset['train']])
    labels = np.array(dataset['train']['label'])

    images = images / 255.0

    return train_test_split(images, labels, test_size=0.2, random_state=42)

## Model Building

The following function builds a convolutional neural network model with specified parameters.

In [3]:
def build_model(layer1_neuron, layer2_neuron, layer3_neuron, layer4_neuron, lr, kernel_size, activation_function, dropout_value, opt_name):
    model = tf.keras.models.Sequential([
        tf.keras.layers.Input(shape=(64, 64, 3)),
        tf.keras.layers.Conv2D(layer1_neuron, kernel_size, activation=activation_function),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(layer2_neuron, kernel_size, activation=activation_function),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Conv2D(layer3_neuron, kernel_size, activation=activation_function),
        tf.keras.layers.MaxPooling2D((2, 2)),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(layer4_neuron, activation=activation_function),
        tf.keras.layers.Dropout(dropout_value),
        tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.compile(optimizer=opt_name(learning_rate=lr),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

## Parameters

Here we define the parameters for the neural network, optimizers, and learning rates.

In [2]:
# Parameters
nb_epochs = 15  # Number of epochs
layer1_neuron = 32
layer2_neuron = 64
layer3_neuron = 128
layer4_neuron = 64

kernel_size = (3, 3)
activation_function = 'relu'
dropout_value = 0.5

optimizers = {
    'SGD': tf.keras.optimizers.SGD,
    'Adagrad': tf.keras.optimizers.Adagrad,
    'Adam': tf.keras.optimizers.Adam
    # Add others optimizers to test
}

learning_rates = [1e-3, 1e-2, 1e-1] # Add others learning rates to test

NameError: name 'tf' is not defined

## Load Dataset

Load the Galaxy10 DECals dataset and preprocess it.

In [5]:
print("\nLoading Dataset\n")
dataset = load_galaxy_data()
train_images, test_images, train_labels, test_labels = preprocess_data(dataset)


Loading Dataset



## Training and Evaluation

Iterate through each optimizer and learning rate combination, train the model, and evaluate its performance. TensorBoard is used for monitoring the training process.

In [7]:
best_opt = ''
best_lr = 0
best_acc = 0

print("\nIterate on each optimizer and each learning_rate\n")
for opt_name, opt_class in optimizers.items():
    for lr in learning_rates:
        # Create a logs directory
        log_dir = f"runs/{opt_name}_lr_{lr}_" + datetime.now().strftime("%Y%m%d-%H%M%S")
        tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
        print(f"\nCreated {log_dir} file\n")

        # Training and Testing neural network
        print("\nTraining...\n")
        model = build_model(layer1_neuron, layer2_neuron, layer3_neuron, layer4_neuron, lr, kernel_size, activation_function, dropout_value, opt_class)
        model.fit(train_images, train_labels, epochs=nb_epochs, validation_data=(test_images, test_labels), callbacks=[tensorboard_callback])
        test_loss, test_acc = model.evaluate(test_images, test_labels)
        if best_acc < test_acc:
            best_acc = test_acc
            best_lr = lr
            best_opt = opt_name
        print(f"Optimizer: {opt_name}, lr: {lr} - Test Accuracy: {test_acc*100:.2f}%\n")

print(f"\nBest combination: (Optimizer: {best_opt} - Learning Rate: {best_lr} - Accuracy: {best_acc*100:.4f})\n")


Iterate on each optimizer and each learning_rate


Created runs/Adam_lr_0.001_20240612-214021 file


Training...



2024-06-12 21:40:23.155274: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:984] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2024-06-12 21:40:23.155849: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


Epoch 1/15
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m17s[0m 39ms/step - accuracy: 0.1773 - loss: 2.1669 - val_accuracy: 0.2753 - val_loss: 1.8373
Epoch 2/15
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 38ms/step - accuracy: 0.2764 - loss: 1.8948 - val_accuracy: 0.4015 - val_loss: 1.6370
Epoch 3/15
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 37ms/step - accuracy: 0.3509 - loss: 1.7040 - val_accuracy: 0.4444 - val_loss: 1.4754
Epoch 4/15
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 37ms/step - accuracy: 0.4012 - loss: 1.5561 - val_accuracy: 0.4795 - val_loss: 1.3557
Epoch 5/15
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 37ms/step - accuracy: 0.4423 - loss: 1.4577 - val_accuracy: 0.5315 - val_loss: 1.2636
Epoch 6/15
[1m400/400[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m14s[0m 35ms/step - accuracy: 0.4905 - loss: 1.3602 - val_accuracy: 0.5427 - val_loss: 1.2183
Epoch 7/15
[1m4

## Running TensorBoard

To monitor the training process, run the following command in your terminal:
```bash
tensorboard --logdir=runs
```
Then open a web browser and go to the URL provided by TensorBoard (typically `http://localhost:6006`).