# Gliese_22

A neural network project based on classifying Deep Space Objects(DSOs). The project is currently on the development stages, where there's still so much work to be done in terms of expanding the datasets involved and adding new functionalities for computationally intensive tasks.

We now start by importing the required modules and libraries

    1. Tensorflow - the library that will be used when building the Neural Network,including training and testing. Using Tensorflow, we have access to Keras(the top level API which we use to access other low level methods like layers, optimizers,loss functions and preprocessing via the ImageDataGenerator class))
    2. Matplotlib - Python's MATLAB-like library that enables graphical representation of data and statistical analysis. It also contains libraries which can be used to view images as data plots by converting the pixel values into a multidimensional array and then plotting these as it were a graph.
    3. Pandas - a library that can be used to perform data visualisation. Here, we use it in exploratory data analysis by obtaining data files and importing them into our code as dataframe objects. Using Pandas, we can also perform set operations on our data, sorting data and even defining our own dataframes
    4. NumPy - a Python library that allows us to work with array datatypes in Python
    5.

In [3]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random
import pandas as pd
import os, wget
 
# from sklearn.model_selection import train_test_split
from keras import layers, Model
from keras.preprocessing.image import ImageDataGenerator
# from keras.utils import img_to_array, load_img
from keras.optimizers import RMSprop
# from keras.losses import categorical_crossentropy

## Importing pretrained models: Transfer Learning

The Inception V3 model is one of the most sophisticated Neural Network models out there, with about 40 convolution layers deep, making it capable enough for any type of image classification, including space objects. Developed by Google in 2014, it allows one to use it as a base model from which similar models can be trained on different types of images. In this case, for example, the Inception V3 model, having being trained on 14 million images, can enable me to build a CNN that is trained to classify images of DSOs

In [None]:
# import and define the InceptionV3 model

!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
    -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5

from keras.applications.inception_v3 import InceptionV3

local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'

pre_trained_model = InceptionV3(input_shape=(150, 150, 3), include_top=False, weights=None)
pre_trained_model.load_weights(local_weights_file)

for layer in pre_trained_model.layers:
    layer.trainable = False

last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape: ', last_layer.output_shape)
last_output = last_layer.output

### Importing the data

We then import our datasets which will be used in this project.

In [4]:
stars = pd.read_csv('./data/Datasets/HYG_Catalogue.csv')

data_ngcic = pd.read_excel('./data/Datasets/DSO-NGCIC Classification.xlsx', sheet_name='NGCIC Classification')
data_dso = pd.read_excel('./data/Datasets/DSO-NGCIC Classification.xlsx', sheet_name='DSO Classification')

We can view details of our dataset, such as the columns, size of the data as well as how it has been structured

In [None]:
print('Messier Objects:', data_dso.shape)
print('NGCIC Objects:', data_ngcic.shape)
print('Data shape: ', data_ngcic.shape, data_dso.shape)
print('Details: ', data_ngcic.columns, data_dso.columns)

### Exploratory data analysis

There are multiple datasets involved in this project. This means that there's a lot of data to deal with. So to better understand our data, we derive as much information from our data as possible. Exploratory data analysis involves the process of building relations between any two variables, extracting subsets of the data and detecting anomalies like null entries or inconsistent data in terms of datatypes, problems that may later lead to future problems like overfitting or underfitting.

In the first analysis, we develop a night sky map where we use Right Ascension and Declination as the astronomical coordinate systems used in locating various objects in space.

In [None]:
# explore the dataset

data_ngcic['float ra'] = data_ngcic['ra hr'] + data_ngcic['ra min'] / 60 + data_ngcic['ra sec'] / 3600
data_ngcic['float dec'] = data_ngcic['dec deg '] + data_ngcic['dec min'] / 60 + data_ngcic['dec sec'] / 3600

# plot the data from various variables

plt.figure(figsize=(30, 10))
plt.ylim(-90, 90)
plt.xlabel('Right Ascension')
plt.ylabel('Declination')
plt.xticks(np.arange(0, 24, 1), labels=['0h', '1h', '2h', '3h', '4h', '5h', '6h', '7h', '8h', '9h',
                                        '10h', '11h', '12h', '13h', '14h', '15h', '16h', '17h', '18h',
                                        '19h', '20h', '21h', '22h', '23h'])
plt.yticks(np.arange(-90, 90, 10), labels=['-90deg', '-80deg', '-70deg', '-60deg', '-50deg', '-40deg', '-30deg',
                                           '-20deg', '-10deg', '0deg', '10deg', '20deg', '30deg', '40deg', '50deg',
                                           '60deg', '70deg', '80deg'])
plt.title('Plot for NGCIC Objects in the Sky', fontsize=20, fontweight='bold')
plt.scatter(data_ngcic['float ra'], data_ngcic['float dec'], s=0.09, c='red')
plt.scatter(stars['ra'], stars['dec'], s=0.01, c='blue')
plt.legend(['NGCIC Objects', 'Stars'], loc='upper right', fontsize=16, markerscale=10)
plt.show()

Next, we establish a relationship between visible magnitude(brightness) of an object against it's distance from Earth, in metric units. From this, we can tell that some objects tend to follow a common trend, such as galaxies, nebulae and star clusters. Such can be used to develop clustering techniques in Machine Learning

For the image data, we use matplotlib to define a 10x10 array of subplots, each with a random image from our './data/Images' dataset.

In [None]:
base_dir = './data/Images'

train_dir = os.path.join( base_dir, 'training_set')
validation_dir = os.path.join( base_dir, 'validation_set')

train_dark_nebula_dir = os.path.join(train_dir, 'dark_nebula') 
train_diffuse_nebula_dir = os.path.join(train_dir, 'diffuse_nebula') 
train_dwarf_elliptical_dir = os.path.join(train_dir, 'dwarf_elliptical') 
train_elliptical_galaxy_dir = os.path.join(train_dir, 'elliptical_galaxy') 
train_globular_cluster_dir = os.path.join(train_dir, 'globular_cluster') 
train_interacting_galaxy_dir = os.path.join(train_dir, 'interacting_galaxy') 
train_irregular_galaxy_dir = os.path.join(train_dir, 'irregular_galaxy') 
train_open_cluster_dir = os.path.join(train_dir, 'open_cluster') 
train_planetary_nebula_dir = os.path.join(train_dir, 'planetary_nebula') 
train_spiral_galaxy_dir = os.path.join(train_dir, 'spiral_galaxy') 
train_supernova_rem_dir = os.path.join(train_dir, 'supernova_rem') 


validation_dark_nebula_dir = os.path.join(validation_dir, 'dark_nebula') 
validation_diffuse_nebula_dir = os.path.join(validation_dir, 'diffuse_nebula') 
validation_dwarf_elliptical_dir = os.path.join(validation_dir, 'dwarf_elliptical') 
validation_elliptical_galaxy_dir = os.path.join(validation_dir, 'elliptical_galaxy') 
validation_globular_cluster_dir = os.path.join(validation_dir, 'globular_cluster') 
validation_interacting_galaxy_dir = os.path.join(validation_dir, 'interacting_galaxy') 
validation_irregular_galaxy_dir = os.path.join(validation_dir, 'irregular_galaxy') 
validation_open_cluster_dir = os.path.join(validation_dir, 'open_cluster') 
validation_planetary_nebula_dir = os.path.join(validation_dir, 'planetary_nebula') 
validation_spiral_galaxy_dir = os.path.join(validation_dir, 'spiral_galaxy') 
validation_supernova_rem_dir = os.path.join(validation_dir, 'supernova_rem') 


# Add our data-augmentation parameters to ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255.,
                                   rotation_range = 40,
                                   width_shift_range = 0.2,
                                   height_shift_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True)

# Note that the validation data should not be augmented!
test_datagen = ImageDataGenerator( rescale = 1.0/255. )

# Flow training images in batches of 20 using train_datagen generator
train_generator = train_datagen.flow_from_directory(train_dir,
                                                    batch_size = 20,
                                                    class_mode = 'categorical', 
                                                    target_size = (150, 150))     

# Flow validation images in batches of 20 using test_datagen generator
validation_generator =  test_datagen.flow_from_directory( validation_dir,
                                                          batch_size  = 20,
                                                          class_mode  = 'categorical', 
                                                          target_size = (150, 150))

In [None]:
train_dark_nebula_names = os.listdir(train_dark_nebula_dir)
train_diffuse_nebula_names = os.listdir(train_diffuse_nebula_dir)
train_planetary_nebula_names = os.listdir(train_planetary_nebula_dir)
train_spiral_galaxy_names = os.listdir(train_spiral_galaxy_dir)

# Parameters for our graph; we'll output images in a 4x4 configuration
nrows = 8
ncols = 8

# Index for iterating over images
pic_index = 0

fig = plt.gcf()
fig.set_size_inches(ncols * 4, nrows * 4)

pic_index += 8
next_dark_nebula_pix = [os.path.join(train_dark_nebula_dir, fname)
                        for fname in train_dark_nebula_names[pic_index-8:pic_index]]
next_diffuse_nebula_pix = [os.path.join(train_diffuse_nebula_dir, fname)
                           for fname in train_diffuse_nebula_names[pic_index-8:pic_index]]
next_planetary_nebula_pix = [os.path.join(train_planetary_nebula_dir, fname)
                             for fname in train_planetary_nebula_names[pic_index-8:pic_index]]
next_spiral_galaxy_pix = [os.path.join(train_spiral_galaxy_dir, fname)
                          for fname in train_spiral_galaxy_names[pic_index-8:pic_index]]

for i, img_path in enumerate(next_dark_nebula_pix+next_planetary_nebula_pix+next_diffuse_nebula_pix+next_spiral_galaxy_pix):
  # Set up subplot; subplot indices start at 1
  sp = plt.subplot(nrows, ncols, i + 1)
  sp.axis('Off')  # Don't show axes (or gridlines)

  img = mpimg.imread(img_path)
  lum_img = img[:, :, 0]
  plt.imshow(lum_img)
  plt.colorbar()

plt.show()


In [None]:
from tensorflow.keras import Model

x = layers.Flatten()(last_output)
x = layers.Dense(1024, activation='relu')(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(11, activation='softmax')(x)
model = Model(pre_trained_model.input, x)


In [None]:
model.compile(optimizer=RMSprop(learning_rate=0.0001),
              loss='categorical_crossentropy',
              metrics=['accuracy'])


In [None]:
history = model.fit(
    train_generator,
    validation_data=validation_generator,
    # steps_per_epoch = 100,
    epochs=20,
    # validation_steps = 50
)


In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.figure(figsize=(20, 10))
plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend(loc='lower right')
plt.figure()

plt.show()