# Gliese_22

A neural network project based on classifying Deep Space Objects(DSOs). The project is currently on the development stages, where there's still so much work to be done in terms of expanding the datasets involved and adding new functionalities for computationally intensive tasks.

We now start by importing the required modules and libraries

    1. Tensorflow - the library that will be used when building the Neural Network,including training and testing. Using Tensorflow, we have access to Keras(the top level API which we use to access other low level methods like layers, optimizers,loss functions and preprocessing via the ImageDataGenerator class))
    2. Matplotlib - Python's MATLAB-like library that enables graphical representation of data and statistical analysis. It also contains libraries which can be used to view images as data plots by converting the pixel values into a multidimensional array and then plotting these as it were a graph.
    3. Pandas - a library that can be used to perform data visualisation. Here, we use it in exploratory data analysis by obtaining data files and importing them into our code as dataframe objects. Using Pandas, we can also perform set operations on our data, sorting data and even defining our own dataframes
    4. NumPy - a Python library that allows us to work with array datatypes in Python
    5.

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import random
import pandas as pd
import os, wget
 
# from sklearn.model_selection import train_test_split
from keras import layers, Model
from keras.preprocessing.image import ImageDataGenerator
# from keras.utils import img_to_array, load_img
from keras.optimizers import RMSprop
# from keras.losses import categorical_crossentropy

### Importing the data

We then import our datasets which will be used in this project.

In [None]:
stars = pd.read_csv('./data/Datasets/HYG_Catalogue.csv')

data_ngcic = pd.read_excel('./data/Datasets/DSO-NGCIC Classification.xlsx', sheet_name='NGCIC Classification')
data_dso = pd.read_excel('./data/Datasets/DSO-NGCIC Classification.xlsx', sheet_name='DSO Classification')

We can view details of our dataset, such as the columns, size of the data as well as how it has been structured

In [None]:
print('Messier Objects:', data_dso.shape)
print('NGCIC Objects:', data_ngcic.shape)
print('Data shape: ', data_ngcic.shape, data_dso.shape)
print('Details: ', data_ngcic.columns, data_dso.columns)

### Exploratory data analysis

There are multiple datasets involved in this project. This means that there's a lot of data to deal with. So to better understand our data, we derive as much information from our data as possible. Exploratory data analysis involves the process of building relations between any two variables, extracting subsets of the data and detecting anomalies that may later lead to future problems like overfitting or underfitting.

In the first analysis, we develop a night sky map where we use Right Ascension and Declination as the astronomical coordinate systems used in locating various objects in space.

In [None]:
# explore the dataset

data_ngcic['float ra'] = data_ngcic['ra hr'] + data_ngcic['ra min'] / 60 + data_ngcic['ra sec'] / 3600
data_ngcic['float dec'] = data_ngcic['dec deg '] + data_ngcic['dec min'] / 60 + data_ngcic['dec sec'] / 3600

# plot the data from various variables

plt.figure(figsize=(30, 10))
plt.ylim(-90, 90)
plt.xlabel('Right Ascension')
plt.ylabel('Declination')
plt.xticks(np.arange(0, 24, 1), labels=['0h', '1h', '2h', '3h', '4h', '5h', '6h', '7h', '8h', '9h',
                                        '10h', '11h', '12h', '13h', '14h', '15h', '16h', '17h', '18h',
                                        '19h', '20h', '21h', '22h', '23h'])
plt.yticks(np.arange(-90, 90, 10), labels=['-90deg', '-80deg', '-70deg', '-60deg', '-50deg', '-40deg', '-30deg',
                                           '-20deg', '-10deg', '0deg', '10deg', '20deg', '30deg', '40deg', '50deg',
                                           '60deg', '70deg', '80deg'])
plt.title('Plot for NGCIC Objects in the Sky', fontsize=20, fontweight='bold')
plt.scatter(data_ngcic['float ra'], data_ngcic['float dec'], s=0.09, c='red')
plt.scatter(stars['ra'], stars['dec'], s=0.01, c='blue')
plt.legend(['NGCIC Objects', 'Stars'], loc='upper right', fontsize=16, markerscale=10)
plt.show()

Next, we establish a relationship between visible magnitude(brightness) of an object against it's distance from Earth, in metric units. From this, we can tell that some objects tend to follow a common trend, such as galaxies, nebulae and star clusters. Such can be used to develop clustering techniques in Machine Learning

For the image data, we use matplotlib to define a 10x10 array of subplots, each with a random image from our './data/Images' dataset.

## Importing pretrained models: Transfer Learning

The Inception V3 model is one of the most sophisticated Neural Network models out there, with about 40 convolution layers deep, making it capable enough for any type of image classification, including space objects.