# Deep learning for Star-Galaxy separation
https://arxiv.org/pdf/1608.04369.pdf


Most existing star-galaxy classifiers require careful feature extraction and selection. The latest advances in machine learning that use deep convolutional neural networks allow a machine to automatically learn the features directly from data, minimizing the need for input from human experts. In these lab we present a star-galaxy classification framework that uses deep convolutional neural networks to solve this problem

<img src="https://old.ipac.caltech.edu/2mass/releases/spr99/doc/test/jarrett2/old/star_gal/jhk_lowdensity.gif" style="width: 400px;"/>

# Section 1: Setup

### Import libraries

In [None]:
import matplotlib as mpl
mpl.use('Agg')
%matplotlib inline 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score

#from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers.core import Dense, Dropout,  Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.optimizers import SGD, Adam
from keras.layers.noise import GaussianNoise
from keras import backend as K
K.set_image_dim_ordering('th')
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint
from keras.constraints import maxnorm
from keras.preprocessing.image import ImageDataGenerator
import pandas as pd

### SET PATH
Set the path to the folder containing the inputs datasets:

In [None]:
pathinData = 'input_starvsgalaxy/

# Section 2: Load Data

Let's now load the data and define the different classes we want to classify

In [None]:
X_tot = np.load(pathinData+'StarGalaxy_Images1.npy')
Y_cat =  pd.read_pickle(pathinData+'StarGalaxy_pandas1')

Y_cat.loc[Y_cat['class'] == 'STAR', 'class'] = 1.
Y_cat.loc[Y_cat['class'] == 'QSO', 'class'] = 1.
Y_cat.loc[Y_cat['class'] == 'GALAXY', 'class'] = 0.



x = Y_cat.as_matrix(['class'])



Y_tot = []
for i in range (len(x)):
    if x[i] == 0:
        Y_tot = np.append(Y_tot,0.)
    else:
        Y_tot = np.append(Y_tot,1.)
    

        

X_tot = np.moveaxis(X_tot, 3, 1)

Divide in training and validation. Use 1/10 for the validation.

In [None]:
# Spliting in Training and Test datasets
X_train = X_tot[0:len(X_tot)//5*4,:,:,:]
Y_train = Y_tot[0:len(Y_tot)//5*4,]
X_test = X_tot[len(X_tot)//5*4:,:,:,:]
Y_test = Y_tot[len(Y_tot)//5*4:,]

# Section 3: Build the CNN models
Define the model yourself based on the summary. 

In [None]:
def Model():
    
    #Enter your code here
     
    return model

# Section 4: Build a training module

Fit the model. You have to decide which loss function to use, which optimizer, whether or not to use data augmentation, EarlyStopping, etc...

In [None]:
Model()

In [None]:
#==================
# FIT MODEL
#==================
def Fit_Model(X_train, X_test,  Y_train, Y_test,  model):
    
    #Insert your code here
  
     return model


# Section 5: Train and Validate

In [None]:
model =Model()
model.summary()
print('-'*30)
print('Fitting model...')
print('-'*30)
model = Fit_Model(X_train, X_test, Y_train, Y_test, model)
#then SAVE the MODEL 
model.save(pathinData+'Model.h5')


# Section 6: Evaluate the model

Let's now evaluate the model by plotting the ROC curve

In [None]:
pred=model.predict_classes(X_test)
print(classification_report(pred,Y_test))

from sklearn.metrics import accuracy_score
print("Accuracy score ", accuracy_score(Y_test, pred))

pred2=model.predict_proba(X_test)


from sklearn.metrics import roc_curve,auc
fpr,tpr,thresholds=roc_curve(Y_test, pred2,pos_label=None, sample_weight=None)
auc = auc(fpr, tpr)

plt.figure(figsize=(12,12))
plt.plot(fpr, tpr, color = 'r', label = "ROC curve")
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label = "Random guess")
plt.legend(loc = "lower right")
plt.xlabel('False Positive rate', fontsize = 12)
plt.ylabel('True Positive rate', fontsize = 12)
plt.text(0.68, 0.1, 'AUC: %.3f' % auc)
plt.savefig('ROC.png', bbox_inches='tight')


# Section 7: Calculate the Youden value

In [None]:
Youden_index = tpr - fpr
i_max = np.argmax(Youden_index)
cut_value = thresholds[i_max]

print("The optimal cut value is: " + str(cut_value))