<a href="https://colab.research.google.com/github/PawseySC/cosmic-machines/blob/master/cosmic-machines.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory">

<img src="cosmic-machines-cropped.png" width="" align="left" vspace="20">
<h1 style="text-align: center;">Cosmic Machines</h1> 


<h3 style="text-align: center;">An Introduction to Deep Learning for observational astronomy</h3> 


Prepared by Lachlan Campbell from the [Pawsey Supercomputing Centre](https://pawsey.org.au) in Perth, Australia on the 21 August 2019.

## Table of Contents


1. [Introduction](#1.-Introduction)
2. [Some of the basics](#2.-Some-of-the-basics)
3. [Data cleaning](#3.-Data-cleaning)
4. [The plot thickens](#4.-The-plot-thickens)
6. [Training the model](#6.-Training-the-model)

## 1. Introduction

[[ go back to the top ]](#Table-of-contents)

Deep Learning is recognised for its ability to solve complex tasks, like image or language understanding. It has gained traction and success particularly with the recent developments in GPUs and TPUs (Tensor Processing Units), the increase in computing power and data in general, as well as the development of easy-to-use frameworks, like Keras and TensorFlow. We find Deep Learning in our everyday lives, e.g. in voice recognition, computer vision, recommender systems, reinforcement learning and many more. But what is it?

Deep learning is a specific subfield of machine learning, a new take on learning representations from data which puts an emphasis on learning successive “layers” of increasingly meaningful representations. How we represent the world can make the complex appear simple both to us humans and to the machine learning models we build. For example the Copernican heliocentric model that put the Sun at the center of the “Universe” as opposed to the prior geocentric model that put the Earth at the center. At its best, deep learning allows us to automate this step, removing Copernicus (i.e., expert humans) from the process:

![Representation_models](Images/representation_models.gif)
<div style="text-align:center;font-size:80%">Heliocentrism (1543) vs Geocentrism (6th century BC). <a href= 'https://www.youtube.com/watch?v=waexG16WZrE'>Trajectory source.</a></div>

## **Before running the code in this notebook it’s important to tell Colab that we want to use a GPU.**

Click on the ‘Runtime’ tab and select ‘Change runtime type’. A pop-up window will open up with a drop-down menu. Select ‘GPU’ from the menu and click ‘Save’.

<img src="change_runtime.png" width="250" align="left"><img src="gpu.png" width="350" align="center">

In order to run the notebook, you first need to install the necessary packages, using the code cell below. However, when you run the first cell, you will face a pop-up saying ‘Warning: This notebook was not authored by Google’; you should leave the default tick in the ‘Reset all runtimes before running’ check box and click on ‘Run Anyway’.
<img src="google_warning.png" width="500" align="center">

In [None]:
!curl -s https://course.fast.ai/setup/colab | bash

Download the images from Google Drive:

In [None]:
!gdown https://drive.google.com/uc?id=1kXTFPIOp6ctIn4RJ8RoxMhklulpC6SV9

If successful you should see something similar to below. If there is a connection error during the download, simply run the cell again until the download is complete.
![download_images](download_images.png)
<div style="text-align:center;font-size:80%"> <a href= ''></a></div>

Download the labels:

In [None]:
!gdown https://drive.google.com/uc?id=1cBCgj-9bnsW91X4vEiuNm8Ms4XJ3MfYo

After the downloads have completed, if we look at our working directory we should see
![list_files](list_files.png)
<div style="text-align:center;font-size:80%"> <a href= ''></a></div>

In [None]:
!ls -l

Next, you need to unzip the files you’ve just downloaded:

In [None]:
!unzip training_solutions_rev1.zip

In [None]:
!unzip images_training_rev1.zip

## Data exploration

Labelled galaxy images from Galaxy Zoo 

![classification_decision_tree](classification_decision_tree.png)
<div style="text-align:center;font-size:80%"> <a href= ''></a></div>

![classification_flowchart](classification_flowchart.png)
<div style="text-align:center;font-size:80%"> <a href= ''></a></div>

### Classes derived from the Galaxy Zoo decision tree
1.1 Round
1.2 Features or Disk
1.3 Star or Artifact
2.1 Edge-on disk
2.2 Not edge-on disk
3.1 Central bar
3.2 No central bar
4.1 Spiral arm
4.2 No spiral arm
5.1 No bulge
5.2 Just noticeable bulge
5.3 Obvious bulge
5.4 Dominant bulge
6.1 Something odd
6.2 Nothing odd
7.1 Round
7.2 In-between
7.3 Cigar shaped
8.1 Ring
8.2 Lens or arc
8.3 Disturbed
8.4 Irregular
8.5 Other
8.6 Merger
8.7 Dust lane
9.1 Rounded central bulge
9.2 Boxy central bulge
9.3 No central bulge
10.1 Tightly wound spiral arms
10.2 Medium wound spiral arms
10.3 Loosely wound spiral arms
11.1 One Spiral arm
11.2 Two spiral arms
11.3 Three spiral arms
11.4 Four spiral arms
11.5 More than four spiral arms
11.6 Can't tell

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
# Import pandas
import pandas as pd

# Read the data file
data = pd.read_csv('training_solutions_rev1.csv')

# Display the first few rows
data.head()

In [None]:
# Get records containing any missing values
data[data.isnull().any(axis=1)]

In [None]:
# Display dataset summary statistics
data.describe()

In [None]:
classdf = data.copy()

In [None]:
ring_df = classdf[['GalaxyID', 'Class8.1']]
#export_csv = ring_df.to_csv(r'/Users/lachlan/Documents/ADACS/Cosmic Machines/Galaxy classification/ring.csv', index = None, header = True)
ring_df = ring_df[ring_df['Class8.1'] > 0.7]
ring_df['Label'] = 'ring'
ring_df.drop('Class8.1', axis=1, inplace=True)
ring_df.shape

In [None]:
merger_df = classdf[['GalaxyID', 'Class8.6']]
#export_csv = ring_df.to_csv(r'/Users/lachlan/Documents/ADACS/Cosmic Machines/Galaxy classification/ring.csv', index = None, header = True)
merger_df = merger_df[merger_df['Class8.6'] > 0.65]
merger_df['Label'] = 'merger'
merger_df.drop('Class8.6', axis=1, inplace=True)
merger_df.shape

In [None]:
spiral_one_arm_df = classdf[['GalaxyID', 'Class11.1']]
#export_csv = ring_df.to_csv(r'/Users/lachlan/Documents/ADACS/Cosmic Machines/Galaxy classification/ring.csv', index = None, header = True)
spiral_one_arm_df = spiral_one_arm_df[spiral_one_arm_df['Class11.1'] > 0.5]
spiral_one_arm_df['Label'] = 'spiral - one arm'
spiral_one_arm_df.drop('Class11.1', axis=1, inplace=True)
spiral_one_arm_df.shape

In [None]:
barred_spirals_df = classdf[['GalaxyID', 'Class3.1']]
#export_csv = ring_df.to_csv(r'/Users/lachlan/Documents/ADACS/Cosmic Machines/Galaxy classification/ring.csv', index = None, header = True)
barred_spirals_df = barred_spirals_df[barred_spirals_df['Class3.1'] > 0.85]
barred_spirals_df['Label'] = 'barred'
barred_spirals_df.drop('Class3.1', axis=1, inplace=True)
barred_spirals_df.shape

In [None]:
elliptical_df = classdf[['GalaxyID', 'Class1.1', 'Class7.2']]
#export_csv = ring_df.to_csv(r'/Users/lachlan/Documents/ADACS/Cosmic Machines/Galaxy classification/ring.csv', index = None, header = True)
elliptical_df = elliptical_df[(elliptical_df['Class1.1'] > 0.9) & (classdf['Class7.2'] > 0.8)]
elliptical_df['Label'] = 'elliptical'
elliptical_df.drop(['Class1.1', 'Class7.2'], axis=1, inplace=True)
elliptical_df.shape

In [None]:
edge_on_disk_df = classdf[['GalaxyID', 'Class2.1']]
#export_csv = ring_df.to_csv(r'/Users/lachlan/Documents/ADACS/Cosmic Machines/Galaxy classification/ring.csv', index = None, header = True)
edge_on_disk_df = edge_on_disk_df[edge_on_disk_df['Class2.1'] > 0.95]
edge_on_disk_df['Label'] = 'edge-on disk'
edge_on_disk_df.drop('Class2.1', axis=1, inplace=True)
edge_on_disk_df.shape

In [None]:
df = pd.concat([ring_df, merger_df, barred_spirals_df, elliptical_df, edge_on_disk_df], ignore_index=True)
df.shape

## 6. Training the model

Use the fast-ai framework and pytorch

In [None]:
from fastai.vision import *
from fastai.metrics import error_rate

In [None]:
bs = 64
# bs = 16   # uncomment this line if you run out of memory even after clicking Kernel->Restart

In [None]:
#data = ImageDataBunch.from_folder('galaxy_sample', train='.', valid_pct = 0.2, ds_tfms=get_transforms(), size = 224, bs=bs).normalize(imagenet_stats)
data = ImageDataBunch.from_df(path='.', df=df, folder='images_training_rev1', suffix='.jpg', valid_pct = 0.2, ds_tfms=get_transforms(), size = 224, bs=bs).normalize(imagenet_stats)

In [None]:
print(data.classes)
len(data.classes),data.c

In [None]:
data.show_batch(rows=3, figsize=(11,9))

In [None]:
learn = cnn_learner(data, models.resnet34, metrics=error_rate)

In [None]:
learn.model

In [None]:
learn.fit_one_cycle(4)

In [None]:
learn.unfreeze()

In [None]:
learn.lr_find()

In [None]:
learn.recorder.plot()

In [None]:
learn.unfreeze()
learn.fit_one_cycle(10, max_lr=slice(1e-3,1e-2))

In [None]:
interp = ClassificationInterpretation.from_learner(learn)

losses,idxs = interp.top_losses()

len(data.valid_ds)==len(losses)==len(idxs)

In [None]:
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)