<font size=6>**Deep Learning**</font>

**_Deep Learning_** is a type of Machine Learning which is characterized by being **deep**.

Meaning, it uses **multiple layers** to process the input information (Figure 0).

<table><tr>
    <td width=640>
        <img src="images/Simple_vs_Deep.png">
        <center>
            <br>
            Figure 0.  A simple <i>feedforward</i> Neural Network compared with a Deep <i>feedforward</i> Neural Network.<br>
            (From <a href="https://thedatascientist.com/what-deep-learning-is-and-isnt/">here</a>)
        </center>
    </td>
</tr></table>

The actual way the depth is designed can be very different. It could be achieved e.g. by **stacking** sequential layers (_feedforward neural networks_), via **recurrent** layers (_recurrent neural networks_), via a "**mix**" of these two approaches (_U-nets_), and many other ways.

Don't worry: we will explain how to _computationally_ create neurons/layers [later](#Generic_Architecture_and_Neurons).

# Why Deep Learning is cool

It is not, we are geeks, and that's the truth.

However ... we do live in the era of "Big Data":

In [1]:
import pandas as pd

df_surveys = pd.DataFrame([
    ['2MASS',                                  1997,    20, 25.4],
    ['Sloan Digital Sky Survey (SDSS)',        2000,   200, 50],
    ['Large Synoptic Survey Telescope (LSST)', 2023,  30e3, 200e3],
    ['Square Kilometer Array (SKA)',           2027, 150e3, 4.6e6]
], columns=['Sky Survey Project', 'First Light', 'Velocty (GB/day)', 'Volume (TB)']).reset_index(drop=True)

df_surveys[df_surveys.columns[1:]] = df_surveys[df_surveys.columns[1:]].astype(int)

display(df_surveys)

Unnamed: 0,Sky Survey Project,First Light,Velocty (GB/day),Volume (TB)
0,2MASS,1997,20,25
1,Sloan Digital Sky Survey (SDSS),2000,200,50
2,Large Synoptic Survey Telescope (LSST),2023,30000,200000
3,Square Kilometer Array (SKA),2027,150000,4600000


In [2]:
import cutecharts.charts as ctc

chart = ctc.Line("Survey size evolution", width='500px')
chart.set_options(labels=list(df_surveys['First Light']), x_label='Year', y_label='Volume (TB)')
chart.add_series('year',list(df_surveys['Volume (TB)']))
chart.render_notebook()

We cannot expect to humanly inspect these data and derive the intuition for the rules which categorize them.

$\rightarrow$ We have to leverage on:

- the large number of examples

- algorithms that can abstract arbitrarily complex rules

## So how does Deep Learning address big data issues?

The basic idea is that each layer constructs **new features**.

In practice, Deep Learning systems include implicit **feature engeneering** _on top_ of the learning task (e.g., classificaton or regression).<br>

In this way, they are a step forward with respect to "classic" ML approaches (Figure 1).

<table><tr>
    <td width=480>
        <img src="images/Deep_Feature_Engeneering.png">
        <center>
            <br>
            Figure 1.  A Deep Neural Network seen as a combination of feature extractor + learner (e.g. classifier or regressor).<br>
            (From <a href="https://stats.stackexchange.com/questions/562466/neural-networks-automatically-do-feature-engineering-how/">here</a>)
        </center>
    </td>
</tr></table>

From this perspective, the connections between the network neurons represent **potential correlations** betweeen features.

<u>What are the implications?</u>

The scientist **does _not_ have to get detailed insight of the problem** to build the proper features or select the proper classifier<br>
$\rightarrow$ the DL system does it all for us!

This comes particularly handy when we deal with databases with **millions of objects** and **hundreds of features**!

<u>References</u>

In case you are curious, it has been proven that Deep Neural Networks are indeed "**_universal approximators_**"
(e.g. [Kurt Hornik (1991), Neural Networks, 4, 2](https://www.sciencedirect.com/science/article/abs/pii/089360809190009T?via%3Dihub)), meaning that they can in principle explain any linear or non-linear relation beteen the features and the target.

## Some example applications

Indeed, Deep Learning (hereafter, **DL**) is being used to solve very _different_ problems, e.g.:

- **Self-Driving cars**

<table><tr>
    <td width=640>
        <img src="images/DL_Self_Driving.png">
        <center>
            <br>
            Figure 2a.  NVIDIA's driverless car simulator.<br>
            (From <a href="https://images.nvidia.com/content/tegra/automotive/images/2016/solutions/pdf/end-to-end-dl-using-px.pdf/">"End to End Learning for Self-Driving Cars" (2016)</a>)
        </center>
    </td>
</tr></table>

- **Protein Structure Prediction**

<table><tr>
    <td width=640>
        <img src="images/DL_AlphaFold.jpg">
        <center>
            <br>
            Figure 2b.  Deep Mind's Alpha Fold network for the prediction of molecular structures of proteins.
            Original paper: <a href="https://www.nature.com/articles/s41586-021-03819-2">Jumper, J., Evans, R., Pritzel, A. et al. 2021,  Nature, 596, 583</a>.<br>
            (From <a href"https://www.deepmind.com/blog/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology">Deep Mind's blog</a>)
        </center>
    </td>
</tr></table>


- **Natural Language Processing, translation, and text generation**


<table><tr>
    <td width=640>
        <img src="images/DL_NLP.png">
        <center>
            <br>
            Figure 2c.  Google's unified text-to-text transformer.<br>
            (From <a href"https://arxiv.org/abs/1910.10683">Raffel et al. 2021, arxiv/1910.10683</a>)
        </center>
    </td>
</tr></table>


- **Computer Vision (lots and lots of it!)**

<table><tr>
    <td width=640>
    <img src="https://scontent.fath3-4.fna.fbcdn.net/v/t39.2365-6/10000000_3947476245325303_7673388906041049088_n.png?_nc_cat=107&amp;ccb=1-7&amp;_nc_sid=ad8a9d&amp;_nc_ohc=_le0vi99JGoAX_AwIJA&amp;_nc_ht=scontent.fath3-4.fna&amp;oh=00_AT-sEoik6hHpnpDMf7YSQw0iuzofrF1QJy9bvNZZ6OigoA&amp;oe=62C8BF4D" alt="Detectron example">
        <center>
        Figure 2d.  Facebook's Detectron2 for multiple computer vision tasks.<br>
        (From <a href"https://ai.facebook.com/tools/detectron2/">Meta AI blog</a>)
        </center>
    </td>
</tr></table>


... and many, many other _scary_ applications like:

- **Deep Fakes**

<table><tr>
    <td width=640>
        <img src="images/DL_DeepFake.jpg">
        <center>
        Figure 2e.  Deep Fakes can be used to bring back actors from when the cinema was actually good (i.e., before 1999!), but also to produce false evidence.  Luckily, there are already ML efforts to uncover Deep Fakes, e.g. <a href"https://arxiv.org/abs/2101.01456/">Zi et al. 2021, arXiv/2101.01456
</a>.
        </center>
    </td>
</tr></table>

- **Video Games**

<table><tr>
    <td width=640>
        <img src="https://assets-global.website-files.com/621e749a546b7592125f38ed/62271e2f604e640534eeca99_AlphaStar%2003.gif">
        <center>
        Figure 2f.  Deep Mind's Alpha Star absolutely demolishng a human player who later claimed ... erhm ... that the internet connection was bad that day because ... ehrm, mmmh ... someone in the house was watching Netflix.<br>
        (From <a href"https://www.deepmind.com/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii">Deep Mind's blog</a>)
        </center>
    </td>
</tr></table>

- - -

Catching up with all the new DL developments is becoming physically impossible, but you can follow great channels like [Two Minute Papers](https://www.youtube.com/c/K%C3%A1rolyZsolnai/featured) to try and stay updated.

## Deep Learning in Astronomy


The application of DL in Astronomy is still at an **amatour level**, with respect to what happens in the industry (prejudice against the "black box"?).<br>  However ... 

Astronomy is the perfect ML lab because it offers:
- tough problems to solve
- large data

In fact, Deep Learning publications are **exploding** in Astronomy (Figure 2)!

<table><tr>
    <td width=480>
        <img src="images/Deep_Learning_astro_papers.png">
        <center>
            <br>
            Figure 3. Number of astronomy papers containing the text "Deep Learning" in their abstracts.<br>
            (From <a href="https://ui.adsabs.harvard.edu">NASA ADS</a>)
        </center>
    </td>
</tr></table>

<font size=3><u>**Some notable examples**</u><font>

<u>Galaxy Classification</u>
    
- [Dieleman et al. (2015), MNRAS, 450, 1441](https://ui.adsabs.harvard.edu/abs/2015MNRAS.450.1441D/abstract) $-$ calculate probabilities for the 37 Galaxy Zoo possible answers

    - **training**: classification of 61,578 JPEG images from SDSS with GZ labels
    - **architecture**: standard CNN

<table><tr>
    <td width=420>
        <img src="images/Galaxy_Zoo_flowchart.png">
        <center>
            <br>
            Figure 4a. Galaxy Zoo classification tree.<br>
            (From <a href="https://ui.adsabs.harvard.edu/abs/2013yCat..74352835W/abstract">Willet et al. (2013)</a>)
        </center>
    </td>
    <td width=480>
        <img src="images/Dieleman_Fig11.png">
        <center>
            <br>
            Figure 4b. Activation of the CNN layers.<br>
            (From <a href="https://ui.adsabs.harvard.edu/abs/2015MNRAS.450.1441D/abstract">Dieleman et al. (2015)</a>)
        </center>
    </td>
</tr></table>
    
- [Ackerman et al. 2017, MNRAS, 479, 415](https://ui.adsabs.harvard.edu/abs/2018MNRAS.479..415A/abstract) $-$ identify mergers

    - **training**: classification of ~4000 JPEG images from SDSS with GZ labels
    - **architecture**: CNN with transfer learning
    
<table><tr>
    <td width=480>
        <img src="images/Ackerman_Fig8.png">
        <center>
            <br>
            Figure 5. Some galaxy pairs confidently identified as mergers.<br>
            (From <a herf="https://ui.adsabs.harvard.edu/abs/2018MNRAS.479..415A/abstract">Ackerman et al. (2017)</a>)
        </center>
    </td>
</tr></table>
    
<u>Galaxy Morphology</u>
    
- [Aragon-Calvo et al. 2020, MNRAS, 498, 3713](https://ui.adsabs.harvard.edu/abs/2020MNRAS.498.3713A/abstract) $-$ obtain structural parameters via self-supervised learning

    - **training**: re-produce parameters used to generate artificial galaxies
    - **architecture**: semantic autoencoder
    
<table><tr>
    <td width=640>
        <img src="images/Aragon_Semantic_Autoencoder.png">
        <center>
            <br>
            Figure 6. Some galaxy pairs confidently identified as mergers.<br>
            (From <a herf="https://ui.adsabs.harvard.edu/abs/2018MNRAS.479..415A/abstract">Ackerman et al. (2017)</a>)
        </center>
    </td>
</tr></table>    
    
<u>Serendipitous Detection</u>
       
- [Lanusse et al. 2018, MNRAS, 473, 3895](https://ui.adsabs.harvard.edu/abs/2018MNRAS.473.3895L/abstract) $-$ spot gravitational lenses

    - **training**: 20,000 LSST-like observations
    - **architecture**: CNN + ResNet
    
<table><tr>
    <td width=480>
        <img src="images/DeepLens_Fig8.png">
        <center>
            <br>
            Figure 7. Some images correctly identified as hosting lenses.<br>
            (From <a herf="https://ui.adsabs.harvard.edu/abs/2018MNRAS.473.3895L/abstract">Lanusse et al. (2018)</a>)
        </center>
    </td>
</tr></table>    

- [Dekany \& Grebel al. 2020, ApJ, 898, 46](https://ui.adsabs.harvard.edu/abs/2020ApJ...898...46D/abstract) $-$ spot fundamental-mode RR Lyrae stars 

    - **training**: 10$^7$$-$10$^8$ near-IR photometric time-series
    - **architecture**: RNN
    
<table><tr>
    <td width=480>
        <img src="images/Dekani_Fig4.png">
        <center>
            <br>
            Figure 8. Spatial distribution of the objects used as training set.<br>
            (From <a herf="https://ui.adsabs.harvard.edu/abs/2020ApJ...898...46D/abstract">Dekany \& Grebel al. (2020)</a>)
        </center>
    </td>
</tr></table>    

    
<u>Image reconstruction</u>
       
- [Schawinski et al. 2017, MNRAS, 467, 110](https://ui.adsabs.harvard.edu/abs/2017MNRAS.467L.110S/abstract) $-$ image denoising

    - **training**: 4550 nearby SDSS galaxies
    - **architecture**: GAN
    
<table><tr>
    <td width=800>
        <img src="images/Schawinski_Fig2.png">
        <center>
            <br>
            Figure 9. Degraded image details reconstructed by a GAN.<br>
            (From <a herf="https://ui.adsabs.harvard.edu/abs/2017MNRAS.467L.110S/abstract">Schawinski et al. (2017)</a>)
        </center>
    </td>
</tr></table>    

<u>Cosmological simulations</u>
    
- [Rodríguez et al. 2018, ComAC, 5, 4](https://ui.adsabs.harvard.edu/abs/2018ComAC...5....4R/abstract) $-$ create fast cosmological simulations

    - **training**: 10 independent L-PICOLA simulation boxes
    - **architecture**: GAN
    
<table><tr>
    <td width=800>
        <img src="images/Rodriguez_Fig1.png">
        <center>
            <br>
            Figure 10. Comparison between the results of a N-body simulation ($top$) and from a GAN ($bottom$).<br>
            (From <a herf="https://ui.adsabs.harvard.edu/abs/2018ComAC...5....4R/abstract">Rodríguez et al. (2018)</a>)
        </center>
    </td>
</tr></table>    


# Neural Networks (NN) Components

## Generic Architecture and Neurons
<a id='Generic_Architecture_and_Neurons'></a>

<table><tr>
    <td width=640>
        <img src="images/Generic_Architecture.png">
        <center>
            <br>
            Figure 11.  A simple, generic <i>feedforward</i> deep neural architecture.  Neurons of a layers might be connected to al the neurons of the neighboring layers, like in this example (<i>fully-connected</i> layers), or not.<br>
            (Adapted from <a href="https://ui.adsabs.harvard.edu">here</a>)
        </center>
    </td>
</tr></table>

<font size=3><u>**Nomenclature**</u><font>
    
**Neuron**: A simple element in a network, carrying 1 value.
    
**Layers**: A collection of neurons activated simulataneusly.<br>
    
    Layers are represented diffrently depending on the architecture.
    E.g., fully-connected layers (as in the Figure above), appear as vertical stripes of neurons.
  
- **input layer**: the data
- **hidden layers**: the internal layers ("_hidden_" from the point of view of the NN user)
- **output layer**: the variable(s) of interest (e.g., class(es) or $y$)
    
    
    E.g., if we provide an image as input, each pixel is 1 neuron of the input layer.


Contemporary NNs contain hundreds to thousands of layers, with million to billion of neurons.

## Weights and Biases

The core of the functioning of any NN is how the **information flows** through a neuron.

<table><tr>
    <td width=640>
        <img src="images/Weights_and_Biases.png">
        <center>
            <br>
            Figure 12.  How the information is propagated through a neuron. Don't get confused with $\hat{y}$: in this image, it only represents the neuron's output, not the target variabe (e.g. the <i>class</i>)<br>
            (From <a href="https://ui.adsabs.harvard.edu">here</a>)
        </center>
    </td>
</tr></table>


1. **The first stage is <u>linear</u>:**<br>
    A neuron takes all the inputs (values) x$_i$ directed into it, multiplies each of them by a different _weight_ ($w_i$), and takes the sum.<br>
    Then, it adds a _bias_ ($b$).
<br>

2. **The second stage is (usually) <u>non-linear</u>:**<br>
    The summation is passed to an **activation function**.<br>
    The activation function acts as a filter, basically deciding when and how the information shall flow. 
    
<u>**Important**</u>

The _weights_ and _biases_ are <u>the</u> elements that are fit during the training of the model! 

Fitting a model means optimizing **all** the _weights_ and _biases_ within the NN, in order to **approximate** the desired output $y$ given a corresponding example $x$.

## Activation Functions

Activation functions are what make NNs so **efficient** as universal tools.

The introduce <u>non-linearities</u> $\rightarrow$ a NN can create an arbitrarily complex model.

They can be basically **any** _filter_-like function, but they better posses some features:

- **computationally inexpensive** $\leftarrow$ hence simple, since they get executed at each neuron

- **zero-centered** $\leftarrow$ not to shift values towards a preferential direction

- **differentiable** $\leftarrow$ because NNs work with [Backpropagation](#Backpropagation)

- **avoid vanishing when chained** $\leftarrow$ more correctly, we need to avoid vanishing gradients (see [Gradient Descent and Loss](#Gradient-Descent-and-Loss))


<table><tr>
    <td width=480>
        <img src="images/Activation_Functions.png">
        <center>
            <br>
            Figure 13.  A collection of commonly used activation functions<br>
            (Adapted from <a href="https://wandb.ai/lavanyashukla/vega-plots/reports/Natural-Language-Processing--Vmlldzo2Nzk2Ng">here</a>)
        </center>
    </td>
</tr></table>


## NN Architecture Variants

# Training NNs

## Gradient Descent and Loss

## Backpropagation

- and optimization algos

## Validation curves

- overfitting/underfitting

https://machinelearningmastery.com/learning-curves-for-diagnosing-machine-learning-model-performance/

# Supervised Learning

## CNNs

- architecture

- conv layers

- activation 

- pooling

- fully connected layers

- Dropout

- BatchNorm

## Transfer Learning

# Domain Adaptation

## Autoencoders

## GANs

- For deconvolution: Schawinski, Kevin, et al. 2017, "Generative Adversarial Networks recover features in astrophysical images of
galaxies beyond the deconvolution limit." arXiv preprint arXiv:1702.00403 (2017).

# Libraries


## TensorFlow

## Keras

- Sequential API

- Functional API