In [2]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

# Introduction to Machine Learning


##### Version 0.1 

***

By AA Miller (Northwestern/CIERA)  
24 February 2022

What is machine learning? 

<img style="display: block; margin-left: auto; margin-right: auto" src="images/AI_ML_DL.jpeg" width="450" align="middle">
<div align="right"> <font size="-3">(credit: <a href="https://www.manning.com/books/deep-learning-with-python">Deep Learning with Python, Chollet (2017)</a>) </font></div>

Machine learning is a sub-field of artificial intelligence (AI).

quick aside – for many many years I had this distinction backwards. I thought machine learning, in the form of simple learning models, such as the decision tree, was a small stride in the direction of AI – fully autonomous and *thinking* computers. This is **not correct**, AI is a very broad field, as we are about to see.

As a field, the goal of AI is to develop computers/machines that are capable of demonstrating intelligence.

During the first major wave of AI development, most researchers focused on "symbolic AI".

Symbolic AI involves the "hard coding" of rules to replicate intelligence. In other words, rules are codified such that the machine knows how to repond to all possible input. 

For example, early versions of chess video games were build on symbolic AI.

For every possible move made by the user, the computer had explicit instructions for what move it should make next.

<img style="display: block; margin-left: auto; margin-right: auto" src="images/chess.jpg" width="450" align="middle">
<div align="right"> <font size="-3">(credit: Microsoft) </font></div>

(Proof that I am old, I actually remember playing the chess game shown above even though most of you have never seen anything like this before)

Imagine now trying to build something that replicated human intelligence with symbolic AI. 

A human says:  
$~~~~$What's up? >> respond with: "Nothing much."  
$~~~~$What's up, dog? >> respond with: "How's it hanging, daddio?"  
$~~~~$Whhaaaaaaaaaaz up? >> respond with: "Hey buuuuuuuuuuudy."  

(I could keep going, but you're probably sick of these...)

It is possible to imagine both (1) crafting millions of lines of code to create a fairly intelligent machine, and (2) that this machine would still never replicate "general" intelligence in any meaningful way. 

Symbolic AI follows the standard paradigm of computer programing, a machine is (1) given a set of rules, (2) provided inputs/data, before (3) processing the inputs according to the rules in order to provide answers.

There is a deterministic output for every input. 

Machine learning represents a significant shift from this design. In machine learning problems, the computer is presented with the inputs and the outputs, and the algorithm determines the rules. 

Machine learning can solve problems where the input-output relationship is not well understood. 

Machine learning goes beyond synbolic AI – we are explicitly asking the computer to learn relationships and rules that we cannot explicitly write down a priori.

As a result, machine learning is an exceptionally powerful tool for classification. 

Deep learning is a new(-ish), and now totally dominant, subfield of machine learning, that emphasizes the construction of several layers of increasingly meaningful representations of the data. 

(there are several lectures devoted to this later this week so forgive the incredibly short intro)

Using the [ten hundred most common words](https://xkcd.com/simplewriter/) in the English language$^\dagger$, I would describe machine learning as

> making computers recognize information

$^\dagger$ See [Up-goer five](https://xkcd.com/1133/).

In slightly more technical language, machine learning can be thought of as the development of extremely complex pattern recognition algorithms: 

<img style="display: block; margin-left: auto; margin-right: auto" src="images/BishopBook.jpg" width="350" align="middle">
<div align="right"> <font size="-3">(credit: <a href="https://link.springer.com/book/9780387310732">Pattern Recognition and Machine Learning, Bishop (2006)</a>) </font></div>

I think it is also important to think of machine learning as an engineering task. As we said earlier, the computer learns rules to connect inputs to outputs, but we have virtually no means of understanding the rules that provide these connections.

Nevertheless, you should be excited about machine/deep learning as summarized in this image:

<img style="display: block; margin-left: auto; margin-right: auto" src="images/ImageNet.png" width="550" align="middle">
<div align="right"> <font size="-3">(credit: <a href="https://www.manning.com/books/probabilistic-deep-learning">Probabilistic Deep Learning. Durr, Sick, Marina (2020)</a>) </font></div>

As of 2015, machine learning models have surpassed humans at 2D image recognition tasks!

(if this doesn't sound impressive, imagine creating symbolic AI to identify what it sees in an image. We will actually do this later this week...) 

There are tons of amazing accomplishments over the past several years: 

$~~~$Self-driving cars

$~~~$Near-human-level speech recognition

$~~~$Machine translation

$~~~$Digital assistants

$~~~$DeepMind's [AlphaGo](https://deepmind.com/research/case-studies/alphago-the-story-so-far)

So, did we all make a mistake in pursuing astro PhDs? Is it time to quit and devote all our energy/effort to machine learning? 

About those self-driving cars...

<img style="display: block; margin-left: auto; margin-right: auto" src="images/Headline.png" width="650" align="middle">
<div align="right"> <font size="-3">(credit: CNN) </font></div>

AI has undergone periods of rapid development and extreme hype before...

Followed by periods of disillusionment and signficantly reduced funding ("AI winter")

That being said, it is clear that Deep Learning is here to stay. 

Furthermore, I suspect these techniques are going to become increasingly important in astronomy, so it's important to know how they work. 

This session is nearly entirely devoted to understanding the how of machine learning. 

Before we delve into that, let's define some terminology and discuss some challenges for astronomical applications.

### Terminology

*Features*$^\dagger$ – measured properties of a source (numerical or categorical)  
$~~~~$Often stored as vectors, entire data sets are a vector of vectors

*Labels*$^\ast$ – outcomes, i.e., thing the algorithm predicts

$^\dagger$Until now, we have been referring to features as "inputs"

$^\ast$Similarly, these were previously called "outputs"

Most machine learning is *supervised*. 

Features and labels are provided to the algorithm, and a mapping between the two is devised.

In *unsupervised* learning, the labels are not known. 

(I sincerely hope you aren't bored, I'll be giving full talks on unsupervised and supervised learning later today...)

*Training set* – subset of the data with known features and labels  
$~~~~$(for supervised applications)

#### Metrics

True positive (TP) = + classified as +

False positive (FP) = - classified as + (type I error)

True negative (TN) = - classified as -

False negative (FN) = + classified as - (type II error)

## Problem 1

What is more detrimental when building a model: false positives or false negatives?

*Take a few minutes to discuss with your partner*

Ultimately, this depends on the problem at hand. If you are building a model to detect cancer, false negatives are really really bad. If you are building a model to find extremely metal poor stars, and then you obtain a 10 hr spectrum on a 10 m telescope to confirm your candidates, false positives are really really bad.

### Machine Learning for Astronomy

The history of astronomy is a long story of classification. Essentially, point a telescope at some new location in the sky, and there is a decent chance you might find something that has never been seen before. Now, figure out how this new thing relates to all the things that you already know.

Taxonomy of the cosmos ;)

Aside - astronomy is (and has been) an observational/experimental led field. The typical pattern is obsservers find some weird thing, and then the theorists try to explain what is going on (there are obviously exceptions, predictions for kilonovae prior to the LIGO detection of a neutron star-neutron star merger are a recent example).

This makes us very different from physics, where theory generates predictions that then lead the observations (e.g., Higgs boson, general relativity, etc.).

Machine learning is amazing, completely utterly amazing, at classification.  
Astronomy is built on classification.  
So...

High-fives all around.

$~$

Adam, why are you pausing?

...

No seriously. 

**Why** aren't you talking?



IS SOMETHING WRONG?

While I just argued that astronomy is an observationally-led, classification-concerned discipline, ultimately, like physicists, we care about the development of a physical understanding of how the Universe works.

For example, we don't care that this is classified as a blue galaxy.

<img style="display: block; margin-left: auto; margin-right: auto" src="images/m106_colombari_3568.jpg" width="450" align="middle">
<div align="right"> <font size="-3">(credit: R. Colombari & R. Gendler, data: NASA/ESO/NAOJ/Giovanni Paglioli) </font></div>

We care that blue galaxies are actively forming stars. 

The classification comes before the inference, but the inference is what truly teaches us about the universe.

Fundamentally, machine learning is not built to tell us that blue means active star formation.

In other words, 

machine learning $\longleftrightarrow$ prediction

astronomy $\longleftrightarrow$ inference

As astronomers, we are data producers. We design experiments to try and understand the universe.

Machine learners$^\dagger$ are data consumers. They take as much, and whatever, they can get.

I contend that this distinction matters.

(people that are much smarter than I are working on incorporating more elements of inference into machine learning. At first I was skeptical, but I am slowly starting to be convinced. We will hear more about this later this week.)

$^\dagger$Totally a phrase I just made up.

## Other Considerations for Astronomy

*pssst* 

Our data sets are not that large. 

Yes, we always talk about the Vera C. Rubin Observatory and big data, but...

Instagram is a thing that exists

Speaking of small data sets – 

Training sets are hard. 

Why is this?  
Labels are expensive.

Bias.  

Pattern matching cannot inherently know about bias, and so biases in the training set will be propagated to the final model predictions. 

(this has tons of VERY IMPORTANT implications for machine learning in society that I do not have time to address today)

Every survey is guaranteed to have a biased training set. Any new survey is likely to be observing the sky in some unique fashion, and thus probing parameter space in some new way. This new survey mode will therefore find sources that were not present in the previous survey. 

For example, a new survey that goes 1.5 mag deeper than the previous survey will find a lot of galaxies at higher redshifts. Furthermore, at fixed redshift, the new survey will find more intrinsically faint galaxies. These high redshift and intrinsically faint galaxies, will not have counterparts in a training set based on the previous survey, and therefore they will be classified incorrectly.


This is known as *sample selection bias* and it is nasty. 

## Problem 2

Based on very sound theoretical reasoning, you expect to find the following in a sample of 100 stars: 60 orange stars, 30 purple stars, and 10 grey stars.

Data from another survey includes 1000 orange stars, 200 purple stars, and 14 grey stars. 

Furthermore, 14 of the orange stars, 7 of the purple stars, and 5 of the grey stars have features that are missing. 

You need to build a model to classify stars as either orange, purple, or grey. What do you do?

*Take a few minutes to discuss with your partner*

There is no correct answer here. Given what we know, I'd do the following: 

  1.  throw away the sources with the missing data
  2.  fit an unsupervized clustering model to the data to identify whether classes are easily separable
  3.  sample from the training set to create a distribution that matches the theoretical prediction
  4.  fit the machine learning model and make final predictions

## Conclusions

Machine learning algorithms are extremely powerful  
$~~~~~~$hard to imagine your work will not be impacted by ML

Machine learning provides predictions; we want inference

All astronomical training sets are biased – very difficult to properly interpret (some) predictions as a result