# Introduction 

Signal detection theory is a means to quantify the ability to differentiate between valid information (signal) and noise. This is done by defining a boundary (or threshold) that separates signal from noise. When the threshold is set very low noise might inadvertently be classified as signal (i.e. many false positives (FP)). Vice versa, when the threshold is set very high signal might be classied as noise (many false negatives, FN)). Systematically varying the threshold provides a global view on how discriminable signal and noise are.

![title](fig/roc.png)

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

## Prerequisites

- $d' = (\mu_S - \mu_N) / \sigma$ quantifies how far two normal distributions with means $\mu_S$ and $\mu_N$ and common standard deviation $\sigma$ are apart
- ROC analysis
- basic Python skills

## Objectives

- practice ROC analysis on mock data
- use random distributions in Python
- visualize data in Python
- beautify figures in Python

# Notebook setup

** Instructions **
- Import numpy, scipy, and matplotlib
- configure inline plots

As you go along, return here and add any additional module that is needed

# Problem 1: Normal distributions

** Instructions **
- Draw 10000 samples each from 2 normal distributions (hint: np.random.normal) with different means and standard deviations
- Plot histograms of the two distributions in one plot (hint: plt.hist; experiment with parameters bins, normed, histtype, color, alpha, linewidth, label)
- Also plot the probability density function of the two distributions in the same plot
- Add a legend to the plot (hint: plt.legend; if necessary consider the parameter 'loc')
- Add a vertical line representing the decision criterion (hint: plt.axvline)

# Problem 2: ROC analysis 

** Instructions **
- Draw samples from a normal distribution with mean 10 and std 2.5
- Draw samples from another normal distribution with std 2.5
- Vary the mean of the second distribution from 11 to 19 in steps of 2
- For each value of the second mean calculate an ROC curve (i.e. true positive rate vs. false positive rate, for a varying decision criterion) and plot all ROC curves in the same plot as dots
- For each value of the second mean also calculate the ROC curve based on the normal pdf (instead of: samples from the pdf) and plot as lines. (hint: scipy.stats.norm.cdf, scipy.stats.norm.sf)
- For each value of the second mean calculate d'
- add axes labels, title and a legend to the plot
- How does the plot change depending on the number of samples drawn from the distributions? Also consider a different number of noise and signal samples.

** Possibly helpful numpy functions **
- np.min, np.max, np.linspace, np.arange, np.zeros, np.zeros_like, np.mean