# Extracting a signal from a noisy dataset

In this worksheet, we'll look at how a weak signal can be extracted from a noisy background. To extract the signal, we use the property of Poisson fluctuations that the standard deviation for a measurement $\lambda$ is $\sigma = \sqrt{\lambda}$.  For background reading, consult section 6.5 of Ryden & Peterson.

We begin by setting up our work environment.

In [None]:
from __future__ import print_function, division
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

We'll also define a custom class of tools `noisySignal`. `Shift-click` in the following cell to initialize this class.  Don't worry if you aren't familiar with all of the code in this cellâ€”we will just be using it.

In [None]:
# %load noisy_signal.py
################################################################################
# Edward Brown
# Michigan State University
#
# Generates a signal with Poisson noise
#
################################################################################

class noisySignal:
    
    """
    Constructs a dataset consisting of a gaussian signal and a constant 
    noise level.  The dataset then consists of a net intensity (signal+noise) 
    drawn from a poisson distribution with that net intensity.
    """
    
    from numpy.random import poisson
    from numpy import exp
        
    def __init__(self,background=3.0,signal=1.0,center=5.0,width=3.0):
        """
        sets the noise amplitude, signal amplitude, and signal width.
        """
        self._background_amplitude = background
        self._signal_amplitude = signal
        self._signal_width = width
        self._signal_center = center

    def make_dataset(self,t):
        """
        Generates the dataset
        
        Parameters
        ----------
            t   := [array-like] points in time where data is collected
            
        Returns
        -------
            Array of Poisson-limited signal + noise
        """
        s = self.poisson(lam=self.signal(t),size=t.size)
        s += self.poisson(lam=self.background(),size=t.size)
        # now for each t, draw from Poisson with mean s
        return s
        
    def signal(self,t):
        """
        Generates the signal.
        
        Parameters
        ----------
            t   := [array-like] points in time where data is collected
            
        Returns
        -------
            Array of signal values, before application of Poisson statistics
        """
        a = self._signal_amplitude
        tau = self._signal_width
        mu = self._signal_center
        return a*self.exp(-(t-mu)**2/2.0/tau**2)
    
    def make_background(self,t):
        """
        Generates background with Poisson fluctuations.
        
        Parameters
        ----------
            t   := [array-like] points in time where data is collected
            
        Returns
        -------
            Array of Poisson noise
        """
        s = self._background_amplitude
        return self.poisson(lam=s,size=t.size)

    def amplitude(self):
        """
        Returns value of signal amplitude
        """
        return self._signal_amplitude
    
    def width(self):
        """
        Returns value of signal width
        """
        return self._signal_width
        
    def center(self):
        """
        Returns value of signal center
        """
        return self._signal_center
    
    def background(self):
        """
        Returns amplitude of noise
        """
        return self._background_amplitude


If you look in the cell above, you will see a function
```python
def __init__(self,background=3.0,signal=1.0,center=5.0,width=3.0):
```
This tells us how we create a new object of type `noisySignal`.  If we write
```python
sig = noisySignal()
```
we will generate a dataset `sig` with a background of 4.0 and a signal with amplitude 1.0, centered at $x=5$ with a width of 3.0.

To adjust the parameters of the dataset, we'll write
```python
sig = noisySignal(background=2.0,signal=0.5,center=0.0,width=1.0)
```
and create a dataset `sig` that has a signal amplitude of 0.5 (`signal=0.5`) and a noise amplitude of 2.0 (`noise=2.0`).  That is, the "background" has an amplitude 4 times greater than our signal.  We'll center our "image" at $x=0$ (`center=0.0`) and give it a width of 1.0.

In [None]:
sig = noisySignal(background=2.0,signal=0.5,center=0.0,width=1.0)

We'll then set `x` to contain an array of points at which we sample the signal. Our total field of view will go from -5.0 to 5.0, and we'll sample with 101 points.

In [None]:
x = np.linspace(-5.0,5.0,101)

Here is a plot of the sigal and the background, but *without* any statistical fluctuations in photon counts.

In [None]:
plt.plot(x,sig.signal(x)+sig.background(),color='0.5',label='signal and background',drawstyle='steps-mid')
plt.plot(x,sig.signal(x),color='red',label='signal',drawstyle='steps-mid')
plt.hlines(sig.background(),x.min(),x.max(),linestyle='dotted')
plt.xlim(-5,5)
plt.ylim(-0.1,3.0)
plt.xlabel('x')
plt.ylabel('signal, background')
plt.legend(frameon=False,loc='center left')

Once we add in the (Poisson) fluctuations to the photon counts, however, the plots will look much different.

Before proceeding, since we are going to be making a lot of plots of various datasets, it would be nice to avoid typing in the same code over and over. Let's define a function `nice_plot` to do this.  We'll also make a function to figure out the plot limits for us.

In [None]:
def plot_boundaries(x,y,margin=0.05):
    x0, x1, y0, y1 = x.min(), x.max(), y.min(), y.max()
    dx = x1-x0
    dy = y1-y0
    padx = dx*margin
    pady = dy*margin
    return x0-padx,x1+padx,y0-pady,y1+pady

def nice_plot(x,y,xlabel='',ylabel='',title=''):
    """
    Makes a plot of our image
    
    Arguments
    ---------
        x, y := points to be plotted
        xlabel := label for the x-axis (default is nothing)
        ylabel := label for the y-axis (default is nothing)
        title  := title for the entire plot (default is nothing)
    """
    
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    x0,x1,y0,y1 = plot_boundaries(x,y)
    plt.xlim(x0,x1)
    plt.ylim(y0,y1)
    plt.title(title)
    plt.plot(x,y,drawstyle='steps-mid')    

By the way, you might be wondering what
```python
    """
    Makes a plot of our image
    
    Arguments
    ---------
        x, y := points to be plotted
        xlabel := label for the x-axis (default is nothing)
        ylabel := label for the y-axis (default is nothing)
        title  := title for the entire plot (default is nothing)
    """
```
does. This lines between the `"""` is a *docstring*: a block of text describing our function.  It is very good practice to write these descriptions. `shift-click` in the next cell to see how these docstrings are used.

In [None]:
help(nice_plot)

By writing these docstrings, you make it easier for collaborators to use your code!

Now let's look at our data (signal and background, with Poisson fluctuations).  Can you see the signal?

In [None]:
y = sig.make_dataset(x)
nice_plot(x,y,xlabel='x',ylabel='intensity',title='data and background')

Here is a plot of just the background. Compare with the plot `data and background` above.  Can you tell which is which?

In [None]:
y = sig.make_background(x)
nice_plot(x,y,xlabel='x',ylabel='intensity',title='pure backgound')

## Repeated "exposures" to increase the signal-to-noise
Now we'll simulate the effect of repeated exposures.  The random fluctuations will increase in size as $\sqrt{N}$ whereas the signal will increase as $N$, where $N$ is the number of exposures.  As a result, given enough exposures, we ought to see the signal appear.

The following function does the folding; the variable `n` stores the number of exposures $N$.

In [None]:
def take_exposures(x,n,ds):
    """
    repeatedly sums datasets plots the results.
    
    Arguments
    ---------
    x := points at which the dataset is sampled
    n := number of exposures
    ds := <noisySignal> object to generate datasets
    """
    y = np.zeros(x.size)
    for ex in range(n):
        y += ds.make_dataset(x)
    nice_plot(x,y,xlabel='x',ylabel='intensity',title='N exposures = {0:<3d}'.format(n))

Let's try it with 2 exposures.

In [None]:
take_exposures(x,2,sig)

Not much improvement.  Let's try 3 exposures.

In [None]:
take_exposures(x,3,sig)

Not much improvement yet: let's try 10 exposures.  Before doing this, **predict** the expected amplitude of the fluctuations and signal.

In [None]:
take_exposures(x,10,sig)

Describe the amplitude of the fluctuations and signal amplitude for this case $N=10$. Show that it agrees with Poissonian statistics.

**Enter your response in this cell**

Hmm...let's keep increasing the number of exposures, with $N = 30, 100, 300$.

In [None]:
take_exposures(x,30,sig)

In [None]:
take_exposures(x,100,sig)

In [None]:
take_exposures(x,300,sig)

Now the signal is apparent.

From section 6.5 of Ryden & Peterson, what is the expected signal-to-noise $S/N$ for a background-limit signal for $N=30,100,300$? How does it compare with the three plots above?

**Enter your response in this cell.**