### Signal Processing with numpy and scipy

Now that we've had an introduction to numpy, we'll get to see it in action with another member of the standard scientific computing stack: scipy

Scipy is a python library for scientific and technical computing, and its use dates back many years to the beginning of when Python started to become popular. In fact, numpy and scipy share some of the original developers, and were developed alongside each other from the start.

Scipy enables the use of the *ndarray* object within numpy for scientific applications that span everything from high performance computing, to signal analysis, to mathematical methods. Understanding the fundamentals of numpy then allows us to explore these more complicated concepts within scipy.

Scipy is enabled by several packages within the library as a whole. Some of these include:

- constants - common physics and mathematical constants
- fft - utilities for computing the Discrete Fourier Transform
- ndimage - for image processing
- optimize - optimization algorithms
- stats - common statistics functions
- signal - for signal processing functions

### Scipy and Signal Analysis

We are going to explore the use of scipy in signal analysis applications today in this notebook, to give you a flavor for how numpy and scipy work together.

Note that we'll also do some rudimentary matplotlib, to introduce you a little to that library.

Our goal will be to go over some basics of signal processing, and we'll note how we're passing data around and how that ties back to numpy.

I'll give the math where appropriate, but no math is strictly needed! Feel free to skip; there will be no math on the test.

In [None]:
# remember, you can download libraries in jupyter with '!pip install <library name>'
import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
from scipy import signal
import pywt

In [None]:
# note how we can use pandas to super easily read the data in
data_fft = pd.read_csv('PJME_hourly.csv')
# then we create an array for our data in numpy; this is the format that matplotlib is expecting. more on this next week
y = np.array(data_fft.PJME_MW)
x = data_fft.index
date_array = pd.to_datetime(data_fft.Datetime)
plt.plot(date_array,y)
plt.xlabel('Date',fontsize=20)
plt.ylabel('MW Energy Consumption',fontsize=20)

In [None]:
# Next up is pre-processing
# here we will de-trend our data (what does this mean? look at the documentation!) 
# this is a useful preprocessing step that will make our analysis methods more accurate
y_detrend = signal.detrend(y)
plt.plot(date_array, y_detrend,color='black',label='Detrended Signal')
plt.plot(date_array,y, color='green',label='Raw Signal')
plt.legend()
plt.xlabel('Date',fontsize=20)
plt.ylabel('Temperature',fontsize=20)

In [None]:
data_fft.head()

In [None]:
# Next, we can compute a Fast Fourier Transform
FFT =np.fft.fft(y_detrend)
new_N=int(len(FFT)/2) 
f_nat=1
new_X = np.linspace(10**-12, f_nat/2, new_N, endpoint=True)
new_Xph=1.0/(new_X)
FFT_abs=np.abs(FFT)
plt.plot(new_Xph,2*FFT_abs[0:int(len(FFT)/2.)]/len(new_Xph),color='black')
plt.xlabel('Period ($h$)',fontsize=20)
plt.ylabel('Amplitude',fontsize=20)
plt.title('(Fast) Fourier Transform Method Algorithm',fontsize=20)
plt.grid(True)
plt.xlim(0,200)

### Task

Now that you've seen that code in action, let's break it apart a little! I'm going to leave you with a series of steps that you can take to understand what's going on with the code. 

1. data_fft contains our data, read in through pandas. What can we learn about our data? Apply what we did in the previous weeks to that data frame. What are you able to discover?
2. Why is it important to preprocess our data? Is this true with just this data set, or all data sets? Think back to any stats class you might have taken; what effect does normalization have on our analyses?
3. Ignore the plotting. Just look at the line: `y = np.array(data_fft.PJME_MW)`. What is that doing? What is the data type of `data_fft.PJME_MW`? What about of `y`?
4. What are the differences induced by performing this line: `date_array = pd.to_datetime(data_fft.Datetime)`? Why do you think we needed to do that?
5. Break this line down by what it does: `new_X = np.linspace(10**-12, f_nat/2, new_N, endpoint=True)`. How are we using a numpy method - `linspace` to help us?