<h1>This notebook is to demo the spectral processing module</h1>

In [None]:
# First we import the spectral processing module.
from SpectralProcessing import RamanProcessing as rp

# Along with some other modles
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import warnings
from pprint import pprint

In [None]:
# Next we need to creat a list of the files we want to process.
file_paths  =  ['Data\Fed_Sample_1.txt',
                'Data\Fed_Sample_2.txt',
                'Data\Fed_Sample_3.txt',
                'Data\Starved_Sample_1.txt',
                'Data\Starved_Sample_2.txt',
                'Data\Starved_Sample_3.txt']

# We will also need a list of ID's for each file.
sample_type =  ['Fed',
                'Fed',
                'Fed',
                'Starved',
                'Starved',
                'Starved']

In [None]:
# Using the list of files we will call 'readArrayFromFile()' to split each spectra in the
# files into an array of samples agains wavenumbers. This function also outputs a 1-D vector
# of wavenumbers and a 1-D vector of sample ID's corrisponding to the list we assinged earlyer.
# Note: this function is only able to convert .txt files of map grid spectral collectrion from
# the WiER 2 software pakage.
WN, array, sample_ID = rp.readArrayFromFile(file_paths, sample_type)

# From here we read the array of spectras into a data frame. The function needs a string to
# specify the name of the first column in the datframe. Each spectra in split into a 1-D 
# vector and stored in a single cell in the data frame.
df = rp.readArrayToDataFrame(array, 'Raw_array')

# Here we used the sample_ID vector to add a column with corrisponding labes for each spectra.
df['Sample_type'] = sample_ID

**Now we have a data frame with the raw spectras in one column and their corisponding sample ID in another column.**

In [None]:
# Print out the data frame to see our nely organised data
df

In [None]:
# To access a column of spectras we can stack them into an array.
spectras = np.stack(df['Raw_array'])

# This new array can be plotted.
plt.rcParams['figure.figsize'] = [18,10]
font = {'family' : 'DejaVu Sans',
        'weight' : 'normal',
        'size'   : 24}
plt.rc('font', **font)

plt.plot(WN, np.transpose(spectras))
plt.autoscale(enable=True, axis='x', tight=True)
plt.title('Raman Spectras')
plt.xlabel('Wavenumbers (CM$^{-1}$)')
plt.ylabel('Intencity (AU)') 
plt.show()

**This module uses the dataframe as a history of the spectra processing. The function 'addColumnToDataFrame()' allows an array to be added to the exsiting df (note: 'readArrayToDataFrame()' creats a new data frame each time so you cant add coluns using it).**

In [None]:
# Next lets smooth the data. The 'smooth()' function takes a column from our dataframe and
# applies a smoothing algorithum to the data. In this case we will used a fast fourior
# transform (FFT) to smooth the data.
smoothed_array = rp.smooth(df['Raw_array'], method = 'FFT', fourior_values = 250)

# Once tyhe data is porcess we can add it to the data frame. The first argument is the data
# frame we are using, the second is the array we want to add and the third is the name for
# the new column.
df = rp.addColumnToDataFrame(df, smoothed_array, 'Smoothed_array')

In [None]:
# Now we have two columns for spectras.
df

In [None]:
# We can now stack and plot this new column.
spectras_smoothed = np.stack(df['Smoothed_array'])

plt.plot(WN, np.transpose(spectras_smoothed))
plt.autoscale(enable=True, axis='x', tight=True)
plt.title('Smoothed Raman Spectras')
plt.xlabel('Wavenumbers (CM$^{-1}$)')
plt.ylabel('Intencity (AU)') 
plt.show()

**Using this method we can creat a whole processing pipeline with whatever steps we want in whatever order we want.**

**Lets take a look at a more complicated processing pipeline.**

In [None]:
# Lets take our smoothed data and normalise, balseline correct, despike and normalise (again).
df = rp.addColumnToDataFrame(df,
                             rp.normalise(df['Smoothed_array'],
                                          method = 'interp_area',
                                          normalisation_indexs = (895,901)),
                             'Normalized_array')

df = rp.addColumnToDataFrame(df,
                             rp.baselineCorrection(df['Normalized_array'],
                                                   method = 'ALS',
                                                   lam=10**5),
                             'Baseline_corrected_array')

df = rp.addColumnToDataFrame(df,
                             rp.removeCosmicRaySpikes(df['Baseline_corrected_array'],
                                                      threshold = 5),
                             'Despiked_array')

df = rp.addColumnToDataFrame(df,
                             rp.normalise(df['Despiked_array'],
                                          method = 'interp_area',
                                          normalisation_indexs = (895,901)),
                             'Baseline_corrected_normalized_array')

# Now we can plot the results of each stage in theis pipeline.
plt.plot(WN,np.transpose(np.stack(df['Normalized_array'])))
plt.autoscale(enable=True, axis='x', tight=True)
plt.title('Normalized Raman Spectras')
plt.xlabel('Wavenumbers (CM$^{-1}$)')
plt.ylabel('Intencity (AU)') 
plt.show()
plt.plot(WN,np.transpose(np.stack(df['Baseline_corrected_array'])))
plt.autoscale(enable=True, axis='x', tight=True)
plt.title('Baseline Corrected Raman Spectras')
plt.xlabel('Wavenumbers (CM$^{-1}$)')
plt.ylabel('Intencity (AU)') 
plt.show()
plt.plot(WN,np.transpose(np.stack(df['Despiked_array'])))
plt.autoscale(enable=True, axis='x', tight=True)
plt.title('Despiked Raman Spectras')
plt.xlabel('Wavenumbers (CM$^{-1}$)')
plt.ylabel('Intencity (AU)') 
plt.show()
plt.plot(WN,np.transpose(np.stack(df['Baseline_corrected_normalized_array'])))
plt.autoscale(enable=True, axis='x', tight=True)
plt.title('Normalized Baseline Corrected Raman Spectras')
plt.xlabel('Wavenumbers (CM$^{-1}$)')
plt.ylabel('Intencity (AU)') 
plt.show()

**This whole pipeline can be quickly implimented by passing the file name list and sample ID list to the 'quickProcess()' function**

In [None]:
# We can also assess the effectiness of each step by calculating the signal to noise ratio.
rp.signalToNoiseOfDataframe(df)

# It should be noted that the signal to noise calculation here uses the standard devaition
# as the noise and the squareroot of the mean as the siganl. Because of this squareroot it is
# bias towards smaller values so and data that is not normalised to 1.0 can not be compaiered
# accuratly.

In [None]:
# We can also plot the spectras seperated out by class using the 'plotSpectraByClass()' function.
# This function takes five arguments; the data frame to be used , the x axis, the column to plot,
# the spectra ids for the classes you want to plot and the spetcra ids coulmn which is the coulmn
# the spectra ids are stored in.
rp.plotSpectraByClass(df,
                      WN,
                      'Baseline_corrected_normalized_array',
                      set(df['Sample_type']),# The set gives us all uniqe entrys in this column.
                      'Sample_type')

# A simpliar function can also plot the principal component alysis for the given spectras.
rp.plotPCAByClass(df,
                  'Baseline_corrected_normalized_array',
                  set(df['Sample_type']),# The set gives us all uniqe entrys in this column.
                  'Sample_type',
                  principal_components=10,
                  PCs_plot=(0,1))

**The module also comes with functions to utalize machine learning (ML) for spectral analysis.**

In [None]:
# Warning are supresed as scikit learn can clog up the output when running multiple
# repeats of ML modles.
warnings.filterwarnings("ignore")

# Lets apply some ML models to our data. The function 'applyMachineLearingPredictors()'
# takes two positional arguments; fist is the column we want to test and the second is
# the sample ID's. A full list of the peramiter for each function can be found in the
# documentation.
CV_Train, CV_Test = rp.applyMachineLearingPredictors(df['Baseline_corrected_normalized_array'],
                                                     df['Sample_type'],
                                                     principal_components=10,
                                                     CV=10,
                                                     test_size=0.33)

# The output is two dictionarys of cross valiudation values for both a traing data set
# and test data set.
print('Cross-validation results for the training data set')
pprint(CV_Train)
print('')
print('Cross-validation results for the test data set')
pprint(CV_Test)

In [None]:
#The results can be better visulised via the 'dispayCVResults()' function.
rp.dispayCVResults(CV_Train)
rp.dispayCVResults(CV_Test)