# Analytical Vacuum Gauge Classifier
## Given a pressure reading input returns whether it is experiencing coupling
Attempt to classify vacuum gauges automatically through pre-processing, curve fitting and
a comparison of fits via their Mean Square Error.

* [Setup](#setup)
    * [Environment](#env)
    * [Plotting Functions](#plotters)
    * [Data Retrieval](#retrieval)
* [Preprocessing](#preprocess)
    * [Normalization](#normalization)
    * [Interpolation](#interpolaton)
    * [Masking](#masking)
    * [Fourier Transform](#fft)
    * [Constrained Inverse Fourier Transform](#ifft)
* [Classification](#class)
    * [Curve fitting](#fit)
    * [Plotting](#plot)
    * [All-In-One Classifier](#allin1)
    * [Multi-Gauge Plotting](#multi)

#  <a id='setup'> Setup </a>
### <a id='env'> Environment </a>
Run every time to get appropriate libraries

In [None]:
%run BackEnd_Plotters.ipynb
%run BackEnd_DataProcessing.ipynb
%run BackEnd_K-Neighbour.ipynb
%run BackEnd_Analytical_Classifiers.ipynb
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import os 
import pytimber
import tkinter as tk
from scipy.optimize import curve_fit
from scipy.signal import find_peaks
from scipy.signal import argrelextrema
from scipy.signal import exponential
from scipy.signal import medfilt
import ipywidgets as widgets
import asyncio

import math
from scipy import signal
db = pytimber.LoggingDB()
plt.style.use('fivethirtyeight')





***

# Non-Linear Fitting Approach
Since the shape for our simple case is non-linear, we can try to find the optimal parameters to fit the shape of our pressure readings. For this we select a decay function and a 2nd order polynomial. In order to reduce sensitivity to noise, we can pre-proecess the data to eliminate holes and ensure an even-spacing between sucessive data points. 

Here we achieve this by eliminating high-frequency noise in the Fourier Domain and then returning to the time domain to obtain a smooth function. We can then fit our decay funtion (corresponding to normal behaviour) and our polynomial (corresponding to coupling). If the polynomial has a lower mean square error then we classify the function as exhibiting some form of coupling.

# <a id='preprocess'> Pre-Processing </a>
### <a id='retrieval'> Data retrieval </a>
Sample Vacuum Gauge data through PyTimber, returns time (x-axis), pressure_readings (y_axis) and overlays the beam intensity

In [None]:
gauge_id = "VGPB.623.4R8.B.PR"
fillNo = 2216

pressure_readings,\
time_readings,\
beam_time,\
beam_energy = retrieve_gauge_data(gauge_id, fillNo,show_plot=True)



### <a id='normalization'> Normalization </a>
As we are concerned with the shape and form of the pressure rather than the value of the pressure, normalization between 0 and 1 allows for more reasonable comparisons between different gauges. 

In [None]:
pressure_readings,\
beam_energy = normalize_y(time_readings,
                          beam_time,
                          pressure_readings,
                          beam_energy,
                          show_plot=True)

### <a id='interpolaton'> Interpolation </a>

The pressure values are interpolated to have an equally spaced x-axis and then it is possible to plot them between 0 and 1 without losing information. The choice to normalize between 0 and 1 is less meaningful than the normalization along the y-axis, since this may correspond to different overall durations (and hence more information will be contained for longer durations)

In [None]:
time_readings,pressure_readings,beam_time,beam_energy = interpolate_readings(pressure_readings, time_readings, beam_time, beam_energy,
                                   show_plot=True)


### <a id='masking'> Masking </a>
We are only concerned with the pressure response after the intensity has ramped up, since this corresponds to there being bunches in the LHC. Hence we used the maximum of the intensity, which occurs after the LHC is filled to the desired level, as our threshold.

In [None]:
mask,\
threshold = double_threshold_energy_masking(time_readings, 
                             beam_time,
                             beam_energy,
                             show_plot=True)

### <a id='fft'> Forward Fourier Transform </a>
With a normalized, interpolated dataset we can now safely transform to the Fourier Domain to obtain the constituent sine-wave components of our signal. 

In [None]:
pressure_transform, spectrum, deltaT = forward_fourier_transform(time_readings,
                                                                 pressure_readings,
                                                                 show_plot=True)

### <a id='ifft'> Constrainted Inverse Fourier Transform </a>
Using a manually-tuned threshold, we can remove high-frequency components of the original signal. When selected correctly, these should correspond to noise - though some information will be lost. Once these components are removed, we can apply an inverse transform to return to the time-domain with a new (smooth) function. 

Note that this approach may suffer from Gibb's Phenomenon at sharp changes in pressure.

In [None]:
time_constrained, signal_constrained = filtered_inverse_fourier_transform(pressure_transform,
                                                                          deltaT,
                                                                          spectrum[0],
                                                                          40,
                                                                          show_plot = True)

#  <a id='class'> Classificaton  </a>

###  <a id='fit'>Curve fitting: </a>
Using the noise-constrained curve from our Inverse Fourier transform, we can now try to fit a decay function as well as a 3rd order polynomail to the signal in order to attempt to classify it. 

The mean square error is te average distance between the fit and the original signal, a smaller value means it is a better fit overall. Therefore if the signal is decaying (pressure is dropping) as we might expect under normal (non-coupling) circumstances, the decay function should fit better. This can therefore be used for classification.

In [None]:
print("\n\033[1mFit for Constrained Signal\033[0m: ")
poly_fit, decay_fit, coupled = fit_curves(time_constrained,
                                signal_constrained,
                                mask,
                                verbose=True)

print("\n\033[1mFit for Unconstrained Signal\033[0m: ")
poly_fit, decay_fit, coupled = fit_curves(time_readings,
                                pressure_readings,
                                mask)

### <a id='plot'> Plotting <a id='plot'>
Dislay entire fitting scheme on one plot, the curve fits act on the constrained signal

In [None]:
plot_analytical_classifier(time_readings,
                           pressure_readings,
                           time_constrained,
                           signal_constrained,
                           poly_fit,
                           decay_fit,
                           mask)

### <a id='allin1'> All-In-One Curve Fitting Classifier Function </a>
Cal entire classification pipeline from one function, allows for classification of many gauges

In [None]:
analytical_classifier(gauge_id,
                      fillNumber=2216)

###  <a id='multi'>Multi Gauge </a>
Plot for many gauges extracted from a CSV file, required a probe id column.

In [None]:
df = pd.read_csv(r'CompleteProbeCatalogue.csv')
limit_slider = widgets.IntSlider(
    value=0,
    min=0, max=df.shape[0],
    step=1,
    description='Limit:',
    orientation='horizontal',
    readout=True,
    readout_format='d'
)
display(limit_slider)


In [None]:


limit = limit_slider.value
for index, row in df.iterrows():
    gauge_id = row['Probe ID']
    try:
        analytical_classifier(gauge_id,row['Fill'])
        print("Probe %s Fill: %d, label:%s"%(gauge_id,int(row['Fill']),row['Steepness']))
        if index >= limit:
            break
    except:
        print("Problem with %s"%gauge_id)

Plot for unlabelled gauges using PyTimber

***

# <a id='naive'>Naive Approach: Derivatives </a>
Here we take a naive approach to the problem using simple calculus. For the case where there is a second peak after the beam energy ramps up - which suggests some form of coupling - we can check whether there is a 2nd maximum. If there is a 2nd maximum (that is larger than the 1st) then we classify it as coupled.

After applying a smoothing function (using a mean convolution), we take the 1st derivative to find all turning points (gradient = 0) and then use the 2nd derivative to confirm whether a particular turning point is a maximum or not. If the 2nd maximum is greater than the 1st, we found coupling. 

This has several limitations: <br />
<pre> 1) A loss of information in smoothing </pre>
<pre> 2) Failure to find minima/maxima due to 'holes' in data  </pre>
<pre> 3) A sensitivity to our choice of smoothing and neighbour parameters  </pre>
The resulting algorithm has a ~60% accuracy for gauges we previously labeled as following this pattern.

In [None]:
def VG_analyzer_naive(time_data, pressure_data,smoothing=50,neighbours=10,plot=True):
    if type(pressure_data) is not np.ndarray: #is not the right type
        raise AnalyzerError("Analyzer FAILED - Expected numpy.ndarray but got %s"%type(pressure_data))
        
    if np.min(pressure_data) != 0 or np.max(pressure_data) != 1: #is not normalized already
        pressure_data = ( pressure_data - np.min(pressure_data) ) /  ( np.max(pressure_data) - np.min(pressure_data))
    pressure_raw = np.copy(pressure_data)
    pressure_data = smooth(pressure_data,smoothing) 

    pressure_1st_deriv = np.diff(pressure_data, n=1) #1st order
    pressure_1st_deriv = smooth(pressure_1st_deriv,smoothing) #w. smoothing
    
    pressure_2nd_deriv = np.diff(pressure_data, n=2) #2nd order
    pressure_2nd_deriv = smooth(pressure_2nd_deriv,smoothing) #w. smoothing
    
    turning_points = {"Maximum":[],"Minimum":[],"Saddle_Point":[]}
    try:
        discovery = -1
        # Check neighbouring points to enforce a minimum width
        for i in range(neighbours,len(pressure_1st_deriv)-neighbours):
            if i <= (discovery + neighbours): # Only find a turning point once!
                continue      
            if ( (pressure_1st_deriv[i-neighbours:i] > 0).all() and (pressure_1st_deriv[i+1:i+neighbours] < 0).all() ) or \
               ( (pressure_1st_deriv[i-neighbours:i] < 0).all() and (pressure_1st_deriv[i+1:i+neighbours] > 0).all() ):   
                if pressure_2nd_deriv[i] == 0:
                    turning_points["Saddle_Point"].append(i)
                elif pressure_2nd_deriv[i] > 0:
                    turning_points["Minimum"].append(i)
                else:
                    turning_points["Maximum"].append(i)  
                discovery = i
    except:
        raise AnalyzerException("Index Out of Bounds, dataset size %s too small for the neighbour value %s"%(len(pressure_data,neighbours)))
    display(turning_points)
    
    if plot:
        f, ax1 = plt.subplots(1)
        ax1.plot(time_data,pressure_raw,color='black',label="Raw")
        ax1.plot(time_data,pressure_data,alpha=0.7,label="Smoothed")
        pytimber.set_xaxis_date()
        ax1_1 = ax1.twinx()
        ax1_1.scatter(time_data[:-1],pressure_1st_deriv,color='orange',label="1st Deriv")
        ax1_1.scatter(time_data[:-2],pressure_2nd_deriv,color='r',label="2nd Deriv")
        for key in turning_points:
            for loc in turning_points[key]:
                ax1_1.axvline(x=time_data[loc],color='black')
        f.legend()
        f.show()
    for index in range(0,len(turning_points["Maximum"])-1):
        if len(turning_points["Maximum"]) == 1:
            break
        elif turning_points["Maximum"][index] < turning_points["Maximum"][index+1]:
            print("COUPLING DETECTED: Successive maximum [%d/%d,%d/%d] is larger than previous"%(turning_points["Maximum"][index],len(pressure_data),turning_points["Maximum"][index+1],len(pressure_data)))
            return True
    return False

def smooth(y, box_pts):
    box = np.ones(box_pts)/box_pts
    y_smooth = np.convolve(y, box, mode='same')
    return y_smooth

class AnalyzerError(Exception):
    pass