This Notebook is focused on reducing the data size while still retaining the important information

1) Filtering out just the peaks
2) Retaining Extreema Data

In [1]:
import pandas as pd
import numpy as np
from scipy import signal

dataframe = pd.read_csv("/home/sweedy/Desktop/MFA GUI/Datasets/dg4 _half laser_final data.tsv", sep='\t', names=['none', 'Fl', 'Ssc']).iloc[:20000]
#dropping off everything other than SSC
dataframe.drop('none', inplace=True, axis=1)
dataframe.drop('Fl', inplace=True, axis=1)
dataframe = dataframe.dropna()
#Set the index as columns
dataframe.reset_index(inplace=True)
dataframe['index'] = dataframe.index

In [2]:
def get_peaks(data):
    if len(data) == 0:
        print("Feedback : Peak function recieved empty dataset")
        return []
    peaks, _ = signal.find_peaks(data)
    return peaks

In [3]:
import plotly
import plotly.graph_objects as go
import plotly.express as px

First, I'm going to work with just the Ssc since it helps find :
1) Leading and Lagging Peaks 
2) Droplet Size
3) Encapsulated SSC

In [4]:
#plotting SSC
fig1 = go.Figure(
    data = px.line(dataframe, x='index', y = 'Ssc')
)
fig1.update_traces(line_color = 'rgb(218, 165, 32)')
fig1.update_layout({
    "showlegend" : True
})

In [6]:
#plotting peaks calculated in smaller window space and building a new dataframe
peak_indexes = get_peaks(dataframe['Ssc'])
ssc_peaks = dataframe['Ssc'].iloc[peak_indexes]

#plotting peak points
fig2 = go.Figure(
    data = px.line(ssc_peaks),
)
fig2.update_traces(line_color = 'rgb(72,61,139)')
fig2.update_layout({
    "showlegend" : True
})
#fig2.show()

In [7]:
print("Before SSC size  = ", len(dataframe['Ssc']))
print("After SSC size  = ", len(ssc_peaks))

#Combined View
combined_peaks = go.Figure(fig1.data + fig2.data)
combined_peaks.update_layout({
    "showlegend" : True
})

combined_peaks.show()

Before SSC size  =  19999
After SSC size  =  5693


Close Enough, But I feel like the lines aren't matching up as much as they should.
Moreover, the data points have significantly reduced which is not a good thing. 

In [8]:
fig3 = px.line(dataframe, x='index', y = 'Ssc')
fig3.add_trace(go.Scatter(x = peak_indexes, y = ssc_peaks, mode = 'markers'))
fig3.show()


So, I'm trying a new approach here

I figured, if the peaks function is smoothening the data out too much, it must be becuase it's ignoring all the lower points, so I'm thinking, if we included the lowest and highest points in every window, it should be enough information to find droplet sizes and other things.

In [9]:
def find_local_extreema(col_data):
    data = np.array(col_data)
    maxima = signal.argrelextrema(data, np.greater)
    minima = signal.argrelextrema(data, np.less)
    return maxima, minima

In [10]:
max_idx, min_idx = find_local_extreema(dataframe['Ssc'])
#x[max_idx[0]]
reduced_idx = np.append(max_idx[0], min_idx[0])
reduced_idx.sort()
print("Old SSC length : ", len(dataframe))
print("New SSC length : ", len(reduced_idx))
fig4 = px.line(dataframe, x='index', y = 'Ssc')
fig4.add_trace(go.Scatter(x = reduced_idx, y = dataframe['Ssc'].iloc[reduced_idx], mode = 'markers'))
fig4.show()

Old SSC length :  19999
New SSC length :  10959


In [11]:
#plotting min max points 
fig5 = go.Figure(
    data = px.line(dataframe['Ssc'].iloc[reduced_idx]),
)
fig5.update_traces(line_color = 'rgb(72,61,139)')
fig5.update_layout({
    "showlegend" : True
})


In [12]:
#Combined View
combined_extreema = go.Figure(fig1.data + fig5.data)
combined_extreema.update_layout({
    "showlegend" : True
})

combined_extreema.show()

Okay I'm satisfied with the closeness and there are a lot of grooves which is a good thing.

I think the next thing would be to test what happens if I try finding the peaks of the peaks