# Introduction

The idea is to analyze the behavior of the conductivity sensor data. This is a sensor that determines whether the condensate going to the boiler is contaminated, with values 50 us/cm it is known that the solution is contaminated, however, the room operator commented that when stopping the condensate pump the value rose, so the objective of this notebook is to determine the algorithm to implement it in the control system and that it does not throw these erroneous values.

A flowsheet of this process is shown in the portfolio pdf document.

# Libraries

In [1]:
import pandas as pd
import plotly.graph_objects as go

# Data loading

In [2]:
df=pd.read_excel('PLC_data.xlsx')

In [3]:
# The excel file containing the data from December 2024 and the first week of January 2025 
# with conductivity sensor data and the amperage of the pump data is loaded
df.head()

Unnamed: 0,TimeStamp,.[YODURO:RSLinx Enterprise:YODURO_4.A3352CIT010.Val],.[YODURO.SD1:RSLinx Enterprise:YODURO_4.A3381BBA046_Current.Val]
0,2024-12-02 10:00:00,4.246216,4.095981
1,2024-12-02 10:05:00,3.886816,4.164521
2,2024-12-02 10:10:00,4.003168,4.23306
3,2024-12-02 10:15:00,4.377427,4.3016
4,2024-12-02 10:20:00,4.146902,4.37014


# Data Cleaning and Processing

In [4]:
# The columns are renamed, since they come with the name of the tag, to make the analysis easier
rename_dict = {'TimeStamp': 'Timestamp', '.[YODURO:RSLinx Enterprise:YODURO_4.A3352CIT010.Val]': 'Conductivity', 
               '.[YODURO.SD1:RSLinx Enterprise:YODURO_4.A3381BBA046_Current.Val]': 'Pump amperage'}

df = df.rename(columns=rename_dict)
df.head()

Unnamed: 0,Timestamp,Conductivity,Pump amperage
0,2024-12-02 10:00:00,4.246216,4.095981
1,2024-12-02 10:05:00,3.886816,4.164521
2,2024-12-02 10:10:00,4.003168,4.23306
3,2024-12-02 10:15:00,4.377427,4.3016
4,2024-12-02 10:20:00,4.146902,4.37014


# Analysis and results

In [8]:
# To facilitate the analysis we graph the trends
fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df['Timestamp'], 
    y=df['Conductivity'], 
    mode='lines+markers', 
    name='Conductivity',
    line=dict(color='blue')))

fig.add_trace(go.Scatter(
    x=df['Timestamp'], 
    y=df['Pump amperage'], 
    mode='lines+markers', 
    name='Pump amperage',
    line=dict(color='red')))

# Graph configuration
fig.update_layout(
    title='Time series of conductivity and pump amperage',
    xaxis_title='Time',
    yaxis_title='Value',
    template='plotly',
    hovermode='x unified')
# Show the graph
fig.show()
fig.write_html("graph_interactive.html")

Interactive Graph: 
You can view the interactive version of the graph [here](https://diegobarriosp.github.io/Conductivity-sensor-analysis/graph_interactive.html).

If you zoom in on the graph it can be seen that the conductivity value increases every time the pump is stopped.
The idea now is to define a threshold to determine under what amperage this behavior occurs.

In [6]:
# We filter the data frame for all values where the conductivity is over 50, to analyze the data that is within this situation
df_filtered = df[df['Conductivity'] > 50]
df_filtered.head()

Unnamed: 0,Timestamp,Conductivity,Pump amperage
25,2024-12-02 12:05:00,224.837387,0.042639
26,2024-12-02 12:10:00,224.721008,0.08928
186,2024-12-03 01:30:00,224.927582,0.196728
219,2024-12-03 04:15:00,224.97052,0.040062
220,2024-12-03 04:20:00,224.770859,0.102534


In [7]:
# We verify that the data is in the correct format, the process variables as numeric data
df_filtered.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 274 entries, 25 to 10088
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Timestamp      274 non-null    datetime64[ns]
 1   Conductivity   274 non-null    float64       
 2   Pump amperage  274 non-null    object        
dtypes: datetime64[ns](1), float64(1), object(1)
memory usage: 8.6+ KB


Now we transform the pump amperage column to a numerical value and obtain the statistical summary of each column

In [8]:
df_filtered = df_filtered.copy()
df_filtered['Pump amperage'] = pd.to_numeric(df_filtered['Pump amperage'], errors='coerce')
df_filtered.describe()

Unnamed: 0,Conductivity,Pump amperage
count,274.0,274.0
mean,222.48264,0.102164
std,14.531916,0.078375
min,62.54298,0.0
25%,224.765408,0.047903
50%,224.855301,0.082721
75%,224.90398,0.141875
max,224.974136,0.494574


With this information we know that according to the data we have, high conductivity due to a drop in pump amperage is with a maximum value of 0.5 amps, so we define this as our threshold.

A rule is generated in the control system, when the pump amperage value is < 0.5, the previous conductivity value is maintained to give false alarms to the operator.