# 01 - Process Analysis
- What is APC adoption by the operators?
- Did the process improve?

*Analysis period: 2019-10-01 to 2020-12-09*

In [None]:
# import libraries
import os
import numpy as np
import pandas as pd
import datetime
import plotly.io as pio
from plotly.subplots import make_subplots
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.stats as stats

# configs
pd.options.display.float_format = "{:,.1f}".format
%matplotlib inline
plt.rcParams["figure.figsize"] = 10, 7
pio.templates.default = "plotly_white"

In [None]:
# import custom functions
from myLib import data_analysis

## Prepare data

In [None]:
data = pd.read_csv(
    "./data/raw/process-data-raw-1min.csv.gz", compression="gzip", index_col=0
)
data.head()

In [None]:
data.tail()

In [None]:
data.index = data.index.astype("datetime64[ns]")
data.info()

In [None]:
data.columns

In [None]:
# change column names to reference columns with dot notation
data.columns = data.columns.str.replace(".", "_", regex=False)
data.columns = data.columns.str.replace("-", "_", regex=False)

In [None]:
data.columns

## Measure the adoption of APC

In [None]:
data["Date"] = data.index.date
data.head()

In [None]:
# calculation to enable data to be rolled up to daily values
data["process_util"] = data.PROCESS_RUN_SIGNAL / (60 * 24)
data["apc_util"] = data.APC_MODE / (60 * 24)

In [None]:
data_util = data[["Date", "process_util", "apc_util"]].groupby(by=["Date"]).sum()
data_util.index = data_util.index.astype("datetime64[ns]")
data_util["Year Month"] = data_util.index.strftime("%Y-%m")
data_util.head()

In [None]:
data_util.columns = ["Process Utilisation", "APC Utilisation", "Year Month"]

In [None]:
data_analysis.plot_timeseries(
    df=data_util,
    y_traces=["Process Utilisation", "APC Utilisation"],
    title="Process and APC utilisation trends",
)

In [None]:
fig = px.box(
    data_util, x="Year Month", y="APC Utilisation", title="Plant adoption of APC"
)
fig.show()

- APC was commissioned in October 2019 and thus utilisation is low as expected.
- Utilisation was also low in April 2020 and June 2020, the first being due to COVID lockdown and the latter being plant shutdown.

In [None]:
filter_util = (
    (data_util["Year Month"] != "2019-10")
    | (data_util["Year Month"] != "2020-04")
    | (data_util["Year Month"] != "2020-06")
)
average_utilisation = data_util["APC Utilisation"][filter_util].mean()
print(
    f"Average APC utilisation, excluding Oct 2019, Apr 2020 and Jun 2020: {average_utilisation*100:0.2f}%"
)

In [None]:
fig = px.box(data_util[filter_util], y="APC Utilisation", title="Plant adoption of APC")
fig.show()

In [None]:
data_util.to_csv("./data/processed/apc_utilisation.csv.gz", compression="gzip")

## Are there process improvement?
- Did stability improve?
- Did throughput increase?

In [None]:
# filter data to include only when the main process was running
data_run = data[data.PROCESS_RUN_SIGNAL > 0]

In [None]:
data_run.head()

In [None]:
# Define APC ON and APC OFF periods based on APC Controller mode
ctrl_threshold = 0.5
data_run["period"] = "APC OFF"
data_run.period[(data_run.APC_MODE >= ctrl_threshold)] = "APC ON"

In [None]:
data_run.columns

- There were some cases where the tags went stale but quality information of the data was not included.
- Use shift function to remove stale data, i.e. where wieghtometer readings were "stuck" at exaxtly the same value.

In [None]:
data_run["check_cv_a"] = data_run.CNVYR_WT_A_READING.eq(
    data_run.CNVYR_WT_A_READING.shift()
)
data_run = data_run[data_run["check_cv_a"] == False]

In [None]:
data_run["check_cv_b"] = data_run.CNVYR_WT_B_READING.eq(
    data_run.CNVYR_WT_B_READING.shift()
)
data_run = data_run[data_run["check_cv_b"] == False]

In [None]:
# calculate the average of the two weightometers
data_run["CV_AVG"] = (data_run.CNVYR_WT_A_READING + data_run.CNVYR_WT_B_READING) / 2
feature = "CV_AVG"
x1 = data_run[feature][(data_run.period == "APC OFF") & (data_run[feature] >= 0)]
x2 = data_run[feature][(data_run.period == "APC ON") & (data_run[feature] >= 0)]
data_analysis.plot_graphs(
    x1,
    x2,
    data_run,
    feature,
    "Feed conveyor throughput split between 'APC off' and 'APC on'",
)
display(data_analysis.generate_stats(x1, x2))

- There are many times where the process was running, but no tons was fed to the plant.
- Based on the lower whisker of APC off, data will be filtered to be above 230 tph.

In [None]:
min_filter_tons = 230
x1 = data_run[feature][
    (data_run.period == "APC OFF") & (data_run[feature] >= min_filter_tons)
]
x2 = data_run[feature][
    (data_run.period == "APC ON") & (data_run[feature] >= min_filter_tons)
]
data_analysis.plot_graphs(
    x1,
    x2,
    data_run,
    feature,
    f"Feed conveyor throughput split between 'APC off' and 'APC on' where feed was more than {min_filter_tons}tph",
)
display(data_analysis.generate_stats(x1, x2))

- Show a more realistic result: 7.5% increase in throughput (based on mean) and 24% reduction in standard deviation.
- The results may be biased as there is on 12.2% of APC on data points available.
- From the histogram for APC on, the distribution is pushed to the maximum throughput.

## Check if the distributions are statistically different
- https://www.marsja.se/how-to-perform-a-two-sample-t-test-with-python-3-different-methods/
- https://www.marsja.se/how-to-perform-mann-whitney-u-test-in-python-with-scipy-and-pingouin/

In [None]:
from scipy import stats

In [None]:
# Checking the Normality of Data
stats.shapiro(x1)

In [None]:
stats.shapiro(x2)

- Here, the null hypothesis is that the data follows a normal distribution. 
- Thus, we can infer that the data from both groups is NOT normally distributed. Use Mann-Whitney U Test.


## When do you use Mann-Whitney U Test?
You can use the Mann-Whitney U test when your outcome/dependent variable is either ordinal or continous but not normally distributed.

In [None]:
stats.mannwhitneyu(x1, x2)

- p-value less than 0.05, thus distrubtions are statistically different.