In [None]:
###############################################################################
# The Institute for the Design of Advanced Energy Systems Integrated Platform
# Framework (IDAES IP) was produced under the DOE Institute for the
# Design of Advanced Energy Systems (IDAES), and is copyright (c) 2018-2022
# by the software owners: The Regents of the University of California, through
# Lawrence Berkeley National Laboratory,  National Technology & Engineering
# Solutions of Sandia, LLC, Carnegie Mellon University, West Virginia University
# Research Corporation, et al.  All rights reserved.
#
# Please see the files COPYRIGHT.md and LICENSE.md for full copyright and
# license information.
###############################################################################

# Simple Data Quality Control Example

Before using plant data in process models, quality control and fault detection analysis is recommended to identify potential data issues (i.e., missing or corrupt data) and data points that are not suitable for the intended analysis (i.e., abnormal plant behavior). This notebook demonstrates basic quality control analysis using a small data set. The analysis is run using [Pecos](https://pecos.readthedocs.io).

Pecos is an open-source Python package designed to monitor performance of time series data, subject to a series of quality control tests. The software includes methods to run quality control tests defined by the user and generate reports which include test results and graphics. The analysis also produces "clean data" which removes data points that failed quality control inspection.

## 1.  Read Data

For this example, the data is loaded from a csv file into a pandas DataFrame using the function `read_csv`. To use this data within Pecos, the data must have a timestamp index.  The index is converted to a timestamp using the `parse_dates` argument. 

The pandas `display.width` option is increased to 100 (the default value is 80) to make the quality control results easier to read in the Jupyter notebook.

In [None]:
import pandas as pd

pd.set_option("display.width", 100)

# Read plant data
data = pd.read_csv("simple_data.csv", index_col=0, parse_dates=True)
print(data)

## 2.  Run Quality Control Analysis

The Pecos `PerformanceMonitoring` class is used to run the analysis.  The first step is to create an instance of the PerformanceMonitoring object and populate that object with the time series data.

In [None]:
import pecos

pm = pecos.monitoring.PerformanceMonitoring()
pm.add_dataframe(data)

The following basic quality control tests can then be run:

* ``check_timestamp`` to see if data is monotonically increasing and sampled every 10 minutes (600 seconds)
* ``check_missing`` to identify missing data
* ``check_corrupt`` to identify corrupt data values of -999
* ``check_range`` to see if data is between -5 and 5
* ``check_outlier`` to see if data is within 3 standard deviations of the mean
* ``check_delta`` to identify data that changes by less than 1 within a 30 minute moving window (1800 seconds)

Pecos includes additional quality control tests. Users can also define custom quality control functions to be used in Pecos. 

In [None]:
pm.check_timestamp(frequency=600)
pm.check_missing()
pm.check_corrupt(corrupt_values=[-999])
pm.check_range(bound=[-5, 5])
pm.check_outlier(bound=[-3, 3])
pm.check_delta(bound=[1, None], window=1800)

One advantage of using Pecos is that the results from individual quality control tests are collected in a table that can be exported to a text file and included in reports (as shown in Step 3).  The results are stored in `pm.test_results`.  This DataFrame is updated each time a quality control test is run.  

In [None]:
print(pm.test_results)

In [None]:
import pytest

assert pm.test_results.shape[0] == 8
assert pm.test_results["Timesteps"].sum() == 14

The quality control test results are also used to produce "clean data" which removes data points that failed quality 
control inspection (replaced by NaN).  The cleaned data is stored in `pm.cleaned_data` and is generated based on current test results. 
Data points that do not pass quality control inspection can be replaced by various means (interpolation, data from 
a duplicate sensor, values from a model) before using the data for further analysis.
Data replacement strategies are generally defined on a case-by-case basis. 
If large sections of the data failed quality control tests, the data might not be suitable for use.

In [None]:
print(pm.cleaned_data)

In [None]:
assert pm.cleaned_data["A"].isna().sum() == 7
assert pm.cleaned_data["B"].isna().sum() == 3
assert pm.cleaned_data["C"].isna().sum() == 3
assert pm.cleaned_data["D"].isna().sum() == 0

A boolean mask, stored in `pm.mask`, indicates which data points passed quality control inspection. The mask can be used to compute a quality control index (QCI) which indicates the percent of data points that pass quality control tests. 

In [None]:
QCI = pecos.metrics.qci(mask=pm.mask)
print(QCI)

## 3.  Generate a Report

Results can be included in HTML or LATEX formatted reports.  The ``plot_test_results`` function creates a graphic for each variable that includes a quality control test failure, highlighting data points that failed a test.
The ``write_monitoring_report`` generates an report (HTML format by default) that includes the test results summary, quality control index, and graphics. 
The following example creates a HTML report file in current working directory named `simple_report.html`.
The images are encoded into the HTML file using the `encode=True` option, the images can also be linked to image files.

In [None]:
test_results_graphics = pecos.graphics.plot_test_results(
    data=pm.data, test_results=pm.test_results
)

filename = pecos.io.write_monitoring_report(
    data=pm.data,
    test_results=pm.test_results,
    metrics=QCI.to_frame("QCI"),
    test_results_graphics=test_results_graphics,
    encode=True,
    filename="simple_report.html",
)

In [None]:
assert len(test_results_graphics) == 3
assert "simple_report.html" in filename

The HTML report file can be opened locally using an internet browser.  The following lines of code display the report within the Jupyter notebook.

In [None]:
from IPython.core.display import HTML

HTML(filename=filename)