# Annotation (Labeling) of Multivariate Time Series Data
<strong>Abraham C. Montes</strong> <br>
<a href="https://www.linkedin.com/in/abraham-c-montes-6661a841/">LinkedIn</a>|<a href="https://www.abraham-montes.com/">Personal Site</a><br>
The University of Texas at Austin | <a href="https://drilling.utexas.edu/">RAPID research consortium</a>

The purpose of this notebook is to show how to annotate (label) multivariate time-series data by using interactive widgets available in Matploblib. The tool leverages two classes: TimeSeries and PreAnnotator. The first one encapsulates the time series data and the methods to manipulate it. The latter is an auxiliary class to pre-annotate the data with a custom logic. In the case of drilling data, for example, this pre-annotation logic consists of a first-order logic inference tool. <br>


<strong>Library requirements:</strong> Ensure you have installed PyQt5 through your preferred package handler (e.g., pip, conda, etc.).

______________________________________________________

## Case 1: Annotation of time series data contained in one CSV file

#### Step 1: Implement the correct pre-annotation logic.
Open the PreAnnotator class and make sure the $\texttt{annotate( )}$ function has the correct logic for pre-annotation. This logic is usually a set of if-then rules, but you may implement more sophisticated logic. 

#### Step 2: Create a PreAnnotator object.

In [1]:
from PreAnnotator import PreAnnotator
pa          = PreAnnotator(  )

#### Step 3: Create a TimeSeries object

In [2]:
from TimeSeries import TimeSeries
ts          = TimeSeries(   pathCSV="dfz_status_08.csv", #path to the CSV file containing the time series data
                            dataframe=None, #if the time series is in a DataFrame in memory, use this input instead and set pathCSV to None.
                            preAnnotator=None, #PreAnnotator object. If you want to annotate from scratch, without any pre-annotation logic, set this to None.
                            renameDimensionsJSON="renameDict_prod.json", #path to JSON file containing new names for the columns in the CSV. If you wish to work with the same CSV columns, set to None.
                            trimDimensionsOfInterest=True, #If true, only the columns in the JSON file will be kept. The rest will be deleted from the TimeSeries object.
                            timeColumn="date", #Name of the CSV column containing the time index. 
                            timeAxisFormat="ISO8601", #Format of the time index in the CSV file. See documentation for options.
                            unitsRow=False, #If true, the second row of the CSV will be ignored.
                            deleteNans=False, #Whether you wish to delete NaNs.
                            nanPlaceHolder=-999.25, #A placeholder for NaNs. If there is no placeholder, set to None.
                            labelColumn="well_activity2" ) #Name of the CSV column containing the labels. If the CSV does not contain labels, set to None.


  if (not timeColumn is 'index'):
  from pandas.core import (


In [2]:
from TimeSeries import TimeSeries
ts          = TimeSeries(   pathCSV="example.csv", #path to the CSV file containing the time series data
                            dataframe=None, #if the time series is in a DataFrame in memory, use this input instead and set pathCSV to None.
                            preAnnotator=pa, #PreAnnotator object. If you want to annotate from scratch, without any pre-annotation logic, set this to None.
                            renameDimensionsJSON="renameDict.json", #path to JSON file containing new names for the columns in the CSV. If you wish to work with the same CSV columns, set to None.
                            trimDimensionsOfInterest=True, #If true, only the columns in the JSON file will be kept. The rest will be deleted from the TimeSeries object.
                            timeColumn="TIME", #Name of the CSV column containing the time index. 
                            timeAxisFormat="ISO8601", #Format of the time index in the CSV file. See documentation for options.
                            unitsRow=True, #If true, the second row of the CSV will be ignored.
                            deleteNans=True, #Whether you wish to delete NaNs.
                            nanPlaceHolder=-999.25, #A placeholder for NaNs. If there is no placeholder, set to None.
                            labelColumn=None ) #Name of the CSV column containing the labels. If the CSV does not contain labels, set to None.

  if (not timeColumn is 'index'):
  from pandas.core import (


Finished rig state update. A total of 51093 states were added to the DF, which in turn, has 51093 rows


#### Step 4: Annotate!
We simply have to call the $\texttt{annotate( )}$ method. You may set up a parameter called $\texttt{hoursPerPlot}$. This parameter controls the amount of time, in hours, plotted at a time. For instance, if you select 2, the first plot will display the first 2 hours of data. A message box will appear asking if the annotation of this plot has ended. If you click on the button, the plot will be closed and the next plot will pop up. The process will continue until the system has displayed the entire time series in chunks of 2 h. <br><br>
<strong>Annotating</strong>: The process of annotating is very simple. If there are pre-existing labels, you will see various rectangles on top of the plot. The color of each rectangle is associated with one label. You can see these labels and colors on a panel of buttons on the left. <br><br>
<strong>Modifying Annotations:</strong>You can click on the edges of any rectangle and drag it to expand or shrink it. The final time interval covered by the rectangle will be annotated with its associated label. Empty spaces will simply contain no label in the dataset.<br><br>
<strong>Deleting Annotation:</strong>You may select a rectangle by clicking on it. If you press the delete key, it will dissapear, and the time interval covered by it assigned no label in the dataset.<br><br>
<strong>Creating Annotations:</strong>You may select an activity "brush" by clicking on a label on the left panel. You can see a message on bottom that tells what brush is currently selected. This selected activity will be used when creating a new rectangle.<br>
You may create a new rectangle by pressing the 'n' key. The new rectangle will appear from the time index the cursor is at when pressing the 'n' key to the left edge of the next rectangle on the right. Its color will be the one associated with the selected activity brush.<br><br>

<strong>New Labels:</strong>If you want to utilize more labels than the ones in the pre-annotated data, or if you have no pre-annotations, you can use the $\texttt{annotate( activityCodes )}$ parameter when calling the $\texttt{annotate( )}$ method. Simply pass a list with the activity codes you want to use. These will appear in the left panel so you can select them to create new rectangles. <strong style="color: red"><br>WARNING:</strong> The activities must be numbers. 

In [None]:
%matplotlib qt
#This magic command will enable the qt backend. Plots will appear in a new window.
ts.annotate( hoursPerPlot=4 )

#### Step 5: Save
You may save the annotated TimeSeries as a CSV file by simply calling the $\texttt{save( outPath )}$ method.

In [None]:
ts.save( "annotated.csv" )

#### Step 6: Visualize annotations
You may visualize your annotations by calling the $\texttt{plotSummaryLabels( )}$ method.

In [None]:
%matplotlib inline  #This deactivates the QT backend. Plots will be displayed on the notebook instead of separate windows.
ts.plotSummaryLabels( )

______________________________________________________

## Case 2: Splitting of a CSV file into chunks.
You may also process a CSV file and split it into chunks of a certain length. This is particularly useful when the TimeSeries data is excessively long or you are only interested in a fraction of the CSV file.  

#### Step 1: Create the PreAnnotator and TimeSeries objects like the previous case.

In [None]:
from PreAnnotator import PreAnnotator
from TimeSeries import TimeSeries
pa          = PreAnnotator(  )
ts          = TimeSeries(   pathCSV="example.csv", #path to the CSV file containing the time series data
                            dataframe=None, #if the time series is in a DataFrame in memory, use this input instead and set pathCSV to None.
                            preAnnotator=pa, #PreAnnotator object. If you want to annotate from scratch, without any pre-annotation logic, set this to None.
                            renameDimensionsJSON="renameDict.json", #path to JSON file containing new names for the columns in the CSV. If you wish to work with the same CSV columns, set to None.
                            trimDimensionsOfInterest=True, #If true, only the columns in the JSON file will be kept. The rest will be deleted from the TimeSeries object.
                            timeColumn="TIME", #Name of the CSV column containing the time index. 
                            timeAxisFormat="ISO8601", #Format of the time index in the CSV file. See documentation for options.
                            unitsRow=True, #If true, the second row of the CSV will be ignored.
                            deleteNans=True, #Whether you wish to delete NaNs.
                            nanPlaceHolder=-999.25, #A placeholder for NaNs. If there is no placeholder, set to None.
                            labelColumn=None ) #Name of the CSV column containing the labels. If the CSV does not contain labels, set to None.

#### Step 2: Split the TimeSeries object.
You may split it into chunks by calling the $\texttt{split( segmentLength )}$ method. The parameter it receives is the length of each chunk. For instance, you may want each chunk to be 24 h long. This method will return a list of TimeSeries objects.

In [None]:
chunks      = ts.split( segmentLength=24 )

#### Step 3: Annotate the chunk of interest.
You may not annotate the chunk of interest, save the dataset once it's been annotated, and visualize the labels, exactly as the previous case. In this example, we are selecting the third chunk.

In [None]:
%matplotlib qt  #This magic command will enable the qt backend. Plots will appear in a new window.
chunks[ 2 ].annotate( hoursPerPlot=2 )

In [None]:
chunks[ 2 ].save( "annotated.csv" )
chunks[ 2 ].plotSummaryLabels( )