<a href="https://colab.research.google.com/github/GDO-Galileo/do-voice-interaction/blob/error_correction/gdo_voicebot/grammar_correction_service/grammar_checker_model/Plot_Metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Plot Metrics**

This notebook will produce graphs of metrics based on the validation output of our [Grammar Checker Model](https://colab.research.google.com/drive/1_7RQQPkUHyF3ip5vCI0b2aOxSejOZcTv?usp=sharing).

The input file is in the format:

```
Processing Epoch Number: e
 Train loss: l
 [[tn fp]
  [fn tp]]

  Validation Accuracy: a

  Validation Correct Recall: rc
  Validation Incorrect Recall: ri
  Validation Total Recall: r

  Validation Correct Precision: pc
  Validation Incorrect Precision: pi
  Validation Total Precision: p

  Validation Correct F1: fc
  Validation Incorrect F1: fi
  Validation Total F1: f
```
Repeated for the amount of epochs listed. Only numbers are parsed for, so the labels of the data may be different, however if the order is changed then the `VAL_TYPES` constant below must be modified appropriately.

### **Example Metrics Files**

Some of our output metrics which can be used with this notebook can be found [here](https://imperialcollegelondon.box.com/s/phuc4dac1j7z7d8gg57zwsofmgszqi93).

### **Label Guide:**
* e = Epochs
* l = Loss
* tn = True Negatives
* fp = False Positives
* fn = False Negatives
* tp = True Positives
* a = Accuracy
* rc = Positive Recall
* ri = Negative Recall
* r = Macro-Average Recall
* pc = Positive Precision
* pi = Negative Precision
* p = Macro-Average Precision
* fc = Positive F1 Measure
* fi = Negative F1 Measure
* f = Macro-Average F1 Measure

In [None]:
##################################
#      Upload Metrics Files      #
##################################

# Change the names of files in 'Constants' section
from google.colab import files
uploaded = files.upload()

In [None]:
###################################
#             Imports             #
###################################

import sys
import re
import matplotlib.pyplot as plt

## **Parsing Files and Plotting Graphs**

Each file in the `FILE_NAMES` list is parsed for decimals. The list of numbers is then split into `NUM_VALS`, or the amount of metrics generated per epoch (including the epoch number). These are then separated into lists of each type of metric, for example a list `[1, 2, 3, 4, 5...]` of epoch numbers. These are then all plotted, with the **x axis** as the **number of epochs** and the **y axis** as **each other metric**.

In [None]:
###################################
#            Constants            #
###################################

# Name of each file to produce graphs for
FILE_NAMES = ["metrics-output-file.txt"]

# Names of each set of values
VAL_TYPES = ["Epochs", "Training Loss", "True Negatives", "False Positives",
             "False Negatives", "True Positives", "Accuracy", "Positive Recall",
             "Negative Recall", "Macro-Average Recall", "Positive Precision",
             "Negative Precision", "Macro-Average Precision", "Positive F1 Measure",
             "Negative F1 Measure", "Macro-Average F1 Measure"]

# Number of number values generated per epoch
NUM_VALS = len(VAL_TYPES)

In [None]:
##################################
#     Parse files for values     #
##################################

# Go through each uploaded inputfile to produce graphs
for inputfile in FILE_NAMES:

  # Take data from file
  file = open(inputfile, "r")
  data = file.read()
  file.close()

  # Matches all numbers not preceeded by 'F' (to exclude the 'F1'
  #   substring) or nans
  numbers = re.findall(re.compile("((?<!F)\d\.?\d*)|nan"), data)

  # Divide numbers into chunks with NUM_VALS elements (since this is the
  #   amount of numbers per epoch)
  epoch_values = []
  for i in range(0, len(numbers), NUM_VALS):
    epoch_values.append(numbers[i:i + NUM_VALS])

  # Convert to lists by type of value rather than by epoch
  epochs = len(epoch_values)
  value_types = [[] for _ in range(NUM_VALS)]
  for epoch_nums in epoch_values:
    for i in range(len(epoch_nums)):
      if epoch_nums[i] == '':
        value = None
      else:
        value = float(epoch_nums[i])

      value_types[i].append(value)

  # Show labeled graphs of each type of data with x being the first type of
  #   data (Epochs)
  for i in range(1, NUM_VALS):
    fig = plt.figure()
    plt.plot(value_types[0], value_types[i], '-')
    plt.plot(value_types[0], value_types[i], 'rs')
    fig.suptitle(inputfile)
    plt.xlabel(VAL_TYPES[0])
    plt.ylabel(VAL_TYPES[i])
    plt.show()