This is an example for running the FrESCO library with the P3B3 becnhmark data from the ECP-Candle [repository](https://github.com/ECP-CANDLE/Benchmarks). Included within the FrESCO repository is a preformatted version of the P3B3 dataset for model training. If you've not already done so, go to the data directory and unzip the dataset using the command `$ tar -xf P3B3.tar.gz`.

In the `configs/` directory are sample `model_args.yml` files for the three sample datasets, using the default settings in these files, we are ready to train a model.

In [1]:
import fresco
import argparse

The FrESCO library is typically run from the command line with arguments specifying the model type and model args, so we'll have to set them up manually for this notebook.

In [2]:
    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    _ = parser.add_argument("--model", "-m", type=str, default='ie',
                        help="""which type of model to create. Must be either
                                IE (information extraction) or clc (case-level context).""")
    _ = parser.add_argument('--model_path', '-mp', type=str, default='',
                       help="""this is the location of the model
                               that will used to make predictions""")
    _ = parser.add_argument('--data_path', '-dp', type=str, default='',
                        help="""where the data will load from. The default is
                                the path saved in the model""")
    _ = parser.add_argument('--model_args', '-args', type=str, default='',
                        help="""file specifying the model or clc args; default is in
                                the fresco directory""")

We are going to train a multi-task classification model on the P3B3 dataset, so we'll specify an `information extraction` model and point to the P3B3 model args file. 

In [3]:
args = parser.parse_args(args=['-m', 'ie', '-args', '../configs/P3B3_args.yml'])

With these arguments specified, just need a few imports before we're ready to train our model. 

In [4]:
from fresco import run_ie

from fresco.validate import exceptions

In [5]:
run_ie.run_ie(args)

Validating kwargs in model_args.yml file
Word embeddings file does not exist; will default to random embeddings.
Loading data and creating DataLoaders
Loading data from ../data/P3B3/
Num workers: 4, reproducible: True
Training on 7500 validate on 1500

Defining a model
Creating model trainer
Training a mthisan model with 2 cuda device


epoch: 1

training time 10.13
Training loss: 0.861854
        task:      micro        macro
      task_1:     0.5047,     0.0645
      task_2:     0.5552,     0.4172
      task_3:     0.8787,     0.4795
      task_4:     0.4424,     0.2964

epoch 1 validation

epoch 1 val loss: 1.17360581, best val loss: inf
patience counter is at 0 of 5
        task:      micro        macro
      task_1:     0.5453,     0.0602
      task_2:     0.5733,     0.4001
      task_3:     0.8787,     0.4784
      task_4:     0.4640,     0.3259

epoch: 2

training time 8.88
Training loss: 0.871779
        task:      micro        macro
      task_1:     0.5504,     0.1189
      


training time 9.16
Training loss: 0.494371
        task:      micro        macro
      task_1:     0.7556,     0.3816
      task_2:     0.8983,     0.8958
      task_3:     0.8931,     0.5128
      task_4:     0.5675,     0.5025

epoch 16 validation

epoch 16 val loss: 0.52033003, best val loss: 0.54817979
patience counter is at 0 of 5
        task:      micro        macro
      task_1:     0.7400,     0.2499
      task_2:     0.9100,     0.9091
      task_3:     0.8867,     0.5360
      task_4:     0.5500,     0.5287

epoch: 17

training time 9.15
Training loss: 0.436824
        task:      micro        macro
      task_1:     0.7676,     0.4251
      task_2:     0.9008,     0.8984
      task_3:     0.8981,     0.5690
      task_4:     0.5903,     0.5422

epoch 17 validation

epoch 17 val loss: 0.49516671, best val loss: 0.52033003
patience counter is at 0 of 5
        task:      micro        macro
      task_1:     0.7467,     0.2715
      task_2:     0.9240,     0.9226
      task_3:

epoch 31 val loss: 0.30449256, best val loss: 0.31153715
patience counter is at 0 of 5
        task:      micro        macro
      task_1:     0.9107,     0.7133
      task_2:     0.9293,     0.9274
      task_3:     0.9647,     0.9207
      task_4:     0.7913,     0.7717

epoch: 32

training time 9.15
Training loss: 0.277118
        task:      micro        macro
      task_1:     0.8923,     0.7991
      task_2:     0.9212,     0.9188
      task_3:     0.9749,     0.9323
      task_4:     0.7969,     0.7785

epoch 32 validation

epoch 32 val loss: 0.29556886, best val loss: 0.30449256
patience counter is at 0 of 5
        task:      micro        macro
      task_1:     0.9147,     0.7189
      task_2:     0.9333,     0.9318
      task_3:     0.9687,     0.9284
      task_4:     0.8060,     0.7910

epoch: 33

training time 9.15
Training loss: 0.293950
        task:      micro        macro
      task_1:     0.8956,     0.8082
      task_2:     0.9201,     0.9177
      task_3:     0.9760


training time 9.18
Training loss: 0.175752
        task:      micro        macro
      task_1:     0.9297,     0.8785
      task_2:     0.9297,     0.9278
      task_3:     0.9843,     0.9580
      task_4:     0.8865,     0.8737

epoch 47 validation

epoch 47 val loss: 0.25636195, best val loss: 0.25057713
patience counter is at 1 of 5
        task:      micro        macro
      task_1:     0.9467,     0.9064
      task_2:     0.9313,     0.9299
      task_3:     0.9740,     0.9408
      task_4:     0.8780,     0.8670

epoch: 48

training time 9.17
Training loss: 0.184777
        task:      micro        macro
      task_1:     0.9368,     0.8913
      task_2:     0.9261,     0.9243
      task_3:     0.9841,     0.9576
      task_4:     0.8881,     0.8767

epoch 48 validation

epoch 48 val loss: 0.25074327, best val loss: 0.25057713
patience counter is at 2 of 5
        task:      micro        macro
      task_1:     0.9453,     0.9040
      task_2:     0.9333,     0.9316
      task_3: