# Getting started with DeepMatcher

Note: you can run **[this notebook live in Google Colab](https://colab.research.google.com/github/sidharthms/deepmatcher/blob/master/examples/getting_started.ipynb)** and use free GPUs provided by Google. 

To enable GPU support in your Colab Notebook, go to `Edit → Notebook Settings` and set `Hardware accelerator` to `GPU`.


In [None]:
# Run this code if you're inside Colab.
!pip install -q http://download.pytorch.org/whl/cu80/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
!pip install -q --process-dependency-links git+https://github.com/sidharthms/deepmatcher

In [0]:
# Get sample data.
!mkdir -p sample_data
!wget -qnc -P sample_data https://raw.githubusercontent.com/sidharthms/deepmatcher/master/examples/sample_data/amz_goog_train.csv
!wget -qnc -P sample_data https://raw.githubusercontent.com/sidharthms/deepmatcher/master/examples/sample_data/amz_goog_validation.csv
!wget -qnc -P sample_data https://raw.githubusercontent.com/sidharthms/deepmatcher/master/examples/sample_data/amz_goog_test.csv

In [1]:
# Check if GPU is available
import torch
torch.cuda.is_available()

True

In [2]:
import deepmatcher as dm

In [3]:
# Process data. Downloads word vectors if necessary. Note that this can take several minutes.
train, validation, test = dm.process(
    path='sample_data',
    train='amz_goog_train.csv',
    validation='amz_goog_validation.csv',
    test='amz_goog_test.csv',
    ignore_columns=('left_id', 'right_id'))


Load time: 3.7080047791823745
Vocab time: 13.488945204764605
Metadata time: 1.5047426847741008
Cache time: 2.229101464152336


## Train simple DL model (SIF model)

In [4]:
# Construct SIF DL model and train.
sif_model = dm.MatchingModel(attr_summarizer='sif')
sif_model.run_train(
    train,  # Training dataset.
    validation,  # Validation dataset.
    epochs=15,  # Number of times to go over the entire training data.
    batch_size=16,  # Number of labeled examples to use for each training step.
    best_save_path='sif_model.pth',  # Path to save the best model.
    pos_weight=1.3)  # The ratio of the weight of positive examples (matches) to weight of negative examples (non-matches).
                     # This value should be increased if you have fewer matches than non-matches in your data.
                     # You will need to experiment with various values for this parameter to see what works best.

* Number of trainable parameters: 542402
===>  TRAIN Epoch 1 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 1 || Run Time:    3.4 | Load Time:    2.3 || F1:  11.68 | Prec:  44.34 | Rec:   6.72 || Ex/s: 1205.42

===>  EVAL Epoch 1 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 1 || Run Time:    0.6 | Load Time:    0.8 || F1:  17.99 | Prec:  56.82 | Rec:  10.68 || Ex/s: 1683.14

* Best F1: 17.985611510791365
Saving best model...
===>  TRAIN Epoch 2 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 2 || Run Time:    3.4 | Load Time:    2.3 || F1:  48.95 | Prec:  63.27 | Rec:  39.91 || Ex/s: 1207.92

===>  EVAL Epoch 2 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 2 || Run Time:    0.6 | Load Time:    0.8 || F1:  42.05 | Prec:  62.71 | Rec:  31.62 || Ex/s: 1688.12

* Best F1: 42.04545454545454
Saving best model...
===>  TRAIN Epoch 3 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 3 || Run Time:    3.3 | Load Time:    2.3 || F1:  66.62 | Prec:  69.69 | Rec:  63.81 || Ex/s: 1215.05

===>  EVAL Epoch 3 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 3 || Run Time:    0.6 | Load Time:    0.8 || F1:  50.99 | Prec:  60.59 | Rec:  44.02 || Ex/s: 1686.30

* Best F1: 50.99009900990099
Saving best model...
===>  TRAIN Epoch 4 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 4 || Run Time:    3.4 | Load Time:    2.3 || F1:  75.09 | Prec:  74.93 | Rec:  75.25 || Ex/s: 1208.70

===>  EVAL Epoch 4 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 4 || Run Time:    0.6 | Load Time:    0.8 || F1:  51.49 | Prec:  61.18 | Rec:  44.44 || Ex/s: 1686.38

* Best F1: 51.48514851485149
Saving best model...
===>  TRAIN Epoch 5 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 5 || Run Time:    3.3 | Load Time:    2.3 || F1:  80.42 | Prec:  79.47 | Rec:  81.40 || Ex/s: 1213.35

===>  EVAL Epoch 5 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 5 || Run Time:    0.6 | Load Time:    0.8 || F1:  54.80 | Prec:  60.62 | Rec:  50.00 || Ex/s: 1623.65

* Best F1: 54.80093676814988
Saving best model...
===>  TRAIN Epoch 6 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 6 || Run Time:    3.4 | Load Time:    2.3 || F1:  84.31 | Prec:  82.96 | Rec:  85.69 || Ex/s: 1208.94

===>  EVAL Epoch 6 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 6 || Run Time:    0.6 | Load Time:    0.8 || F1:  56.00 | Prec:  58.33 | Rec:  53.85 || Ex/s: 1689.44

* Best F1: 56.0
Saving best model...
===>  TRAIN Epoch 7 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 7 || Run Time:    3.4 | Load Time:    2.3 || F1:  86.57 | Prec:  86.02 | Rec:  87.12 || Ex/s: 1208.50

===>  EVAL Epoch 7 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 7 || Run Time:    0.6 | Load Time:    0.8 || F1:  54.95 | Prec:  58.10 | Rec:  52.14 || Ex/s: 1685.56

===>  TRAIN Epoch 8 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 8 || Run Time:    3.4 | Load Time:    2.3 || F1:  88.14 | Prec:  87.03 | Rec:  89.27 || Ex/s: 1208.23

===>  EVAL Epoch 8 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 8 || Run Time:    0.6 | Load Time:    0.8 || F1:  54.46 | Prec:  58.62 | Rec:  50.85 || Ex/s: 1683.39

===>  TRAIN Epoch 9 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 9 || Run Time:    3.3 | Load Time:    2.3 || F1:  89.65 | Prec:  88.89 | Rec:  90.41 || Ex/s: 1211.82

===>  EVAL Epoch 9 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 9 || Run Time:    0.6 | Load Time:    0.8 || F1:  53.46 | Prec:  58.00 | Rec:  49.57 || Ex/s: 1639.96

===>  TRAIN Epoch 10 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 10 || Run Time:    3.3 | Load Time:    2.3 || F1:  90.91 | Prec:  90.27 | Rec:  91.56 || Ex/s: 1215.95

===>  EVAL Epoch 10 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 10 || Run Time:    0.6 | Load Time:    0.8 || F1:  53.43 | Prec:  59.79 | Rec:  48.29 || Ex/s: 1675.82

===>  TRAIN Epoch 11 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 11 || Run Time:    3.4 | Load Time:    2.3 || F1:  91.98 | Prec:  91.27 | Rec:  92.70 || Ex/s: 1208.12

===>  EVAL Epoch 11 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 11 || Run Time:    0.6 | Load Time:    0.8 || F1:  53.72 | Prec:  61.20 | Rec:  47.86 || Ex/s: 1684.09

===>  TRAIN Epoch 12 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 12 || Run Time:    3.4 | Load Time:    2.3 || F1:  92.54 | Prec:  91.95 | Rec:  93.13 || Ex/s: 1208.01

===>  EVAL Epoch 12 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 12 || Run Time:    0.6 | Load Time:    0.8 || F1:  53.40 | Prec:  61.80 | Rec:  47.01 || Ex/s: 1689.04

===>  TRAIN Epoch 13 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 13 || Run Time:    3.3 | Load Time:    2.3 || F1:  93.01 | Prec:  92.75 | Rec:  93.28 || Ex/s: 1213.61

===>  EVAL Epoch 13 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 13 || Run Time:    0.6 | Load Time:    0.8 || F1:  52.45 | Prec:  61.49 | Rec:  45.73 || Ex/s: 1688.25

===>  TRAIN Epoch 14 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 14 || Run Time:    3.3 | Load Time:    2.3 || F1:  93.23 | Prec:  92.90 | Rec:  93.56 || Ex/s: 1217.77

===>  EVAL Epoch 14 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 14 || Run Time:    0.6 | Load Time:    0.8 || F1:  52.48 | Prec:  62.35 | Rec:  45.30 || Ex/s: 1649.33

===>  TRAIN Epoch 15 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 15 || Run Time:    3.3 | Load Time:    2.3 || F1:  93.45 | Prec:  93.05 | Rec:  93.85 || Ex/s: 1213.73

===>  EVAL Epoch 15 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 15 || Run Time:    0.6 | Load Time:    0.8 || F1:  52.61 | Prec:  62.72 | Rec:  45.30 || Ex/s: 1657.23



In [5]:
# Load the best model state before the model starts overfitting. This is known as early stopping.
sif_model.load_state('sif_model.pth')

# Compute F1 on test set
sif_model.run_eval(test)

===>  EVAL Epoch 6 :


0% [██████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:00


Finished Epoch 6 || Run Time:    0.3 | Load Time:    0.7 || F1:  58.17 | Prec:  61.03 | Rec:  55.56 || Ex/s: 2228.61



58.165548098434

## Train medium complexity DL model (RNN model)

In [6]:
# Construct RNN DL model and train.
rnn_model = dm.MatchingModel(attr_summarizer='rnn')
rnn_model.run_train(
    train,
    validation,
    epochs=15,
    batch_size=16,
    best_save_path='rnn_model.pth',
    pos_weight=1.3)

* Number of trainable parameters: 1762802
===>  TRAIN Epoch 1 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 1 || Run Time:    9.1 | Load Time:    2.4 || F1:   9.94 | Prec:  45.35 | Rec:   5.58 || Ex/s: 598.09

===>  EVAL Epoch 1 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 1 || Run Time:    1.3 | Load Time:    0.8 || F1:  28.87 | Prec:  73.68 | Rec:  17.95 || Ex/s: 1128.76

* Best F1: 28.865979381443303
Saving best model...
===>  TRAIN Epoch 2 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 2 || Run Time:    9.2 | Load Time:    2.4 || F1:  59.28 | Prec:  65.79 | Rec:  53.93 || Ex/s: 594.20

===>  EVAL Epoch 2 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 2 || Run Time:    1.3 | Load Time:    0.8 || F1:  51.72 | Prec:  67.59 | Rec:  41.88 || Ex/s: 1132.26

* Best F1: 51.715039577836414
Saving best model...
===>  TRAIN Epoch 3 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 3 || Run Time:    9.2 | Load Time:    2.4 || F1:  78.66 | Prec:  76.73 | Rec:  80.69 || Ex/s: 595.42

===>  EVAL Epoch 3 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 3 || Run Time:    1.3 | Load Time:    0.8 || F1:  53.44 | Prec:  66.04 | Rec:  44.87 || Ex/s: 1131.91

* Best F1: 53.43511450381679
Saving best model...
===>  TRAIN Epoch 4 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 4 || Run Time:    9.1 | Load Time:    2.4 || F1:  87.73 | Prec:  84.21 | Rec:  91.56 || Ex/s: 598.47

===>  EVAL Epoch 4 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 4 || Run Time:    1.3 | Load Time:    0.8 || F1:  57.08 | Prec:  60.19 | Rec:  54.27 || Ex/s: 1120.39

* Best F1: 57.07865168539326
Saving best model...
===>  TRAIN Epoch 5 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 5 || Run Time:    9.1 | Load Time:    2.4 || F1:  92.05 | Prec:  89.80 | Rec:  94.42 || Ex/s: 600.36

===>  EVAL Epoch 5 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 5 || Run Time:    1.3 | Load Time:    0.8 || F1:  59.20 | Prec:  58.58 | Rec:  59.83 || Ex/s: 1133.64

* Best F1: 59.19661733615222
Saving best model...
===>  TRAIN Epoch 6 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 6 || Run Time:    9.1 | Load Time:    2.4 || F1:  95.12 | Prec:  93.99 | Rec:  96.28 || Ex/s: 598.57

===>  EVAL Epoch 6 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 6 || Run Time:    1.3 | Load Time:    0.8 || F1:  58.13 | Prec:  59.03 | Rec:  57.26 || Ex/s: 1111.44

===>  TRAIN Epoch 7 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 7 || Run Time:    9.1 | Load Time:    2.4 || F1:  96.29 | Prec:  94.37 | Rec:  98.28 || Ex/s: 600.43

===>  EVAL Epoch 7 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 7 || Run Time:    1.3 | Load Time:    0.8 || F1:  57.68 | Prec:  64.55 | Rec:  52.14 || Ex/s: 1122.62

===>  TRAIN Epoch 8 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 8 || Run Time:    9.2 | Load Time:    2.4 || F1:  97.45 | Prec:  96.49 | Rec:  98.43 || Ex/s: 593.24

===>  EVAL Epoch 8 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 8 || Run Time:    1.3 | Load Time:    0.8 || F1:  57.63 | Prec:  66.48 | Rec:  50.85 || Ex/s: 1134.94

===>  TRAIN Epoch 9 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 9 || Run Time:    9.2 | Load Time:    2.4 || F1:  97.88 | Prec:  96.52 | Rec:  99.28 || Ex/s: 594.59

===>  EVAL Epoch 9 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 9 || Run Time:    1.3 | Load Time:    0.8 || F1:  56.30 | Prec:  66.67 | Rec:  48.72 || Ex/s: 1108.13

===>  TRAIN Epoch 10 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 10 || Run Time:    9.1 | Load Time:    2.4 || F1:  98.16 | Prec:  97.33 | Rec:  99.00 || Ex/s: 599.41

===>  EVAL Epoch 10 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 10 || Run Time:    1.3 | Load Time:    0.8 || F1:  56.00 | Prec:  67.47 | Rec:  47.86 || Ex/s: 1110.82

===>  TRAIN Epoch 11 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 11 || Run Time:    9.1 | Load Time:    2.4 || F1:  98.65 | Prec:  97.89 | Rec:  99.43 || Ex/s: 598.16

===>  EVAL Epoch 11 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 11 || Run Time:    1.3 | Load Time:    0.8 || F1:  54.08 | Prec:  67.09 | Rec:  45.30 || Ex/s: 1131.86

===>  TRAIN Epoch 12 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 12 || Run Time:    9.1 | Load Time:    2.4 || F1:  98.93 | Prec:  98.31 | Rec:  99.57 || Ex/s: 598.55

===>  EVAL Epoch 12 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 12 || Run Time:    1.3 | Load Time:    0.8 || F1:  53.71 | Prec:  66.88 | Rec:  44.87 || Ex/s: 1134.42

===>  TRAIN Epoch 13 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 13 || Run Time:    9.2 | Load Time:    2.4 || F1:  99.07 | Prec:  98.58 | Rec:  99.57 || Ex/s: 594.58

===>  EVAL Epoch 13 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 13 || Run Time:    1.3 | Load Time:    0.8 || F1:  53.09 | Prec:  66.88 | Rec:  44.02 || Ex/s: 1104.89

===>  TRAIN Epoch 14 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 14 || Run Time:    9.1 | Load Time:    2.4 || F1:  99.29 | Prec:  99.00 | Rec:  99.57 || Ex/s: 597.76

===>  EVAL Epoch 14 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 14 || Run Time:    1.3 | Load Time:    0.8 || F1:  53.85 | Prec:  67.31 | Rec:  44.87 || Ex/s: 1134.86

===>  TRAIN Epoch 15 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:11


Finished Epoch 15 || Run Time:    9.2 | Load Time:    2.4 || F1:  99.36 | Prec:  99.01 | Rec:  99.71 || Ex/s: 594.80

===>  EVAL Epoch 15 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 15 || Run Time:    1.3 | Load Time:    0.8 || F1:  53.37 | Prec:  67.76 | Rec:  44.02 || Ex/s: 1132.94



In [7]:
# Load the best model state before the model starts overfitting. This is known as early stopping.
rnn_model.load_state('rnn_model.pth')

# Compute F1 on test set
rnn_model.run_eval(test)

===>  EVAL Epoch 5 :


0% [██████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 5 || Run Time:    0.6 | Load Time:    0.7 || F1:  62.34 | Prec:  61.07 | Rec:  63.68 || Ex/s: 1673.20



62.34309623430963

## Train medium complexity DL model (Attention model)

In [8]:
# Construct Attention DL model and train.
att_model = dm.MatchingModel(attr_summarizer='attention')
att_model.run_train(
    train,
    validation,
    epochs=15,
    batch_size=16,
    best_save_path='att_model.pth',
    pos_weight=1.3)

* Number of trainable parameters: 3429602
===>  TRAIN Epoch 1 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 1 || Run Time:   13.4 | Load Time:    2.3 || F1:  13.81 | Prec:  41.13 | Rec:   8.30 || Ex/s: 438.23

===>  EVAL Epoch 1 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 1 || Run Time:    2.1 | Load Time:    0.8 || F1:  36.89 | Prec:  56.64 | Rec:  27.35 || Ex/s: 808.24

* Best F1: 36.887608069164266
Saving best model...
===>  TRAIN Epoch 2 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 2 || Run Time:   13.3 | Load Time:    2.3 || F1:  53.57 | Prec:  57.78 | Rec:  49.93 || Ex/s: 438.83

===>  EVAL Epoch 2 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 2 || Run Time:    2.1 | Load Time:    0.8 || F1:  53.00 | Prec:  63.86 | Rec:  45.30 || Ex/s: 813.67

* Best F1: 53.00000000000001
Saving best model...
===>  TRAIN Epoch 3 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 3 || Run Time:   13.4 | Load Time:    2.4 || F1:  67.37 | Prec:  66.12 | Rec:  68.67 || Ex/s: 435.34

===>  EVAL Epoch 3 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 3 || Run Time:    2.0 | Load Time:    0.8 || F1:  55.56 | Prec:  60.61 | Rec:  51.28 || Ex/s: 817.27

* Best F1: 55.555555555555564
Saving best model...
===>  TRAIN Epoch 4 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 4 || Run Time:   13.5 | Load Time:    2.4 || F1:  75.29 | Prec:  72.18 | Rec:  78.68 || Ex/s: 434.20

===>  EVAL Epoch 4 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 4 || Run Time:    2.0 | Load Time:    0.8 || F1:  58.62 | Prec:  69.19 | Rec:  50.85 || Ex/s: 815.99

* Best F1: 58.62068965517241
Saving best model...
===>  TRAIN Epoch 5 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 5 || Run Time:   13.3 | Load Time:    2.4 || F1:  81.34 | Prec:  78.13 | Rec:  84.84 || Ex/s: 437.83

===>  EVAL Epoch 5 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 5 || Run Time:    2.0 | Load Time:    0.8 || F1:  59.17 | Prec:  69.14 | Rec:  51.71 || Ex/s: 817.54

* Best F1: 59.168704156479215
Saving best model...
===>  TRAIN Epoch 6 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 6 || Run Time:   13.3 | Load Time:    2.4 || F1:  85.77 | Prec:  82.91 | Rec:  88.84 || Ex/s: 437.92

===>  EVAL Epoch 6 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 6 || Run Time:    2.0 | Load Time:    0.8 || F1:  61.65 | Prec:  71.35 | Rec:  54.27 || Ex/s: 817.68

* Best F1: 61.6504854368932
Saving best model...
===>  TRAIN Epoch 7 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 7 || Run Time:   13.3 | Load Time:    2.4 || F1:  87.45 | Prec:  84.42 | Rec:  90.70 || Ex/s: 437.17

===>  EVAL Epoch 7 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 7 || Run Time:    2.1 | Load Time:    0.8 || F1:  61.29 | Prec:  66.50 | Rec:  56.84 || Ex/s: 810.61

===>  TRAIN Epoch 8 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 8 || Run Time:   13.4 | Load Time:    2.4 || F1:  89.80 | Prec:  87.20 | Rec:  92.56 || Ex/s: 436.51

===>  EVAL Epoch 8 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 8 || Run Time:    2.0 | Load Time:    0.8 || F1:  61.92 | Prec:  64.65 | Rec:  59.40 || Ex/s: 816.29

* Best F1: 61.91536748329621
Saving best model...
===>  TRAIN Epoch 9 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 9 || Run Time:   13.4 | Load Time:    2.4 || F1:  91.14 | Prec:  88.96 | Rec:  93.42 || Ex/s: 435.32

===>  EVAL Epoch 9 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 9 || Run Time:    2.0 | Load Time:    0.8 || F1:  62.13 | Prec:  66.18 | Rec:  58.55 || Ex/s: 818.79

* Best F1: 62.131519274376416
Saving best model...
===>  TRAIN Epoch 10 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 10 || Run Time:   13.3 | Load Time:    2.4 || F1:  92.31 | Prec:  90.29 | Rec:  94.42 || Ex/s: 437.86

===>  EVAL Epoch 10 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 10 || Run Time:    2.0 | Load Time:    0.8 || F1:  61.86 | Prec:  67.86 | Rec:  56.84 || Ex/s: 815.96

===>  TRAIN Epoch 11 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 11 || Run Time:   13.3 | Load Time:    2.3 || F1:  93.79 | Prec:  91.55 | Rec:  96.14 || Ex/s: 439.31

===>  EVAL Epoch 11 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 11 || Run Time:    2.1 | Load Time:    0.8 || F1:  60.52 | Prec:  67.72 | Rec:  54.70 || Ex/s: 803.30

===>  TRAIN Epoch 12 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 12 || Run Time:   13.3 | Load Time:    2.4 || F1:  94.25 | Prec:  92.43 | Rec:  96.14 || Ex/s: 437.15

===>  EVAL Epoch 12 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 12 || Run Time:    2.1 | Load Time:    0.8 || F1:  59.52 | Prec:  67.20 | Rec:  53.42 || Ex/s: 807.28

===>  TRAIN Epoch 13 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 13 || Run Time:   13.3 | Load Time:    2.4 || F1:  94.83 | Prec:  92.75 | Rec:  97.00 || Ex/s: 438.92

===>  EVAL Epoch 13 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 13 || Run Time:    2.0 | Load Time:    0.8 || F1:  59.38 | Prec:  66.84 | Rec:  53.42 || Ex/s: 817.59

===>  TRAIN Epoch 14 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 14 || Run Time:   13.4 | Load Time:    2.4 || F1:  95.23 | Prec:  93.40 | Rec:  97.14 || Ex/s: 436.77

===>  EVAL Epoch 14 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 14 || Run Time:    2.0 | Load Time:    0.8 || F1:  59.72 | Prec:  67.02 | Rec:  53.85 || Ex/s: 818.32

===>  TRAIN Epoch 15 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:15


Finished Epoch 15 || Run Time:   13.3 | Load Time:    2.4 || F1:  95.58 | Prec:  93.80 | Rec:  97.42 || Ex/s: 438.62

===>  EVAL Epoch 15 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 15 || Run Time:    2.1 | Load Time:    0.8 || F1:  60.75 | Prec:  67.01 | Rec:  55.56 || Ex/s: 812.63



In [9]:
# Load the best model state before the model starts overfitting. This is known as early stopping.
att_model.load_state('att_model.pth')

# Compute F1 on test set
att_model.run_eval(test)

===>  EVAL Epoch 9 :


0% [██████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:01


Finished Epoch 9 || Run Time:    1.0 | Load Time:    0.7 || F1:  61.02 | Prec:  63.72 | Rec:  58.55 || Ex/s: 1286.03



61.02449888641425

## Train sophisticated DL model (Hybrid model)

In [12]:
# Construct Hybrid DL model and train.
hybrid_model = dm.MatchingModel(attr_summarizer='hybrid')
hybrid_model.run_train(
    train,
    validation,
    epochs=15,
    batch_size=16,
    best_save_path='hybrid_model.pth',
    pos_weight=1.3)

* Number of trainable parameters: 7133105
===>  TRAIN Epoch 1 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 1 || Run Time:   29.0 | Load Time:    2.3 || F1:   9.79 | Prec:  33.90 | Rec:   5.72 || Ex/s: 219.53

===>  EVAL Epoch 1 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 1 || Run Time:    4.3 | Load Time:    0.8 || F1:  13.69 | Prec:  62.07 | Rec:   7.69 || Ex/s: 454.97

* Best F1: 13.688212927756656
Saving best model...
===>  TRAIN Epoch 2 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 2 || Run Time:   29.1 | Load Time:    2.3 || F1:  51.67 | Prec:  54.91 | Rec:  48.78 || Ex/s: 218.56

===>  EVAL Epoch 2 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 2 || Run Time:    4.3 | Load Time:    0.8 || F1:  48.85 | Prec:  60.38 | Rec:  41.03 || Ex/s: 454.44

* Best F1: 48.854961832061065
Saving best model...
===>  TRAIN Epoch 3 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 3 || Run Time:   29.1 | Load Time:    2.3 || F1:  69.59 | Prec:  65.94 | Rec:  73.68 || Ex/s: 218.99

===>  EVAL Epoch 3 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 3 || Run Time:    4.3 | Load Time:    0.8 || F1:  54.03 | Prec:  68.87 | Rec:  44.44 || Ex/s: 456.48

* Best F1: 54.02597402597402
Saving best model...
===>  TRAIN Epoch 4 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 4 || Run Time:   29.0 | Load Time:    2.3 || F1:  78.25 | Prec:  73.92 | Rec:  83.12 || Ex/s: 219.56

===>  EVAL Epoch 4 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 4 || Run Time:    4.3 | Load Time:    0.8 || F1:  41.21 | Prec:  70.83 | Rec:  29.06 || Ex/s: 455.49

===>  TRAIN Epoch 5 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 5 || Run Time:   29.0 | Load Time:    2.3 || F1:  85.50 | Prec:  81.91 | Rec:  89.41 || Ex/s: 219.52

===>  EVAL Epoch 5 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 5 || Run Time:    4.3 | Load Time:    0.8 || F1:  62.07 | Prec:  59.07 | Rec:  65.38 || Ex/s: 455.17

* Best F1: 62.06896551724138
Saving best model...
===>  TRAIN Epoch 6 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 6 || Run Time:   29.0 | Load Time:    2.3 || F1:  90.67 | Prec:  87.09 | Rec:  94.56 || Ex/s: 219.31

===>  EVAL Epoch 6 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 6 || Run Time:    4.2 | Load Time:    0.8 || F1:  63.27 | Prec:  60.55 | Rec:  66.24 || Ex/s: 456.41

* Best F1: 63.26530612244898
Saving best model...
===>  TRAIN Epoch 7 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 7 || Run Time:   29.2 | Load Time:    2.3 || F1:  92.94 | Prec:  90.07 | Rec:  95.99 || Ex/s: 218.08

===>  EVAL Epoch 7 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 7 || Run Time:    4.2 | Load Time:    0.8 || F1:  64.40 | Prec:  60.53 | Rec:  68.80 || Ex/s: 457.48

* Best F1: 64.4
Saving best model...
===>  TRAIN Epoch 8 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 8 || Run Time:   29.0 | Load Time:    2.3 || F1:  94.48 | Prec:  92.35 | Rec:  96.71 || Ex/s: 219.15

===>  EVAL Epoch 8 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 8 || Run Time:    4.3 | Load Time:    0.8 || F1:  63.36 | Prec:  63.91 | Rec:  62.82 || Ex/s: 454.86

===>  TRAIN Epoch 9 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 9 || Run Time:   29.0 | Load Time:    2.3 || F1:  95.68 | Prec:  93.33 | Rec:  98.14 || Ex/s: 219.25

===>  EVAL Epoch 9 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 9 || Run Time:    4.3 | Load Time:    0.8 || F1:  59.50 | Prec:  64.04 | Rec:  55.56 || Ex/s: 454.61

===>  TRAIN Epoch 10 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 10 || Run Time:   29.2 | Load Time:    2.3 || F1:  96.69 | Prec:  95.28 | Rec:  98.14 || Ex/s: 217.78

===>  EVAL Epoch 10 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 10 || Run Time:    4.2 | Load Time:    0.8 || F1:  55.29 | Prec:  63.19 | Rec:  49.15 || Ex/s: 456.97

===>  TRAIN Epoch 11 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 11 || Run Time:   29.0 | Load Time:    2.3 || F1:  97.61 | Prec:  95.99 | Rec:  99.28 || Ex/s: 219.26

===>  EVAL Epoch 11 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 11 || Run Time:    4.3 | Load Time:    0.8 || F1:  54.27 | Prec:  65.85 | Rec:  46.15 || Ex/s: 455.49

===>  TRAIN Epoch 12 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 12 || Run Time:   29.1 | Load Time:    2.3 || F1:  97.82 | Prec:  96.39 | Rec:  99.28 || Ex/s: 218.49

===>  EVAL Epoch 12 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 12 || Run Time:    4.3 | Load Time:    0.8 || F1:  52.53 | Prec:  64.20 | Rec:  44.44 || Ex/s: 452.87

===>  TRAIN Epoch 13 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 13 || Run Time:   29.2 | Load Time:    2.3 || F1:  98.37 | Prec:  97.34 | Rec:  99.43 || Ex/s: 218.23

===>  EVAL Epoch 13 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 13 || Run Time:    4.3 | Load Time:    0.8 || F1:  53.13 | Prec:  64.24 | Rec:  45.30 || Ex/s: 454.83

===>  TRAIN Epoch 14 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 14 || Run Time:   29.1 | Load Time:    2.3 || F1:  98.51 | Prec:  97.75 | Rec:  99.28 || Ex/s: 218.29

===>  EVAL Epoch 14 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 14 || Run Time:    4.3 | Load Time:    0.8 || F1:  54.15 | Prec:  63.07 | Rec:  47.44 || Ex/s: 455.19

===>  TRAIN Epoch 15 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 15 || Run Time:   29.1 | Load Time:    2.3 || F1:  98.94 | Prec:  98.17 | Rec:  99.71 || Ex/s: 218.83

===>  EVAL Epoch 15 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 15 || Run Time:    4.3 | Load Time:    0.8 || F1:  54.37 | Prec:  62.92 | Rec:  47.86 || Ex/s: 456.30



In [13]:
# Load the best model state before the model starts overfitting. This is known as early stopping.
hybrid_model.load_state('hybrid_model.pth')
hybrid_model.run_eval(test)

===>  EVAL Epoch 7 :


0% [██████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 7 || Run Time:    2.2 | Load Time:    0.7 || F1:  63.35 | Prec:  59.33 | Rec:  67.95 || Ex/s: 786.21



63.34661354581673