## Code for training a BERT classifier

This notebook contains the code used to train the BERT classifiers used in Table 5 of the Living Machines paper (COLING 2020).

The user can specify two training sets (in variable `corpus`, under `Load data`):
* `stories/` to train the classifier used for the _Stories_ dataset, or
* `combined_animacy/` to train the classifier used for the _19thC Machines_ animacy dataset).
* `combined_humanness/` to train the classifier used for the _19thC Machines_ humanness dataset).

Three classifiers will be trained:
* **targetExpression:** using the target expression alone as input (_targetExp_ in Table 5).
* **context3w:** using the target expression plus 3 tokens to the left and right (_targetExp + ctxt_ in Table 5).
* **context3wmasked:** using the masked target expression plus 3 tokens to the left and right (_targetExp + ctxt_ in Table 5).

In [None]:
import pickle
from bert_sklearn import BertClassifier
from bert_sklearn import load_model
import pandas as pd
import pathlib
from tools import processing

#### Load data

In [None]:
corpus = "stories/" # Options: "combined_animacy/" or "stories/"
dataset_df = pd.read_pickle("../data/" + corpus + "train.pkl")

#### Train BERT classifiers

In [None]:
dFolders = {"targetExpression": "../models/classifiers/" + corpus + "targetExpression/",
            "context3wmasked": "../models/classifiers/" + corpus + "context3wmasked/",
            "context3w": "../models/classifiers/" + corpus + "context3w/"}

for col in dFolders:
    
    model = BertClassifier()
    
    pathlib.Path(dFolders[col]).mkdir(parents=True, exist_ok=True)

    X = dataset_df[col].tolist()
    y = dataset_df["animated"].tolist()

    model.fit(X, y)

    # save model to disk
    savefile= dFolders[col] + "/bert.bin"
    model.save(savefile)
    print("DONE")
    print()