# Conditional Random Fields

Conditional Random Fields (CRFs) are an undirected graph model used for sequence annotation and structured prediction tasks. Compared with Hidden Markov Models (HMMs), CRFs are more suitable for modeling non local dependencies between sequences. In NLP, CRFs are commonly used for tasks such as named entity recognition, part of speech tagging, and syntactic analysis.

The core idea of CRFs is the conditional probability distribution of the output sequence under given input sequence conditions. It trains the model by maximizing the conditional log likelihood function, typically using optimization algorithms such as gradient descent for parameter estimation.

CRFs have similarities with Hidden Markov Models (HMMs), but they differ in several key aspects:

**Label dependency**:

- HMMs typically assume that the transition between labels (or hidden states) only depends on the previous state, meaning they have Markov properties.

- CRFs allow the transition between labels to depend on the entire input sequence, rather than solely on the previous state. This means that CRFs can capture dependencies over longer distances.

**Probability calculation**:

- In HMMs, the label probabilities of sequences are calculated through multiplication, as they assume that state transitions only depend on the previous state.

- CRFs use a global normalization factor (also known as a partition function) that considers all possible label sequences to calculate the observation probability of a given input sequence.

**Training and inference**:

- The training and label inference of HMMs can usually be efficiently completed using dynamic programming algorithms such as the Viterbi algorithm.

- The training and inference of CRFs are usually more complex as they require optimization of the entire label sequence. This usually involves iterative algorithms such as gradient descent or Newton's method.

The main components of CRF include:

`Characteristic function`: defines the relationship between input sequence and label sequence. CRFs capture the relationship between input and output through feature functions.

`Weight`: The parameter associated with the feature function, learned through training data.

`Partition function`: a normalization factor used to ensure that the sum of probabilities of all possible label sequences is 1.

In [17]:
train_sents = [
    [('John', 'NNP'), ('is', 'VBZ'), ('from', 'IN'), ('New', 'NNP'), ('York', 'NNP')],
    [('Alice', 'NNP'), ('loves', 'VBZ'), ('to', 'TO'), ('read', 'VB'), ('books', 'NNS')],
    [('The', 'DT'), ('Eiffel', 'NNP'), ('Tower', 'NNP'), ('is', 'VBZ'), ('in', 'IN'), ('Paris', 'NNP')],
    [('Microsoft', 'NNP'), ('released', 'VBD'), ('a', 'DT'), ('new', 'JJ'), ('product', 'NN')],
    [('He', 'PRP'), ('visited', 'VBD'), ('the', 'DT'), ('Great', 'NNP'), ('Wall', 'NNP'), ('of', 'IN'), ('China', 'NNP')],
    [('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')],
    [('Artificial', 'JJ'), ('Intelligence', 'NNP'), ('is', 'VBZ'), ('the', 'DT'), ('future', 'NN')],
    [('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('great', 'JJ'), ('programming', 'NN'), ('language', 'NN')]
]

test_sents = [
    [('Jane', 'NNP'), ('works', 'VBZ'), ('at', 'IN'), ('Google', 'NNP')],
    [('They', 'PRP'), ('traveled', 'VBD'), ('to', 'TO'), ('San', 'NNP'), ('Francisco', 'NNP')],
    [('He', 'PRP'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('person', 'NN')],
    [('The', 'DT'), ('new', 'JJ'), ('phone', 'NN'), ('was', 'VBD'), ('released', 'VBN'), ('yesterday', 'NN')]
]

print(train_sents)
print(test_sents)

[[('John', 'NNP'), ('is', 'VBZ'), ('from', 'IN'), ('New', 'NNP'), ('York', 'NNP')], [('Alice', 'NNP'), ('loves', 'VBZ'), ('to', 'TO'), ('read', 'VB'), ('books', 'NNS')], [('The', 'DT'), ('Eiffel', 'NNP'), ('Tower', 'NNP'), ('is', 'VBZ'), ('in', 'IN'), ('Paris', 'NNP')], [('Microsoft', 'NNP'), ('released', 'VBD'), ('a', 'DT'), ('new', 'JJ'), ('product', 'NN')], [('He', 'PRP'), ('visited', 'VBD'), ('the', 'DT'), ('Great', 'NNP'), ('Wall', 'NNP'), ('of', 'IN'), ('China', 'NNP')], [('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')], [('Artificial', 'JJ'), ('Intelligence', 'NNP'), ('is', 'VBZ'), ('the', 'DT'), ('future', 'NN')], [('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('great', 'JJ'), ('programming', 'NN'), ('language', 'NN')]]
[[('Jane', 'NNP'), ('works', 'VBZ'), ('at', 'IN'), ('Google', 'NNP')], [('They', 'PRP'), ('traveled', 'VBD'), ('to', 'TO'), ('San', 'NNP'), ('Francisco', 'NNP')], [('

In [22]:
import pycrfsuite

# Characteristic function
def word2features(sent, i):
    word = sent[i][0]
    postag = sent[i][1]
    features = [
        'bias',
        'word.lower=' + word.lower(),
        'word[-3:]=' + word[-3:],
        'word[-2:]=' + word[-2:],
        'word.isupper=%s' % word.isupper(),
        'word.istitle=%s' % word.istitle(),
        'word.isdigit=%s' % word.isdigit(),
        'postag=' + postag,
    ]
    if i > 0:
        word1 = sent[i-1][0]
        postag1 = sent[i-1][1]
        features.extend([
            '-1:word.lower=' + word1.lower(),
            '-1:postag=' + postag1,
            '-1:word.istitle=%s' % word1.istitle(),
            '-1:word.isupper=%s' % word1.isupper(),
        ])
    else:
        features.append('BOS')
        
    if i < len(sent)-1:
        word1 = sent[i+1][0]
        postag1 = sent[i+1][1]
        features.extend([
            '+1:word.lower=' + word1.lower(),
            '+1:postag=' + postag1,
            '+1:word.istitle=%s' % word1.istitle(),
            '+1:word.isupper=%s' % word1.isupper(),
        ])
    else:
        features.append('EOS')
    return features

# labels
def sent2features(sent):
    return [word2features(sent, i) for i in range(len(sent))]

def sent2labels(sent):
    return [postag for token, postag in sent]

# train data
X_train = [sent2features(s) for s in train_sents]
y_train = [sent2labels(s) for s in train_sents]

# test data
X_test = [sent2features(s) for s in test_sents]
y_test = [sent2labels(s) for s in test_sents]

# train CRF
trainer = pycrfsuite.Trainer(verbose=False)
for xseq, yseq in zip(X_train, y_train):
    trainer.append(xseq, yseq)
trainer.set_params({
    'c1': 1.0,
    'c2': 1e-3,
    'max_iterations': 50,
    'feature.possible_transitions': True
})
trainer.train('example.crfsuite')

# predict
tagger = pycrfsuite.Tagger()
tagger.open('example.crfsuite')
y_pred = [tagger.tag(xseq) for xseq in X_test]

for sent, true_labels, pred_labels in zip(test_sents, y_test, y_pred):
    print("Sentence:", [token for token, postag in sent])
    print("True POS tags:", true_labels)
    print("Predicted POS tags:", pred_labels)
    print()

Sentence: ['Jane', 'works', 'at', 'Google']
True POS tags: ['NNP', 'VBZ', 'IN', 'NNP']
Predicted POS tags: ['NNP', 'VBZ', 'IN', 'NNP']

Sentence: ['They', 'traveled', 'to', 'San', 'Francisco']
True POS tags: ['PRP', 'VBD', 'TO', 'NNP', 'NNP']
Predicted POS tags: ['NNP', 'VBD', 'DT', 'NNP', 'NNP']

Sentence: ['He', 'is', 'a', 'good', 'person']
True POS tags: ['PRP', 'VBZ', 'DT', 'JJ', 'NN']
Predicted POS tags: ['NNP', 'VBZ', 'DT', 'JJ', 'NN']

Sentence: ['The', 'new', 'phone', 'was', 'released', 'yesterday']
True POS tags: ['DT', 'JJ', 'NN', 'VBD', 'VBN', 'NN']
Predicted POS tags: ['DT', 'JJ', 'NN', 'VBD', 'VBD', 'NN']

