# Example for easy reproduction

**Important notes before running the script**
- To run the script, please make sure you have set up all the requirements in README.md - Prerequisition
- Please make sure you fill in the directory of Standford core nlp in `crest.py` (Highlighted by TODO)
- Please launch Stanford Core NLP sever at Port 9001 and make sure the port number is not occupied

## 1. Read data from CoNLL12

In [1]:
import os
import sys
import random

# WARNING! Before running the script, please make sure you have filled in the 'TODO' in crest.py
from main import *
from utils import readCoNLL12

In [2]:
oriSentencePairs = readCoNLL12()

sample_num = 2  # change this number to use more source inputs
oriSentencePairs = random.sample(oriSentencePairs, k=sample_num)

print("Sampled {} sentences as source inputs.".format(sample_num))

Read 2894 lines from conll12/dev.english.v4_gold_conll.sen.json.
Sampled 2 sentences as source inputs.


## 2. Test generation

### Preparation -  Make temporary directory for output

In [3]:
OutputDir = '../TempOutput'
os.makedirs(OutputDir, exist_ok=True)


### Preparation - Setup nlp pipeline

In [4]:
# load nlp
nlp = setup_nlp_pipeline()

Some weights of the model checkpoint at nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large were not used when initializing XLMRobertaModel: ['lm_head.dense.bias', 'lm_head.layer_norm.bias', 'lm_head.bias', 'lm_head.dense.weight', 'lm_head.layer_norm.weight']
- This IS expected if you are initializing XLMRobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of XLMRobertaModel were not initialized from the model checkpoint at nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large and are newly initialized: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
You should probably TRAIN this model on a down-st

### Generation - Crest

In [5]:
genMethodName = 'Crest'
output_tsv_fn = os.path.join(OutputDir, '{}.tsv'.format(genMethodName))
Coref_testing(nlp, oriSentencePairs, genMethodName=genMethodName, output_fn=output_tsv_fn)


<Pair 0>
Origin  :  Students, who have enough money to keep several streets of shops going, carelessly wheel their motorcycles through the narrow streets, their minds elsewhere, thinking of youthful fun or romance.
Generate:  Students, who have enough money to keep several streets of shops going, heedlessly wheel their motorcycles through the narrow streets, their minds elsewhere, thinking of youthful fun or romance.
Replace: carelessly -> heedlessly
oriConsistent False
> Origin sentence's Coref:
[[[0, 12], [16, 16], [23, 23]]]
- Cluster 1: Students, who have enough money to keep several streets of shops going, their, their

> Generated sentence's Coref:
[[[0, 12], [16, 16], [23, 23]]]
- Cluster 1: Students, who have enough money to keep several streets of shops going, their, their
[Pass]
Origin  : Precision: 33.33 | Recall: 33.33 | F1: 33.33
Generate  : Precision: 100.00 | Recall: 100.00 | F1: 100.00

<Pair 1>
Origin  :  Students, who have enough money to keep several streets of shop

### SIT

In [6]:
genMethodName = 'SIT'
output_tsv_fn = os.path.join(OutputDir, '{}.tsv'.format(genMethodName))
Coref_testing(nlp, oriSentencePairs, genMethodName=genMethodName, output_fn=output_tsv_fn)

skip a sentence. unknown token is 'carelessly'

<Pair 0>
Origin  :  CIA officials say they never received it.
Generate:  CIA others say they never received it.
Replace: officials -> others
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[[[0, 0], [3, 3]]]
- Cluster 1: CIA, they
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall: 0.00 | F1: 0.00

<Pair 1>
Origin  :  CIA officials say they never received it.
Generate:  CIA people say they never received it.
Replace: officials -> people
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[[[0, 0], [3, 3]]]
- Cluster 1: CIA, they
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall: 0.00 | F1: 0.00



Summary: 
Number of origin sentence: 2 | Failed: 0
Number of generated pairs: 2 | Failed: 2


### PatInv

In [10]:
genMethodName = 'PatInv'
output_tsv_fn = os.path.join(OutputDir, '{}.tsv'.format(genMethodName))
Coref_testing(nlp, oriSentencePairs, genMethodName=genMethodName, output_fn=output_tsv_fn)

skip a sentence. unknown token is 'carelessly'
skip a sentence. unknown token is 'pirate'
skip a sentence. unknown token is 'germans'
skip a sentence. unknown token is 'italians'
skip a sentence. unknown token is 'their'

<Pair 0>
Origin  :  CIA officials say they never received it.
Generate:  CIA people say they never received it.
Replace: officials -> people
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[[[0, 0], [3, 3]]]
- Cluster 1: CIA, they
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall: 0.00 | F1: 0.00

<Pair 1>
Origin  :  CIA officials say they never received it.
Generate:  CIA things say they never received it.
Replace: officials -> things
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[[[0, 0], [3, 3]]]
- Cluster 1: CIA, they
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall:

### CAT

In [11]:
genMethodName = 'CAT'
output_tsv_fn = os.path.join(OutputDir, '{}.tsv'.format(genMethodName))
Coref_testing(nlp, oriSentencePairs, genMethodName=genMethodName, output_fn=output_tsv_fn)

Some weights of the model checkpoint at bert-large-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at bert-large-cased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.


<Pair 5>
Origin  :  CIA officials say they never received it.
Generate:  CIA sources say they never received it .
Replace: officials -> sources
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[[[0, 0], [3, 3]]]
- Cluster 1: CIA, they
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall: 0.00 | F1: 0.00

<Pair 6>
Origin  :  CIA officials say they never received it.
Generate:  CIA officials claim they never received it .
Replace: say -> claim
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[]
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall: 0.00 | F1: 0.00

<Pair 7>
Origin  :  CIA officials say they never received it.
Generate:  CIA officials said they never received it .
Replace: say -> said
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[]
[Bug found!] Incon

Some weights of the model checkpoint at bert-large-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at bert-large-cased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.


<Pair 0>
Origin  :  Students, who have enough money to keep several streets of shops going, carelessly wheel their motorcycles through the narrow streets, their minds elsewhere, thinking of youthful fun or romance.
Generate:  Students , who have enough money to keep several streets of shops going , carelessly wheel their motorcycles along the narrow streets , their minds elsewhere , thinking of youthful fun or romance .
Replace: through -> along
oriConsistent False
> Origin sentence's Coref:
[[[0, 12], [16, 16], [23, 23]]]
- Cluster 1: Students, who have enough money to keep several streets of shops going, their, their

> Generated sentence's Coref:
[[[0, 12], [16, 16], [23, 23]]]
- Cluster 1: Students , who have enough money to keep several streets of shops going, their, their
[Pass]
Origin  : Precision: 33.33 | Recall: 33.33 | F1: 33.33
Generate  : Precision: 100.00 | Recall: 100.00 | F1: 100.00

<Pair 1>
Origin  :  Students, who have enough money to keep several streets of shops go

Some weights of the model checkpoint at bert-large-cased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at bert-large-cased were not used when initializing BertModel: ['cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.


<Pair 5>
Origin  :  CIA officials say they never received it.
Generate:  CIA sources say they never received it .
Replace: officials -> sources
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[[[0, 0], [3, 3]]]
- Cluster 1: CIA, they
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall: 0.00 | F1: 0.00

<Pair 6>
Origin  :  CIA officials say they never received it.
Generate:  CIA officials claim they never received it .
Replace: say -> claim
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[]
[Bug found!] Inconsistent!
Origin  : Precision: 0.00 | Recall: 0.00 | F1: 0.00
Generate  : Precision: 0.00 | Recall: 0.00 | F1: 0.00

<Pair 7>
Origin  :  CIA officials say they never received it.
Generate:  CIA officials said they never received it .
Replace: say -> said
oriConsistent False
> Origin sentence's Coref:
[]

> Generated sentence's Coref:
[]
[Bug found!] Incon

### 3. Calculate semantic similarity and naturalness

In [12]:
from codebook.Eval_Sim_Nat import AeonScorer

In [16]:

scorer = AeonScorer(batch_size=8, masked_lm='bert-base-uncased', embed_lm='princeton-nlp/sup-simcse-bert-base-uncased', verbose=False)

source = 'Students, who have enough money to keep several streets of shops going, carelessly wheel their motorcycles through the narrow streets, their minds elsewhere, thinking of youthful fun or romance.'
follow = 'Students, who have enough money to keep several streets of shops going, heedlessly wheel their motorcycles through the narrow streets, their minds elsewhere, thinking of youthful fun or romance.'

result = scorer.compute(source, follow)

for sim in ['semSim', 'natDiff']:
    print('{}: {}'.format(sim, result[sim][0]))


2022-09-05 15:25:58 [INFO] Using cpu
2022-09-05 15:25:58 [INFO] Using batch size: 8


Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


2022-09-05 15:26:17 [INFO] Masked LM model: bert-base-uncased
2022-09-05 15:26:31 [INFO] Embedding model: princeton-nlp/sup-simcse-bert-base-uncased
2022-09-05 15:26:31 [INFO] Calculating scores


2022-09-05 15:26:39 [INFO] Finish text number 0
semSim: 0.9418442249298096
natDiff: 0.021671870750692268
