# Conference resolution

This notebook is part of the lecture series at the Faculty Development Programme organised by the Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Visakhapatnam, jointly in association with ShodhGuru Innovation and Research Labs, India. Specifically, this notebook is part of Tek Raj Chhetri's lecture entitled Applications of Deep Neural Networks in Knowledge Graph Construction.

The demo uses the F-COREF [2]. 
 
[2] __Otmazgin, S., Cattan, A. and Goldberg, Y., 2022. F-COREF: Fast, Accurate and Easy to Use Coreference Resolution. arXiv preprint arXiv:2209.04280.__

### Installation 

`pip install fastcoref` 

`pip install -U spacy`

We will use small model, `en_core_web_sm` for this case but if you want more accuracy, you should use `en_core_web_trf` as suggested by [Spacy](https://spacy.io/usage).


Note: You also require Java.  

### Installation 

In [1]:
# !pip install -U spacy --quiet
# !python -m spacy download en_core_web_sm --quiet

In [1]:
import spacy
from fastcoref import spacy_component

  from .autonotebook import tqdm as notebook_tqdm


***
### Loading models

We load the following items:
- en_core_web_sm
- biu-nlp/lingmess-coref

In [4]:
# load spacy
spacym = spacy.load('en_core_web_sm')
# add fastcoref to spacy pipeline
spacym.add_pipe(
   "fastcoref", 
   config={'model_architecture': 'LingMessCoref', 
           'model_path': 'biu-nlp/lingmess-coref', 'device': 'cpu'}
)

Some weights of the model checkpoint at biu-nlp/lingmess-coref were not used when initializing LingMessModel: ['longformer.embeddings.position_ids']
- This IS expected if you are initializing LingMessModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing LingMessModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
04/05/2023 15:25:49 - INFO - 	 missing_keys: []
04/05/2023 15:25:49 - INFO - 	 unexpected_keys: []
04/05/2023 15:25:49 - INFO - 	 mismatched_keys: []
04/05/2023 15:25:49 - INFO - 	 error_msgs: []
04/05/2023 15:25:49 - INFO - 	 Model Parameters: 590.0M, Transformer: 434.6M, Coref head: 155.4M


<fastcoref.spacy_component.spacy_component.FastCorefResolver at 0x13ce7cf90>

In [7]:
sentence_to_fix = 'Sanju Tiwari is a researcher. She works at the Universidad Autonoma de Tamaulipas.'

In [8]:
from fastcoref import FCoref

In [9]:
model = FCoref()

04/05/2023 15:26:04 - INFO - 	 missing_keys: []
04/05/2023 15:26:04 - INFO - 	 unexpected_keys: []
04/05/2023 15:26:04 - INFO - 	 mismatched_keys: []
04/05/2023 15:26:04 - INFO - 	 error_msgs: []
04/05/2023 15:26:04 - INFO - 	 Model Parameters: 90.5M, Transformer: 82.1M, Coref head: 8.4M


In [10]:
prediction = model.predict(
    texts=[sentence_to_fix]
)

04/05/2023 15:26:04 - INFO - 	 Tokenize 1 inputs...
04/05/2023 15:26:04 - INFO - 	 ***** Running Inference on 1 texts *****         
Inference: 100%|██████████████████████████████████| 1/1 [00:00<00:00, 10.86it/s]


## Related text
Clusters specifying the text that are related to each other. 

In [11]:
prediction[0].get_clusters(as_strings=False)

[[(0, 12), (30, 33)]]

In [12]:
sentence_to_fix[0:12]

'Sanju Tiwari'

In [13]:
sentence_to_fix[30:33]

'She'

In [14]:
prediction[0].get_clusters()

[['Sanju Tiwari', 'She']]

## Let's use with Spacy 

In [15]:
spacym(sentence_to_fix, component_cfg={"fastcoref": {'resolve_text': True}})._.resolved_text

04/05/2023 15:26:12 - INFO - 	 Tokenize 1 inputs...
04/05/2023 15:26:12 - INFO - 	 ***** Running Inference on 1 texts *****         
Inference: 100%|██████████████████████████████████| 1/1 [00:04<00:00,  4.58s/it]


'Sanju Tiwari is a researcher. Sanju Tiwari works at the Universidad Autonoma de Tamaulipas.'

## Another Example

In [16]:
t = "Jane voted for Obama because he is aligned with her democratic values, she said."

In [17]:
spacym(t, component_cfg={"fastcoref": {'resolve_text': True}})._.resolved_text

04/05/2023 15:26:17 - INFO - 	 Tokenize 1 inputs...
04/05/2023 15:26:17 - INFO - 	 ***** Running Inference on 1 texts *****         
Inference: 100%|██████████████████████████████████| 1/1 [00:04<00:00,  4.37s/it]


"Jane voted for Obama because Obama is aligned with Jane's democratic values, Jane said."

In [18]:
t1 = "Jane stated that she voted for Obama because he shares democratic values that align with her."

In [19]:
spacym(t1, component_cfg={"fastcoref": {'resolve_text': True}})._.resolved_text

04/05/2023 15:26:22 - INFO - 	 Tokenize 1 inputs...
04/05/2023 15:26:22 - INFO - 	 ***** Running Inference on 1 texts *****         
Inference: 100%|██████████████████████████████████| 1/1 [00:04<00:00,  4.38s/it]


'Jane stated that Jane voted for Obama because Obama shares democratic values that align with Jane.'