# File Description

##### This file is being created to perform the Neural Coreferencing of any given text. The Method used here is through a state of the art model called Neural Coreference Resolution or Neuralcoref by Hugging Face (NY). Please note that Neural Coref isn't compatible with the latest version of Spacy. This caused problems for us as the rest of our project relies on the latest version of Spacy to function. Our solution to that problem was attempting coreference on Spacy an old but compatible version of Spacy and then later, updating that version to the latest available. 

## Installation of Files

In [None]:
!pip uninstall spacy
!pip install --upgrade spacy==2.1.0
!pip install neuralcoref
!pip install urllib3=='1.25.4'
!python -m spacy download en_core_web_sm

## Implementation of Code

##### In the following shell, we'll be defining a function that will perform neural coreferencing resolution on our given text. It will take a string as input and return a coreferenced string as output. The results of this function have been manually tested using the website, text-compare

In [None]:
import spacy
import neuralcoref

# Load SpaCy
nlp = spacy.load('en_core_web_sm')
# Add neural coref to SpaCy's pipe
neuralcoref.add_to_pipe(nlp)

def coref_resolution(text):
    """Function that executes coreference resolution on a given text"""
    doc = nlp(text)
    # fetches tokens with whitespaces from spacy document
    tok_list = list(token.text_with_ws for token in doc)
    for cluster in doc._.coref_clusters:
        # get tokens from representative cluster name
        cluster_main_words = set(cluster.main.text.split(' '))
        for coref in cluster:
            if coref != cluster.main:  # if coreference element is not the representative element of that cluster
                if coref.text != cluster.main.text and bool(set(coref.text.split(' ')).intersection(cluster_main_words)) == False:
                    # if coreference element text and representative element text are not equal and none of the coreference element words are in representative element. This was done to handle nested coreference scenarios
                    tok_list[coref.start] = cluster.main.text + \
                        doc[coref.end-1].whitespace_
                    for i in range(coref.start+1, coref.end):
                        tok_list[i] = ""

    return "".join(tok_list)

##### Execution of function as well as generation of a clean preprocessed file. For our demonstration, we will name this file the "Preprocessed PTCL" file, however the names of the file may also be passed by reference as arguments to the file in the future.

In [None]:
f = open("/content/PTCL.txt", "r")
sentence=f.read()
# sentence="PTCL is the national telecommunication company in Pakistan. It provides telephone and internet services nationwide and is the backbone for the country's telecommunication infrastructure despite the arrival of a dozen other telecommunication corporations, including Telenor GSM and China Mobile. "
store_text=coref_resolution(sentence)
f = open("Preprocessed_PTCL.txt", "a")
f.write(store_text)
f.close()