![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/public/COREFERENCE_RESOLUTION.ipynb)

# **COREFERENCE_RESOLUTION**

### **Colab Setup and Start Spark Session**

In [None]:
# Install PySpark and Spark NLP
! pip install -q pyspark==3.3.0 spark-nlp==4.2.8

In [2]:
import pandas as pd
import numpy as np
import random
import json

import sparknlp
import pyspark.sql.functions as F

from sparknlp.pretrained import PretrainedPipeline
from pyspark.sql.types import StringType, IntegerType
from pyspark.sql import SparkSession
from pyspark.ml import Pipeline
from sparknlp.annotator import *
from sparknlp.base import *


spark = sparknlp.start()
print ("Spark NLP Version :", sparknlp.version())
spark

Spark NLP Version : 4.2.8


# **`spanbert_base_coref`**

## 🔎Define Spark NLP pipeline

In [3]:
document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
    
sentence_detector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentences")
    
tokenizer = Tokenizer()\
    .setInputCols(["sentences"])\
    .setOutputCol("tokens")
    
corefResolution = SpanBertCorefModel().pretrained("spanbert_base_coref")\
    .setInputCols(["sentences", "tokens"])\
    .setOutputCol("corefs")
    
pipeline = Pipeline(
    stages=[
        document_assembler, 
        sentence_detector, 
        tokenizer, 
        corefResolution])


spanbert_base_coref download started this may take some time.
Approximate size to download 540.1 MB
[OK!]


# 🔎Sample text

In [4]:
sample_texts = ["""All the flowers in the garden withered in a week, their leaves were withered, Luan collected all the leaves that were woven by his hands.""",
                """John told Mary he would like to borrow a book from her.""",
                """In Kevin's presentation, many participants always raised their hands to ask questions about the topic he was talking about.""",
                """Alex told Mary that he would always support her as she prepared for her projects.""",
                """John and Peter are brothers, but they do not support each other much."""]

# 🔎Run the pipeline

In [5]:
for  i in range(len(sample_texts)):
  print(f"SAMPLE TEXT : {sample_texts[i]}")
  df = spark.createDataFrame([[sample_texts[i]]]).toDF("text")
  result = pipeline.fit(df).transform(df)

  result.selectExpr("explode(corefs) AS coref").selectExpr("coref.begin as begin", 
                                                           "coref.end as end", 
                                                           "coref.result as token", 
                                                           "coref.metadata").show(truncate=False)


SAMPLE TEXT : All the flowers in the garden withered in a week, their leaves were withered, Luan collected all the leaves that were woven by his hands.
+-----+---+-----------------------------+-----------------------------------------------------------------------------------------------------------+
|begin|end|token                        |metadata                                                                                                   |
+-----+---+-----------------------------+-----------------------------------------------------------------------------------------------------------+
|0    |28 |All the flowers in the garden|{head.sentence -> -1, head -> ROOT, head.begin -> -1, head.end -> -1, sentence -> 0}                       |
|50   |54 |their                        |{head.sentence -> 0, head -> All the flowers in the garden, head.begin -> 0, head.end -> 28, sentence -> 0}|
|78   |81 |Luan                         |{head.sentence -> -1, head -> ROOT, head.begin -> -1, hea