![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/Certification_Trainings/Legal/12.Assertion_Status.ipynb)

# Legal Assertion Status Model 

## Colab Setup

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install johnsnowlabs 

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [3]:
from johnsnowlabs import * 
# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
# Make sure to restart your notebook afterwards for changes to take effect
jsl.install()

👌 Detected license file /content/4.2.0.spark_nlp_for_healthcare-2.json
📋 Stored John Snow Labs License in /root/.johnsnowlabs/licenses/license_number_0_for_Spark-Healthcare_Spark-OCR.json
👷 Setting up if John Snow Labs home exists in /root/.johnsnowlabs this might take a few minutes.
Downloading 🐍+🚀 Python Library spark_nlp-4.2.0-py2.py3-none-any.whl
Downloading 🐍+💊 Python Library spark_nlp_jsl-4.2.0-py3-none-any.whl
Downloading 🐍+🕶 Python Library spark_ocr-4.1.0-py3-none-any.whl
Downloading 🫘+🚀 Java Library spark-nlp-assembly-4.2.0.jar
Downloading 🫘+💊 Java Library spark-nlp-jsl-4.2.0.jar
Downloading 🫘+🕶 Java Library spark-ocr-assembly-4.1.0.jar
🙆 JSL Home setup in /root/.johnsnowlabs
👌 Detected license file /content/4.2.0.spark_nlp_for_healthcare-2.json
Installing /root/.johnsnowlabs/py_installs/spark_ocr-4.1.0-py3-none-any.whl to /usr/bin/python3
Running: /usr/bin/python3 -m pip install /root/.johnsnowlabs/py_installs/spark_ocr-4.1.0-py3-none-any.whl
👌 Detected license file /content/

## Start Spark Session

In [1]:
from johnsnowlabs import * 
# Automatically load license data and start a session with all jars user has access to
spark = jsl.start()

DEBUG START!
👌 Detected license file /content/4.2.0.spark_nlp_for_healthcare-2.json
👌 Launched [92mcpu-Optimized JVM[39m SparkSession with Jars for: 🚀Spark-NLP==4.2.0, 💊Spark-Healthcare==4.2.0, 🕶Spark-OCR==4.1.0, running on ⚡ PySpark==3.1.2


In [2]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
# if you want to start the session with custom params as in start function above
def start(SECRET):
    builder = SparkSession.builder \
        .appName("Spark NLP Licensed") \
        .master("local[*]") \
        .config("spark.driver.memory", "16G") \
        .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
        .config("spark.kryoserializer.buffer.max", "2000M") \
        .config("spark.jars.packages", "com.johnsnowlabs.nlp:spark-nlp_2.12:"+PUBLIC_VERSION) \
        .config("spark.jars", "https://pypi.johnsnowlabs.com/"+SECRET+"/spark-nlp-jsl-"+JSL_VERSION+".jar")
      
    return builder.getOrCreate()

#spark = start(SECRET)


## Legal Assertion Status Model 

The model has been implemented within Spark NLP as an annotator called **AssertionDLModel**. It is an Assertion Status Model aimed to detect **temporality (PRESENT, PAST, FUTURE)** or **certainty (POSSIBLE)** in your legal texts.

## Prediction Pipeline

In [3]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings_ner = nlp.RoBertaEmbeddings.pretrained("roberta_embeddings_legal_roberta_base", "en") \
    .setInputCols("sentence", "token") \
    .setOutputCol("embeddings_ner")\

ner_model = legal.NerModel.pretrained('legner_contract_doc_parties', 'en', 'legal/models')\
    .setInputCols(["sentence", "token", "embeddings_ner"])\
    .setOutputCol("ner")

ner_converter = nlp.NerConverter()\
    .setInputCols(["sentence", "token", "ner"])\
    .setOutputCol("ner_chunk")\
    .setWhiteList(["DOC", "EFFDATE", "PARTY"])

embeddings_ass = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
    .setInputCols(["sentence", "token"]) \
    .setOutputCol("embeddings_ass")

assertion = legal.AssertionDLModel.pretrained("legassertion_time", "en", "legal/models")\
    .setInputCols(["sentence", "ner_chunk", "embeddings_ass"]) \
    .setOutputCol("assertion")


nlpPipeline = Pipeline(stages=[
            document_assembler, 
            sentence_detector,
            tokenizer,
            embeddings_ner,
            ner_model,
            ner_converter,
            embeddings_ass,
            assertion
            ])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

light_model = LightPipeline(model)

sentence_detector_dl download started this may take some time.
Approximate size to download 514.9 KB
[OK!]
roberta_embeddings_legal_roberta_base download started this may take some time.
Approximate size to download 447.2 MB
[OK!]
legner_contract_doc_parties download started this may take some time.
[OK!]
bert_embeddings_sec_bert_base download started this may take some time.
Approximate size to download 390.4 MB
[OK!]
legassertion_time download started this may take some time.
[OK!]


### Getting Result 

In [4]:
sample_text = "This is an Intellectual Property Agreement between Amazon Inc. and Atlantic Inc."


In [5]:
data = spark.createDataFrame([[sample_text]]).toDF("text")

data.show(truncate = 80)

+--------------------------------------------------------------------------------+
|                                                                            text|
+--------------------------------------------------------------------------------+
|This is an Intellectual Property Agreement between Amazon Inc. and Atlantic Inc.|
+--------------------------------------------------------------------------------+



In [6]:
result = model.transform(data)

In [7]:
result.select(F.explode(F.arrays_zip(result.ner_chunk.result,  
                                     result.ner_chunk.begin, 
                                     result.ner_chunk.end, 
                                     result.ner_chunk.metadata, 
                                     result.assertion.result)).alias("cols"))\
      .select(F.expr("cols['0']").alias("chunk"),
              F.expr("cols['1']").alias("begin"),
              F.expr("cols['2']").alias("end"),
              F.expr("cols['3']['entity']").alias("ner_label"),
              F.expr("cols['4']").alias("assertion")).show(truncate=False)

+-------------------------------+-----+---+---------+---------+
|chunk                          |begin|end|ner_label|assertion|
+-------------------------------+-----+---+---------+---------+
|Intellectual Property Agreement|11   |41 |DOC      |PRESENT  |
|Amazon Inc                     |51   |60 |PARTY    |PRESENT  |
|Atlantic Inc                   |67   |78 |PARTY    |PRESENT  |
+-------------------------------+-----+---+---------+---------+



In [8]:
# from sparknlp_display import AssertionVisualizer

vis = viz.AssertionVisualizer()

vis.display(result.collect()[0], 'ner_chunk', 'assertion')

### Getting Result with LightPipeline

In [9]:
sample_text = ["""This TRADEMARK LICENSE AGREEMENT (this "Agreement") is made and effective as of 31 Aug, 2020 ("Effective Date"), by and between Palmer Square Capital Management LLC, a Delaware limited liability company (the "Licensor"), and Palmer Square Capital BDC Inc., a corporation organized under the laws of the State of Maryland (the "Licensee")""",
               """The Intellectual Property Agreement would potentially be in short signed by the two Parties""",
               """The Sponsorship Agreement ("Agreement") will be signed on October 10, 2015""",
               """This is an Intellectual Property Agreement between Amazon.com Inc and Atlantic Inc.""",
               """This Sponsorship Agreement ("Agreement") was entered into as of December 18, 1998, by and between Ford Motor Media, a division of J. Walter Thompson with offices at 300 Renaissance Center, Detroit, Michigan 48243 and iVillage Inc., with offices at 170 Fifth Avenue, New York, New York 10010."""]

In [10]:
chunks=[]
entities=[]
status=[]
begin = []
end = []

for i in range(len(sample_text)):

    light_result = light_model.fullAnnotate(sample_text[i])[0]

    for n,m in zip(light_result['ner_chunk'],light_result['assertion']):
        begin.append(n.begin)
        end.append(n.end)
        chunks.append(n.result)
        entities.append(n.metadata['entity']) 
        status.append(m.result)
        
df = pd.DataFrame({'chunks':chunks, 'begin':begin, 'end':end, 'ner_label':entities, 'assertion':status})

In [11]:
df

Unnamed: 0,chunks,begin,end,ner_label,assertion
0,TRADEMARK LICENSE AGREEMENT,5,31,DOC,PRESENT
1,"31 Aug, 2020",80,91,EFFDATE,PRESENT
2,Palmer Square Capital Management LLC,128,163,PARTY,PRESENT
3,Palmer Square Capital BDC Inc,225,253,PARTY,PRESENT
4,Intellectual Property Agreement,4,34,DOC,POSSIBLE
5,Sponsorship Agreement,4,24,DOC,FUTURE
6,"October 10, 2015",58,73,EFFDATE,FUTURE
7,Intellectual Property Agreement,11,41,DOC,PRESENT
8,Amazon.com Inc,51,64,PARTY,PRESENT
9,Atlantic Inc,70,81,PARTY,PRESENT


### Visualization of Assertion Status

In [12]:
# from sparknlp_display import AssertionVisualizer

vis = viz.AssertionVisualizer()

for i in range(len(sample_text)):
    
    light_result = light_model.fullAnnotate(sample_text[i])[0]    

    vis.display(light_result, 'ner_chunk', 'assertion')    
    print("\n\n")
























