![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/legal-nlp/15.0.Legal_Text_Generation.ipynb)

# **Legal Text Generation**

Legal Text Generator uses the basic Flan-T5 model to perform various tasks related to legal text abstraction. With this models, a user can provide a prompt and context and instruct the system to perform a legal specific task. The Flan-T5 is an enhanced version of the original T5 model and is designed to produce better quality and more coherent text generation. It is trained on a large dataset of diverse texts and can generate high-quality summaries of articles, documents, and other text-based inputs.


Available models can be found at the [Models Hub](https://nlp.johnsnowlabs.com/models?task=Text+Generation&edition=Legal+NLP).


# Colab Setup

To run this yourself, you will need to upload your license keys to the notebook. Just Run The Cell Below in order to do that. Also You can open the file explorer on the left side of the screen and upload `license_keys.json` to the folder that opens.
Otherwise, you can look at the example outputs at the bottom of the notebook.

In [None]:
# Install the johnsnowlabs library to access Spark-OCR and Spark-NLP for Healthcare, Finance, and Legal.
! pip install -q johnsnowlabs

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

In [None]:
from johnsnowlabs import nlp, legal
# After uploading your license run this to install all licensed Python Wheels and pre-download Jars the Spark Session JVM
nlp.install()

In [None]:
from johnsnowlabs import nlp, legal
import pandas as pd

# Automatically load license data and start a session with all jars user has access to
spark = nlp.start()

In [5]:
from pyspark.sql import DataFrame
import pyspark.sql.functions as F
import pyspark.sql.types as T
import pyspark.sql as SQL
from pyspark import keyword_only

# Text Generation Models

<div align="center">

| **Index** | **Text Generator Models**        |
|---------------|----------------------|
| 1        |  [leggen_flant5_finetuned](https://nlp.johnsnowlabs.com/2023/04/29/leggen_flant5_finetuned_en.html)     |
| 2      | [leggen_flant5_base](https://nlp.johnsnowlabs.com/2023/04/21/leggen_flant5_base_en.html)    |



</div>

## **leggen_flant5_base**

This `leggen_flant5_base` model has been fine-tuned on FLANT5 Using legal texts. FLAN-T5 is a state-of-the-art language model developed by Google AI that utilizes the T5 architecture for text generation tasks.

In [6]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("prompt")

flant5 = legal.TextGenerator.pretrained("leggen_flant5_base","en","legal/models")\
    .setInputCols(["prompt"])\
    .setOutputCol("generated_text")\
    .setMaxNewTokens(150)\
    .setStopAtEos(True)
  
pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))


leggen_flant5_base download started this may take some time.
[OK!]


In [7]:
data = spark.createDataFrame([[1, "Explain loan Clauses"]]).toDF('id', 'text')

result = model.transform(data)

result.select("generated_text.result").show(truncate=False)

+--------------------------------------------------------------------------------------------+
|result                                                                                      |
+--------------------------------------------------------------------------------------------+
|[Loan clauses are clauses in the U.S. Constitution that provide for the repayment of loans.]|
+--------------------------------------------------------------------------------------------+



## **leggen_flant5_finetuned**


In [6]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("prompt")

flant5 = legal.TextGenerator.pretrained("leggen_flant5_finetuned","en","legal/models")\
    .setInputCols(["prompt"])\
    .setOutputCol("generated_text")\
    .setMaxNewTokens(256)\
    .setTopK(1)\
    .setRandomSeed(42)\
    .setStopAtEos(True)
  
pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

model = pipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

leggen_flant5_finetuned download started this may take some time.
[OK!]


In [10]:
data = spark.createDataFrame(
[[1,
 """This amendment shall be governed by and construed in accordance with the laws of Japan."""
]]
).toDF('id', 'text')

result = model.transform(data)

result.select("generated_text.result").show(truncate=False)

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+---------------------------------------

### **Using LightPipeline**

In [14]:
text = ["""This amendment shall be governed by and construed in accordance with the laws of Japan."""]

light_model = nlp.LightPipeline(model)

light_result = light_model.annotate(text)

light_result

[{'prompt': ['This amendment shall be governed by and construed in accordance with the laws of Japan.'],
  'generated_text': ['The parties agree that this amendment shall be governed by the laws of Japan and the parties hereby agree to submit to the exclusive jurisdiction of the courts of Japan. The parties further agree that any dispute arising out of or related to this amendment shall be resolved through binding arbitration. The parties agree to submit to the exclusive jurisdiction of the courts of Japan. The parties further agree to submit to the exclusive jurisdiction of the courts of Japan.']}]

In [15]:
import textwrap

document_text = textwrap.fill(light_result[0]['prompt'][0], width=120)

summary_text = textwrap.fill(light_result[0]['generated_text'][0], width=120)

print("➤ Input: \n{}".format(document_text))
print("\n")
print("➤ Output: \n{}".format(summary_text))
print("\n")

➤ Input: 
This amendment shall be governed by and construed in accordance with the laws of Japan.


➤ Output: 
The parties agree that this amendment shall be governed by the laws of Japan and the parties hereby agree to submit to
the exclusive jurisdiction of the courts of Japan. The parties further agree that any dispute arising out of or related
to this amendment shall be resolved through binding arbitration. The parties agree to submit to the exclusive
jurisdiction of the courts of Japan. The parties further agree to submit to the exclusive jurisdiction of the courts of
Japan.


