![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

# Text Classification

## Setup

In [None]:
from johnsnowlabs import *

import pandas as pd
import json
import os

spark = start_spark()

## Get Binary Prediction from Legal Caluses

The classification models were trained on legal texts, where different paragraphs were mapped to different legal clauses types, some of them belonging to several topics at the same time.

Since the number of classes is very high (as mentioned, over 250) and the texts could belong to some topics at the same time (multilabel problem), the models are binary (yes / no) and used independently. You can select the topics you are interested in (for example, looking for **loans** and **fiscal-year** clauses) and create a pipeline with both of them to detect for those types of clauses in your paragraphs. As a reminder, since the models are independent and the task is multilabel, you may get some times positive results for more than one class (i.e, a paragraph talks about **loans** and **fiscal year** at the same time).

### Sample Texts for Binary Classification

In [3]:
models = ["legclf_amendments_clause", "legclf_loans_clause", "legclf_currency_clause", "legclf_fiscal_year_clause", "legclf_guarantee_clause"]

In [4]:
sample_texts = [("""This agreement, or any term thereof, may be changed or waived only by written amendment, signed by the party against whom enforcement of such change or waiver is sought.""", "amendments"),
                ("""The sponsor has made loans or advances to the company in the aggregate amount of approximately $140,000 (the “Insider Advances”). The Insider Advances do not bear any interest, are unsecured and are repayable by the company on the earlier of June 30, 2017 or the consummation of the offering.""", "loans"),
                ("""Unless otherwise specified in this agreement, all references to currency, monetary values and dollars set forth herein shall mean United States (U.S.) dollars and all payments hereunder shall be made in United States dollars.""", "currency"),
                ("""The fiscal year for the School shall begin on July 1 and end on June 30 of the subsequent calendar year.""", "fiscal-year"),
                ("""The Engineer warrants that engineering design work performed by the Engineer hereunder shall be in accordance with sound engineering design practices and in conformance with applicable code and standards established for such work.""","guarantee")]

### Prediction Pipeline

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_cased", "en")\
    .setInputCols("document")\
    .setOutputCol("sentence_embeddings")

In [8]:
import pyspark.sql.functions as F

for model_name in models:   
    document_classifier = legal.ClassifierDLModel.pretrained(model_name, "en", "legal/models")\
        .setInputCols(['sentence_embeddings'])\
        .setOutputCol("class")

    clf_ipeline = nlp.Pipeline(stages=[
         document_assembler, 
         embeddings,
         document_classifier
         ])

    empty_df = spark.createDataFrame([['']]).toDF("text")

    model = clf_ipeline.fit(empty_df)

    df = spark.createDataFrame(sample_texts, ["text", "label"])

    result = model.transform(df)
    
    print(f"<---{model_name} result--->")
    
    result.select("label", F.explode(F.arrays_zip('document.result', 'class.result')).alias("cols"))\
          .select(F.expr("cols['0']").alias("document"),
                  "label",
                  F.expr("cols['1']").alias("class")).show(truncate=80)     
    print("\n")

legclf_amendments_clause download started this may take some time.
[ | ]1g6ax7jxec82paqzk7ivz9fx7
codes retrieved: ArrayBuffer(1g6ax7jxec82paqzk7ivz9fx7), product code(from property): 1g6ax7jxec82paqzk7ivz9fx7
[OK!]
1g6ax7jxec82paqzk7ivz9fx7
codes retrieved: ArrayBuffer(1g6ax7jxec82paqzk7ivz9fx7), product code(from property): 1g6ax7jxec82paqzk7ivz9fx7
<---legclf_amendments_clause result--->


                                                                                

+--------------------------------------------------------------------------------+-----------+----------+
|                                                                        document|      label|     class|
+--------------------------------------------------------------------------------+-----------+----------+
|This agreement, or any term thereof, may be changed or waived only by written...| amendments|amendments|
|The sponsor has made loans or advances to the company in the aggregate amount...|      loans|     other|
|Unless otherwise specified in this agreement, all references to currency, mon...|   currency|     other|
|The fiscal year for the School shall begin on July 1 and end on June 30 of th...|fiscal-year|     other|
|The Engineer warrants that engineering design work performed by the Engineer ...|  guarantee|     other|
+--------------------------------------------------------------------------------+-----------+----------+



legclf_loans_clause download started this m

                                                                                

+--------------------------------------------------------------------------------+-----------+-----------+
|                                                                        document|      label|      class|
+--------------------------------------------------------------------------------+-----------+-----------+
|This agreement, or any term thereof, may be changed or waived only by written...| amendments|      other|
|The sponsor has made loans or advances to the company in the aggregate amount...|      loans|      other|
|Unless otherwise specified in this agreement, all references to currency, mon...|   currency|      other|
|The fiscal year for the School shall begin on July 1 and end on June 30 of th...|fiscal-year|fiscal-year|
|The Engineer warrants that engineering design work performed by the Engineer ...|  guarantee|      other|
+--------------------------------------------------------------------------------+-----------+-----------+



legclf_guarantee_clause download s

                                                                                

## Get Multilabel Prediction from Legal Clauses

This model analyses and provides the best class or classes given an input text. The model can be used to detect relevant clauses in a legal text.

### Prediction Pipeline

In [10]:
document_assembeler = nlp.DocumentAssembler()\
  .setInputCol("text")\
  .setOutputCol("document")\

embeddings = nlp.BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_legal", "en")\
  .setInputCols("document") \
  .setOutputCol("sentence_embeddings")

multiClassifier = nlp.MultiClassifierDLModel.pretrained("legmulticlf_edgar", "en", "legal/models")\
  .setInputCols(["sentence_embeddings"])\
  .setOutputCol("class")

clf_pipeline = nlp.Pipeline(stages=[
            document_assembeler, 
            embeddings,
            multiClassifier
            ])


light_pipeline = nlp.LightPipeline(clf_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

sent_bert_base_uncased_legal download started this may take some time.
Approximate size to download 390.8 MB
[OK!]
legmulticlf_edgar download started this may take some time.
Approximate size to download 13.3 MB
[OK!]


### Get Prediction with LightPipeline

In Multilabel Classification, we used uncased embeddings. So, we converted all tokens to lowercase

In [11]:
result = light_pipeline.annotate("""No failure or delay by the Administrative Agent or any Lender in exercising any right or power hereunder shall operate as a waiver thereof, nor shall any single or partial exercise of any such right or power, or any abandonment or discontinuance of steps to enforce such a right or power, preclude any other or further exercise thereof or the exercise of any other right or power. The rights and remedies of the Administrative Agent and the Lenders hereunder are cumulative and are not exclusive of any rights or remedies that they would otherwise have. No waiver of any provision of this Agreement or consent to any departure by the Borrower therefrom shall in any event be effective unless the same shall be permitted by paragraph (b) of this Section, and then such waiver or consent shall be effective only in the specific instance and for the purpose for which given. Without limiting the generality of the foregoing, the making of a Loan shall not be construed as a waiver of any Default, regardless of whether the Administrative Agent or any Lender may have had notice or knowledge of such Default at the time.""")

result["class"]

['waivers', 'amendments']

In [12]:
result = light_pipeline.annotate("""The provisions of this Agreement shall be binding upon and inure to the benefit of the parties hereto and their respective successors and assigns permitted hereby (including any Affiliate of the Issuing Bank that issues any Letter of Credit), except that (i) the Borrower may not assign or otherwise transfer any of its rights or obligations hereunder without the prior written consent of each Lender (and any attempted assignment or transfer by the Borrower without such consent shall be null and void) and (ii) no Lender may assign or otherwise transfer its rights or obligations hereunder except in accordance with this Section. Nothing in this Agreement, expressed or implied, shall be construed to confer upon any Person (other than the parties hereto, their respective successors and assigns permitted hereby (including any Affiliate of the Issuing Bank that issues any Letter of Credit), Participants (to the extent provided in paragraph (c) of this Section) and, to the extent expressly contemplated hereby, the Related Parties of each of the Administrative Agent, the Issuing Bank and the Lenders) any legal or equitable right, remedy or claim under or by reason of this Agreement.""")

result["class"]

['successors', 'assigns']

In [13]:
result = light_pipeline.annotate("""After the effectiveness of this Amendment, the representations and warranties of the Borrower set forth in the Credit Agreement and in the other Loan Documents are true and correct in all material respects on and as of the date hereof, with the same force and effect as if made on and as of such date, except to the extent that such representations and warranties (i) specifically refer to an earlier date, in which case they shall be true and correct in all material respects as of such earlier date (except to the extent of changes in facts or circumstances that have been disclosed to the Lenders and do not constitute an Event of Default or a Potential Default under the Credit Agreement or any other Loan Document), and (ii) are already qualified by materiality, in which case they shall be true and correct in all respects, and except that for purposes of this Section 4.1 , the representations and warranties contained in Section 7.6 of the Credit Agreement shall be deemed to refer to the most recent financial statements furnished pursuant to Section 8.1(a) of the Credit Agreement.""".lower())

result["class"]

['warranties', 'representations']

In [14]:
result = light_pipeline.annotate("""All notices and other communications provided for in this Agreement and the other Loan Documents shall be in writing and may (subject to paragraph (b) below) be telecopied (faxed), mailed by certified mail return receipt requested, or delivered by hand or overnight courier service to the intended recipient at the addresses specified below or at such other address as shall be designated by any party listed below in a notice to the other parties listed below given in accordance with this Section.""".lower())

result["class"]

['notices']