![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/legal-nlp/13.0.Legal_Summarization.ipynb)

#🎬 Installation

In [None]:
! pip install -q johnsnowlabs

##🔗 Automatic Installation
Using my.johnsnowlabs.com SSO

In [None]:
from johnsnowlabs import nlp, finance, legal

nlp.install(refresh_install=True, force_browser = True)

##🔗 Manual downloading
If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.

- Go to my.johnsnowlabs.com
- Download your license
- Upload it using the following command

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

- Install it

#📌 Starting

In [None]:
spark = nlp.start()

#🔎 Legal Text Summarization

📜Explanation:

Native Legal Text Summarization is a valuable tool for legal professionals who need to quickly understand the key points of a legal document. It can save time and improve accuracy, allowing lawyers to focus on more complex tasks.

The Native Legal Text Summarization feature uses a deep learning model that has been trained on a large corpus of legal documents to automatically generate summaries of legal text. The model uses a combination of natural language processing techniques, such as entity recognition, part-of-speech tagging, and dependency parsing, to extract important information from the text.

The summarization model then uses this extracted information to generate a summary that accurately captures the key points of the document in a concise and understandable manner. The resulting summary can be used to quickly identify relevant information, such as the outcome of a legal case or the main provisions of a regulation.

By using our new Legal Summarizer() module, you can get state-of-the-art, short versions of your legal documents, without losing any information.

We included 2 models for Legal Summarization:

  - **Legal FLAN-T5 Summarization (Base):** The base model, with generic capacities for summarizing legal documents.
  - **Legal Finetuned FLAN-T5 Summarization:** A specifically finetuned model trained to summarize Legal Agreements . For this task, we finetuned our base model with more than 8K sections from different legal commercial agreements.



### Let's see how to get summaries in different Legal documents using the `Summarizer()` module.


## 🧮 Subpoenas

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

flant5 = legal.Summarizer().pretrained('legsum_flant5_legal_augmented','en','legal/models')\
    .setInputCols(["documents"])\
    .setOutputCol("summary")\
    .setMaxNewTokens(1000)

pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

data = spark.createDataFrame([
  [1, """ 
NOTICE OF DEFAULT AND INTENT TO FORECLOSE

PLEASE TAKE NOTICE that you are in default of your mortgage agreement with XYZ Bank, which is secured by the property located at 1234 Elm Street, Anytown, USA 12345. As of the date of this notice, the outstanding balance on your mortgage is $200,000, which includes principal, interest, late fees, and other charges.

Under the terms of your mortgage agreement, you were required to make monthly payments of $1,200 on the first day of each month. However, you have failed to make your payments for the months of January, February, and March 2023. As a result, you are in default of your mortgage agreement, and the entire amount of the outstanding balance has become due and payable.

UNLESS YOU TAKE ACTION TO CURE THIS DEFAULT, XYZ Bank INTENDS TO FORECLOSE ON YOUR PROPERTY. XYZ Bank will file a notice of default with the county recorder's office, which will initiate the foreclosure process. If the foreclosure proceeds, XYZ Bank will sell your property at a public auction to satisfy the outstanding balance on your mortgage.

YOU HAVE THE RIGHT TO CURE THIS DEFAULT BY PAYING THE ENTIRE OUTSTANDING BALANCE OF $200,000, INCLUDING ALL FEES AND CHARGES, ON OR BEFORE APRIL 30th, 2023.

IF YOU ARE UNABLE TO CURE THIS DEFAULT, YOU MAY BE ELIGIBLE FOR ALTERNATIVE FORECLOSURE PREVENTION OPTIONS, SUCH AS LOAN MODIFICATION, SHORT SALE, OR DEED IN LIEU OF FORECLOSURE. YOU MAY CONTACT XYZ BANK TO DISCUSS THESE OPTIONS OR TO SEEK ASSISTANCE FROM A HOUSING COUNSELOR.

IF YOU HAVE ANY QUESTIONS ABOUT THIS NOTICE, PLEASE CONTACT XYZ BANK AS SOON AS POSSIBLE.
"""]]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

legsum_flant5_legal_augmented download started this may take some time.
[OK!]
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                      

## 🧮 Mutual Non-Disclosure Agreement (MNDA)

In [None]:
data = spark.createDataFrame([
  [2, """NOW, THEREFORE, in consideration of the Company’s disclosure of information to the Recipient
and the promises set forth below, the parties agree as follows:

     1. Confidential Information. “Confidential Information” as used in this
Agreement means all information relating to the Company disclosed to the Recipient by the Company,
including without limitation any business, technical, marketing, financial or other information,
whether in written, electronic or oral form. Any and all reproductions, copies, notes, summaries,
reports, analyses or other material derived by the Recipient or its Representatives (as defined
below) in whole or in part from the Confidential Information in whatever form maintained shall be
considered part of the Confidential Information itself and shall be treated as such. Confidential
Information does not include information that (a) is or becomes part of the public domain other
than as a result of disclosure by the Recipient or its Representatives; (b) becomes available to
the Recipient on a nonconfidential basis from a source other than the Company, provided that source
is not bound with respect to that information by a confidentiality agreement with the Company or is
otherwise prohibited from transmitting that information by a contractual, legal or other
obligation; (c) can be proven by the Recipient to have been in the Recipient’s possession prior to
disclosure of the same by the Company; or (d) is independently developed by the Recipient without
reference to or reliance on any of the Company’s Confidential Information."""]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                        

## 🧮 Commercial Agreements

In [None]:
data = spark.createDataFrame([
  [3, """EXHIBIT 99.2 Page 1 of 3 DISTRIBUTOR AGREEMENT Agreement made this 19t h day of March, 2020 Between: Co-Diagnostics, Inc. (herein referred to as "Principal") And PreCheck Health Services, Inc. (herein referred to as "Distributor"). In consideration of the mutual terms, conditions and covenants hereinafter set forth, Principal and Distributor acknowledge and agree to the following descriptions and conditions: DESCRIPTION OF PRINCIPAL The Principal is a company located in Utah, United States and is in the business of research and development of reagents. The Principal markets and sells it products globally through direct sales and distributors. DESCRIPTION OF DISTRIBUTOR The Distributor is a company operating or planning to operate in the United States of America, Latin America, Europe and Russia. The Distributor represents that the Distributor or a subsidiary of the Distributor is or will be fully licensed and registered in the Territory and will provide professional distribution services for the products of the Principal. CONDITIONS: 1. The Principal appoints the Distributor as a non-exclusive distributor, to sell Principal's qPCR infectious disease kits, Logix Smart COVID-19 PCR diagnostic test and Co-Dx Box™ instrument (the "Products"). The Products are described on Exhibit A to this Agreement. 2. The Principal grants Distributor non- exclusive rights to sell these products within the countries of Romania (the "Territory"), which may be amended by mutual written agreement.
  
Source: PRECHECK HEALTH SERVICES, INC., 8-K, 3/20/2020

3. The Distributor accepts the appointment and shall use its commercially reasonable efforts to promote, market and sell the Products within the Territory, devote such time and attention as may be reasonably necessary and abide by the Principal's policies. 4. The Principal shall maintain the right to contact and market its products to potential customers in the Territory; but agrees to pass on all sales leads and orders to the Distributor. 5. The parties agree that the list of Products and/or prices may be amended from time to time. The Principal may unilaterally remove Products from the catalog or change prices. Additions to the Products shall be by mutual agreement. However, in the event the Distributor rejects a new product addition to the product list, the Principal shall then retain the right to market and distribute the new product that is rejected by the Distributor. 6. Unless accepted by the Principal, the Distributor agrees that during the term of this Agreement, the Distributor, either directly or indirectly, shall handle no products that are competitive with the Products within the Territory. 7. The Distributor shall obtain at its own expense, all necessary licenses and permits to allow the Distributor to conduct business as contemplated herein. The Distributor represents and warrants that the Distributor shall conduct business in strict conformity with all local, state and federal laws, rules and regulations. 8. The Principal agrees that the Distributor may employ or engage representatives or sub-distributors in furtherance of this Agreement and the Distributor agrees that the Distributor shall be solely responsible for the payment of wages or commissions to those representatives and sub-distributors, and that under no circumstances shall Distributor's representatives be deemed employees of Principal for any purpose whatsoever. 9. Principal will grant Distributor a discount based on the Products and Prices. The proposed discount is expected to be ¨%. Discount may vary depending on product volume ordered or promotions. 10. This Agreement shall be in effect until March 18. 2021, unless sooner terminated by either party upon (30) days written notice, without cause. 11. In the event of termination, the Distributor shall be entitled to receive all orders accepted by the Principal prior to the date of termination and may sell the ordered Products in the Territory. Payment to be made upon shipment"""]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                      

## 🏓**Long Document Summarization**

 For long document summarization, the following steps can be taken:

1.   **Splitting the Document**: The initial step is to split the long document into smaller chunks. This is done to ensure that each chunk stays within the token limit imposed by the model and tokenizer. The document can be divided into paragraphs, sections, or any logical divisions based on the content.
2.   **Token Limit**:  We split the text into sentences,  The initial text is divided into separate sentences. Starting with an empty container, the sentences are added to it one by one, in the order they appear in the original text. As each sentence is added, the total number of characters in the accumulated text is calculated and monitored. After adding each sentence, the accumulated character count is compared to the specified limit of 400 characters. The process continues until the total character count reaches or exceeds the specified limit. At this point, the process stops, and the accumulated text, consisting of all the sentences added so far, is considered the final output.
3.   **Applying Summarization**: After the document is divided into manageable chunks, summarization is applied to each chunk independently. This can be done using the chosen summarization model or technique. The aim is to generate concise summaries for each chunk, capturing the essential information within the given constraints.
4.   **Merging and Refining**: Once summaries are generated for each individual chunk, they can be merged and refined to form a coherent and comprehensive summary of the entire long document. This can involve removing redundant information, resolving inconsistencies, and ensuring smooth transitions between different chunks.
5.   **Iterative Process:** If the summary generated from the merged chunks exceeds the desired length or does not meet the desired quality, the process can be iterated. This might involve adjusting the splitting strategy, revising the summarization model parameters, or applying additional techniques to improve the summary.








⏳ Load sample txt file

In [None]:
text = """
This Share Exchange Agreement (the ""Agreement"") is made and entered by and between 1536692 Ontario Inc. an Ontario Corporation, duly registered by the laws of Ontario, Canada and in good standing (“Ontario”), the shareholders of Ontario (the “Ontario Shareholders”) and Fox Petroleum Inc. (“Fox”), a Nevada corporation also duly registered and in good standing, effective as of July 15, 2010.  
A. ONTARIO is the owner of a Scrap Plastic processing plant with certain equipment, fixtures, and improvements, the assets, located in Xxxxxxxx, Xxxxxxx Xxxxxx, 
B. The ONTARIO Shareholders desire to exchange their shares of common stock held of record in ONTARIO representing 100% of the total issued and outstanding shares of ONTARIO and FOX desires to issue an aggregate of ________ shares of its common stock to the ONTARIO Shareholders;
X. XXX further desires to assume an aggregate debt in the amount of $225,000.00 incurred by ONTARIO to Davfam Investments (1998) Ltd. during fiscal year 1995, as reflected in the financial books and records of ONTARIO (the “Ontario Debt”), which Ontario Debt has verbally established conversion terms;  
Now, therefore, in consideration of the parties' covenants and promises contained in this Agreement, and for other good and valuable consideration, the receipt and sufficiency of which the parties acknowledge the parties agree:  
1. Definitions. The following defined terms, wherever used in this Agreement, shall have the meanings described below: 
1.1 “Assets” means and includes: (a) the consulting business and operations
1.2 “Shares” means all of the shares currently held by the shareholders of ONTARIO Polymers Inc.
1.3 ""Closing"" means the delivery of documents to be executed and delivered by the parties, the deposit and delivery of the Purchase Price, as defined in this Agreement, and the consummation of the transactions contemplated under this Agreement.
1.4 ""Closing Date"" means the date on which the Closing occurs as provided in Section 6.1.
1.5 ""Data"" means environmental, title and other information, data and reports in ONTARIO’s possession or control relating to the Land and the Permits. 
1.6 ""Effective Date"" means July 13, 2010 and as fully executed by ONTARIO and FOX.
1.7 “Land” means the site, where the fixtures and improvements are located described in Exhibit A, Part 1, 
2.1 ONTARIO Shareholders. Subject to all of the terms and conditions of this Agreement, the ONTARIO Shareholders agree to tender their respective shares of ONTARIO held of record to FOX.
2.2 Issuance of Shares. FOX agrees to issue to the ONTARIO Shareholders an aggregate of One Million Seven Hundred and Fifty Thousand (1,750,000) shares of its common stock to the ONTARIO Shareholders and to assume the ONTARIO Debt. 
2.3 Assumption and Performance of Permits. ONTARIO shall continue to maintain and keep current all operating permits and licenses required to operate Plastics recycling Facility 
3. Representations and Warranties of ONTARIO. ONTARIO represents and warrants to FOX the following:
3.1 Organization and Authorization. ONTARIO is a corporation duly organized and validly existing and in good standing under the laws of the Ontario. ONTARIO has the full power and authority to enter into this Agreement and to consummate the transactions contemplated under this Agreement. The making and performance of this Agreement and the agreements and other instruments required to be executed by ONTARIO have been, or at the Closing will have been, duly authorized by all necessary corporate actions and will be duly executed by a person authorized by ONTARIO to do so. ONTARIO shall deliver to FOX duly approved and executed resolutions of the directors and shareholders approving ONTARIO’s execution and delivery of this Agreement and the performance of its obligations under this Agreement.
3.2 No Breach of Laws or Contracts. The consummation by ONTARIO of the transactions contemplated by this Agreement will not result in the breach of any term or provision of, or constitute a default under any applicable law or regulation, its articles of organization or operating agreement, or under any other agreement or instrument to which ONTARIO is a party, by which it is bound, or which affects the Assets. 
3.3 Binding Obligations. When executed and delivered, this Agreement and all instruments executed and delivered by ONTARIO pursuant to this Agreement will constitute legal and binding obligations of ONTARIO and will be valid and enforceable in accordance with their respective terms. 
3.4 Compliance with Laws. ONTARIO has not received notice from any governmental agency, of any physical or environmental condition existing on the Land or any access to the Land or created by ONTARIO or of any action or failure to act by ONTARIO which is a material violation of any applicable law, regulation or ordinance. To ONTARIO’s knowledge, there are currently no off-site improvement requirements that any governmental authority has imposed or threatened to impose on the Land. 
3.5 No Litigation. There is no suit, action, arbitration or legal, administrative or other proceeding or governmental investigation pending or, to the knowledge of ONTARIO without inquiry, threatened against, or affecting the Assets or the ability of ONTARIO to perform its covenants and obligations under this Agreement. 
3.6.1 Title to the Land. ONTARIO represents and warrants that ONTARIO’s title to the Assets is good and marketable and on the Closing shall be free and clear of any lien, claim or encumbrance, except the following (the “Permitted Exceptions”): 
(a) Liens for taxes and mortgages acknowledged by FOX on the Assets not yet due and payable or which are being contested in good faith; 
(b) Any items listed in the Title Commitment or any amendment or update to the Title Commitment to which FOX does not timely deliver to ONTARIO a Notice of Objection pursuant to Section 3.9.5. 
3.6.2 Encroachments. To ONTARIOS’s knowledge, the improvements on the Land lie entirely within the boundaries of the Land and no structure of any kind encroaches on or over the Land.
3.6.3 Condemnation. To ONTARIO’s knowledge, no portion of any of the Land or improvements on the Land is the subject of, or affected by, any condemnation or eminent domain proceeding.
3.6.5 Taxes. ONTARIO represents that all taxes, including without limitation, advalorem, property (both real and personal), production, severance, reclamation, and similar taxes and assessments based upon or measured by ownership of property or production of minerals or the receipt of proceeds there from which have become due and payable have been properly paid. FOX will not be liable for any taxes which accrue or are assessed before the Closing. To ONTARIOs’s knowledge, there are no pending or threatened special assessments affecting the Assets. 
4. Representations and Warranties of FOX. FOX agrees, represents and warrants to ONTARIO the following: 
4.1 No Breach of Law or Contracts. The consummation by FOX of the transactions contemplated by this Agreement will not result in a breach of any term or provision of, or constitute a default under any applicable law, regulation or ordinance or any other agreement or instrument to which FOX is a party or by which it is bound. 
4.2 Binding Obligations. When executed and delivered this Agreement and all instruments executed by FOX pursuant to this Agreement, will constitute legal and binding obligations of FOX and will be valid and enforceable in accordance with their respective terms. 
4.3 No Litigation. There is no suit, action, arbitration or legal, administrative or other proceeding or governmental investigation pending or, to the knowledge of FOX without inquiry, threatened against, or affecting the Assets or the ability of FOX to perform its covenants and obligations under this Agreement. 
4.4 Brokers. FOX has incurred no liability, contingent or otherwise, for broker's or finder's fees relating to the transactions contemplated by this Agreement.
4.5 Assumption of Ontario Debt. FOX agrees to assume the ONTARIO Debt and to further do and perform all acts and execute and deliver all documents and take all such other steps as may be necessary or desirable to give full effect to the repayment terms of the ONTARIO Debt.
4.6 Organization and Authorization. FOX is a corporation duly organized and validly existing and in good standing under the laws of Nevada. FOX has the full power and authority to enter into this Agreement and to consummate the transactions contemplated under this Agreement. The making and performance of this Agreement and the agreements and other instruments required to be executed by FOX have been, or at the Closing will have been, duly authorized by all necessary corporate actions and will be duly executed by a person authorized by FOX to do so. 
5.1.1 Maintenance of Property. Until the Closing, ONTARIO shall cause the Assets to be maintained and operated in a good and workmanlike manner, shall not partition the Assets,shall maintain insurance now in force with respect to the Assets, shall pay or cause to be paid all costs and expenses incurred in connection with this Agreement, shall keep the Underlying Agreements in full force and effect, and shall perform and comply with all of the conditions and covenants contained in same and all other agreements relating to the Assets. 
5.1.3 Copies of Agreements. ONTARIO has disclosed to FOX the existence of and has furnished FOX with copies of all agreements and contracts relating to the Assets, to the extent that MDP is aware of the existence of such agreements and contracts. 
5.1.4 Notification of FOX of Suits, Litigation, Material Adverse Change, Etc. Until the Closing, ONTARIO promptly shall notify FOX of any suit, action, or other proceeding, actual or threatened, before any court, governmental agency or arbitrator and any cause of action or any other adverse change which relates to the Assets or which might result in impairment or loss of ONTARIO's title to any portion of the Assets or the value of the Assets or which might hinder or impede the operation of the Assets or which seeks to restrain or prohibit or to obtain substantial damages from ONTARIO in respect of, or which is related to or arises out of, this Agreement or the consummation of all or any part of the transactions contemplated under this Agreement of which ONTARIO becomes aware.
5.1.5 Agreement Not to Market the Assets. Until the Closing and thereafter if the Closing occurs, ONTARIO shall not assign, transfer, encumber or in any way dispose of any interest in or to the Shares to any other person or entity, or negotiate with any other person or entity with respect to the transfer or grant of any interest or option whatsoever in the Assets, except that ONTARIO may continue to sell aggregate, sand and gravel from the Assets in the ordinary course of ONTARIO’s business. These obligations of ONTARIO shall terminate before the Closing if and at such time as this Agreement is terminated as provided in Section 8.  
5.1.7 Permits and Underlying Agreements. ONTARIO shall maintain the Permits and Underlying Agreements in full force and effect. 
5.2.1 Maintenance and Confidentiality of Data. Before the Closing, FOX shall exercise due diligence in safeguarding and maintaining all Data and keeping the Data confidential, except for such disclosure as reasonably deemed necessary by FOX for purposes of obtaining financing and such disclosures as counsel for either party may advise is legally required or an announcement which is required to be made to all governmental or regulatory agency, in which cases ONTARIO shall be given reasonable advance notice and the right to review and comment on same. If the Closing does not occur, FOX’s obligation to maintain the confidentiality Data shall survive termination of this Agreement. 
5.2.2 Maintenance of Representations and Warranties. FOX shall use its reasonable best efforts to cause all of the representations and warranties of FOX contained in this Agreement to be true and correct as of the Closing; provided, however, that nothing contained in this Section shall create an obligation of FOX to ONTARIO to pay money or undertake any additional legal obligation. 
6.1 Date and Place of Closing. The parties will execute and deliver to each other a signed counterpart or copy of this Agreement as escrow instructions and such general conditions of escrow as requires. In the event of any conflict between the terms of this Agreement and the general conditions of the closing, the terms of this Agreement shall control. The Closing shall be held at a time mutually agreed upon by ONTARIO and FOX on the Closing Date, unless extended by the parties' agreement. The Closing will be held at the offices of the ONTARIO. The Closing shall occur on or before June 25th, 2010.
"""

## Pipeline specified to separate text into tokens

Tokenization is the NLP task in charge of splitting sentences in smaller pieces, usually words. Althought it's a splitting task, we don't call it splitting, we call it tokenization.

The main component to do tokenization is the Tokenizer. It will get the tokens from the piece of text you pass to it.

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

text_splitter = legal.TextSplitter() \
    .setInputCols(["document"])\
    .setOutputCol("sentences")\
    .setCustomBounds(["\n\n"])\
    .setExplodeSentences(True)

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentences"]) \
    .setOutputCol("tokens")

nlp_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    text_splitter,
    tokenizer])

df = spark.createDataFrame([[text]]).toDF("text")

fit = nlp_pipeline.fit(df)
lp = nlp.LightPipeline(fit)

res = lp.fullAnnotate(text)

In [None]:
sentences = res[0]['sentences']
tokens = res[0]['tokens']

for i in range(len(sentences)):
  sen_tokens = [x.result for x in tokens if x.metadata['sentence'] == str(i)]
  print(f"Sentence: {sentences[i].result}")
  print(f"Tokens {len(sen_tokens)}: {sen_tokens}")
  print("*"*250)

Sentence: This Share Exchange Agreement (the ""Agreement"") is made and entered by and between 1536692 Ontario Inc. an Ontario Corporation, duly registered by the laws of Ontario, Canada and in good standing (“Ontario”), the shareholders of Ontario (the “Ontario Shareholders”) and Fox Petroleum Inc.
Tokens 54: ['This', 'Share', 'Exchange', 'Agreement', '(', 'the', '""', 'Agreement', '"")', 'is', 'made', 'and', 'entered', 'by', 'and', 'between', '1536692', 'Ontario', 'Inc', '.', 'an', 'Ontario', 'Corporation', ',', 'duly', 'registered', 'by', 'the', 'laws', 'of', 'Ontario', ',', 'Canada', 'and', 'in', 'good', 'standing', '(', '“Ontario”', '),', 'the', 'shareholders', 'of', 'Ontario', '(', 'the', '“Ontario', 'Shareholders”', ')', 'and', 'Fox', 'Petroleum', 'Inc', '.']
*******************************************************************************************************************************************************************************************************************************

In [None]:
sentences = res[0]['sentences']
tokens = res[0]['tokens']

total_token = 0
total_sentences = ""
final_sentences = []

for i in range(len(sentences)):
  sen_tokens = [x.result for x in tokens if x.metadata['sentence'] == str(i)]
  total_token += len(sen_tokens)
  if total_token <= 300:
    total_sentences += sentences[i].result
  elif total_token > 300:
    print(f"Tokens: {total_token-len(sen_tokens)}")
    total_token = len(sen_tokens)
    final_sentences.append(total_sentences)
    print(f"Sentences: {total_sentences}")
    print("*"*250)
    total_sentences = ""
    total_sentences += sentences[i].result

final_sentences.append(total_sentences)


Tokens: 277
Sentences: This Share Exchange Agreement (the ""Agreement"") is made and entered by and between 1536692 Ontario Inc. an Ontario Corporation, duly registered by the laws of Ontario, Canada and in good standing (“Ontario”), the shareholders of Ontario (the “Ontario Shareholders”) and Fox Petroleum Inc.(“Fox”), a Nevada corporation also duly registered and in good standing, effective as of July 15, 2010.A. ONTARIO is the owner of a Scrap Plastic processing plant with certain equipment, fixtures, and improvements, the assets, located in Xxxxxxxx, Xxxxxxx Xxxxxx, 
B. The ONTARIO Shareholders desire to exchange their shares of common stock held of record in ONTARIO representing 100% of the total issued and outstanding shares of ONTARIO and FOX desires to issue an aggregate of ________ shares of its common stock to the ONTARIO Shareholders;X. XXX further desires to assume an aggregate debt in the amount of $225,000.00 incurred by ONTARIO to Davfam Investments (1998) Ltd. during fi

In [None]:
final_sentences

['This Share Exchange Agreement (the ""Agreement"") is made and entered by and between 1536692 Ontario Inc. an Ontario Corporation, duly registered by the laws of Ontario, Canada and in good standing (“Ontario”), the shareholders of Ontario (the “Ontario Shareholders”) and Fox Petroleum Inc.(“Fox”), a Nevada corporation also duly registered and in good standing, effective as of July 15, 2010.A. ONTARIO is the owner of a Scrap Plastic processing plant with certain equipment, fixtures, and improvements, the assets, located in Xxxxxxxx, Xxxxxxx Xxxxxx, \nB. The ONTARIO Shareholders desire to exchange their shares of common stock held of record in ONTARIO representing 100% of the total issued and outstanding shares of ONTARIO and FOX desires to issue an aggregate of ________ shares of its common stock to the ONTARIO Shareholders;X. XXX further desires to assume an aggregate debt in the amount of $225,000.00 incurred by ONTARIO to Davfam Investments (1998) Ltd. during fiscal year 1995, as r

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

flant5 = legal.Summarizer().pretrained('legsum_flant5_legal_augmented','en','legal/models')\
    .setInputCols(["documents"])\
    .setOutputCol("summary")\
    .setMaxTextLength(512)\
    .setMaxNewTokens(512)

pipeline = nlp.Pipeline(stages=[document_assembler, 
                                flant5])

data = spark.createDataFrame([[1, " "]]).toDF('id', 'text')

model = pipeline.fit(data)
light_model = nlp.LightPipeline(model)

light_result = light_model.annotate(final_sentences)

In [None]:
light_result

[{'documents': ['This Share Exchange Agreement (the ""Agreement"") is made and entered by and between 1536692 Ontario Inc. an Ontario Corporation, duly registered by the laws of Ontario, Canada and in good standing (“Ontario”), the shareholders of Ontario (the “Ontario Shareholders”) and Fox Petroleum Inc.(“Fox”), a Nevada corporation also duly registered and in good standing, effective as of July 15, 2010.A. ONTARIO is the owner of a Scrap Plastic processing plant with certain equipment, fixtures, and improvements, the assets, located in Xxxxxxxx, Xxxxxxx Xxxxxx, \nB. The ONTARIO Shareholders desire to exchange their shares of common stock held of record in ONTARIO representing 100% of the total issued and outstanding shares of ONTARIO and FOX desires to issue an aggregate of ________ shares of its common stock to the ONTARIO Shareholders;X. XXX further desires to assume an aggregate debt in the amount of $225,000.00 incurred by ONTARIO to Davfam Investments (1998) Ltd. during fiscal 

In [None]:
final_sentences = []
for item in light_result:
    final_sentences.append(item['summary'][0])

In [None]:
final_sentences

["This agreement is between Ontario Inc., an Ontario Corporation, and Fox Petroleum Inc., a Nevada Corporation. Ontario is the owner of a Scrap Plastic processing plant with certain equipment, fixtures, and improvements, the assets, located in Xxxxxxxx, Xxxxxxx Xxxxxx, B. The Ontario Shareholders desire to exchange their shares of common stock held of record in Ontario representing 100% of the total issued and outstanding shares of Ontario and Fox desires to issue an aggregate of ________ shares of Fox's common stock to the Ontario Shareholders. Fox also desires to assume an aggregate debt in the amount of $225,000.00 incurred by Ontario to Davfam Investments (1998) Ltd. during fiscal year 1995.",
 'This legal agreement outlines the consulting business and operations of Ontario Polymers Inc. The agreement outlines the delivery of documents, deposit and delivery of the Purchase Price, and the consummation of the transactions contemplated under the agreement. It also outlines the date on

In [None]:
def summa(final_sentences, groupcount=2):
    groups = ["".join(map(str, final_sentences[i:i+groupcount])) for i in range(0, len(final_sentences), groupcount)]
    for group in groups:
        print(group)
    return groups

while len(final_sentences) > 1:
    print("The Number of Sentences:", len(final_sentences))
    print()
    final_sentences = summa(final_sentences, groupcount=2)
    print("Number of Sentences After Grouping:", len(final_sentences))

    final_summary = light_model.annotate(final_sentences)
    final_sentences = [final_summary[i]["summary"] for i in range(len(final_sentences))]
    print()
    print("Number of Sentences After Annotate:", len(final_sentences))
    print()
    print("<<< SUMMARY of the SENTENCES >>>\n", final_sentences)
    print()
    print("<>" * 50)

if len(final_sentences) == 1:
    final_summary = light_model.annotate(final_sentences[0])
    print("The final summary of the long thesis inserted into the Pipeline.")
    print("<>" * 50)
    print(final_summary[0]["summary"])


The Number of Sentences: 8

This agreement is between Ontario Inc., an Ontario Corporation, and Fox Petroleum Inc., a Nevada Corporation. Ontario is the owner of a Scrap Plastic processing plant with certain equipment, fixtures, and improvements, the assets, located in Xxxxxxxx, Xxxxxxx Xxxxxx, B. The Ontario Shareholders desire to exchange their shares of common stock held of record in Ontario representing 100% of the total issued and outstanding shares of Ontario and Fox desires to issue an aggregate of ________ shares of Fox's common stock to the Ontario Shareholders. Fox also desires to assume an aggregate debt in the amount of $225,000.00 incurred by Ontario to Davfam Investments (1998) Ltd. during fiscal year 1995.This legal agreement outlines the consulting business and operations of Ontario Polymers Inc. The agreement outlines the delivery of documents, deposit and delivery of the Purchase Price, and the consummation of the transactions contemplated under the agreement. It also