![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/spark-nlp-workshop/blob/master/finance-nlp/12.1.Financial_Summarization.ipynb)

#🎬 Installation

In [None]:
! pip install -q johnsnowlabs

##🔗 Automatic Installation
Using my.johnsnowlabs.com SSO

In [None]:
from johnsnowlabs import nlp, finance, legal

nlp.install(refresh_install=True, force_browser = True)

##🔗 Manual downloading
If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.

- Go to my.johnsnowlabs.com
- Download your license
- Upload it using the following command

In [None]:
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

- Install it

#📌 Starting

In [None]:
spark = nlp.start()

#🔎 Financial Summarization

📜Explanation:

Financial Summarization is the process of generating a concise and informative summary of financial documents, such as annual reports, financial statements, earnings transcripts, and news articles related to finance. John Snow Labs, a leading provider of natural language processing tools and technologies, offers a Financial Summarization solution that utilizes state-of-the-art deep learning algorithms to automatically extract and summarize key information from financial texts.

By using our new Financial Summarizer() module, you can get state-of-the-art, short versions of your financial documents, without losing any information.

We included 2 models for Financial Summarization:

  - **Financial FLAN-T5 Summarization (Base):** The base model, with generic capacities for summarizing financial documents.
  - **Financial Finetuned FLAN-T5 Summarization ( SEC 10k Filings ):** A specifically finetuned model trained to summarize Financial Reports sections. For this task, we finetuned our base model with more than 8K sections from different SEC Financial Reports.



### Let's see how to get summaries in different Finance documents using the `Summarizer()` module.


## 🧮 Suspicious Activity Report

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

flant5 = finance.Summarizer().pretrained('finsum_flant5_base','en','finance/models')\
    .setInputCols(["documents"])\
    .setOutputCol("summary")\
    .setMaxNewTokens(1000)

pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

data = spark.createDataFrame([
  [1, """Description of Activity:
  
On [Date], [Name of Business] submitted a loan application for a large sum of money. The loan officer noted that the application contained several red flags that raised suspicions of possible fraudulent activity.

Firstly, the business provided minimal documentation to support their financial statements, such as tax returns or bank statements. Secondly, the business listed a residential address as their place of business, which appeared to be a private residence. Additionally, the business provided inconsistent information regarding their ownership structure and the intended use of the loan proceeds.

Further investigation revealed that the business had no visible online presence, including a lack of a website, social media accounts, or business reviews. The loan officer also discovered that the business had only been in operation for a short period, despite their claims of significant revenue and growth.

Based on these findings, it is suspected that [Name of Business] may be engaging in fraudulent activity and using the loan to perpetrate such activity. Therefore, we recommend that this loan application be denied, and further investigation be conducted to determine if any additional suspicious activity has occurred."""]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

finsum_flant5_base download started this may take some time.
[OK!]
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                                                                                                                                                                                                                                        

## 🧮 Responsibility Reports

In [None]:
data = spark.createDataFrame([
  [2, """Lost Time Incident Rate: 

  The lost time incident rate per 200,000 hours worked in 2021 was 0.14, which decreased by 17.6% compared to 2020 (0.17) and decreased by 70.8% compared to 2019 (0.48). The decrease in the lost time incident rate can be attributed to the company's efforts to improve workplace safety and implement effective risk management strategies. 
  
  The total Scope 2 GHG emissions in 2021 were 688,228 tonnes, which remained relatively stable compared to 2020. The company's efforts to transition to renewable energy sources have helped to minimize Scope 2 GHG emissions."""]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                                                                                                                                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## 🧮 Broker Reports

In [None]:
data = spark.createDataFrame([
  [3, """Broker Report: Company XYZ

Introduction:
Company XYZ is a leading player in the technology industry that has released its financial results for the fiscal year 2022. The company has reported significant improvements in its cash flow operations, free cash flow, and loss reduction. This report aims to analyze these improvements and provide insights into the future prospects of the company.

Cash Flow Operations:
Company XYZ's cash flow operations have shown significant improvement over the fiscal year 2022. The net cash flow from operating activities has increased by 15% compared to the previous year. This improvement is primarily due to the increase in sales and effective management of accounts receivable and accounts payable. The company has also reduced its inventory levels, resulting in a reduction of cash outflows from operating activities.

Free Cash Flow:
Company XYZ's free cash flow has also increased by 20% over the fiscal year 2022. This increase is primarily due to the improvement in cash flow operations and the reduction in capital expenditures. The company has been able to generate positive free cash flow for the third consecutive year. This is a significant achievement for the company and shows its commitment to improving its financial position."""]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

+--------------------------------------------------------------------------------------------------------------------+
|result                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------+
|[Company XYZ has reported significant improvements in its cash flow operations, free cash flow, and loss reduction.]|
+--------------------------------------------------------------------------------------------------------------------+



## 🧮 SEC10K

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

flant5 = finance.Summarizer().pretrained('finsum_flant5_finetuned_sec10k','en','finance/models')\
    .setInputCols(["documents"])\
    .setOutputCol("summary")\
    .setMaxNewTokens(1000)

pipeline = nlp.Pipeline(stages=[document_assembler, flant5])

data = spark.createDataFrame([
  [4, """Report on Form 10-K.
Moreover, we operate in a very competitive and rapidly changing environment. New risks and uncertainties emerge from time to time, and it is not possible for us to predict all risks and uncertainties that could have an impact on the forward-looking statements contained in this Annual Report on Form 10-K. We cannot assure you that the results, events, and circumstances reflected in the forward-looking statements will be achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking statements.
The forward-looking statements made in this Annual Report on Form 10-K relate only to events as of the date on which the statements are made. We undertake no obligation to update any forward-looking statements made in this Annual Report on Form 10-K to reflect events or circumstances after the date of this Annual Report on Form 10-K or to reflect new information or the occurrence of unanticipated events, except as required by law. We may not actually achieve the plans, intentions, or expectations disclosed in our forward-looking statements, and you should not place undue reliance on our forward-looking statements. Our forward-looking statements do not reflect the potential impact of any future acquisitions, mergers, dispositions, joint ventures, or investments we may make.
SUMMARY OF RISK FACTORS
Below is a summary of the principal factors that
could materially harm our business, operating results and/or financial condition, impair our future prospects and/or cause the price of our Class A common stock to decline.
This summary does not address all of the risks that we face. Additional discussion of the risks summarized in this risk factor summary, and other risks that we face, can be found below under the heading “Risk Factors” and should be carefully considered, together with other information in this Form 10-K and our other filings with the Securities and Exchange Commission ("SEC") before making an investment decision regarding our Class A common stock."""]
]).toDF('id', 'text')

results = pipeline.fit(data).transform(data)

results.select("summary.result").show(truncate=False)

finsum_flant5_finetuned_sec10k download started this may take some time.
[OK!]
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                                                                                                                                                                              

## 🏓**Long Document Summarization**

 For long document summarization, the following steps can be taken:

1.   **Splitting the Document**: The initial step is to split the long document into smaller chunks. This is done to ensure that each chunk stays within the token limit imposed by the model and tokenizer. The document can be divided into paragraphs, sections, or any logical divisions based on the content.
2.   **Token Limit**:  We split the text into sentences,  The initial text is divided into separate sentences. Starting with an empty container, the sentences are added to it one by one, in the order they appear in the original text. As each sentence is added, the total number of characters in the accumulated text is calculated and monitored. After adding each sentence, the accumulated character count is compared to the specified limit of 400 characters. The process continues until the total character count reaches or exceeds the specified limit. At this point, the process stops, and the accumulated text, consisting of all the sentences added so far, is considered the final output.
3.   **Applying Summarization**: After the document is divided into manageable chunks, summarization is applied to each chunk independently. This can be done using the chosen summarization model or technique. The aim is to generate concise summaries for each chunk, capturing the essential information within the given constraints.
4.   **Merging and Refining**: Once summaries are generated for each individual chunk, they can be merged and refined to form a coherent and comprehensive summary of the entire long document. This can involve removing redundant information, resolving inconsistencies, and ensuring smooth transitions between different chunks.
5.   **Iterative Process:** If the summary generated from the merged chunks exceeds the desired length or does not meet the desired quality, the process can be iterated. This might involve adjusting the splitting strategy, revising the summarization model parameters, or applying additional techniques to improve the summary.








⏳ Load sample txt file

In [None]:
text = """
Our Community Intelligence capability, which extends across our BSM platform, provides information to Coupa customers by applying artificial intelligence-powered analysis to the structured, normalized data collected from the comprehensive set of business spend transactions that have occurred on the Coupa platform. This innovative analysis provides Coupa customers with prescriptive recommendations to optimize their spend decisions, improve operational efficiency, and reduce risk based on best practices from the Coupa community. Participating customers are able to contribute to and benefit from Community Intelligence, with use cases spanning various areas of spend management, including: Supplier Insights and Supplier Risk Management which help companies evaluate and reduce the risk levels of suppliers; operational insights which help businesses measure their own performance on key operational metrics against other Coupa customers and follow best practices to drive efficiency and savings, Commodity and Procurable Insights which help companies identify spend consolidation and savings opportunities, and Spend Guard, which leverages artificial intelligence on behavior patterns to automatically surface potential errors and fraud across all business spend. Rapid time to value through fast deployment cycles and low cost of ownership of a cloud-based model. Opportunity to achieve significant and sustainable savings that can translate into improved profitability. High employee adoption of our easy-to-use BSM platform, which enables better visibility into spend, allowing both procurement and sourcing professionals to better manage their time. Strong supplier adoption as suppliers are motivated to join our network due to ease of enablement, flexibility, and lack of supplier fees. Access to extensive spending data in real-time, which leads to superior decision-making that can result in significant cost savings. Ability to stay agile and adapt to changes in operating and regulatory environments with our easily configurable platform. Process efficiency improvements that allow businesses to free up valuable resources and staff who can be deployed effectively elsewhere in the organization. Enhanced compliance with governmental regulations through greater auditability, documentation and control of spending activity. Intuitive and simple user experience that shields users from complexity and enables adoption of our platform with minimal training. Efficiency improvements as employees are more rapidly able to procure the goods and services they need to fulfill their job responsibilities. Convenience to employees, as our platform gathers data on historical activity and leverages the insights to help populate requests and minimize data entry. Participating in our Coupa Open Business Network, which allows suppliers to display their information and catalog of products and services on our platform for existing and prospective customers. Fast registration process and flexibility to interact with customers through the Coupa Supplier Portal, direct integration or simply by use of direct email. Elimination of manual processes and efficiency improvements through electronic invoicing and streamlined procurement and payment processes. Real-time visibility into invoice status, often through direct push notifications without having to log in to a portal. Seamless audit, documentation and archiving of electronic purchase orders and invoices that helps suppliers comply with changing government regulations, as well as avoid risks. We sell our software applications through our direct sales organization and our partner program, Coupa Partner Connect. Our direct sales team is global and comprised of inside sales and field sales personnel who are organized by geography, account size, and application type. We generate customer leads, accelerate sales opportunities, and build brand awareness through our marketing programs, including such programs with our strategic relationships. For example, we have joint marketing programs and sponsorship agreements with KPMG, Deloitte, and Accenture. our annual Coupa Inspire conferences which are held in multiple jurisdictions and over multiple days to connect customers, disseminate best practices, and reinforce our brand among existing and new customers. As a result of the COVID-19 pandemic, we replaced our 2020 in-person Inspire conference with web-based events for our customers, prospects, and partners; development of our ideal customer profile (ICP), which helps identify the accounts with the highest propensity to buy, for each of our sales segments; programmatic account-based marketing and field efforts in close partnership with sales to target the ICP accounts in our respective sales segments; territory development representatives who respond to incoming leads to convert them into new sales opportunities; participation in, and sponsorship of, user conferences, executive events, trade shows, and industry events; integrated marketing campaigns, including direct e-mail, online web advertising, blogs, and webinars; cooperative marketing efforts with partners, including joint press announcements, joint trade show activities, channel marketing campaigns, and joint seminars; use of our website to provide application and company information, as well as learning opportunities for potential customers. In May 2020, we acquired all of the equity interest in ConnXus, Inc. (“ConnXus”), a cloud-based supplier relationship management platform that enables enterprises, health systems and government agencies to monitor all aspects of their supplier diversity compliance programs. The purchase consideration was approximately $10.0 million in cash of which approximately $1.4 million In June 2020, we acquired all of the equity interest in Bellin Treasury International GmbH (“Bellin”), a cloud-based treasury management software platform that improves visibility and control over cash and optimizes treasury processes. The purchase consideration was approximately $121.0 million, comprised of $79.1 million in cash (of which $8.0 million is being held in escrow for eighteen months after the transaction closing date) and 186,300 shares of our common stock with a fair value of approximately $41.8 million as of the transaction close date. In September 2020, we acquired all of the equity interest in Much-Net GmbH ("Much-Net"), a financial instrument software and service provider that specializes in risk management. The purchase consideration was approximately $4.3 million in cash, which is net of $1.8 million in cash acquired. In November 2020, we completed the acquisition of Laurel Parent Holdings, Inc. and its subsidiaries ("LLamasoft"), a supply chain design and analysis software and solutions company. The acquisition strengthens Coupa’s supply chain capabilities, enabling businesses to drive greater value through Business Spend Management. In connection with the acquisition, we issued approximately 2.4 million shares of our common stock and paid aggregate cash of approximately $791.5 million. Approximately $15.0 million of the cash paid is being held in escrow for fifteen months after the transaction closing date as security for the former LLamasoft stockholders' indemnification obligations, and approximately $7.5 million of the cash paid is being held in escrow until the completion of final adjustment on the purchase consideration. In February 2021, we completed the acquisition of Pana Industries, Inc. ("Pana"), a corporate travel booking solution company that puts an emphasis on the traveler experience. In connection with the completion of the acquisition, we paid aggregate cash of approximately $48.5 million, and issued 23,822 shares of our common stock. As a core part of our strategy, we have developed an ecosystem of partners to extend our sales capabilities and coverage, to broaden and complement our application offerings, and to provide a broad array of services that lie outside of our primary areas of focus. Our partnerships increase our ability to grow and scale quickly and efficiently and allow us to maintain greater focus on executing against our strategy. Our referral partners provide global, national and regional expertise in business spend management, procurement and expense management. They help organizations through operational transformation by leveraging process, best practices and new technology. These partners may refer customer prospects to us and assist us in selling to them. In return, we typically pay these partners a percentage of the first-year subscription revenue generated by the customers they refer. In order to offer the full breadth of implementation services, change management, and strategic consulting services to our customers, we work with leading global systems integrators such as Accenture, Deloitte and KPMG, as well as boutique and regional consulting firms. Our strategy is to enable the majority of our projects to be led by implementation partners with additional specialized support from us. Our implementation partners are highly skilled and trained by our team. When working with implementation partners, we are typically in a “co-sell” arrangement where we will sell our subscription directly to the customer and our partner will sell its implementation services directly to the customer.
"""

## Pipeline specified to separate text into tokens

Tokenization is the NLP task in charge of splitting sentences in smaller pieces, usually words. Althought it's a splitting task, we don't call it splitting, we call it tokenization.

The main component to do tokenization is the Tokenizer. It will get the tokens from the piece of text you pass to it.

In [None]:
document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

text_splitter = finance.TextSplitter() \
    .setInputCols(["document"])\
    .setOutputCol("sentences")\
    .setCustomBounds(["\n\n"])\
    .setExplodeSentences(True)

tokenizer = nlp.Tokenizer()\
    .setInputCols(["sentences"]) \
    .setOutputCol("tokens")

nlp_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    text_splitter,
    tokenizer])

df = spark.createDataFrame([[text]]).toDF("text")

fit = nlp_pipeline.fit(df)
lp = nlp.LightPipeline(fit)

res = lp.fullAnnotate(text)

In [None]:
sentences = res[0]['sentences']
tokens = res[0]['tokens']

for i in range(len(sentences)):
  sen_tokens = [x.result for x in tokens if x.metadata['sentence'] == str(i)]
  print(f"Sentence: {sentences[i].result}")
  print(f"Tokens {len(sen_tokens)}: {sen_tokens}")
  print("*"*250)

Sentence: Our Community Intelligence capability, which extends across our BSM platform, provides information to Coupa customers by applying artificial intelligence-powered analysis to the structured, normalized data collected from the comprehensive set of business spend transactions that have occurred on the Coupa platform.
Tokens 45: ['Our', 'Community', 'Intelligence', 'capability', ',', 'which', 'extends', 'across', 'our', 'BSM', 'platform', ',', 'provides', 'information', 'to', 'Coupa', 'customers', 'by', 'applying', 'artificial', 'intelligence-powered', 'analysis', 'to', 'the', 'structured', ',', 'normalized', 'data', 'collected', 'from', 'the', 'comprehensive', 'set', 'of', 'business', 'spend', 'transactions', 'that', 'have', 'occurred', 'on', 'the', 'Coupa', 'platform', '.']
***************************************************************************************************************************************************************************************************************

In [None]:
sentences = res[0]['sentences']
tokens = res[0]['tokens']

total_token = 0
total_sentences = ""
final_sentences = []

for i in range(len(sentences)):
  sen_tokens = [x.result for x in tokens if x.metadata['sentence'] == str(i)]
  total_token += len(sen_tokens)
  if total_token <= 300:
    total_sentences += sentences[i].result
  elif total_token > 300:
    print(f"Tokens: {total_token-len(sen_tokens)}")
    total_token = len(sen_tokens)
    final_sentences.append(total_sentences)
    print(f"Sentences: {total_sentences}")
    print("*"*250)
    total_sentences = ""
    total_sentences += sentences[i].result

final_sentences.append(total_sentences)


Tokens: 289
Sentences: Our Community Intelligence capability, which extends across our BSM platform, provides information to Coupa customers by applying artificial intelligence-powered analysis to the structured, normalized data collected from the comprehensive set of business spend transactions that have occurred on the Coupa platform.This innovative analysis provides Coupa customers with prescriptive recommendations to optimize their spend decisions, improve operational efficiency, and reduce risk based on best practices from the Coupa community.Participating customers are able to contribute to and benefit from Community Intelligence, with use cases spanning various areas of spend management, including: Supplier Insights and Supplier Risk Management which help companies evaluate and reduce the risk levels of suppliers;operational insights which help businesses measure their own performance on key operational metrics against other Coupa customers and follow best practices to drive eff

- In the notebook, we are working with the T5 model, which uses a different tokenizer called SentencePiece. Unlike the standard tokenizer used in the Tokenizer annotator, SentencePiece can produce a varying number of tokens for the same input text. This means that we cannot directly compare the token counts between the two tokenizers.

- To ensure that the generated tokens from the SentencePiece tokenizer do not exceed the T5 model's token limit, we set a conservative limit of 300 tokens. This value provides a cushion to account for the fact that SentencePiece tokenization can result in a higher token count compared to simple whitespace tokenization.

- The reason for this precaution is that if the total token count in a given input, including both the existing tokens and the additional tokens from the next sentence, exceeds the T5 model's maximum limit (which is typically 512 tokens but can vary), it will result in an error.

- By limiting the token count to 300, we ensure that even if the subsequent sentence adds a few more tokens, the total number of tokens remains within the T5 model's limit. This allows us to avoid tokenization errors and successfully generate summaries for the long document.

- It's important to note that the specific token limit for the T5 model being used should be checked in the model's documentation or implementation, as it might deviate from the standard 512 tokens. Adjustments to the limit may be necessary based on the model's specific requirements and constraints.

In [None]:
final_sentences

['Our Community Intelligence capability, which extends across our BSM platform, provides information to Coupa customers by applying artificial intelligence-powered analysis to the structured, normalized data collected from the comprehensive set of business spend transactions that have occurred on the Coupa platform.This innovative analysis provides Coupa customers with prescriptive recommendations to optimize their spend decisions, improve operational efficiency, and reduce risk based on best practices from the Coupa community.Participating customers are able to contribute to and benefit from Community Intelligence, with use cases spanning various areas of spend management, including: Supplier Insights and Supplier Risk Management which help companies evaluate and reduce the risk levels of suppliers;operational insights which help businesses measure their own performance on key operational metrics against other Coupa customers and follow best practices to drive efficiency and savings, 

In [None]:
document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("documents")

flant5 = finance.Summarizer().pretrained('finsum_flant5_finetuned_sec10k','en','finance/models')\
    .setInputCols(["documents"])\
    .setOutputCol("summary")\
    .setMaxTextLength(512)\
    .setMaxNewTokens(512)

pipeline = nlp.Pipeline(stages=[document_assembler, 
                                flant5])

data = spark.createDataFrame([[1, " "]]).toDF('id', 'text')

model = pipeline.fit(data)
light_model = nlp.LightPipeline(model)

light_result = light_model.annotate(final_sentences)

In [None]:
light_result

[{'documents': ['Our Community Intelligence capability, which extends across our BSM platform, provides information to Coupa customers by applying artificial intelligence-powered analysis to the structured, normalized data collected from the comprehensive set of business spend transactions that have occurred on the Coupa platform.This innovative analysis provides Coupa customers with prescriptive recommendations to optimize their spend decisions, improve operational efficiency, and reduce risk based on best practices from the Coupa community.Participating customers are able to contribute to and benefit from Community Intelligence, with use cases spanning various areas of spend management, including: Supplier Insights and Supplier Risk Management which help companies evaluate and reduce the risk levels of suppliers;operational insights which help businesses measure their own performance on key operational metrics against other Coupa customers and follow best practices to drive efficienc

In [None]:
final_sentences = []
for item in light_result:
    final_sentences.append(item['summary'][0])

In [None]:
final_sentences

['This community intelligence capability extends across the Coupa platform, providing Coupa customers with prescriptive recommendations to optimize their spend decisions, improve operational efficiency, and reduce risk. It also provides Coupa customers with access to a cloud-based model, enabling better visibility into spend, and a strong supplier adoption.',
 'This company is a software company that is able to stay agile and adapt to changes in operating and regulatory environments with its easily configurable platform. It also has the ability to increase compliance with governmental regulations through greater auditability, documentation and control of spending activity. It also has the ability to collect data on historical activity and leverage the insights to help populate requests and minimize data entry. It also participates in the Coupa Open Business Network, which allows suppliers to display their information and catalog of products and services on our platform for existing and

- As you can see, if we apply the summarization pipeline again to the 6 summaries that are formed, we will again have limitation by the 512 token. So we will divide these summarizations into two and last one run them through the summarization pipeline again to get the final summary.

In [None]:
len(final_sentences)

6

In [None]:
for i in final_sentences:
  print(i)

This community intelligence capability extends across the Coupa platform, providing Coupa customers with prescriptive recommendations to optimize their spend decisions, improve operational efficiency, and reduce risk. It also provides Coupa customers with access to a cloud-based model, enabling better visibility into spend, and a strong supplier adoption.
This company is a software company that is able to stay agile and adapt to changes in operating and regulatory environments with its easily configurable platform. It also has the ability to increase compliance with governmental regulations through greater auditability, documentation and control of spending activity. It also has the ability to collect data on historical activity and leverage the insights to help populate requests and minimize data entry. It also participates in the Coupa Open Business Network, which allows suppliers to display their information and catalog of products and services on our platform for existing and prosp

In [None]:
def summa(final_sentences, groupcount=2):
    groups = ["".join(map(str, final_sentences[i:i+groupcount])) for i in range(0, len(final_sentences), groupcount)]
    for group in groups:
        print(group)
    return groups

while len(final_sentences) > 1:
    print("The Number of Sentences:", len(final_sentences))
    print()
    final_sentences = summa(final_sentences, groupcount=2)
    print("Number of Sentences After Grouping:", len(final_sentences))

    final_summary = light_model.annotate(final_sentences)
    final_sentences = [final_summary[i]["summary"] for i in range(len(final_sentences))]
    print()
    print("Number of Sentences After Annotate:", len(final_sentences))
    print()
    print("<<< SUMMARY of the SENTENCES >>>\n", final_sentences)
    print()
    print("<>" * 50)

if len(final_sentences) == 1:
    final_summary = light_model.annotate(final_sentences[0])
    print("The final summary of the long thesis inserted into the Pipeline.")
    print("<>" * 50)
    print(final_summary[0]["summary"])


The Number of Sentences: 6

This community intelligence capability extends across the Coupa platform, providing Coupa customers with prescriptive recommendations to optimize their spend decisions, improve operational efficiency, and reduce risk. It also provides Coupa customers with access to a cloud-based model, enabling better visibility into spend, and a strong supplier adoption.This company is a software company that is able to stay agile and adapt to changes in operating and regulatory environments with its easily configurable platform. It also has the ability to increase compliance with governmental regulations through greater auditability, documentation and control of spending activity. It also has the ability to collect data on historical activity and leverage the insights to help populate requests and minimize data entry. It also participates in the Coupa Open Business Network, which allows suppliers to display their information and catalog of products and services on our plat