## This Extractive Text Summarisation was based on the links below:
* https://www.mygreatlearning.com/blog/text-summarization-in-python/#Approaches%20used%20for%20Text%20Summarization

### Import Relevant Libraries

In [1]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize

import pandas as pd
import numpy as np
import json



### Load Dataset

In [21]:
#CUAD
contracts_cuad = pd.read_excel('../data/contract_new.xlsx')
contracts_cuad['content'] = contracts_cuad['content'].apply(lambda x: x.lower())
contracts_cuad.head()

Unnamed: 0,contract,content
0,2ThemartComInc_19990826_10-12G_EX-10.10_670028...,co-branding and advertising agreement this co-...
1,ABILITYINC_06_15_2020-EX-4.25-SERVICESAGREEMEN...,exhibit 4.25 information in this exhibit ident...
2,ACCELERATEDTECHNOLOGIESHOLDINGCORP_04_24_2003-...,exhibit 10.13 joint venture agreement collecti...
3,ACCURAYINC_09_01_2010-EX-10.31-DISTRIBUTORAGRE...,exhibit 10.31 pursuant to 17 c.f.r. § 240.24b-...
4,ADAMSGOLFINC_03_21_2005-EX-10.17-ENDORSEMENTAG...,redacted copy confidential treatment requested...


### Summarize using Avg Sentence Score

#### CUAD dataset

In [23]:
#get an input text to do a rough test
input_text = contracts_cuad['content'][3]
print(input_text)

exhibit 10.31 pursuant to 17 c.f.r. § 240.24b-2, confidential information (indicated by {*****}) has been omitted from this document and has been filed separately with the securities and exchange commission pursuant to a confidential treatment application filed with the commission accuray incorporated multiple linac and multi-modality distributor agreement this multiple linac and multi-modality distributor agreement ("agreement") is entered into by and between accuray incorporated, a delaware corporation with its executive offices located at 1310 chesapeake terrace, sunnyvale, california 94089, usa ("accuray"), and siemens aktiengesellschaft, a corporation formed under the laws of the federal republic of germany, with its registered offices located at berlin and munich ("siemens"), as of june 8, 2010 ("effective date"). recitals accuray manufactures and sells full-body radiosurgery systems using image-guided robotics, including the cyberknife® robotic radiosurgery system, which is fda 

In [24]:
#tokenise the text
stopWords = set(stopwords.words("english"))
words = word_tokenize(input_text)

In [25]:
#create a freq table to keep the score of each word
freqTable = dict()
for word in words:
    if word in stopWords:
        continue
    if word in freqTable:
        freqTable[word] += 1
    else:
        freqTable[word] = 1

In [26]:
# Create dictionary to keep the score of each sentence
sentences = sent_tokenize(input_text)
sentenceValue = dict()
for sentence in sentences:
    for word, freq in freqTable.items():
        if word in sentence:
            if sentence in sentenceValue:
                sentenceValue[sentence] += freq
            else:
                sentenceValue[sentence] = freq
                
sumValue = 0
for sentence in sentenceValue:
    sumValue += sentenceValue[sentence]

In [27]:
# Average value of a sentence from the original text
avg = int(sumValue/len(sentenceValue))

In [28]:
# Storing sentences into our summary
summary = ''
summary_list = []
for sentence in sentences:
    if (sentence in sentenceValue) and (sentenceValue[sentence]>(1.2*avg)):
        summary_list.append(sentence)
        summary += ' '+ sentence
        
print(summary)

 § 240.24b-2, confidential information (indicated by {*****}) has been omitted from this document and has been filed separately with the securities and exchange commission pursuant to a confidential treatment application filed with the commission accuray incorporated multiple linac and multi-modality distributor agreement this multiple linac and multi-modality distributor agreement ("agreement") is entered into by and between accuray incorporated, a delaware corporation with its executive offices located at 1310 chesapeake terrace, sunnyvale, california 94089, usa ("accuray"), and siemens aktiengesellschaft, a corporation formed under the laws of the federal republic of germany, with its registered offices located at berlin and munich ("siemens"), as of june 8, 2010 ("effective date"). in order to achieve its business objectives, accuray relies on qualified distributors to market and distribute its products and services. accuray and siemens have entered into that certain strategic alli

In [29]:
print(len(sentences))
print(len(summary_list))

183
72


#### BillSum dataset

In [None]:
# read 