# Text Summary & Scoring Project
##### Michael Creegan, Yungfeng Dai, Hong Gyu Ji, Ziling Zeng
##### Python for Data Analysis
##### Columbia University

# Abstract

Summarization is a common problem in the 21st century as the world has become increasingly driven by data. Summarization of data can be very useful to  quickly determine if something is relevant or whether it's worth reading. Another use case could could be to store summaries of articles it in the backend to run downstream taks on. It could also be useful to understand the semantic integrity to indicate quality.

To explore this topic, we will leverage the extreme summarization dataset (XSUM) which consists of BBC articles accompanying single sentence summaries. Each article is prefaced with an introductory sentence (which is a summary) that is professionally written, typically by the author of the article.

To summarize articles, we will use an encoder-decoder transformer (sequence-to-sequence) which combines  decoders and encoders because we need to perform both input and output tasks: taking in text and then generating a summary. We selected this type of transformer because the encoder accepts inputs (text) and computes a high level representation of those inputs  which are then passed to the decoder to generate a prediction output (summary). This has advantages over using a standalone encoder like BERT/ALBERT/ELECTRA/RoBERTA/DistilBERT to name a few because  encoders are pre-trained by filling randomly masked words in sentences and therefore are better suited for output tasks. Using a standalone decoder like gpt2 would also not be optimal because decoders are trained to guess the next word in a sequence (left or right context aka does not have context on one side of the sequence) and therefore are better suited at generating text but not necessarily taking in text because of the hidden context limitations. 

Our scoring will compare the output of the BART encoder-decoder model to the professionally written summaries in the XSUM dataset to see how semantically similar a machine generated summary is to a professional one as well as to their source articles. Our scoring methodology will be focused on semantic textual similarity and computed using the cosine similarity between the professional human written summary and the machine generated one. 

# Importing Transformers & Dependencies

In [1]:
import pandas as pd
import numpy as np
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
from datasets import load_dataset, load_metric
from sentence_transformers import SentenceTransformer, util
import random
from IPython.display import display, HTML

# Load XSUM Dataset

In [2]:
xsum = load_dataset('xsum')

Using custom data configuration default
Reusing dataset xsum (C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934)
100%|██████████| 3/3 [00:00<00:00, 19.96it/s]


### We can see that the dataset is a "DatasetDict" where the keys are strings that correspond to the split and the values are the dataset object. In the XSUM dataset, the the keys are "training", "validation", and "test" with values corresponding to "document", "summary", and "id" (columns)

In [3]:
xsum

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

# View Underlying Data

In [4]:
xsum['test'][0]

{'document': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said the

## We can use a function to view a random selection of articles and summaries to get a more accurate depiction of what the data looks like in a synthesized format

In [5]:
def display_function(xsum, num_examples=3):
    assert num_examples <= len(xsum)                # limit to number of records in the xsum
    
    selections = []                                 # create empty list to put the records into 
    
    for _ in range(num_examples):                   # we can use _ here in place of a variable name because we don't care how many time sthe loop is run
        selection = random.randint(0, len(xsum) - 1)
        while selection in selections:
            selection = random.randint(0, len(xsum) - 1)
        selections.append(selection)

    xsumPd = pd.DataFrame(xsum[selections])
    for column, typ in xsum.features.items():
        display(HTML(xsumPd.to_html()))

# Cleaning
Our end goal is to create accurate summaries using this model so we need to remove the text characters that do not provide any contextual value. We can also see that there are characters in the document that are not present in the summary which could cause discrepencies between our machine generated summary vs the professional human generated one. We need to remove new line characters and backslashes that are present in the document column but not the summary column

In [6]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"At the time Mr Tilli was working as an estate agent, but with the Portuguese housing market in free fall, he was earning next to nothing.\nAnd with Portugal's government needing a 78bn euro ($103bn; Â£62bn) international bailout, in exchange for putting in place a number of austerity measures, the wider Portuguese economy was mired in its worst recession for more than 40 years.\nWith the jobs market having ground to a halt, and four children to look after, Magda and Miguel Tilli decided they had no option but to take a chance on starting their own company.\nMs Tilli, 37, recalls: ""We were at the beach, brainstorming about what we could do.""\nTapping into Mr Tilli's knowledge of the property market, they recognised that they could turn a problem - no one was buying homes anymore - into a business opportunity.\nAnd so, they decided to launch an estate agent business specialising in renting houses in Lisbon city centre.\nWhile such a focus on rental properties may be common in other countries, the great majority of Portuguese estate agents only deal with selling homes.\nThis is because in Portugal owning your own house or apartment has always been such a matter of pride that it is the first choice of most people, even young adults.\nYet suddenly the great majority of young people couldn't hope to get a mortgage. Instead, a growing number living in Lisbon started to turn to the Tillis' new property rental business - Home Lovers.\nTo keep start-up costs down, the couple initially listed their available properties only on Facebook.\nYet to build up a decent reputation, they hired professionals to take all the photos, and only accepted properties of a high standard.\nMs Tilli says they picked the kind of places that appeal to young, urban professionals, such as trendy apartments.\nSoon they had a steady stream of customers, both people wishing to rent a property, and landlords wanting to list with them.\n""It became a cool thing to rent a house through us,"" says Ms Tilli, who previously worked as a flight attendant for TAP, the main Portuguese airline.\nHome Lovers has now expanded to Porto and Cascais, two other Portuguese cities, and has a team of 20 workers.\nIt is now considering going to Madrid.\nMs Tilli says: ""I'm a bit scared with that, but I don't see us being able to do this in any more places here in Portugal.""\nTo understand why entrepreneurship, or setting up a business, is now so popular in Portugal, you only need a quick reminder of how bad unemployment remains in the country, even though it came out of recession in 2013.\nThe Portuguese jobless rate rose from 7.6% in 2008 to 14.1% in June of this year.\nThe situation is even worse for young adults, with one out of every three people aged 15 to 24 years old out of work, according to Eurostat, the statistical office of the European Union.\nProfessor Paulo Soares de Pinho, who teaches at Nova School of Business and Economics in Lisbon, and runs his own investment fund, says that one of the biggest changes brought by Portugal's economic crisis was ""to transform many unemployed people into wannabe entrepreneurs"".\nYet he cautions that while many technology-minded young people are coming up with products, not all of them are able to turn them into a viable business.\n""We're going through an app entrepreneurship wave. Any kid coming out of an engineering school develops an app and thinks he has a company,"" he says.\n""But there are many tech projects with no market orientation whatsoever.""\nCarlos Silva, co-founder of the crowd funding website Seedrs, agrees that in Portugal ""may start-ups are going ahead just because entrepreneurship is now a trend"".\nYet, he adds that there are ""more and more start-ups of excellent quality.""\nTo help boost entrepreneurship, the Portuguese government has created an investment body called Portugal Ventures to invest 20m euros of public funds a year into start-up firms.\nStart-up incubators have also sprung up, to give new businesses an office or desk to help them get on their feet during their first months.\nMagda and Miguel Tilli, used one such incubator - Start-up Lisboa - during the launch of their firm.\nAnthony Douglas is another entrepreneur who has used Start-up Lisboa to get his business off the ground.\nThe 33-year-old is the founder of Hole19, a golfing app, which has mapped out thousands of golf courses around the world, and allows golfers to track and store statistics about their own performances.\nInitially it was a paid app, and the business struggled.\nMr Douglas, who has a Portuguese mother and American father, says: ""We've been almost dead a few times, with zero euros in the bank.\n""In some months I stopped paying my own salary and had to ask relatives for money.""\nYet Mr Douglas has since been able to transform the business's fortunes by giving the app away for free.\nThe aim is now to make money by enabling Hole19's users to book golf courses via the app, in exchange for paying a fee each time.\nMr Douglas says Hole19 was downloaded 220,000 times in the first 90 days after going free. And recently he raised 900,000 euros from foreign investors.\nJoao Romao, 25, is another young Portuguese entrepreneur who has managed to turn around his business fortunes.\nHis first start-up venture, based around the idea of a shareable gift list connected to online shops, quickly failed.\nUndeterred, he is now developing a business called GetSocial, which aims to help companies promote their content on social networks, and measure its impact.\nRecently he secured 630,000 euros of investment.\nMr Romao says: ""The first try was a good lesson learned. It taught me how to build a start-up. Everybody's learning.""","When Portugal was hit by an economic crisis in 2011, Magda Tilli and her husband Miguel realised that if they wanted to make a decent living they would have to set up their own business.",29027462
1,"The animal had been shot twice in the shoulder and once in its left back leg, which vets had to amputate.\nThe charity said the one-year-old cat was ""incredibly lucky"" to survive.\nLast year the Scottish government held a consultation on licensing air weapons, but a majority of responders opposed the plan.\nOne-year-old Teenie was found injured by her owner Sarah Nisbett in NiddryView, Winchburgh, at about 16:30 on Friday 14 March and taken to the Scottish SPCA.\nMrs Nisbett said the cat was now having to learn how to walk again.\n""The gun that was used must have some power because the pellet actually went through her back leg, that's why it was so badly damaged,"" she said.\n""She's now learning how to hop around the house, it's terrible.\n""The fact that it was three shots is crazy. We live in a housing estate and there are lots of kids. That just makes it worse because any of them could have been hit in the crossfire.""\nShe added: ""There's some sick people out there, hopefully somebody will know who's done this and let the police or the Scottish SPCA know.""\nScottish SPCA Ch Supt Mike Flynn said: ""Teenie's owners are understandably very upset and keen for us to find the callous person responsible to ensure no more cats come to harm.\n""This is an alarming incident which only highlights why the Scottish government should implement the licensing of airguns as a matter of urgency.""\nHe added: ""The new licensing regime should ensure that only those with a lawful reason are allowed to possess such a dangerous weapon. It will also help the police trace anyone using an air gun irresponsibly.""\nLast year the Scottish government launched a consultation on licensing air weapons, with a large majority of those who responded opposing the plan.\nUnder the proposed scheme, anyone wanting to own an air gun would need to demonstrate they had a legitimate reason for doing so.\nA total of 87% of respondents rejected the idea - with some describing it as ""draconian"" and ""heavy-handed"". A small number of people felt ministers were not going far enough.\nThe Scottish SPCA urged anyone with information about the incident to contact them.",An animal charity is calling for the licensing of air guns after a cat in West Lothian was left injured after being shot three times.,26668081
2,"The 27-year-old woman was attacked at about 20:00 on 20 February while walking along a path between Byres Road and Glenmalloch Place in Elderslie.\nPolice said three men seen in the area at the time may have seen the suspect or ""unwittingly"" witnessed something.\nThe suspect was white, aged between 35 and 50, and with dark receding hair.\nHe hit the woman, causing her to fall to the ground, and then raped her.\nDet Insp Louise Harvie said: ""Extensive inquiries are continuing to trace whoever is responsible for this serious sexual assault.\n""There are three men that officers wish to trace as they were seen in the area near to where the incident took place, and may have seen the suspect or unwittingly witnesses something vital to this investigation.\n""I would urge them to come forward and speak to police.""\nThe first man was seen in Stoddard Square, Elderslie, at about 20:00 on Sunday 19 or Monday 20 February. He is described as between 30-50 years of age and wearing dark trousers and a light top.\nThe second man was seen near the Wallace Monument in Main Road, Elderslie, at about 20:15 on 20 February.\nHe is described as being in his 30s, 5ft 10in, of a stocky build with short, dark hair and clean shaven. He was wearing dark trousers and a dark parka-style jacket.\nThe third man was also seen near the Wallace Monument at about 01:10 on 21 February. He is described as between 30-40 years old, with a broad build and wearing dark jeans, a black jacket and white trainers.","Officers investigating the rape of a woman in Renfrewshire have appealed for three men to come forward, saying they could be vital witnesses.",39372760


Unnamed: 0,document,summary,id
0,"At the time Mr Tilli was working as an estate agent, but with the Portuguese housing market in free fall, he was earning next to nothing.\nAnd with Portugal's government needing a 78bn euro ($103bn; Â£62bn) international bailout, in exchange for putting in place a number of austerity measures, the wider Portuguese economy was mired in its worst recession for more than 40 years.\nWith the jobs market having ground to a halt, and four children to look after, Magda and Miguel Tilli decided they had no option but to take a chance on starting their own company.\nMs Tilli, 37, recalls: ""We were at the beach, brainstorming about what we could do.""\nTapping into Mr Tilli's knowledge of the property market, they recognised that they could turn a problem - no one was buying homes anymore - into a business opportunity.\nAnd so, they decided to launch an estate agent business specialising in renting houses in Lisbon city centre.\nWhile such a focus on rental properties may be common in other countries, the great majority of Portuguese estate agents only deal with selling homes.\nThis is because in Portugal owning your own house or apartment has always been such a matter of pride that it is the first choice of most people, even young adults.\nYet suddenly the great majority of young people couldn't hope to get a mortgage. Instead, a growing number living in Lisbon started to turn to the Tillis' new property rental business - Home Lovers.\nTo keep start-up costs down, the couple initially listed their available properties only on Facebook.\nYet to build up a decent reputation, they hired professionals to take all the photos, and only accepted properties of a high standard.\nMs Tilli says they picked the kind of places that appeal to young, urban professionals, such as trendy apartments.\nSoon they had a steady stream of customers, both people wishing to rent a property, and landlords wanting to list with them.\n""It became a cool thing to rent a house through us,"" says Ms Tilli, who previously worked as a flight attendant for TAP, the main Portuguese airline.\nHome Lovers has now expanded to Porto and Cascais, two other Portuguese cities, and has a team of 20 workers.\nIt is now considering going to Madrid.\nMs Tilli says: ""I'm a bit scared with that, but I don't see us being able to do this in any more places here in Portugal.""\nTo understand why entrepreneurship, or setting up a business, is now so popular in Portugal, you only need a quick reminder of how bad unemployment remains in the country, even though it came out of recession in 2013.\nThe Portuguese jobless rate rose from 7.6% in 2008 to 14.1% in June of this year.\nThe situation is even worse for young adults, with one out of every three people aged 15 to 24 years old out of work, according to Eurostat, the statistical office of the European Union.\nProfessor Paulo Soares de Pinho, who teaches at Nova School of Business and Economics in Lisbon, and runs his own investment fund, says that one of the biggest changes brought by Portugal's economic crisis was ""to transform many unemployed people into wannabe entrepreneurs"".\nYet he cautions that while many technology-minded young people are coming up with products, not all of them are able to turn them into a viable business.\n""We're going through an app entrepreneurship wave. Any kid coming out of an engineering school develops an app and thinks he has a company,"" he says.\n""But there are many tech projects with no market orientation whatsoever.""\nCarlos Silva, co-founder of the crowd funding website Seedrs, agrees that in Portugal ""may start-ups are going ahead just because entrepreneurship is now a trend"".\nYet, he adds that there are ""more and more start-ups of excellent quality.""\nTo help boost entrepreneurship, the Portuguese government has created an investment body called Portugal Ventures to invest 20m euros of public funds a year into start-up firms.\nStart-up incubators have also sprung up, to give new businesses an office or desk to help them get on their feet during their first months.\nMagda and Miguel Tilli, used one such incubator - Start-up Lisboa - during the launch of their firm.\nAnthony Douglas is another entrepreneur who has used Start-up Lisboa to get his business off the ground.\nThe 33-year-old is the founder of Hole19, a golfing app, which has mapped out thousands of golf courses around the world, and allows golfers to track and store statistics about their own performances.\nInitially it was a paid app, and the business struggled.\nMr Douglas, who has a Portuguese mother and American father, says: ""We've been almost dead a few times, with zero euros in the bank.\n""In some months I stopped paying my own salary and had to ask relatives for money.""\nYet Mr Douglas has since been able to transform the business's fortunes by giving the app away for free.\nThe aim is now to make money by enabling Hole19's users to book golf courses via the app, in exchange for paying a fee each time.\nMr Douglas says Hole19 was downloaded 220,000 times in the first 90 days after going free. And recently he raised 900,000 euros from foreign investors.\nJoao Romao, 25, is another young Portuguese entrepreneur who has managed to turn around his business fortunes.\nHis first start-up venture, based around the idea of a shareable gift list connected to online shops, quickly failed.\nUndeterred, he is now developing a business called GetSocial, which aims to help companies promote their content on social networks, and measure its impact.\nRecently he secured 630,000 euros of investment.\nMr Romao says: ""The first try was a good lesson learned. It taught me how to build a start-up. Everybody's learning.""","When Portugal was hit by an economic crisis in 2011, Magda Tilli and her husband Miguel realised that if they wanted to make a decent living they would have to set up their own business.",29027462
1,"The animal had been shot twice in the shoulder and once in its left back leg, which vets had to amputate.\nThe charity said the one-year-old cat was ""incredibly lucky"" to survive.\nLast year the Scottish government held a consultation on licensing air weapons, but a majority of responders opposed the plan.\nOne-year-old Teenie was found injured by her owner Sarah Nisbett in NiddryView, Winchburgh, at about 16:30 on Friday 14 March and taken to the Scottish SPCA.\nMrs Nisbett said the cat was now having to learn how to walk again.\n""The gun that was used must have some power because the pellet actually went through her back leg, that's why it was so badly damaged,"" she said.\n""She's now learning how to hop around the house, it's terrible.\n""The fact that it was three shots is crazy. We live in a housing estate and there are lots of kids. That just makes it worse because any of them could have been hit in the crossfire.""\nShe added: ""There's some sick people out there, hopefully somebody will know who's done this and let the police or the Scottish SPCA know.""\nScottish SPCA Ch Supt Mike Flynn said: ""Teenie's owners are understandably very upset and keen for us to find the callous person responsible to ensure no more cats come to harm.\n""This is an alarming incident which only highlights why the Scottish government should implement the licensing of airguns as a matter of urgency.""\nHe added: ""The new licensing regime should ensure that only those with a lawful reason are allowed to possess such a dangerous weapon. It will also help the police trace anyone using an air gun irresponsibly.""\nLast year the Scottish government launched a consultation on licensing air weapons, with a large majority of those who responded opposing the plan.\nUnder the proposed scheme, anyone wanting to own an air gun would need to demonstrate they had a legitimate reason for doing so.\nA total of 87% of respondents rejected the idea - with some describing it as ""draconian"" and ""heavy-handed"". A small number of people felt ministers were not going far enough.\nThe Scottish SPCA urged anyone with information about the incident to contact them.",An animal charity is calling for the licensing of air guns after a cat in West Lothian was left injured after being shot three times.,26668081
2,"The 27-year-old woman was attacked at about 20:00 on 20 February while walking along a path between Byres Road and Glenmalloch Place in Elderslie.\nPolice said three men seen in the area at the time may have seen the suspect or ""unwittingly"" witnessed something.\nThe suspect was white, aged between 35 and 50, and with dark receding hair.\nHe hit the woman, causing her to fall to the ground, and then raped her.\nDet Insp Louise Harvie said: ""Extensive inquiries are continuing to trace whoever is responsible for this serious sexual assault.\n""There are three men that officers wish to trace as they were seen in the area near to where the incident took place, and may have seen the suspect or unwittingly witnesses something vital to this investigation.\n""I would urge them to come forward and speak to police.""\nThe first man was seen in Stoddard Square, Elderslie, at about 20:00 on Sunday 19 or Monday 20 February. He is described as between 30-50 years of age and wearing dark trousers and a light top.\nThe second man was seen near the Wallace Monument in Main Road, Elderslie, at about 20:15 on 20 February.\nHe is described as being in his 30s, 5ft 10in, of a stocky build with short, dark hair and clean shaven. He was wearing dark trousers and a dark parka-style jacket.\nThe third man was also seen near the Wallace Monument at about 01:10 on 21 February. He is described as between 30-40 years old, with a broad build and wearing dark jeans, a black jacket and white trainers.","Officers investigating the rape of a woman in Renfrewshire have appealed for three men to come forward, saying they could be vital witnesses.",39372760


Unnamed: 0,document,summary,id
0,"At the time Mr Tilli was working as an estate agent, but with the Portuguese housing market in free fall, he was earning next to nothing.\nAnd with Portugal's government needing a 78bn euro ($103bn; Â£62bn) international bailout, in exchange for putting in place a number of austerity measures, the wider Portuguese economy was mired in its worst recession for more than 40 years.\nWith the jobs market having ground to a halt, and four children to look after, Magda and Miguel Tilli decided they had no option but to take a chance on starting their own company.\nMs Tilli, 37, recalls: ""We were at the beach, brainstorming about what we could do.""\nTapping into Mr Tilli's knowledge of the property market, they recognised that they could turn a problem - no one was buying homes anymore - into a business opportunity.\nAnd so, they decided to launch an estate agent business specialising in renting houses in Lisbon city centre.\nWhile such a focus on rental properties may be common in other countries, the great majority of Portuguese estate agents only deal with selling homes.\nThis is because in Portugal owning your own house or apartment has always been such a matter of pride that it is the first choice of most people, even young adults.\nYet suddenly the great majority of young people couldn't hope to get a mortgage. Instead, a growing number living in Lisbon started to turn to the Tillis' new property rental business - Home Lovers.\nTo keep start-up costs down, the couple initially listed their available properties only on Facebook.\nYet to build up a decent reputation, they hired professionals to take all the photos, and only accepted properties of a high standard.\nMs Tilli says they picked the kind of places that appeal to young, urban professionals, such as trendy apartments.\nSoon they had a steady stream of customers, both people wishing to rent a property, and landlords wanting to list with them.\n""It became a cool thing to rent a house through us,"" says Ms Tilli, who previously worked as a flight attendant for TAP, the main Portuguese airline.\nHome Lovers has now expanded to Porto and Cascais, two other Portuguese cities, and has a team of 20 workers.\nIt is now considering going to Madrid.\nMs Tilli says: ""I'm a bit scared with that, but I don't see us being able to do this in any more places here in Portugal.""\nTo understand why entrepreneurship, or setting up a business, is now so popular in Portugal, you only need a quick reminder of how bad unemployment remains in the country, even though it came out of recession in 2013.\nThe Portuguese jobless rate rose from 7.6% in 2008 to 14.1% in June of this year.\nThe situation is even worse for young adults, with one out of every three people aged 15 to 24 years old out of work, according to Eurostat, the statistical office of the European Union.\nProfessor Paulo Soares de Pinho, who teaches at Nova School of Business and Economics in Lisbon, and runs his own investment fund, says that one of the biggest changes brought by Portugal's economic crisis was ""to transform many unemployed people into wannabe entrepreneurs"".\nYet he cautions that while many technology-minded young people are coming up with products, not all of them are able to turn them into a viable business.\n""We're going through an app entrepreneurship wave. Any kid coming out of an engineering school develops an app and thinks he has a company,"" he says.\n""But there are many tech projects with no market orientation whatsoever.""\nCarlos Silva, co-founder of the crowd funding website Seedrs, agrees that in Portugal ""may start-ups are going ahead just because entrepreneurship is now a trend"".\nYet, he adds that there are ""more and more start-ups of excellent quality.""\nTo help boost entrepreneurship, the Portuguese government has created an investment body called Portugal Ventures to invest 20m euros of public funds a year into start-up firms.\nStart-up incubators have also sprung up, to give new businesses an office or desk to help them get on their feet during their first months.\nMagda and Miguel Tilli, used one such incubator - Start-up Lisboa - during the launch of their firm.\nAnthony Douglas is another entrepreneur who has used Start-up Lisboa to get his business off the ground.\nThe 33-year-old is the founder of Hole19, a golfing app, which has mapped out thousands of golf courses around the world, and allows golfers to track and store statistics about their own performances.\nInitially it was a paid app, and the business struggled.\nMr Douglas, who has a Portuguese mother and American father, says: ""We've been almost dead a few times, with zero euros in the bank.\n""In some months I stopped paying my own salary and had to ask relatives for money.""\nYet Mr Douglas has since been able to transform the business's fortunes by giving the app away for free.\nThe aim is now to make money by enabling Hole19's users to book golf courses via the app, in exchange for paying a fee each time.\nMr Douglas says Hole19 was downloaded 220,000 times in the first 90 days after going free. And recently he raised 900,000 euros from foreign investors.\nJoao Romao, 25, is another young Portuguese entrepreneur who has managed to turn around his business fortunes.\nHis first start-up venture, based around the idea of a shareable gift list connected to online shops, quickly failed.\nUndeterred, he is now developing a business called GetSocial, which aims to help companies promote their content on social networks, and measure its impact.\nRecently he secured 630,000 euros of investment.\nMr Romao says: ""The first try was a good lesson learned. It taught me how to build a start-up. Everybody's learning.""","When Portugal was hit by an economic crisis in 2011, Magda Tilli and her husband Miguel realised that if they wanted to make a decent living they would have to set up their own business.",29027462
1,"The animal had been shot twice in the shoulder and once in its left back leg, which vets had to amputate.\nThe charity said the one-year-old cat was ""incredibly lucky"" to survive.\nLast year the Scottish government held a consultation on licensing air weapons, but a majority of responders opposed the plan.\nOne-year-old Teenie was found injured by her owner Sarah Nisbett in NiddryView, Winchburgh, at about 16:30 on Friday 14 March and taken to the Scottish SPCA.\nMrs Nisbett said the cat was now having to learn how to walk again.\n""The gun that was used must have some power because the pellet actually went through her back leg, that's why it was so badly damaged,"" she said.\n""She's now learning how to hop around the house, it's terrible.\n""The fact that it was three shots is crazy. We live in a housing estate and there are lots of kids. That just makes it worse because any of them could have been hit in the crossfire.""\nShe added: ""There's some sick people out there, hopefully somebody will know who's done this and let the police or the Scottish SPCA know.""\nScottish SPCA Ch Supt Mike Flynn said: ""Teenie's owners are understandably very upset and keen for us to find the callous person responsible to ensure no more cats come to harm.\n""This is an alarming incident which only highlights why the Scottish government should implement the licensing of airguns as a matter of urgency.""\nHe added: ""The new licensing regime should ensure that only those with a lawful reason are allowed to possess such a dangerous weapon. It will also help the police trace anyone using an air gun irresponsibly.""\nLast year the Scottish government launched a consultation on licensing air weapons, with a large majority of those who responded opposing the plan.\nUnder the proposed scheme, anyone wanting to own an air gun would need to demonstrate they had a legitimate reason for doing so.\nA total of 87% of respondents rejected the idea - with some describing it as ""draconian"" and ""heavy-handed"". A small number of people felt ministers were not going far enough.\nThe Scottish SPCA urged anyone with information about the incident to contact them.",An animal charity is calling for the licensing of air guns after a cat in West Lothian was left injured after being shot three times.,26668081
2,"The 27-year-old woman was attacked at about 20:00 on 20 February while walking along a path between Byres Road and Glenmalloch Place in Elderslie.\nPolice said three men seen in the area at the time may have seen the suspect or ""unwittingly"" witnessed something.\nThe suspect was white, aged between 35 and 50, and with dark receding hair.\nHe hit the woman, causing her to fall to the ground, and then raped her.\nDet Insp Louise Harvie said: ""Extensive inquiries are continuing to trace whoever is responsible for this serious sexual assault.\n""There are three men that officers wish to trace as they were seen in the area near to where the incident took place, and may have seen the suspect or unwittingly witnesses something vital to this investigation.\n""I would urge them to come forward and speak to police.""\nThe first man was seen in Stoddard Square, Elderslie, at about 20:00 on Sunday 19 or Monday 20 February. He is described as between 30-50 years of age and wearing dark trousers and a light top.\nThe second man was seen near the Wallace Monument in Main Road, Elderslie, at about 20:15 on 20 February.\nHe is described as being in his 30s, 5ft 10in, of a stocky build with short, dark hair and clean shaven. He was wearing dark trousers and a dark parka-style jacket.\nThe third man was also seen near the Wallace Monument at about 01:10 on 21 February. He is described as between 30-40 years old, with a broad build and wearing dark jeans, a black jacket and white trainers.","Officers investigating the rape of a woman in Renfrewshire have appealed for three men to come forward, saying they could be vital witnesses.",39372760


## We can address the problem we mentioned above by define a cleaning function that replaces new lines and backslashes with white space.

In [7]:
def clean(row):
    row['document'] = row['document'].replace('\n', ' ')\
                                     .replace('\'', '').replace('\"','')
    return row

## We can now apply the cleaning function we created and map it onto our data (it loads for train, test, and validation)

In [8]:
xsum = xsum.map(clean)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-fd36b556705cbe4d.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-edb3a2dc2f06b92c.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-a4042da98a2992a2.arrow


### Voila!

In [9]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"She stressed that to kill a police officer was to attack the fundamental basis of our society. But Mrs May also said police officers must end frivolous accident claims and focus on raising public trust. Earlier the Police Federation urged her not to base legislation changes on the behaviour of a handful of officers. During her address, Mrs May said suing someone after slipping on their property was not the sort of attitude officers should exhibit. Her comments come after it emerged recently that one police officer, PC Kelly Jones, had taken legal action after tripping on a kerb at a Norfolk petrol station in August. Mrs May also revealed plans to allow police to take over shoplifting prosecutions where goods taken were worth less than Â£200. Unveiling plans for a change in legislation at the Police Federation conference in Bournemouth, Mrs May announced the government proposal that the minimum term for killing an officer should be increased to life without parole. The current minimum sentence for a police murder is 30 years. By Danny ShawBBC home affairs correspondent Theresa Mays whole life tariff for police murderers is being welcomed by rank-and-file officers - but its unlikely to quell the anger felt by Police Federation members about the governments programme of cuts and reforms to the service. High on their list of concerns is an idea, currently the subject of negotiation, which would allow chief constables to make police compulsorily redundant. Officers say chiefs could get rid of officers they dont like or those approaching pension age - and with no industrial rights thered be nothing police could do about it. A final decision on whether the home secretary will go ahead is expected in the summer. The federation would no doubt toast Mrs May if she abandoned the whole idea. The home secretary told rank-and-file officers the murder of a police officer was a particularly appalling crime. We ask police officers to keep us safe by confronting and stopping violent criminals for us, she said. And sometimes you are targeted by criminals because of what you represent. She added: We are clear - life should mean life for anyone convicted of killing a police officer. The Criminal Justice Act 2003 permits Justice Secretary Chris Grayling - following consultation with the Sentencing Council - to make an order to change starting points for sentences. In this instance, it enables him to change the starting point from 30 years to a whole life order, meaning offenders could not be released other than at the discretion of the secretary of state on compassionate grounds - for example, if they are terminally ill or seriously incapacitated. The Sentencing Council, the official body that oversees sentencing in England and Wales, issues guidelines for judges and magistrates to work to for all offences other than murder. A spokesman said: Introducing whole life tariffs for those who murder police officers would involve changes to the law, which is a matter for Parliament, rather than the Sentencing Council. But he confirmed that the government had a duty to consult with the council before new legislation could be brought in. The Sentencing Council says that, as things stand, whole life orders can be imposed in murder cases if the court decides that the offence is so serious that the offender should spend the rest of their life in prison. There are currently 47 prisoners in England and Wales who have been given whole life tariffs, including Rosemary West and Yorkshire Ripper Peter Sutcliffe. The home secretary, who faced a question and answer session after her speech, was heckled at last years conference after she told officers to stop pretending they were being singled out and would have to make their share of public spending cuts. Police Federation chairman Steve Williams, who had earlier welcomed Mrs Mays sentencing plan, told her morale was low as a result of the governments programme of cuts and reforms. Speaking at the conference, he urged the home secretary not to hang your reforms on the reprehensible behaviour of a handful of officers. The biggest applause came when he called for the government to abandon plans for compulsory severance, which are currently subject to negotiation. Chief Inspector of Constabulary Tom Winsor, who is behind hotly debated changes such as fast-track recruitment and lower annual pay for new constables, was also due to address officers. On Tuesday, shadow home secretary Yvette Cooper told the three-day conference that government plans to withdraw from the European Arrest Warrant agreement would make it harder to catch criminals who went on the run abroad.","Criminals who kill police officers in England and Wales will face compulsory whole life sentences, Home Secretary Theresa May has announced.",22534665
1,"Steven Rodriguez, who was better known as A$AP Yams or Yamborghini, died aged 26 on 18 January at Brooklyns Woodhull Medical Centre. He founded the US rap collective A$AP Mob along with fellow New Yorkers A$AP Bari and A$AP Illz. Now the New York Times reports that his death was caused by acute mixed drug intoxication. Opiates and benzodiazepines were found in his system and it was ruled an accident. After his death artists paid tribute to him on social media. Drake tweeted: Rest in peace Yams. A$AP is family. Azealia Banks wrote: ASAP YAMS should be remembered as a leader, an innovator and most importantly as an important part of NYC youth culture. Follow @BBCNewsbeat on Twitter, BBCNewsbeat on Instagram and Radio1Newsbeat on YouTube","American rapper A$AP Yams died of an accidental drug overdose, according to New York City's chief medical examiner.",31983012
2,"Capt Ranong Chumpinit told the BBC that Daniel Clarke was found at 01:05 GMT on Saturday lying by the train track in Thung-Kha, Chumphon province. He said Mr Clarke, from Aldershot, told police that he stepped out to smoke between two carriages when he fell. The Foreign Office said a Briton had been hospitalised in Thailand. We are supporting the family of a British national who has been hospitalised in Thailand, a spokeswoman said. Capt Ranong said a friend of the backpacker told police it was an accident. He said: We dont believe theres a foul play going on because his belongings remained intact.","A 21-year-old British man is in hospital in Thailand with head and leg injuries after he fell out of a moving train, Thai police have said.",39468445


Unnamed: 0,document,summary,id
0,"She stressed that to kill a police officer was to attack the fundamental basis of our society. But Mrs May also said police officers must end frivolous accident claims and focus on raising public trust. Earlier the Police Federation urged her not to base legislation changes on the behaviour of a handful of officers. During her address, Mrs May said suing someone after slipping on their property was not the sort of attitude officers should exhibit. Her comments come after it emerged recently that one police officer, PC Kelly Jones, had taken legal action after tripping on a kerb at a Norfolk petrol station in August. Mrs May also revealed plans to allow police to take over shoplifting prosecutions where goods taken were worth less than Â£200. Unveiling plans for a change in legislation at the Police Federation conference in Bournemouth, Mrs May announced the government proposal that the minimum term for killing an officer should be increased to life without parole. The current minimum sentence for a police murder is 30 years. By Danny ShawBBC home affairs correspondent Theresa Mays whole life tariff for police murderers is being welcomed by rank-and-file officers - but its unlikely to quell the anger felt by Police Federation members about the governments programme of cuts and reforms to the service. High on their list of concerns is an idea, currently the subject of negotiation, which would allow chief constables to make police compulsorily redundant. Officers say chiefs could get rid of officers they dont like or those approaching pension age - and with no industrial rights thered be nothing police could do about it. A final decision on whether the home secretary will go ahead is expected in the summer. The federation would no doubt toast Mrs May if she abandoned the whole idea. The home secretary told rank-and-file officers the murder of a police officer was a particularly appalling crime. We ask police officers to keep us safe by confronting and stopping violent criminals for us, she said. And sometimes you are targeted by criminals because of what you represent. She added: We are clear - life should mean life for anyone convicted of killing a police officer. The Criminal Justice Act 2003 permits Justice Secretary Chris Grayling - following consultation with the Sentencing Council - to make an order to change starting points for sentences. In this instance, it enables him to change the starting point from 30 years to a whole life order, meaning offenders could not be released other than at the discretion of the secretary of state on compassionate grounds - for example, if they are terminally ill or seriously incapacitated. The Sentencing Council, the official body that oversees sentencing in England and Wales, issues guidelines for judges and magistrates to work to for all offences other than murder. A spokesman said: Introducing whole life tariffs for those who murder police officers would involve changes to the law, which is a matter for Parliament, rather than the Sentencing Council. But he confirmed that the government had a duty to consult with the council before new legislation could be brought in. The Sentencing Council says that, as things stand, whole life orders can be imposed in murder cases if the court decides that the offence is so serious that the offender should spend the rest of their life in prison. There are currently 47 prisoners in England and Wales who have been given whole life tariffs, including Rosemary West and Yorkshire Ripper Peter Sutcliffe. The home secretary, who faced a question and answer session after her speech, was heckled at last years conference after she told officers to stop pretending they were being singled out and would have to make their share of public spending cuts. Police Federation chairman Steve Williams, who had earlier welcomed Mrs Mays sentencing plan, told her morale was low as a result of the governments programme of cuts and reforms. Speaking at the conference, he urged the home secretary not to hang your reforms on the reprehensible behaviour of a handful of officers. The biggest applause came when he called for the government to abandon plans for compulsory severance, which are currently subject to negotiation. Chief Inspector of Constabulary Tom Winsor, who is behind hotly debated changes such as fast-track recruitment and lower annual pay for new constables, was also due to address officers. On Tuesday, shadow home secretary Yvette Cooper told the three-day conference that government plans to withdraw from the European Arrest Warrant agreement would make it harder to catch criminals who went on the run abroad.","Criminals who kill police officers in England and Wales will face compulsory whole life sentences, Home Secretary Theresa May has announced.",22534665
1,"Steven Rodriguez, who was better known as A$AP Yams or Yamborghini, died aged 26 on 18 January at Brooklyns Woodhull Medical Centre. He founded the US rap collective A$AP Mob along with fellow New Yorkers A$AP Bari and A$AP Illz. Now the New York Times reports that his death was caused by acute mixed drug intoxication. Opiates and benzodiazepines were found in his system and it was ruled an accident. After his death artists paid tribute to him on social media. Drake tweeted: Rest in peace Yams. A$AP is family. Azealia Banks wrote: ASAP YAMS should be remembered as a leader, an innovator and most importantly as an important part of NYC youth culture. Follow @BBCNewsbeat on Twitter, BBCNewsbeat on Instagram and Radio1Newsbeat on YouTube","American rapper A$AP Yams died of an accidental drug overdose, according to New York City's chief medical examiner.",31983012
2,"Capt Ranong Chumpinit told the BBC that Daniel Clarke was found at 01:05 GMT on Saturday lying by the train track in Thung-Kha, Chumphon province. He said Mr Clarke, from Aldershot, told police that he stepped out to smoke between two carriages when he fell. The Foreign Office said a Briton had been hospitalised in Thailand. We are supporting the family of a British national who has been hospitalised in Thailand, a spokeswoman said. Capt Ranong said a friend of the backpacker told police it was an accident. He said: We dont believe theres a foul play going on because his belongings remained intact.","A 21-year-old British man is in hospital in Thailand with head and leg injuries after he fell out of a moving train, Thai police have said.",39468445


Unnamed: 0,document,summary,id
0,"She stressed that to kill a police officer was to attack the fundamental basis of our society. But Mrs May also said police officers must end frivolous accident claims and focus on raising public trust. Earlier the Police Federation urged her not to base legislation changes on the behaviour of a handful of officers. During her address, Mrs May said suing someone after slipping on their property was not the sort of attitude officers should exhibit. Her comments come after it emerged recently that one police officer, PC Kelly Jones, had taken legal action after tripping on a kerb at a Norfolk petrol station in August. Mrs May also revealed plans to allow police to take over shoplifting prosecutions where goods taken were worth less than Â£200. Unveiling plans for a change in legislation at the Police Federation conference in Bournemouth, Mrs May announced the government proposal that the minimum term for killing an officer should be increased to life without parole. The current minimum sentence for a police murder is 30 years. By Danny ShawBBC home affairs correspondent Theresa Mays whole life tariff for police murderers is being welcomed by rank-and-file officers - but its unlikely to quell the anger felt by Police Federation members about the governments programme of cuts and reforms to the service. High on their list of concerns is an idea, currently the subject of negotiation, which would allow chief constables to make police compulsorily redundant. Officers say chiefs could get rid of officers they dont like or those approaching pension age - and with no industrial rights thered be nothing police could do about it. A final decision on whether the home secretary will go ahead is expected in the summer. The federation would no doubt toast Mrs May if she abandoned the whole idea. The home secretary told rank-and-file officers the murder of a police officer was a particularly appalling crime. We ask police officers to keep us safe by confronting and stopping violent criminals for us, she said. And sometimes you are targeted by criminals because of what you represent. She added: We are clear - life should mean life for anyone convicted of killing a police officer. The Criminal Justice Act 2003 permits Justice Secretary Chris Grayling - following consultation with the Sentencing Council - to make an order to change starting points for sentences. In this instance, it enables him to change the starting point from 30 years to a whole life order, meaning offenders could not be released other than at the discretion of the secretary of state on compassionate grounds - for example, if they are terminally ill or seriously incapacitated. The Sentencing Council, the official body that oversees sentencing in England and Wales, issues guidelines for judges and magistrates to work to for all offences other than murder. A spokesman said: Introducing whole life tariffs for those who murder police officers would involve changes to the law, which is a matter for Parliament, rather than the Sentencing Council. But he confirmed that the government had a duty to consult with the council before new legislation could be brought in. The Sentencing Council says that, as things stand, whole life orders can be imposed in murder cases if the court decides that the offence is so serious that the offender should spend the rest of their life in prison. There are currently 47 prisoners in England and Wales who have been given whole life tariffs, including Rosemary West and Yorkshire Ripper Peter Sutcliffe. The home secretary, who faced a question and answer session after her speech, was heckled at last years conference after she told officers to stop pretending they were being singled out and would have to make their share of public spending cuts. Police Federation chairman Steve Williams, who had earlier welcomed Mrs Mays sentencing plan, told her morale was low as a result of the governments programme of cuts and reforms. Speaking at the conference, he urged the home secretary not to hang your reforms on the reprehensible behaviour of a handful of officers. The biggest applause came when he called for the government to abandon plans for compulsory severance, which are currently subject to negotiation. Chief Inspector of Constabulary Tom Winsor, who is behind hotly debated changes such as fast-track recruitment and lower annual pay for new constables, was also due to address officers. On Tuesday, shadow home secretary Yvette Cooper told the three-day conference that government plans to withdraw from the European Arrest Warrant agreement would make it harder to catch criminals who went on the run abroad.","Criminals who kill police officers in England and Wales will face compulsory whole life sentences, Home Secretary Theresa May has announced.",22534665
1,"Steven Rodriguez, who was better known as A$AP Yams or Yamborghini, died aged 26 on 18 January at Brooklyns Woodhull Medical Centre. He founded the US rap collective A$AP Mob along with fellow New Yorkers A$AP Bari and A$AP Illz. Now the New York Times reports that his death was caused by acute mixed drug intoxication. Opiates and benzodiazepines were found in his system and it was ruled an accident. After his death artists paid tribute to him on social media. Drake tweeted: Rest in peace Yams. A$AP is family. Azealia Banks wrote: ASAP YAMS should be remembered as a leader, an innovator and most importantly as an important part of NYC youth culture. Follow @BBCNewsbeat on Twitter, BBCNewsbeat on Instagram and Radio1Newsbeat on YouTube","American rapper A$AP Yams died of an accidental drug overdose, according to New York City's chief medical examiner.",31983012
2,"Capt Ranong Chumpinit told the BBC that Daniel Clarke was found at 01:05 GMT on Saturday lying by the train track in Thung-Kha, Chumphon province. He said Mr Clarke, from Aldershot, told police that he stepped out to smoke between two carriages when he fell. The Foreign Office said a Briton had been hospitalised in Thailand. We are supporting the family of a British national who has been hospitalised in Thailand, a spokeswoman said. Capt Ranong said a friend of the backpacker told police it was an accident. He said: We dont believe theres a foul play going on because his belongings remained intact.","A 21-year-old British man is in hospital in Thailand with head and leg injuries after he fell out of a moving train, Thai police have said.",39468445


## We can view the column names and data types with our dataset using .features

In [10]:
xsum['test'].features

{'document': Value(dtype='string', id=None),
 'summary': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None)}

In [11]:
print(xsum['test'].info)

DatasetInfo(description='\nExtreme Summarization (XSum) Dataset.\n\nThere are three features:\n  - document: Input news article.\n  - summary: One sentence summary of the article.\n  - id: BBC ID of the article.\n\n', citation="\n@article{Narayan2018DontGM,\n  title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization},\n  author={Shashi Narayan and Shay B. Cohen and Mirella Lapata},\n  journal={ArXiv},\n  year={2018},\n  volume={abs/1808.08745}\n}\n", homepage='https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset', license='', features={'document': Value(dtype='string', id=None), 'summary': Value(dtype='string', id=None), 'id': Value(dtype='string', id=None)}, post_processed=None, supervised_keys=SupervisedKeysData(input='document', output='summary'), task_templates=None, builder_name='xsum', config_name='default', version=1.2.0, splits={'train': SplitInfo(name='train', num_bytes=479206615, num_examples=204045, data

# Preparing XSUM Data
Before we can put the text into a model we need to convert it into a format that the transformer can understand. Encoders and decoders only understand numerical values; we need to tokenize each word and then convert the tokens into numerical values. The tokenization transformer splits text into tokens and then adds special tokens if expected based on pretraining. The tokenizer then matches each token to unique id in vocabulary of tokenizer which has a corresponding vector of numerical values. These vectors contain the contextualized value of a word. For example, the vector representation of the word "to" isnt just "to", it also takes into account the words around it which are called context (right and left context). To continue this example, "Welcome to NYC" is a sentence that has the word "to". For the word "to" the left context is "Welcome" and the right context is "NYC". The output is based on these contexts; this is how the value is a contextualized vector thanks to self-attention mechanism. We can do all of this using the AutoTokenizer.from_pretarined method to ensure that we get a tokenizer that corresponds to the model architecture we want to use (facebook/bart-large-cnn); however, we will specifically reference the BartTokenizer in our checkpoint, tokenizer, and model to ensure all aspects of our model were trained using the same methodologies so we can avoid unexpected summaries

In [12]:
checkpoint = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint)

## We now write a function that preprocesses the test data by passing it to the tokenizer. We need to use the argument truncation=True to ensure that any input longer than the model can handle will be truncated to the maximum length alowed. We can view this information in the model config. BART has a maximum length (can take in 1024 tokens in a sequence) of 1024 which we can see in max_position_embeddings

In [13]:
model.config

BartConfig {
  "_name_or_path": "facebook/bart-large-cnn",
  "_num_labels": 3,
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_final_layer_norm": false,
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "early_stopping": true,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "force_bos_token_to_be_generated": true,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "L

## We can now create the function with the maximum length allowed as per the config and a minimum length of 60 which is explained in the section where we compare human summaries and machine summaries to each other and the original articles

In [14]:
max_input_length = 1024
max_target_length = 60


def preperation_function(examples):
    inputs = [doc for doc in examples["document"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, padding=True)

    
    with tokenizer.as_target_tokenizer(): # Setup the tokenizer for summaries where "as_target_tokenizer" is what provides passes along the context for each vector
        labels = tokenizer(
            examples["summary"], max_length=max_target_length, truncation=True
        )

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

## We can apply this function to our dataset using map

In [15]:
tokenized_xsum = xsum.map(preperation_function, batched=True)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-d006ce488ae4d44a.arrow
100%|██████████| 12/12 [00:18<00:00,  1.56s/ba]
100%|██████████| 12/12 [00:17<00:00,  1.47s/ba]


In [16]:
tokenized_xsum

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11334
    })
})

In [17]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

## The attention mask tells the model what to pay attention to by passing values of 1 for tokens to consider and values of 0 for tokens to ignore. The input ids are the numerical mapping of tokens to BART's vocabulary; each word in BART's vocabulary is assigned a numerical value.

In [18]:
display_function(tokenized_xsum['test'])

Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The East Antrim MP had been considering putting his name forward after Nigel Dodds ruled himself out. Finance Minister Arlene Foster is the only person so far to declare an interest and has the backing of a majority of the partys most senior elected representatives. Nominations to become the next leader of the DUP close later on Wednesday. Peter Robinson announced in November that he was standing down as party leader. On Tuesday, Mrs Foster said she was very humbled by the support she has received from party colleagues. Mr Wilson thought long and hard about his decision and spoke to among others, party colleagues Mrs Foster and Mr Dodds. In the end he felt it would be in the best interests of the party that he did not put his name forward. He felt he wanted to make sure there was a smooth transition and that means almost certainly now that Arlene Foster will be the new DUP leader and the new first minister at Stormont. She said she looked forward to leading the DUP, if that was the partys wish. Mrs Foster said she had hoped to work with Mr Dodds as a team. We will still hopefully work together as a team and that is certainly my wish for the future, she said. In a tweet on Monday night, Mr Robinson said he had received a valid nomination from Mrs Foster for the post of DUP leader. Arlenes nomination was submitted with the support of over 75% of those entitled to vote in the electoral college, he added.",35048629,"[0, 133, 953, 3702, 6103, 3957, 56, 57, 2811, 2057, 39, 766, 556, 71, 16734, 18753, 29, 3447, 1003, 66, 4, 4090, 692, 1586, 24398, 8436, 16, 5, 129, 621, 98, 444, 7, 10152, 41, 773, 8, 34, 5, 6027, 9, 10, 1647, 9, 5, 233, 2459, 144, 949, 2736, 4844, 4, 234, 18121, 1635, 7, 555, 5, 220, 884, 9, 5, 22598, 593, 423, 15, 307, 4, 2155, 5380, 585, 11, 759, 14, 37, 21, 2934, 159, 25, 537, 884, 4, 374, 294, 6, 3801, 8436, 26, 79, 21, 182, 10080, 10288, 30, 5, 323, 79, 34, 829, 31, ...]","[0, 21169, 4783, 3095, 34, 26, 37, 40, 45, 28, 878, 13, 5, 22598, 1673, 4, 2]",Sammy Wilson has said he will not be running for the DUP leadership.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","As first reported by The Express, the settled community in the Hovefields area of Wickford reported hardcore-laden lorries arrive at the weekend. The lorries were followed by large mobile homes. Basildon Council said it is aware of an alleged breach of planning laws. Essex Police has also been informed. The Hovefield site - which is subject to a High Court injunction preventing further development - is less than three miles (5km) by road - from the Dale Farm traveller site in Wickford. Dale Farm was Europes largest traveller site before about 80 families were evicted from unlawful plots in 2011. Jill Walsh, of the Hovefields Residents Association, said an English Traveller family in five caravans left the site on Friday. After they departed, she said, a number of large lorries carrying concrete, hardcore and three mobile homes drove down Hovefields Avenue and onto the five acre field at the end of the road. Mrs Walsh said because of the narrowness of the road one of their neighbours - an elderly couple - had their fence ripped out and shrubbery damaged so that the lorries could get through. If the council does not prosecute over this and deal with the situation urgently they will have a Dale Farm II, but bigger. Phil Turner, leader of Basildon Council, said: Basildon Council is aware of an alleged breach of planning laws in the Hovefields area. We share the frustrations of residents, but the council does not have powers of arrest and must follow the proper legal process. As a public body, we must act within the existing legal framework, as set out by Parliament, and this adds considerable time and cost in dealing with such situations. However, residents can be assured that the council is taking all appropriate steps to deal with unauthorised development. Essex Police said it was investigating a criminal damage report involving the fence and has urged any witnesses to contact them. A police spokesman said: Essex Police is aware of an unauthorised traveller development on land near Hovefields Avenue. We are liaising with the local authority and will continue to monitor the situation.",39195455,"[0, 1620, 78, 431, 30, 20, 3619, 6, 5, 5668, 435, 11, 5, 289, 7067, 21346, 443, 9, 18063, 1891, 431, 27482, 12, 20724, 784, 368, 4458, 5240, 23, 5, 983, 4, 20, 784, 368, 4458, 58, 1432, 30, 739, 1830, 1611, 4, 7093, 9683, 261, 1080, 26, 24, 16, 2542, 9, 41, 1697, 6999, 9, 1884, 2074, 4, 15252, 522, 34, 67, 57, 3978, 4, 20, 289, 7067, 1399, 1082, 111, 61, 16, 2087, 7, 10, 755, 837, 17096, 9107, 617, 709, 111, 16, 540, 87, 130, 1788, 36, 245, 7203, 43, 30, 921, 111, 31, 5, 11302, 6584, ...]","[0, 35129, 33, 373, 13, 9047, 814, 2876, 1449, 14, 10, 92, 2862, 33659, 12853, 1082, 16, 145, 1412, 23, 5, 253, 9, 49, 921, 11, 15252, 4, 2]",Residents have called for urgent action amid claims that a new mass Traveller site is being created at the end of their road in Essex.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old, who scored once in 27 games last term, netted in Sundays 2-0 friendly win at Sheffield Wednesday. I always said if I play where I did for my last club I could get goals but I havent really had that chance since I came here, he told Rangers website. Hopefully I can play in a bit more of an attacking position this season and get some more goals in the league. Midfielder Windass, playing in an advanced role, scored 17 goals for Accrington in season 2015/16 before moving to Rangers in the summer of 2016. He has scored in closed door friendly matches this summer and was delighted to hit the target in front of a health crowd as the Ibrox side rounded off their pre-season with a win against Championship outfit Wednesday. I was pleased to get a goal - it has been a long time since I scored my last one, he added. Obviously it is only a friendly so it doesnt mean that much but its nice to get off the mark. I dont think I had a point to prove this pre-season. I have no idea how the manager is thinking but I can only keep playing how I have been playing. I have scored a few goals in pre-season so hopefully that is enough to get in the team. Following their shock Europa League exit at the hands of Luxembourg side Progres Niederkorn, Rangers have drawn 1-1 with Marseille, beaten Watford 2-1 and saw off Sheffield Wednesday in friendly matches. With their season kicking off away to Motherwell on 6 August, Rangers manager Pedro Caixinha believes his new-look side are clicking into gear. The last three games, Marseille, Watford and today Sheffield Wednesday, were fantastic for us to get our cohesion, to get our ideas, to add everything in and get our confidence and our belief, the Portuguese told the Rangers website. The boys have been making a fantastic effort in order to keep focused and look forward, and today they had their bonus. We knew since the very beginning we are not the worst team in the world and we are not the best one, but we need to keep this focus and this approach to the game.",40773606,"[0, 133, 883, 12, 180, 12, 279, 6, 54, 1008, 683, 11, 974, 426, 94, 1385, 6, 15825, 11, 17429, 132, 12, 288, 5192, 339, 23, 9667, 307, 4, 38, 460, 26, 114, 38, 310, 147, 38, 222, 13, 127, 94, 950, 38, 115, 120, 1175, 53, 38, 2489, 9399, 269, 56, 14, 778, 187, 38, 376, 259, 6, 37, 174, 5706, 998, 4, 13088, 38, 64, 310, 11, 10, 828, 55, 9, 41, 6666, 737, 42, 191, 8, 120, 103, 55, 1175, 11, 5, 1267, 4, 4079, 1399, 254, 7247, 2401, 6, 816, 11, 41, 3319, 774, 6, 1008, ...]","[0, 32879, 7247, 2401, 9838, 37, 40, 1606, 1175, 7, 39, 177, 114, 37, 16, 4507, 5, 6666, 774, 37, 19656, 3677, 10, 5706, 4, 2]",Josh Windass insists he will add goals to his game if he is handed the attacking role he craves a Rangers.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The East Antrim MP had been considering putting his name forward after Nigel Dodds ruled himself out. Finance Minister Arlene Foster is the only person so far to declare an interest and has the backing of a majority of the partys most senior elected representatives. Nominations to become the next leader of the DUP close later on Wednesday. Peter Robinson announced in November that he was standing down as party leader. On Tuesday, Mrs Foster said she was very humbled by the support she has received from party colleagues. Mr Wilson thought long and hard about his decision and spoke to among others, party colleagues Mrs Foster and Mr Dodds. In the end he felt it would be in the best interests of the party that he did not put his name forward. He felt he wanted to make sure there was a smooth transition and that means almost certainly now that Arlene Foster will be the new DUP leader and the new first minister at Stormont. She said she looked forward to leading the DUP, if that was the partys wish. Mrs Foster said she had hoped to work with Mr Dodds as a team. We will still hopefully work together as a team and that is certainly my wish for the future, she said. In a tweet on Monday night, Mr Robinson said he had received a valid nomination from Mrs Foster for the post of DUP leader. Arlenes nomination was submitted with the support of over 75% of those entitled to vote in the electoral college, he added.",35048629,"[0, 133, 953, 3702, 6103, 3957, 56, 57, 2811, 2057, 39, 766, 556, 71, 16734, 18753, 29, 3447, 1003, 66, 4, 4090, 692, 1586, 24398, 8436, 16, 5, 129, 621, 98, 444, 7, 10152, 41, 773, 8, 34, 5, 6027, 9, 10, 1647, 9, 5, 233, 2459, 144, 949, 2736, 4844, 4, 234, 18121, 1635, 7, 555, 5, 220, 884, 9, 5, 22598, 593, 423, 15, 307, 4, 2155, 5380, 585, 11, 759, 14, 37, 21, 2934, 159, 25, 537, 884, 4, 374, 294, 6, 3801, 8436, 26, 79, 21, 182, 10080, 10288, 30, 5, 323, 79, 34, 829, 31, ...]","[0, 21169, 4783, 3095, 34, 26, 37, 40, 45, 28, 878, 13, 5, 22598, 1673, 4, 2]",Sammy Wilson has said he will not be running for the DUP leadership.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","As first reported by The Express, the settled community in the Hovefields area of Wickford reported hardcore-laden lorries arrive at the weekend. The lorries were followed by large mobile homes. Basildon Council said it is aware of an alleged breach of planning laws. Essex Police has also been informed. The Hovefield site - which is subject to a High Court injunction preventing further development - is less than three miles (5km) by road - from the Dale Farm traveller site in Wickford. Dale Farm was Europes largest traveller site before about 80 families were evicted from unlawful plots in 2011. Jill Walsh, of the Hovefields Residents Association, said an English Traveller family in five caravans left the site on Friday. After they departed, she said, a number of large lorries carrying concrete, hardcore and three mobile homes drove down Hovefields Avenue and onto the five acre field at the end of the road. Mrs Walsh said because of the narrowness of the road one of their neighbours - an elderly couple - had their fence ripped out and shrubbery damaged so that the lorries could get through. If the council does not prosecute over this and deal with the situation urgently they will have a Dale Farm II, but bigger. Phil Turner, leader of Basildon Council, said: Basildon Council is aware of an alleged breach of planning laws in the Hovefields area. We share the frustrations of residents, but the council does not have powers of arrest and must follow the proper legal process. As a public body, we must act within the existing legal framework, as set out by Parliament, and this adds considerable time and cost in dealing with such situations. However, residents can be assured that the council is taking all appropriate steps to deal with unauthorised development. Essex Police said it was investigating a criminal damage report involving the fence and has urged any witnesses to contact them. A police spokesman said: Essex Police is aware of an unauthorised traveller development on land near Hovefields Avenue. We are liaising with the local authority and will continue to monitor the situation.",39195455,"[0, 1620, 78, 431, 30, 20, 3619, 6, 5, 5668, 435, 11, 5, 289, 7067, 21346, 443, 9, 18063, 1891, 431, 27482, 12, 20724, 784, 368, 4458, 5240, 23, 5, 983, 4, 20, 784, 368, 4458, 58, 1432, 30, 739, 1830, 1611, 4, 7093, 9683, 261, 1080, 26, 24, 16, 2542, 9, 41, 1697, 6999, 9, 1884, 2074, 4, 15252, 522, 34, 67, 57, 3978, 4, 20, 289, 7067, 1399, 1082, 111, 61, 16, 2087, 7, 10, 755, 837, 17096, 9107, 617, 709, 111, 16, 540, 87, 130, 1788, 36, 245, 7203, 43, 30, 921, 111, 31, 5, 11302, 6584, ...]","[0, 35129, 33, 373, 13, 9047, 814, 2876, 1449, 14, 10, 92, 2862, 33659, 12853, 1082, 16, 145, 1412, 23, 5, 253, 9, 49, 921, 11, 15252, 4, 2]",Residents have called for urgent action amid claims that a new mass Traveller site is being created at the end of their road in Essex.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old, who scored once in 27 games last term, netted in Sundays 2-0 friendly win at Sheffield Wednesday. I always said if I play where I did for my last club I could get goals but I havent really had that chance since I came here, he told Rangers website. Hopefully I can play in a bit more of an attacking position this season and get some more goals in the league. Midfielder Windass, playing in an advanced role, scored 17 goals for Accrington in season 2015/16 before moving to Rangers in the summer of 2016. He has scored in closed door friendly matches this summer and was delighted to hit the target in front of a health crowd as the Ibrox side rounded off their pre-season with a win against Championship outfit Wednesday. I was pleased to get a goal - it has been a long time since I scored my last one, he added. Obviously it is only a friendly so it doesnt mean that much but its nice to get off the mark. I dont think I had a point to prove this pre-season. I have no idea how the manager is thinking but I can only keep playing how I have been playing. I have scored a few goals in pre-season so hopefully that is enough to get in the team. Following their shock Europa League exit at the hands of Luxembourg side Progres Niederkorn, Rangers have drawn 1-1 with Marseille, beaten Watford 2-1 and saw off Sheffield Wednesday in friendly matches. With their season kicking off away to Motherwell on 6 August, Rangers manager Pedro Caixinha believes his new-look side are clicking into gear. The last three games, Marseille, Watford and today Sheffield Wednesday, were fantastic for us to get our cohesion, to get our ideas, to add everything in and get our confidence and our belief, the Portuguese told the Rangers website. The boys have been making a fantastic effort in order to keep focused and look forward, and today they had their bonus. We knew since the very beginning we are not the worst team in the world and we are not the best one, but we need to keep this focus and this approach to the game.",40773606,"[0, 133, 883, 12, 180, 12, 279, 6, 54, 1008, 683, 11, 974, 426, 94, 1385, 6, 15825, 11, 17429, 132, 12, 288, 5192, 339, 23, 9667, 307, 4, 38, 460, 26, 114, 38, 310, 147, 38, 222, 13, 127, 94, 950, 38, 115, 120, 1175, 53, 38, 2489, 9399, 269, 56, 14, 778, 187, 38, 376, 259, 6, 37, 174, 5706, 998, 4, 13088, 38, 64, 310, 11, 10, 828, 55, 9, 41, 6666, 737, 42, 191, 8, 120, 103, 55, 1175, 11, 5, 1267, 4, 4079, 1399, 254, 7247, 2401, 6, 816, 11, 41, 3319, 774, 6, 1008, ...]","[0, 32879, 7247, 2401, 9838, 37, 40, 1606, 1175, 7, 39, 177, 114, 37, 16, 4507, 5, 6666, 774, 37, 19656, 3677, 10, 5706, 4, 2]",Josh Windass insists he will add goals to his game if he is handed the attacking role he craves a Rangers.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The East Antrim MP had been considering putting his name forward after Nigel Dodds ruled himself out. Finance Minister Arlene Foster is the only person so far to declare an interest and has the backing of a majority of the partys most senior elected representatives. Nominations to become the next leader of the DUP close later on Wednesday. Peter Robinson announced in November that he was standing down as party leader. On Tuesday, Mrs Foster said she was very humbled by the support she has received from party colleagues. Mr Wilson thought long and hard about his decision and spoke to among others, party colleagues Mrs Foster and Mr Dodds. In the end he felt it would be in the best interests of the party that he did not put his name forward. He felt he wanted to make sure there was a smooth transition and that means almost certainly now that Arlene Foster will be the new DUP leader and the new first minister at Stormont. She said she looked forward to leading the DUP, if that was the partys wish. Mrs Foster said she had hoped to work with Mr Dodds as a team. We will still hopefully work together as a team and that is certainly my wish for the future, she said. In a tweet on Monday night, Mr Robinson said he had received a valid nomination from Mrs Foster for the post of DUP leader. Arlenes nomination was submitted with the support of over 75% of those entitled to vote in the electoral college, he added.",35048629,"[0, 133, 953, 3702, 6103, 3957, 56, 57, 2811, 2057, 39, 766, 556, 71, 16734, 18753, 29, 3447, 1003, 66, 4, 4090, 692, 1586, 24398, 8436, 16, 5, 129, 621, 98, 444, 7, 10152, 41, 773, 8, 34, 5, 6027, 9, 10, 1647, 9, 5, 233, 2459, 144, 949, 2736, 4844, 4, 234, 18121, 1635, 7, 555, 5, 220, 884, 9, 5, 22598, 593, 423, 15, 307, 4, 2155, 5380, 585, 11, 759, 14, 37, 21, 2934, 159, 25, 537, 884, 4, 374, 294, 6, 3801, 8436, 26, 79, 21, 182, 10080, 10288, 30, 5, 323, 79, 34, 829, 31, ...]","[0, 21169, 4783, 3095, 34, 26, 37, 40, 45, 28, 878, 13, 5, 22598, 1673, 4, 2]",Sammy Wilson has said he will not be running for the DUP leadership.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","As first reported by The Express, the settled community in the Hovefields area of Wickford reported hardcore-laden lorries arrive at the weekend. The lorries were followed by large mobile homes. Basildon Council said it is aware of an alleged breach of planning laws. Essex Police has also been informed. The Hovefield site - which is subject to a High Court injunction preventing further development - is less than three miles (5km) by road - from the Dale Farm traveller site in Wickford. Dale Farm was Europes largest traveller site before about 80 families were evicted from unlawful plots in 2011. Jill Walsh, of the Hovefields Residents Association, said an English Traveller family in five caravans left the site on Friday. After they departed, she said, a number of large lorries carrying concrete, hardcore and three mobile homes drove down Hovefields Avenue and onto the five acre field at the end of the road. Mrs Walsh said because of the narrowness of the road one of their neighbours - an elderly couple - had their fence ripped out and shrubbery damaged so that the lorries could get through. If the council does not prosecute over this and deal with the situation urgently they will have a Dale Farm II, but bigger. Phil Turner, leader of Basildon Council, said: Basildon Council is aware of an alleged breach of planning laws in the Hovefields area. We share the frustrations of residents, but the council does not have powers of arrest and must follow the proper legal process. As a public body, we must act within the existing legal framework, as set out by Parliament, and this adds considerable time and cost in dealing with such situations. However, residents can be assured that the council is taking all appropriate steps to deal with unauthorised development. Essex Police said it was investigating a criminal damage report involving the fence and has urged any witnesses to contact them. A police spokesman said: Essex Police is aware of an unauthorised traveller development on land near Hovefields Avenue. We are liaising with the local authority and will continue to monitor the situation.",39195455,"[0, 1620, 78, 431, 30, 20, 3619, 6, 5, 5668, 435, 11, 5, 289, 7067, 21346, 443, 9, 18063, 1891, 431, 27482, 12, 20724, 784, 368, 4458, 5240, 23, 5, 983, 4, 20, 784, 368, 4458, 58, 1432, 30, 739, 1830, 1611, 4, 7093, 9683, 261, 1080, 26, 24, 16, 2542, 9, 41, 1697, 6999, 9, 1884, 2074, 4, 15252, 522, 34, 67, 57, 3978, 4, 20, 289, 7067, 1399, 1082, 111, 61, 16, 2087, 7, 10, 755, 837, 17096, 9107, 617, 709, 111, 16, 540, 87, 130, 1788, 36, 245, 7203, 43, 30, 921, 111, 31, 5, 11302, 6584, ...]","[0, 35129, 33, 373, 13, 9047, 814, 2876, 1449, 14, 10, 92, 2862, 33659, 12853, 1082, 16, 145, 1412, 23, 5, 253, 9, 49, 921, 11, 15252, 4, 2]",Residents have called for urgent action amid claims that a new mass Traveller site is being created at the end of their road in Essex.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old, who scored once in 27 games last term, netted in Sundays 2-0 friendly win at Sheffield Wednesday. I always said if I play where I did for my last club I could get goals but I havent really had that chance since I came here, he told Rangers website. Hopefully I can play in a bit more of an attacking position this season and get some more goals in the league. Midfielder Windass, playing in an advanced role, scored 17 goals for Accrington in season 2015/16 before moving to Rangers in the summer of 2016. He has scored in closed door friendly matches this summer and was delighted to hit the target in front of a health crowd as the Ibrox side rounded off their pre-season with a win against Championship outfit Wednesday. I was pleased to get a goal - it has been a long time since I scored my last one, he added. Obviously it is only a friendly so it doesnt mean that much but its nice to get off the mark. I dont think I had a point to prove this pre-season. I have no idea how the manager is thinking but I can only keep playing how I have been playing. I have scored a few goals in pre-season so hopefully that is enough to get in the team. Following their shock Europa League exit at the hands of Luxembourg side Progres Niederkorn, Rangers have drawn 1-1 with Marseille, beaten Watford 2-1 and saw off Sheffield Wednesday in friendly matches. With their season kicking off away to Motherwell on 6 August, Rangers manager Pedro Caixinha believes his new-look side are clicking into gear. The last three games, Marseille, Watford and today Sheffield Wednesday, were fantastic for us to get our cohesion, to get our ideas, to add everything in and get our confidence and our belief, the Portuguese told the Rangers website. The boys have been making a fantastic effort in order to keep focused and look forward, and today they had their bonus. We knew since the very beginning we are not the worst team in the world and we are not the best one, but we need to keep this focus and this approach to the game.",40773606,"[0, 133, 883, 12, 180, 12, 279, 6, 54, 1008, 683, 11, 974, 426, 94, 1385, 6, 15825, 11, 17429, 132, 12, 288, 5192, 339, 23, 9667, 307, 4, 38, 460, 26, 114, 38, 310, 147, 38, 222, 13, 127, 94, 950, 38, 115, 120, 1175, 53, 38, 2489, 9399, 269, 56, 14, 778, 187, 38, 376, 259, 6, 37, 174, 5706, 998, 4, 13088, 38, 64, 310, 11, 10, 828, 55, 9, 41, 6666, 737, 42, 191, 8, 120, 103, 55, 1175, 11, 5, 1267, 4, 4079, 1399, 254, 7247, 2401, 6, 816, 11, 41, 3319, 774, 6, 1008, ...]","[0, 32879, 7247, 2401, 9838, 37, 40, 1606, 1175, 7, 39, 177, 114, 37, 16, 4507, 5, 6666, 774, 37, 19656, 3677, 10, 5706, 4, 2]",Josh Windass insists he will add goals to his game if he is handed the attacking role he craves a Rangers.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The East Antrim MP had been considering putting his name forward after Nigel Dodds ruled himself out. Finance Minister Arlene Foster is the only person so far to declare an interest and has the backing of a majority of the partys most senior elected representatives. Nominations to become the next leader of the DUP close later on Wednesday. Peter Robinson announced in November that he was standing down as party leader. On Tuesday, Mrs Foster said she was very humbled by the support she has received from party colleagues. Mr Wilson thought long and hard about his decision and spoke to among others, party colleagues Mrs Foster and Mr Dodds. In the end he felt it would be in the best interests of the party that he did not put his name forward. He felt he wanted to make sure there was a smooth transition and that means almost certainly now that Arlene Foster will be the new DUP leader and the new first minister at Stormont. She said she looked forward to leading the DUP, if that was the partys wish. Mrs Foster said she had hoped to work with Mr Dodds as a team. We will still hopefully work together as a team and that is certainly my wish for the future, she said. In a tweet on Monday night, Mr Robinson said he had received a valid nomination from Mrs Foster for the post of DUP leader. Arlenes nomination was submitted with the support of over 75% of those entitled to vote in the electoral college, he added.",35048629,"[0, 133, 953, 3702, 6103, 3957, 56, 57, 2811, 2057, 39, 766, 556, 71, 16734, 18753, 29, 3447, 1003, 66, 4, 4090, 692, 1586, 24398, 8436, 16, 5, 129, 621, 98, 444, 7, 10152, 41, 773, 8, 34, 5, 6027, 9, 10, 1647, 9, 5, 233, 2459, 144, 949, 2736, 4844, 4, 234, 18121, 1635, 7, 555, 5, 220, 884, 9, 5, 22598, 593, 423, 15, 307, 4, 2155, 5380, 585, 11, 759, 14, 37, 21, 2934, 159, 25, 537, 884, 4, 374, 294, 6, 3801, 8436, 26, 79, 21, 182, 10080, 10288, 30, 5, 323, 79, 34, 829, 31, ...]","[0, 21169, 4783, 3095, 34, 26, 37, 40, 45, 28, 878, 13, 5, 22598, 1673, 4, 2]",Sammy Wilson has said he will not be running for the DUP leadership.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","As first reported by The Express, the settled community in the Hovefields area of Wickford reported hardcore-laden lorries arrive at the weekend. The lorries were followed by large mobile homes. Basildon Council said it is aware of an alleged breach of planning laws. Essex Police has also been informed. The Hovefield site - which is subject to a High Court injunction preventing further development - is less than three miles (5km) by road - from the Dale Farm traveller site in Wickford. Dale Farm was Europes largest traveller site before about 80 families were evicted from unlawful plots in 2011. Jill Walsh, of the Hovefields Residents Association, said an English Traveller family in five caravans left the site on Friday. After they departed, she said, a number of large lorries carrying concrete, hardcore and three mobile homes drove down Hovefields Avenue and onto the five acre field at the end of the road. Mrs Walsh said because of the narrowness of the road one of their neighbours - an elderly couple - had their fence ripped out and shrubbery damaged so that the lorries could get through. If the council does not prosecute over this and deal with the situation urgently they will have a Dale Farm II, but bigger. Phil Turner, leader of Basildon Council, said: Basildon Council is aware of an alleged breach of planning laws in the Hovefields area. We share the frustrations of residents, but the council does not have powers of arrest and must follow the proper legal process. As a public body, we must act within the existing legal framework, as set out by Parliament, and this adds considerable time and cost in dealing with such situations. However, residents can be assured that the council is taking all appropriate steps to deal with unauthorised development. Essex Police said it was investigating a criminal damage report involving the fence and has urged any witnesses to contact them. A police spokesman said: Essex Police is aware of an unauthorised traveller development on land near Hovefields Avenue. We are liaising with the local authority and will continue to monitor the situation.",39195455,"[0, 1620, 78, 431, 30, 20, 3619, 6, 5, 5668, 435, 11, 5, 289, 7067, 21346, 443, 9, 18063, 1891, 431, 27482, 12, 20724, 784, 368, 4458, 5240, 23, 5, 983, 4, 20, 784, 368, 4458, 58, 1432, 30, 739, 1830, 1611, 4, 7093, 9683, 261, 1080, 26, 24, 16, 2542, 9, 41, 1697, 6999, 9, 1884, 2074, 4, 15252, 522, 34, 67, 57, 3978, 4, 20, 289, 7067, 1399, 1082, 111, 61, 16, 2087, 7, 10, 755, 837, 17096, 9107, 617, 709, 111, 16, 540, 87, 130, 1788, 36, 245, 7203, 43, 30, 921, 111, 31, 5, 11302, 6584, ...]","[0, 35129, 33, 373, 13, 9047, 814, 2876, 1449, 14, 10, 92, 2862, 33659, 12853, 1082, 16, 145, 1412, 23, 5, 253, 9, 49, 921, 11, 15252, 4, 2]",Residents have called for urgent action amid claims that a new mass Traveller site is being created at the end of their road in Essex.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old, who scored once in 27 games last term, netted in Sundays 2-0 friendly win at Sheffield Wednesday. I always said if I play where I did for my last club I could get goals but I havent really had that chance since I came here, he told Rangers website. Hopefully I can play in a bit more of an attacking position this season and get some more goals in the league. Midfielder Windass, playing in an advanced role, scored 17 goals for Accrington in season 2015/16 before moving to Rangers in the summer of 2016. He has scored in closed door friendly matches this summer and was delighted to hit the target in front of a health crowd as the Ibrox side rounded off their pre-season with a win against Championship outfit Wednesday. I was pleased to get a goal - it has been a long time since I scored my last one, he added. Obviously it is only a friendly so it doesnt mean that much but its nice to get off the mark. I dont think I had a point to prove this pre-season. I have no idea how the manager is thinking but I can only keep playing how I have been playing. I have scored a few goals in pre-season so hopefully that is enough to get in the team. Following their shock Europa League exit at the hands of Luxembourg side Progres Niederkorn, Rangers have drawn 1-1 with Marseille, beaten Watford 2-1 and saw off Sheffield Wednesday in friendly matches. With their season kicking off away to Motherwell on 6 August, Rangers manager Pedro Caixinha believes his new-look side are clicking into gear. The last three games, Marseille, Watford and today Sheffield Wednesday, were fantastic for us to get our cohesion, to get our ideas, to add everything in and get our confidence and our belief, the Portuguese told the Rangers website. The boys have been making a fantastic effort in order to keep focused and look forward, and today they had their bonus. We knew since the very beginning we are not the worst team in the world and we are not the best one, but we need to keep this focus and this approach to the game.",40773606,"[0, 133, 883, 12, 180, 12, 279, 6, 54, 1008, 683, 11, 974, 426, 94, 1385, 6, 15825, 11, 17429, 132, 12, 288, 5192, 339, 23, 9667, 307, 4, 38, 460, 26, 114, 38, 310, 147, 38, 222, 13, 127, 94, 950, 38, 115, 120, 1175, 53, 38, 2489, 9399, 269, 56, 14, 778, 187, 38, 376, 259, 6, 37, 174, 5706, 998, 4, 13088, 38, 64, 310, 11, 10, 828, 55, 9, 41, 6666, 737, 42, 191, 8, 120, 103, 55, 1175, 11, 5, 1267, 4, 4079, 1399, 254, 7247, 2401, 6, 816, 11, 41, 3319, 774, 6, 1008, ...]","[0, 32879, 7247, 2401, 9838, 37, 40, 1606, 1175, 7, 39, 177, 114, 37, 16, 4507, 5, 6666, 774, 37, 19656, 3677, 10, 5706, 4, 2]",Josh Windass insists he will add goals to his game if he is handed the attacking role he craves a Rangers.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The East Antrim MP had been considering putting his name forward after Nigel Dodds ruled himself out. Finance Minister Arlene Foster is the only person so far to declare an interest and has the backing of a majority of the partys most senior elected representatives. Nominations to become the next leader of the DUP close later on Wednesday. Peter Robinson announced in November that he was standing down as party leader. On Tuesday, Mrs Foster said she was very humbled by the support she has received from party colleagues. Mr Wilson thought long and hard about his decision and spoke to among others, party colleagues Mrs Foster and Mr Dodds. In the end he felt it would be in the best interests of the party that he did not put his name forward. He felt he wanted to make sure there was a smooth transition and that means almost certainly now that Arlene Foster will be the new DUP leader and the new first minister at Stormont. She said she looked forward to leading the DUP, if that was the partys wish. Mrs Foster said she had hoped to work with Mr Dodds as a team. We will still hopefully work together as a team and that is certainly my wish for the future, she said. In a tweet on Monday night, Mr Robinson said he had received a valid nomination from Mrs Foster for the post of DUP leader. Arlenes nomination was submitted with the support of over 75% of those entitled to vote in the electoral college, he added.",35048629,"[0, 133, 953, 3702, 6103, 3957, 56, 57, 2811, 2057, 39, 766, 556, 71, 16734, 18753, 29, 3447, 1003, 66, 4, 4090, 692, 1586, 24398, 8436, 16, 5, 129, 621, 98, 444, 7, 10152, 41, 773, 8, 34, 5, 6027, 9, 10, 1647, 9, 5, 233, 2459, 144, 949, 2736, 4844, 4, 234, 18121, 1635, 7, 555, 5, 220, 884, 9, 5, 22598, 593, 423, 15, 307, 4, 2155, 5380, 585, 11, 759, 14, 37, 21, 2934, 159, 25, 537, 884, 4, 374, 294, 6, 3801, 8436, 26, 79, 21, 182, 10080, 10288, 30, 5, 323, 79, 34, 829, 31, ...]","[0, 21169, 4783, 3095, 34, 26, 37, 40, 45, 28, 878, 13, 5, 22598, 1673, 4, 2]",Sammy Wilson has said he will not be running for the DUP leadership.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","As first reported by The Express, the settled community in the Hovefields area of Wickford reported hardcore-laden lorries arrive at the weekend. The lorries were followed by large mobile homes. Basildon Council said it is aware of an alleged breach of planning laws. Essex Police has also been informed. The Hovefield site - which is subject to a High Court injunction preventing further development - is less than three miles (5km) by road - from the Dale Farm traveller site in Wickford. Dale Farm was Europes largest traveller site before about 80 families were evicted from unlawful plots in 2011. Jill Walsh, of the Hovefields Residents Association, said an English Traveller family in five caravans left the site on Friday. After they departed, she said, a number of large lorries carrying concrete, hardcore and three mobile homes drove down Hovefields Avenue and onto the five acre field at the end of the road. Mrs Walsh said because of the narrowness of the road one of their neighbours - an elderly couple - had their fence ripped out and shrubbery damaged so that the lorries could get through. If the council does not prosecute over this and deal with the situation urgently they will have a Dale Farm II, but bigger. Phil Turner, leader of Basildon Council, said: Basildon Council is aware of an alleged breach of planning laws in the Hovefields area. We share the frustrations of residents, but the council does not have powers of arrest and must follow the proper legal process. As a public body, we must act within the existing legal framework, as set out by Parliament, and this adds considerable time and cost in dealing with such situations. However, residents can be assured that the council is taking all appropriate steps to deal with unauthorised development. Essex Police said it was investigating a criminal damage report involving the fence and has urged any witnesses to contact them. A police spokesman said: Essex Police is aware of an unauthorised traveller development on land near Hovefields Avenue. We are liaising with the local authority and will continue to monitor the situation.",39195455,"[0, 1620, 78, 431, 30, 20, 3619, 6, 5, 5668, 435, 11, 5, 289, 7067, 21346, 443, 9, 18063, 1891, 431, 27482, 12, 20724, 784, 368, 4458, 5240, 23, 5, 983, 4, 20, 784, 368, 4458, 58, 1432, 30, 739, 1830, 1611, 4, 7093, 9683, 261, 1080, 26, 24, 16, 2542, 9, 41, 1697, 6999, 9, 1884, 2074, 4, 15252, 522, 34, 67, 57, 3978, 4, 20, 289, 7067, 1399, 1082, 111, 61, 16, 2087, 7, 10, 755, 837, 17096, 9107, 617, 709, 111, 16, 540, 87, 130, 1788, 36, 245, 7203, 43, 30, 921, 111, 31, 5, 11302, 6584, ...]","[0, 35129, 33, 373, 13, 9047, 814, 2876, 1449, 14, 10, 92, 2862, 33659, 12853, 1082, 16, 145, 1412, 23, 5, 253, 9, 49, 921, 11, 15252, 4, 2]",Residents have called for urgent action amid claims that a new mass Traveller site is being created at the end of their road in Essex.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old, who scored once in 27 games last term, netted in Sundays 2-0 friendly win at Sheffield Wednesday. I always said if I play where I did for my last club I could get goals but I havent really had that chance since I came here, he told Rangers website. Hopefully I can play in a bit more of an attacking position this season and get some more goals in the league. Midfielder Windass, playing in an advanced role, scored 17 goals for Accrington in season 2015/16 before moving to Rangers in the summer of 2016. He has scored in closed door friendly matches this summer and was delighted to hit the target in front of a health crowd as the Ibrox side rounded off their pre-season with a win against Championship outfit Wednesday. I was pleased to get a goal - it has been a long time since I scored my last one, he added. Obviously it is only a friendly so it doesnt mean that much but its nice to get off the mark. I dont think I had a point to prove this pre-season. I have no idea how the manager is thinking but I can only keep playing how I have been playing. I have scored a few goals in pre-season so hopefully that is enough to get in the team. Following their shock Europa League exit at the hands of Luxembourg side Progres Niederkorn, Rangers have drawn 1-1 with Marseille, beaten Watford 2-1 and saw off Sheffield Wednesday in friendly matches. With their season kicking off away to Motherwell on 6 August, Rangers manager Pedro Caixinha believes his new-look side are clicking into gear. The last three games, Marseille, Watford and today Sheffield Wednesday, were fantastic for us to get our cohesion, to get our ideas, to add everything in and get our confidence and our belief, the Portuguese told the Rangers website. The boys have been making a fantastic effort in order to keep focused and look forward, and today they had their bonus. We knew since the very beginning we are not the worst team in the world and we are not the best one, but we need to keep this focus and this approach to the game.",40773606,"[0, 133, 883, 12, 180, 12, 279, 6, 54, 1008, 683, 11, 974, 426, 94, 1385, 6, 15825, 11, 17429, 132, 12, 288, 5192, 339, 23, 9667, 307, 4, 38, 460, 26, 114, 38, 310, 147, 38, 222, 13, 127, 94, 950, 38, 115, 120, 1175, 53, 38, 2489, 9399, 269, 56, 14, 778, 187, 38, 376, 259, 6, 37, 174, 5706, 998, 4, 13088, 38, 64, 310, 11, 10, 828, 55, 9, 41, 6666, 737, 42, 191, 8, 120, 103, 55, 1175, 11, 5, 1267, 4, 4079, 1399, 254, 7247, 2401, 6, 816, 11, 41, 3319, 774, 6, 1008, ...]","[0, 32879, 7247, 2401, 9838, 37, 40, 1606, 1175, 7, 39, 177, 114, 37, 16, 4507, 5, 6666, 774, 37, 19656, 3677, 10, 5706, 4, 2]",Josh Windass insists he will add goals to his game if he is handed the attacking role he craves a Rangers.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The East Antrim MP had been considering putting his name forward after Nigel Dodds ruled himself out. Finance Minister Arlene Foster is the only person so far to declare an interest and has the backing of a majority of the partys most senior elected representatives. Nominations to become the next leader of the DUP close later on Wednesday. Peter Robinson announced in November that he was standing down as party leader. On Tuesday, Mrs Foster said she was very humbled by the support she has received from party colleagues. Mr Wilson thought long and hard about his decision and spoke to among others, party colleagues Mrs Foster and Mr Dodds. In the end he felt it would be in the best interests of the party that he did not put his name forward. He felt he wanted to make sure there was a smooth transition and that means almost certainly now that Arlene Foster will be the new DUP leader and the new first minister at Stormont. She said she looked forward to leading the DUP, if that was the partys wish. Mrs Foster said she had hoped to work with Mr Dodds as a team. We will still hopefully work together as a team and that is certainly my wish for the future, she said. In a tweet on Monday night, Mr Robinson said he had received a valid nomination from Mrs Foster for the post of DUP leader. Arlenes nomination was submitted with the support of over 75% of those entitled to vote in the electoral college, he added.",35048629,"[0, 133, 953, 3702, 6103, 3957, 56, 57, 2811, 2057, 39, 766, 556, 71, 16734, 18753, 29, 3447, 1003, 66, 4, 4090, 692, 1586, 24398, 8436, 16, 5, 129, 621, 98, 444, 7, 10152, 41, 773, 8, 34, 5, 6027, 9, 10, 1647, 9, 5, 233, 2459, 144, 949, 2736, 4844, 4, 234, 18121, 1635, 7, 555, 5, 220, 884, 9, 5, 22598, 593, 423, 15, 307, 4, 2155, 5380, 585, 11, 759, 14, 37, 21, 2934, 159, 25, 537, 884, 4, 374, 294, 6, 3801, 8436, 26, 79, 21, 182, 10080, 10288, 30, 5, 323, 79, 34, 829, 31, ...]","[0, 21169, 4783, 3095, 34, 26, 37, 40, 45, 28, 878, 13, 5, 22598, 1673, 4, 2]",Sammy Wilson has said he will not be running for the DUP leadership.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","As first reported by The Express, the settled community in the Hovefields area of Wickford reported hardcore-laden lorries arrive at the weekend. The lorries were followed by large mobile homes. Basildon Council said it is aware of an alleged breach of planning laws. Essex Police has also been informed. The Hovefield site - which is subject to a High Court injunction preventing further development - is less than three miles (5km) by road - from the Dale Farm traveller site in Wickford. Dale Farm was Europes largest traveller site before about 80 families were evicted from unlawful plots in 2011. Jill Walsh, of the Hovefields Residents Association, said an English Traveller family in five caravans left the site on Friday. After they departed, she said, a number of large lorries carrying concrete, hardcore and three mobile homes drove down Hovefields Avenue and onto the five acre field at the end of the road. Mrs Walsh said because of the narrowness of the road one of their neighbours - an elderly couple - had their fence ripped out and shrubbery damaged so that the lorries could get through. If the council does not prosecute over this and deal with the situation urgently they will have a Dale Farm II, but bigger. Phil Turner, leader of Basildon Council, said: Basildon Council is aware of an alleged breach of planning laws in the Hovefields area. We share the frustrations of residents, but the council does not have powers of arrest and must follow the proper legal process. As a public body, we must act within the existing legal framework, as set out by Parliament, and this adds considerable time and cost in dealing with such situations. However, residents can be assured that the council is taking all appropriate steps to deal with unauthorised development. Essex Police said it was investigating a criminal damage report involving the fence and has urged any witnesses to contact them. A police spokesman said: Essex Police is aware of an unauthorised traveller development on land near Hovefields Avenue. We are liaising with the local authority and will continue to monitor the situation.",39195455,"[0, 1620, 78, 431, 30, 20, 3619, 6, 5, 5668, 435, 11, 5, 289, 7067, 21346, 443, 9, 18063, 1891, 431, 27482, 12, 20724, 784, 368, 4458, 5240, 23, 5, 983, 4, 20, 784, 368, 4458, 58, 1432, 30, 739, 1830, 1611, 4, 7093, 9683, 261, 1080, 26, 24, 16, 2542, 9, 41, 1697, 6999, 9, 1884, 2074, 4, 15252, 522, 34, 67, 57, 3978, 4, 20, 289, 7067, 1399, 1082, 111, 61, 16, 2087, 7, 10, 755, 837, 17096, 9107, 617, 709, 111, 16, 540, 87, 130, 1788, 36, 245, 7203, 43, 30, 921, 111, 31, 5, 11302, 6584, ...]","[0, 35129, 33, 373, 13, 9047, 814, 2876, 1449, 14, 10, 92, 2862, 33659, 12853, 1082, 16, 145, 1412, 23, 5, 253, 9, 49, 921, 11, 15252, 4, 2]",Residents have called for urgent action amid claims that a new mass Traveller site is being created at the end of their road in Essex.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old, who scored once in 27 games last term, netted in Sundays 2-0 friendly win at Sheffield Wednesday. I always said if I play where I did for my last club I could get goals but I havent really had that chance since I came here, he told Rangers website. Hopefully I can play in a bit more of an attacking position this season and get some more goals in the league. Midfielder Windass, playing in an advanced role, scored 17 goals for Accrington in season 2015/16 before moving to Rangers in the summer of 2016. He has scored in closed door friendly matches this summer and was delighted to hit the target in front of a health crowd as the Ibrox side rounded off their pre-season with a win against Championship outfit Wednesday. I was pleased to get a goal - it has been a long time since I scored my last one, he added. Obviously it is only a friendly so it doesnt mean that much but its nice to get off the mark. I dont think I had a point to prove this pre-season. I have no idea how the manager is thinking but I can only keep playing how I have been playing. I have scored a few goals in pre-season so hopefully that is enough to get in the team. Following their shock Europa League exit at the hands of Luxembourg side Progres Niederkorn, Rangers have drawn 1-1 with Marseille, beaten Watford 2-1 and saw off Sheffield Wednesday in friendly matches. With their season kicking off away to Motherwell on 6 August, Rangers manager Pedro Caixinha believes his new-look side are clicking into gear. The last three games, Marseille, Watford and today Sheffield Wednesday, were fantastic for us to get our cohesion, to get our ideas, to add everything in and get our confidence and our belief, the Portuguese told the Rangers website. The boys have been making a fantastic effort in order to keep focused and look forward, and today they had their bonus. We knew since the very beginning we are not the worst team in the world and we are not the best one, but we need to keep this focus and this approach to the game.",40773606,"[0, 133, 883, 12, 180, 12, 279, 6, 54, 1008, 683, 11, 974, 426, 94, 1385, 6, 15825, 11, 17429, 132, 12, 288, 5192, 339, 23, 9667, 307, 4, 38, 460, 26, 114, 38, 310, 147, 38, 222, 13, 127, 94, 950, 38, 115, 120, 1175, 53, 38, 2489, 9399, 269, 56, 14, 778, 187, 38, 376, 259, 6, 37, 174, 5706, 998, 4, 13088, 38, 64, 310, 11, 10, 828, 55, 9, 41, 6666, 737, 42, 191, 8, 120, 103, 55, 1175, 11, 5, 1267, 4, 4079, 1399, 254, 7247, 2401, 6, 816, 11, 41, 3319, 774, 6, 1008, ...]","[0, 32879, 7247, 2401, 9838, 37, 40, 1606, 1175, 7, 39, 177, 114, 37, 16, 4507, 5, 6666, 774, 37, 19656, 3677, 10, 5706, 4, 2]",Josh Windass insists he will add goals to his game if he is handed the attacking role he craves a Rangers.


# 

In [19]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

# Compare Machine Summaries to Professional Human Written Summaries
To score our machine generated summaries against professional human written ones, we compute the cosine similarities between embeddings to measure the semantic similaritiy between two texts. The comparisons we will be marking include: human summary to machine summary, human summary to original document, and machine summary to original document. Initially, we wanted to make the maximum length in each machine summary the same length as the summaries in the XSUM. However, because the length of the XSUM summaries are so short (hence the name extreme summaries), the model  only provided the first words of every article. This makes sense because BART's pretraining likely influenced it's methodology to recognize that the start of text often contains valuable summarization inforamtion. As a result we opted for a length of 60 words to keep it brief but allow the model to output enough context to be meaningful. The average summaries for our models are outlined below (at ~19 words per human summary)

We are going to focus on 10 articles and build 10 models to inspect each pair individually

In [20]:
def listToString(s): 
    str1 = "" 
    
    for ele in s: 
        str1 += ele  
 
    return str1 

In [21]:
article1 = tokenized_xsum['test']['document'][0]
article2 = tokenized_xsum['test']['document'][123]
article3 = tokenized_xsum['test']['document'][99]
article4 = tokenized_xsum['test']['document'][1100]
article5 = tokenized_xsum['test']['document'][1118]
article6 = tokenized_xsum['test']['document'][45]
article7 = tokenized_xsum['test']['document'][13]
article8 = tokenized_xsum['test']['document'][69]
article9 = tokenized_xsum['test']['document'][27]
article10 = tokenized_xsum['test']['document'][9]

summary1 = tokenized_xsum['test']['summary'][0]
summary2 = tokenized_xsum['test']['summary'][123]
summary3 = tokenized_xsum['test']['summary'][99]
summary4 = tokenized_xsum['test']['summary'][1100]
summary5 = tokenized_xsum['test']['summary'][1118]
summary6 = tokenized_xsum['test']['summary'][45]
summary7 = tokenized_xsum['test']['summary'][13]
summary8 = tokenized_xsum['test']['summary'][69]
summary9 = tokenized_xsum['test']['summary'][27]
summary10 = tokenized_xsum['test']['summary'][9]


In [22]:
summaryList = [summary1.split(),
summary2.split(), 
summary3.split(), 
summary4.split(),
summary5.split(),
summary6.split(),
summary7.split(), 
summary8.split(),
summary9.split(), 
summary10.split()]

count = sum( [ len(listElem) for listElem in summaryList])

print('The total number of words in these summaries is: ', count)
print('The average words per summary is: ', count / len(summaryList))

The total number of words in these summaries is:  186
The average words per summary is:  18.6


## We had 50% of our models run with the parameters early_stopping=True and 50% with early_stopping=False to see if this would provide any meaningful difference

## Model 1

In [23]:
input1 = tokenizer(article1, return_tensors='pt', truncation=True)
summary_ids1 = model.generate(input1['input_ids'], max_length=60)
machineSummary1 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids1])

In [24]:
machineSummary1 = listToString(machineSummary1)
original1 = listToString(article1)

comparison1 = [summary1, machineSummary1, original1]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings1 = token_model.encode(comparison1)
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings1[1], comparison_embeddings1[2])) # machine summary to original article

tensor([[0.7313]])
tensor([[0.7645]])
tensor([[0.9574]])


In [25]:
comparison1

['There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. Welsh Government said',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. The Welsh Government said more people than ever were getting help to address housing problems. Changes to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation. Prison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because

# Model 2

In [26]:
input2 = tokenizer(article2, return_tensors='pt', truncation=True)
summary_ids2 = model.generate(input2['input_ids'], max_length=60)
machineSummary2 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids2])

In [27]:
machineSummary2 = listToString(machineSummary2)
original2 = listToString(article2)

comparison2 = [summary2, machineSummary2, original2]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings2 = token_model.encode(comparison2)
print(util.pytorch_cos_sim(comparison_embeddings2[0], comparison_embeddings2[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings2[0], comparison_embeddings2[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings2[1], comparison_embeddings2[2])) # machine summary to original article

tensor([[0.7189]])
tensor([[0.5850]])
tensor([[0.6048]])


In [28]:
comparison2

["For a man often described as capricious, Tyson Fury's chaotic reign as world heavyweight champion was strangely predictable.",
 'Fury has been speaking about his mental health struggles for years. The repeated claims from Furys camp that his victory was downplayed by the British media, and that they had an agenda against him from the outset, are delusional. Fury is not the first boxer to lose motivation having reached',

# Model 3

In [29]:
input3 = tokenizer(article3, return_tensors='pt', truncation=True)
summary_ids3 = model.generate(input3['input_ids'], max_length=60)
machineSummary3 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids3])

In [30]:
machineSummary3 = listToString(machineSummary3)
original3 = listToString(article3)

comparison3 = [summary3, machineSummary3, original3]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings3 = token_model.encode(comparison3)
print(util.pytorch_cos_sim(comparison_embeddings3[0], comparison_embeddings3[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings3[0], comparison_embeddings3[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings3[1], comparison_embeddings3[2])) # machine summary to original article

tensor([[0.5551]])
tensor([[0.7642]])
tensor([[0.8500]])


In [31]:
comparison3

['A barrister who was due to move into his own chambers in Huddersfield has pleaded guilty to supplying cocaine.',
 'Omar Khan, 31, had worked at The Johnson Partnership in Nottingham for five years. Partner Digby Johnson said he did not represent Khan, who had set up his own office and was set to leave the company. Erlin Manahasa, Albert Dibra and Naza',
 'Omar Khan, 31, had worked at The Johnson Partnership in Nottingham for five years before he was arrested. Erlin Manahasa, Albert Dibra and Nazaquat Ali joined Khan in admitting the same charge, between 1 October  and 4 December last year, at Nottingham Crown Court. They are due to be sentenced on 15 April. Updates on this story and more from Nottinghamshire The court heard the case involved the recovery of 1kg (2.2lb) of cocaine. Digby Johnson, a partner at the Johnson firm, confirmed they did not represent Khan - who had set up his own office and was set to leave the company. I still find it hard to believe he could do something as

# Model 4

In [32]:
input4 = tokenizer(article4, return_tensors='pt', truncation=True)
summary_ids4 = model.generate(input4['input_ids'], max_length=60)
machineSummary4 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids4])

In [33]:
machineSummary4 = listToString(machineSummary4)
original4 = listToString(article4)

comparison4 = [summary4, machineSummary4, original4]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings4 = token_model.encode(comparison4)
print(util.pytorch_cos_sim(comparison_embeddings4[0], comparison_embeddings4[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings4[0], comparison_embeddings4[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings4[1], comparison_embeddings4[2])) # machine summary to original article

tensor([[0.5436]])
tensor([[0.6342]])
tensor([[0.8264]])


In [34]:
comparison4

['Star Wars fans are being given the opportunity to become Jedi Knights and learn how to wield lightsabers in combat.',
 'The sport began eight years ago in Italy but has only just come to England with the first classes in Cheltenham. Instructor Jordan Court said people were already hooked. The lightsabers used in the sport are all hand-made and are provided for use during the classes.',
 'LudoSport has opened its first academy teaching seven forms of combat from the Star Wars world using flexible blades mounted on weighted hilts. The sport began eight years ago in Italy but has only just come to England with the first classes in Cheltenham. Instructor Jordan Court said people were already hooked. The classes in Cheltenham began last month. So far there are six pupils, but this number is expected to increase. Mr Court attended an international boot camp to learn the different stages of the sport which range in characteristics from defensive in stage one to aggressive and flamboyant in 

# Model 5

In [35]:
input5 = tokenizer(article5, return_tensors='pt', truncation=True)
summary_ids5 = model.generate(input5['input_ids'], max_length=60)
machineSummary5 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids5])

In [36]:
machineSummary5 = listToString(machineSummary5)
original5 = listToString(article5)

comparison5 = [summary5, machineSummary5, original5]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings5 = token_model.encode(comparison5)
print(util.pytorch_cos_sim(comparison_embeddings5[0], comparison_embeddings5[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings5[0], comparison_embeddings5[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings5[1], comparison_embeddings5[2])) # machine summary to original article

tensor([[0.5847]])
tensor([[0.6152]])
tensor([[0.9742]])


In [37]:
comparison5

['Awareness rides are taking place to try and cut the number of people on horseback injured or killed on roads.',
 'The Pass Wide and Slow Wales campaign has collected 1,300 signatures on the assemblys e-petition website. It wants an annual road safety awareness campaign explaining to motorists how to react around horses. The British Horse Society found that since 2010 there have been 2,000 road accidents in',
 'The Pass Wide and Slow Wales campaign has collected 1,300 signatures on the assemblys e-petition website. It wants an annual road safety awareness campaign explaining to motorists how to react around horses. The British Horse Society found that since 2010 there have been 2,000 road accidents in the UK, with 1,500 because of cars passing too closely. As a result of these, 180 horses and 36 riders have died. Awareness rides were planned for Penarth, Vale of Glamorgan, Swansea, Neyland in Pembrokeshire, Machynlleth, Powys, Flintshire and Porthmadog in Gwynedd. Any petition with ov

# Model 6

In [38]:
input6 = tokenizer(article6, return_tensors='pt', truncation=True)
summary_ids6 = model.generate(input6['input_ids'], max_length=60, early_stopping=False)
machineSummary6 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids6])

In [39]:
machineSummary6 = listToString(machineSummary6)
original6 = listToString(article6)

comparison6 = [summary6, machineSummary6, original6]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings6 = token_model.encode(comparison6)
print(util.pytorch_cos_sim(comparison_embeddings6[0], comparison_embeddings6[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings6[0], comparison_embeddings6[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings6[1], comparison_embeddings6[2])) # machine summary to original article

tensor([[0.7071]])
tensor([[0.7340]])
tensor([[0.9464]])


In [40]:
comparison6

['Two new councillors have been elected in a by-election in the City of Edinburgh.',
 'SNP topped the vote in the Leith Walk by-election. Scottish Labour won the second seat from the Greens. Deidre Brock of the SNP and Maggie Chapman of the Scottish Greens stood down. It was the first time the Single Transferable Vote (STV) system had',
 'It was the first time the Single Transferable Vote (STV) system had been used to select two members in the same ward in a by-election. The SNP topped the vote in the Leith Walk by-election, while Scottish Labour won the second seat from the Greens. The by-election was called after Deidre Brock of the SNP and Maggie Chapman of the Scottish Greens stood down. The SNPs John Lewis Ritchie topped the Leith Walk poll with 2,290 votes. He was elected at stage one in the STV process with a swing in first-preference votes of 7.6% from Labour. Labours Marion Donaldson received 1,623 votes, ahead of Susan Jane Rae of the Scottish Greens on 1,381. Ms Donaldson wa

# Model 7

In [41]:
input7 = tokenizer(article7, return_tensors='pt', truncation=True)
summary_ids7 = model.generate(input7['input_ids'], max_length=60, early_stopping=False)
machineSummary7 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids7])

In [42]:
machineSummary7 = listToString(machineSummary7)
original7 = listToString(article7)

comparison7 = [summary7, machineSummary7, original7]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings7 = token_model.encode(comparison7)
print(util.pytorch_cos_sim(comparison_embeddings7[0], comparison_embeddings7[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings7[0], comparison_embeddings7[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings7[1], comparison_embeddings7[2])) # machine summary to original article

tensor([[0.7054]])
tensor([[0.6673]])
tensor([[0.9054]])


In [43]:
comparison7

["Torquay United boss Kevin Nicholson says none of the money from Eunan O'Kane's move to Leeds from Bournemouth will go to the playing squad.",
 ' OKane moved for an undisclosed fee, but Nicholson says any money will go to help the cash-strapped club. The Gulls are still looking for new owners having been taken over by a consortium of local business people last summer. They were forced to close down the clubs academy',
 'The National League sold the Republic of Ireland midfielder to the Cherries for £175,000 in 2012 and had a 15% sell-on clause included in the deal. OKane moved for an undisclosed fee, but Nicholson says any money will go to help the cash-strapped club. I dont think Ill be getting anything, Nicholson told BBC Devon. Theres more important things. The Gulls are still looking for new owners having been taken over by a consortium of local business people last summer. They were forced to close down the clubs academy and drastically reduce the playing budget after millionaire

# Model 8

In [44]:
input8 = tokenizer(article8, return_tensors='pt', truncation=True)
summary_ids8 = model.generate(input8['input_ids'], max_length=60, early_stopping=False)
machineSummary8 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids8])

In [45]:
machineSummary8 = listToString(machineSummary8)
original8 = listToString(article8)

comparison8 = [summary8, machineSummary8, original8]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings8 = token_model.encode(comparison8)
print(util.pytorch_cos_sim(comparison_embeddings8[0], comparison_embeddings8[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings8[0], comparison_embeddings8[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings8[1], comparison_embeddings8[2])) # machine summary to original article

tensor([[0.5923]])
tensor([[0.6410]])
tensor([[0.9681]])


In [46]:
comparison8

['Manufacturers have reported positive business trends, in the latest survey from the Scottish Chambers of Commerce.',
 'Manufacturers reported their highest growth in new orders for nearly three years. In retail, there was also a return to optimism - though only just. In tourism, firms reported improving visitor numbers in the final quarter of the year, but falling sales revenues. Construction is expecting an investment dip.',

# Model 9

In [47]:
input9 = tokenizer(article9, return_tensors='pt', truncation=True)
summary_ids9 = model.generate(input9['input_ids'], max_length=60, early_stopping=False)
machineSummary9 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids9])

In [48]:
machineSummary9 = listToString(machineSummary9)
original9 = listToString(article9)

comparison9 = [summary9, machineSummary9, original9]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings9 = token_model.encode(comparison9)
print(util.pytorch_cos_sim(comparison_embeddings9[0], comparison_embeddings9[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings9[0], comparison_embeddings9[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings9[1], comparison_embeddings9[2])) # machine summary to original article

tensor([[0.8161]])
tensor([[0.8348]])
tensor([[0.8977]])


In [49]:
comparison9

['Of his last 30 matches in 2016, Andy Murray won 28 and lost just two.',
 'The world number one has won 21 of his first 30 matches in 2017. Murray has had shingles and an elbow problem, and now his left hip is proving cause for concern. Opting out of two scheduled exhibition matches at the Hurlingham Club in London may not be too',
 'Media playback is not supported on this device Of his first 30 matches in 2017, the world number one has won 21 and lost nine. Winning his last five tournaments of 2016 to pip Novak Djokovic to the year-end number one position in the final match of the season at Londons O2 Arena was astonishing, dramatic and unforgettable. And yet it appears that relentless run of success, and the 87 matches he played over a season, has come at a price. Murrays straight-set defeat by world number 90 Jordan Thompson in the first round at Queens Club was the sixth time he has lost to a player outside the top 20 this year. He has had shingles and an elbow problem, and now hi

# Model 10

In [50]:
input10 = tokenizer(article10, return_tensors='pt', truncation=True)
summary_ids10 = model.generate(input10['input_ids'], max_length=60, early_stopping=False)
machineSummary10 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids10])

In [51]:
machineSummary10 = listToString(machineSummary10)
summary10 = listToString(summary10)
original10 = listToString(article10)

comparison10 = [summary10, machineSummary10, original10]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings10 = token_model.encode(comparison10)
print(util.pytorch_cos_sim(comparison_embeddings10[0], comparison_embeddings10[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings10[0], comparison_embeddings10[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings10[1], comparison_embeddings10[2])) # machine summary to original article

tensor([[0.7916]])
tensor([[0.7987]])
tensor([[0.7452]])


In [52]:
comparison10

["Manager Brendan Rodgers is sure Celtic can exploit the wide open spaces of Hampden when they meet Rangers in Sunday's League Cup semi-final.",
 "Celtic face Rangers in the Scottish Cup semi-final at Hampden Park. Brendan Rodgers' side beat Rangers 5-1 at Celtic Park last month. Rodgers lost two semi-finals in his time at Liverpool and is aiming to make it third time lucky at the club he joined",
 'Im really looking forward to it - the home of Scottish football, said Rodgers ahead of his maiden visit. I hear the pitch is good, a nice big pitch suits the speed in our team and our intensity. The technical area goes right out to the end of the pitch, but you might need a taxi to get back to your staff. This will be Rodgers second taste of the Old Firm derby and his experience of the fixture got off to a great start with a 5-1 league victory at Celtic Park last month. It was a brilliant performance by the players in every aspect, he recalled. Obviously this one is on a neutral ground, but

# Conclusion

We can see that the machine model had higher cosine similarity to the original article 70% of the time compared to the human article. However, this may be influenced by the fact that the length of the machine summary was about 3x the size of the average human summary. The argument early_stopping=True/False did not appear to have any real affect on cosine-similarity at the max length size of 60 (we compared the 10 models with and without and obtained similar results). The pretrained transformers do provide relevant summaries when reviewing these articles so it appears there is a definite use case for providing news article snippits in products like Bloomberg First Word or other content editors. 20% of the models showed the machine vs human summaries having relatively equivalent cosine similarities. It appears that human summaries are shorter and more semantically similar to articles than machine summaries for articles about sports and athletes. This may be an area that huggingface could focus on pretraining new pipelines, transformers, and models in the future to expand their use cases.