# Text Summary & Scoring Project
##### Michael Creegan, Yungfeng Dai, Hong Gyu Ji, Ziling Zeng
##### Python for Data Analysis
##### Columbia University

# Abstract

Summarization is a common problem in the 21st century as the world has become increasingly driven by data. Summarization of data can be very useful to  quickly determine if something is relevant or whether it's worth reading. Another use case could could be to store summaries of articles it in the backend to run downstream taks on. It could also be useful to understand the semantic integrity to indicate quality.

To explore this topic, we will leverage the extreme summarization dataset (XSUM) which consists of BBC articles accompanying single sentence summaries. Each article is prefaced with an introductory sentence (which is a summary) that is professionally written, typically by the author of the article.

To summarize articles, we will use an encoder-decoder transformer (sequence-to-sequence) which combines  decoders and encoders because we need to perform both input and output tasks: taking in text and then generating a summary. We selected this type of transformer because the encoder accepts inputs (text) and computes a high level representation of those inputs  which are then passed to the decoder to generate a prediction output (summary). This has advantages over using a standalone encoder like BERT/ALBERT/ELECTRA/RoBERTA/DistilBERT to name a few because  encoders are pre-trained by filling randomly masked words in sentences and therefore are better suited for output tasks. Using a standalone decoder like gpt2 would also not be optimal because decoders are trained to guess the next word in a sequence (left or right context aka does not have context on one side of the sequence) and therefore are better suited at generating text but not necessarily taking in text because of the hidden context limitations. 

Our scoring will compare the output of the BART encoder-decoder model to the professionally written summaries in the XSUM dataset to see how similar a machine generated summary is to a professional one. Our scoring methodology will be focused on semantic textual similarity and computed using the cosine similarity between the professional human written summary and the machine generated one. 

# Importing Transformers & Dependencies

In [1]:
import pandas as pd
import numpy as np
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
from datasets import load_dataset, load_metric
from sentence_transformers import SentenceTransformer, util
import random
from IPython.display import display, HTML

# Load XSUM Dataset

In [2]:
xsum = load_dataset('xsum')

Using custom data configuration default
Reusing dataset xsum (C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934)
100%|██████████| 3/3 [00:00<00:00, 55.30it/s]


### We can see that the dataset is a "DatasetDict" where the keys are strings that correspond to the split and the values are the dataset object. In the XSUM dataset, the the keys are "training", "validation", and "test" with values corresponding to "document", "summary", and "id" (columns)

In [3]:
xsum

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

### View a record of the underlying data

In [4]:
xsum['test'][0]

{'document': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said the

### We can use a function to view a random selection of articles and summaries in the training section (largest section) to get a more accurate depiction of what the data looks like in a synthesized format

In [5]:
def display_function(xsum, num_examples=5):
    assert num_examples <= len(xsum)                # limit to number of records in the xsum
    
    selections = []                                 # create empty list to put the records into 
    
    for _ in range(num_examples):                   # we can use _ here in place of a variable name because we don't care how many time sthe loop is run
        selection = random.randint(0, len(xsum) - 1)
        while selection in selections:
            selection = random.randint(0, len(xsum) - 1)
        selections.append(selection)

    xsumPd = pd.DataFrame(xsum[selections])
    for column, typ in xsum.features.items():
        display(HTML(xsumPd.to_html()))

### Our end goal is to create accurate summaries using this model so we need to remove the text characters that do not provide any contextual value. We can also see that there are characters in the document that are not present in the summary which could cause discrepencies between our machine generated summary vs the professional human generated one. We need to remove new line characters that are present in the document column but not the summary column

In [6]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"Media playback is not supported on this device\nThe British pair entered through separate corridors at a Liverpool hotel on Monday and were divided by security for the traditional pre-fight face-off.\nBellew, 34, called Haye ""a broken man"", as they repeatedly exchanged insults.\nHaye, 36, threw a punch at Bellew at a November media gathering and had warned they would need a barrier between them.\nWBC cruiserweight champion Bellew will fight at heavyweight for the first time, completing a two-division jump after competing at light-heavyweight as recently as 2013.\nFormer WBA heavyweight champion Haye has had two routine wins since returning from over three years out of the sport.\nThe London fighter seemed frustrated as fans in attendance drowned out his comments with songs on Monday - and he responded by insulting those in the crowd and said Liverpudlian Bellew would ""need all the support he can get"".\nAn agitated Haye told the crowd: ""Deep in all of your tiny minds you know this guy is getting drilled to the canvas pretty fast.""\nBellew said: ""I am going in with a man who was absolutely fantastic. When he was in his prime, an immense athlete - but the tank is very, very low and it does not last very long.\n""When the gas runs out, the big fat Scouser is going to steam through him.""\nHowever, Haye's trainer, Shane McGuigan, predicted WBC cruiserweight champion Bellew would be ""cannon fodder"".\nHaye's wins since returning - both inside two rounds - prompted Dave Coldwell, Bellew's trainer, to question if the shoulder surgery he had in 2013 could hamper him in a longer contest.\n""When you've had major surgery as an athlete, you are never the same man, you have doubts in your mind,"" said Coldwell, who once worked for Hayemaker promotions.\n""Your surgeon advised you to retire, you come back but you don't know how you will perform on the night.""\nAddressing his opponent, Bellew added: ""I've seen people have the operations you have had. Reconstructive shoulder surgery is a big thing, your right hand becomes a looping right hand.""\nBellew holds a record of 28 wins and a draw from 31 fights, with Haye boasting the same number of wins from 30 contests.",David Haye and Tony Bellew were physically kept apart at a heated news conference for Saturday's heavyweight bout at London's O2 Arena.,39110172
1,"Abdul Hafidah, 18, died in hospital from a stab wound to the neck after the attack in Moss Side.\nPolice believe he had been chased by a group of men near Greenheys Lane before he was hit by the car and then attacked.\nHis family said they were experiencing ""the most difficult time in our lives"".\nThey added: ""Abdul was a composed and caring son, who bought us all so much joy. You felt his presence when he was there and you missed it whenever he wasn't.\n""His strength was in his loyalty to his family and friends, and honesty whenever he spoke.""\nMr Hafidah's family also urged young people to spend time with their parents and think about the community they wanted to grow up in.\nTwo men have been arrested on suspicion of murder.\nA 17-year-old boy was also arrested on suspicion of attempted murder and later bailed.","A teenager who was stabbed after he was hit by a car in Manchester was ""loyal and caring"" and ""brought so much joy"" to others, his family has said.",36336099
2,"Watkins has been banned for four years after he tested positive for the anabolic steroid nandrolone and the stimulant methylhexaneamine.\nHe joins team-mate Shaun Cleary, who has been banned for two years as benzoylecgonine, a cocaine metabolite, was found in his system.\nBoth players tested positive before a friendly against Bridgend Ravens RFC.\nUKAD Director of Legal, Graham Arthur said: ""Ryan Watkins deliberately ingested nandrolone and methylhexaneamine without any consideration for his responsibilities as an athlete.\n""By making this conscious choice to dope, Watkins has chosen to cheat his team-mates, the opposition and his sport.\n""I hope this case will act as strong deterrent to other young amateur players - the risks to your playing career, your reputation and more importantly to your health, just aren't worth it.""\nSpeaking about Cleary, Mr Arthur said: ""Although Mr Cleary used cocaine three days before playing, cocaine was still in his system when he played.\n""Cocaine is banned from sport and athletes are solely responsible for what is in their system, regardless of whether there is an intention to cheat or not.""\nWelsh Rugby Union chief executive Martyn Phillips said about Cleary: ""This case serves as a strong warning to everyone in the game that non-compliance with anti-doping rules carries grave consequences.\n""Whether intentional, or inadvertent, players have a responsibility to themselves, to each other, to their clubs and to the sport to act within the rules and spirit of the game.\n""We work closely with UK Anti-Doping and fully adhere to the World Anti-Doping Code. There is no room in the code for carelessness or not knowing.\n""We will be relentless in working with UKAD to follow up leads that out players who dope in Welsh rugby.""\nMaesteg Harlequins are mid-table in Welsh National League Division One West Central and Cleary has been banned until 10 October, 2017. Watkins is banned until 11 September 2019.",Maesteg Harlequins lock Ryan Watkins has become the 12th Welsh rugby player to be suspended by UK Anti-Doping.,35418355
3,"Walter Bartram was prospecting in dusty terrain in Coober Pedy, about 750km (466 miles) north of Adelaide, in 1946 when he staked a claim to what became called the Fire of Australia.\nAlthough his family achieved success in opal trading, their greatest discovery has been seen rarely by the public.\nThat has just changed.\nThe 998g (35.2oz) opal, valued at nearly A$900,000 (Â£550,000; $680,000), is now on display in Adelaide's South Australian Museum.\nStill largely in its original condition, the opal's two polished faces reveal a kaleidoscope of colours from green to yellow to red.\n""When my father was alive, it was originally kept separately from all trading because it was such a significant piece,"" Alan Bartram told the BBC.\n""We decided we would retain that intention, and keep it as a significant and obviously excellent example of light opal from South Australia.""\nThe family has decided to pass it on for future generations to enjoy.\nThe museum's director, Brian Oldman, said the opal's rarity should not be underestimated.\n""Opal of this quality can only be created under certain climate conditions,"" Mr Oldman said.\n""When our state's inland sea evaporated millions of years ago, it provided a unique silica-rich environment for the creation of precious opal. It is these exceptional conditions that created the Fire of Australia.""\nA mining town for more than 100 years, Coober Pedy still draws people lured by the hope of striking it rich.\n""They're becoming more scarce because the overheads of mining now are getting to be so expensive - in fuel, explosives, machinery and living costs on the field,"" Mr Bartram said.\n""But South Australia supplies about 90% of the world's quality opals. There may be more major finds.""\nReporting by the BBC's Greg Dunlop",The world's finest uncut opal has mostly been kept in a safe deposit box since it was unearthed from the South Australian outback with a pick and shovel 70 years ago.,38713957
4,"Wycombe face a 430-mile round trip to two-time winners Blackpool, while League One strugglers Coventry play Brighton Under-21s at the Ricoh Arena.\nLuton, winners of the competition in 2009, take on 2012 champions Chesterfield while Mansfield will host the winner of Walsall v Oldham.\nThere is also one all under-21 side match, with Swansea hosting Wolves.\nA total of six development sides are left in the competition.\nThe matches will be played in the week starting 9 January, apart from Cheltenham or Leicester U21 v Bradford.\nLeicester's development team travels to Cheltenham on 10 January, in a second-round match postponed because of the Foxes' Champions League schedule.\nMansfield Town v Walsall or Oldham\nLuton v Chesterfield\nOxford v Scunthorpe\nBlackpool v Wycombe\nCheltenham or Leicester U21 v Bradford\nYeovil v Southampton U21 or Reading U21\nCoventry v Brighton U21\nSwansea U21 v Wolves U21\nTake part in our new Premier League Predictor game, which allows you to create leagues with friends.",League One leaders Scunthorpe United will travel to Oxford United in the last 16 of the EFL Trophy.,38251696


Unnamed: 0,document,summary,id
0,"Media playback is not supported on this device\nThe British pair entered through separate corridors at a Liverpool hotel on Monday and were divided by security for the traditional pre-fight face-off.\nBellew, 34, called Haye ""a broken man"", as they repeatedly exchanged insults.\nHaye, 36, threw a punch at Bellew at a November media gathering and had warned they would need a barrier between them.\nWBC cruiserweight champion Bellew will fight at heavyweight for the first time, completing a two-division jump after competing at light-heavyweight as recently as 2013.\nFormer WBA heavyweight champion Haye has had two routine wins since returning from over three years out of the sport.\nThe London fighter seemed frustrated as fans in attendance drowned out his comments with songs on Monday - and he responded by insulting those in the crowd and said Liverpudlian Bellew would ""need all the support he can get"".\nAn agitated Haye told the crowd: ""Deep in all of your tiny minds you know this guy is getting drilled to the canvas pretty fast.""\nBellew said: ""I am going in with a man who was absolutely fantastic. When he was in his prime, an immense athlete - but the tank is very, very low and it does not last very long.\n""When the gas runs out, the big fat Scouser is going to steam through him.""\nHowever, Haye's trainer, Shane McGuigan, predicted WBC cruiserweight champion Bellew would be ""cannon fodder"".\nHaye's wins since returning - both inside two rounds - prompted Dave Coldwell, Bellew's trainer, to question if the shoulder surgery he had in 2013 could hamper him in a longer contest.\n""When you've had major surgery as an athlete, you are never the same man, you have doubts in your mind,"" said Coldwell, who once worked for Hayemaker promotions.\n""Your surgeon advised you to retire, you come back but you don't know how you will perform on the night.""\nAddressing his opponent, Bellew added: ""I've seen people have the operations you have had. Reconstructive shoulder surgery is a big thing, your right hand becomes a looping right hand.""\nBellew holds a record of 28 wins and a draw from 31 fights, with Haye boasting the same number of wins from 30 contests.",David Haye and Tony Bellew were physically kept apart at a heated news conference for Saturday's heavyweight bout at London's O2 Arena.,39110172
1,"Abdul Hafidah, 18, died in hospital from a stab wound to the neck after the attack in Moss Side.\nPolice believe he had been chased by a group of men near Greenheys Lane before he was hit by the car and then attacked.\nHis family said they were experiencing ""the most difficult time in our lives"".\nThey added: ""Abdul was a composed and caring son, who bought us all so much joy. You felt his presence when he was there and you missed it whenever he wasn't.\n""His strength was in his loyalty to his family and friends, and honesty whenever he spoke.""\nMr Hafidah's family also urged young people to spend time with their parents and think about the community they wanted to grow up in.\nTwo men have been arrested on suspicion of murder.\nA 17-year-old boy was also arrested on suspicion of attempted murder and later bailed.","A teenager who was stabbed after he was hit by a car in Manchester was ""loyal and caring"" and ""brought so much joy"" to others, his family has said.",36336099
2,"Watkins has been banned for four years after he tested positive for the anabolic steroid nandrolone and the stimulant methylhexaneamine.\nHe joins team-mate Shaun Cleary, who has been banned for two years as benzoylecgonine, a cocaine metabolite, was found in his system.\nBoth players tested positive before a friendly against Bridgend Ravens RFC.\nUKAD Director of Legal, Graham Arthur said: ""Ryan Watkins deliberately ingested nandrolone and methylhexaneamine without any consideration for his responsibilities as an athlete.\n""By making this conscious choice to dope, Watkins has chosen to cheat his team-mates, the opposition and his sport.\n""I hope this case will act as strong deterrent to other young amateur players - the risks to your playing career, your reputation and more importantly to your health, just aren't worth it.""\nSpeaking about Cleary, Mr Arthur said: ""Although Mr Cleary used cocaine three days before playing, cocaine was still in his system when he played.\n""Cocaine is banned from sport and athletes are solely responsible for what is in their system, regardless of whether there is an intention to cheat or not.""\nWelsh Rugby Union chief executive Martyn Phillips said about Cleary: ""This case serves as a strong warning to everyone in the game that non-compliance with anti-doping rules carries grave consequences.\n""Whether intentional, or inadvertent, players have a responsibility to themselves, to each other, to their clubs and to the sport to act within the rules and spirit of the game.\n""We work closely with UK Anti-Doping and fully adhere to the World Anti-Doping Code. There is no room in the code for carelessness or not knowing.\n""We will be relentless in working with UKAD to follow up leads that out players who dope in Welsh rugby.""\nMaesteg Harlequins are mid-table in Welsh National League Division One West Central and Cleary has been banned until 10 October, 2017. Watkins is banned until 11 September 2019.",Maesteg Harlequins lock Ryan Watkins has become the 12th Welsh rugby player to be suspended by UK Anti-Doping.,35418355
3,"Walter Bartram was prospecting in dusty terrain in Coober Pedy, about 750km (466 miles) north of Adelaide, in 1946 when he staked a claim to what became called the Fire of Australia.\nAlthough his family achieved success in opal trading, their greatest discovery has been seen rarely by the public.\nThat has just changed.\nThe 998g (35.2oz) opal, valued at nearly A$900,000 (Â£550,000; $680,000), is now on display in Adelaide's South Australian Museum.\nStill largely in its original condition, the opal's two polished faces reveal a kaleidoscope of colours from green to yellow to red.\n""When my father was alive, it was originally kept separately from all trading because it was such a significant piece,"" Alan Bartram told the BBC.\n""We decided we would retain that intention, and keep it as a significant and obviously excellent example of light opal from South Australia.""\nThe family has decided to pass it on for future generations to enjoy.\nThe museum's director, Brian Oldman, said the opal's rarity should not be underestimated.\n""Opal of this quality can only be created under certain climate conditions,"" Mr Oldman said.\n""When our state's inland sea evaporated millions of years ago, it provided a unique silica-rich environment for the creation of precious opal. It is these exceptional conditions that created the Fire of Australia.""\nA mining town for more than 100 years, Coober Pedy still draws people lured by the hope of striking it rich.\n""They're becoming more scarce because the overheads of mining now are getting to be so expensive - in fuel, explosives, machinery and living costs on the field,"" Mr Bartram said.\n""But South Australia supplies about 90% of the world's quality opals. There may be more major finds.""\nReporting by the BBC's Greg Dunlop",The world's finest uncut opal has mostly been kept in a safe deposit box since it was unearthed from the South Australian outback with a pick and shovel 70 years ago.,38713957
4,"Wycombe face a 430-mile round trip to two-time winners Blackpool, while League One strugglers Coventry play Brighton Under-21s at the Ricoh Arena.\nLuton, winners of the competition in 2009, take on 2012 champions Chesterfield while Mansfield will host the winner of Walsall v Oldham.\nThere is also one all under-21 side match, with Swansea hosting Wolves.\nA total of six development sides are left in the competition.\nThe matches will be played in the week starting 9 January, apart from Cheltenham or Leicester U21 v Bradford.\nLeicester's development team travels to Cheltenham on 10 January, in a second-round match postponed because of the Foxes' Champions League schedule.\nMansfield Town v Walsall or Oldham\nLuton v Chesterfield\nOxford v Scunthorpe\nBlackpool v Wycombe\nCheltenham or Leicester U21 v Bradford\nYeovil v Southampton U21 or Reading U21\nCoventry v Brighton U21\nSwansea U21 v Wolves U21\nTake part in our new Premier League Predictor game, which allows you to create leagues with friends.",League One leaders Scunthorpe United will travel to Oxford United in the last 16 of the EFL Trophy.,38251696


Unnamed: 0,document,summary,id
0,"Media playback is not supported on this device\nThe British pair entered through separate corridors at a Liverpool hotel on Monday and were divided by security for the traditional pre-fight face-off.\nBellew, 34, called Haye ""a broken man"", as they repeatedly exchanged insults.\nHaye, 36, threw a punch at Bellew at a November media gathering and had warned they would need a barrier between them.\nWBC cruiserweight champion Bellew will fight at heavyweight for the first time, completing a two-division jump after competing at light-heavyweight as recently as 2013.\nFormer WBA heavyweight champion Haye has had two routine wins since returning from over three years out of the sport.\nThe London fighter seemed frustrated as fans in attendance drowned out his comments with songs on Monday - and he responded by insulting those in the crowd and said Liverpudlian Bellew would ""need all the support he can get"".\nAn agitated Haye told the crowd: ""Deep in all of your tiny minds you know this guy is getting drilled to the canvas pretty fast.""\nBellew said: ""I am going in with a man who was absolutely fantastic. When he was in his prime, an immense athlete - but the tank is very, very low and it does not last very long.\n""When the gas runs out, the big fat Scouser is going to steam through him.""\nHowever, Haye's trainer, Shane McGuigan, predicted WBC cruiserweight champion Bellew would be ""cannon fodder"".\nHaye's wins since returning - both inside two rounds - prompted Dave Coldwell, Bellew's trainer, to question if the shoulder surgery he had in 2013 could hamper him in a longer contest.\n""When you've had major surgery as an athlete, you are never the same man, you have doubts in your mind,"" said Coldwell, who once worked for Hayemaker promotions.\n""Your surgeon advised you to retire, you come back but you don't know how you will perform on the night.""\nAddressing his opponent, Bellew added: ""I've seen people have the operations you have had. Reconstructive shoulder surgery is a big thing, your right hand becomes a looping right hand.""\nBellew holds a record of 28 wins and a draw from 31 fights, with Haye boasting the same number of wins from 30 contests.",David Haye and Tony Bellew were physically kept apart at a heated news conference for Saturday's heavyweight bout at London's O2 Arena.,39110172
1,"Abdul Hafidah, 18, died in hospital from a stab wound to the neck after the attack in Moss Side.\nPolice believe he had been chased by a group of men near Greenheys Lane before he was hit by the car and then attacked.\nHis family said they were experiencing ""the most difficult time in our lives"".\nThey added: ""Abdul was a composed and caring son, who bought us all so much joy. You felt his presence when he was there and you missed it whenever he wasn't.\n""His strength was in his loyalty to his family and friends, and honesty whenever he spoke.""\nMr Hafidah's family also urged young people to spend time with their parents and think about the community they wanted to grow up in.\nTwo men have been arrested on suspicion of murder.\nA 17-year-old boy was also arrested on suspicion of attempted murder and later bailed.","A teenager who was stabbed after he was hit by a car in Manchester was ""loyal and caring"" and ""brought so much joy"" to others, his family has said.",36336099
2,"Watkins has been banned for four years after he tested positive for the anabolic steroid nandrolone and the stimulant methylhexaneamine.\nHe joins team-mate Shaun Cleary, who has been banned for two years as benzoylecgonine, a cocaine metabolite, was found in his system.\nBoth players tested positive before a friendly against Bridgend Ravens RFC.\nUKAD Director of Legal, Graham Arthur said: ""Ryan Watkins deliberately ingested nandrolone and methylhexaneamine without any consideration for his responsibilities as an athlete.\n""By making this conscious choice to dope, Watkins has chosen to cheat his team-mates, the opposition and his sport.\n""I hope this case will act as strong deterrent to other young amateur players - the risks to your playing career, your reputation and more importantly to your health, just aren't worth it.""\nSpeaking about Cleary, Mr Arthur said: ""Although Mr Cleary used cocaine three days before playing, cocaine was still in his system when he played.\n""Cocaine is banned from sport and athletes are solely responsible for what is in their system, regardless of whether there is an intention to cheat or not.""\nWelsh Rugby Union chief executive Martyn Phillips said about Cleary: ""This case serves as a strong warning to everyone in the game that non-compliance with anti-doping rules carries grave consequences.\n""Whether intentional, or inadvertent, players have a responsibility to themselves, to each other, to their clubs and to the sport to act within the rules and spirit of the game.\n""We work closely with UK Anti-Doping and fully adhere to the World Anti-Doping Code. There is no room in the code for carelessness or not knowing.\n""We will be relentless in working with UKAD to follow up leads that out players who dope in Welsh rugby.""\nMaesteg Harlequins are mid-table in Welsh National League Division One West Central and Cleary has been banned until 10 October, 2017. Watkins is banned until 11 September 2019.",Maesteg Harlequins lock Ryan Watkins has become the 12th Welsh rugby player to be suspended by UK Anti-Doping.,35418355
3,"Walter Bartram was prospecting in dusty terrain in Coober Pedy, about 750km (466 miles) north of Adelaide, in 1946 when he staked a claim to what became called the Fire of Australia.\nAlthough his family achieved success in opal trading, their greatest discovery has been seen rarely by the public.\nThat has just changed.\nThe 998g (35.2oz) opal, valued at nearly A$900,000 (Â£550,000; $680,000), is now on display in Adelaide's South Australian Museum.\nStill largely in its original condition, the opal's two polished faces reveal a kaleidoscope of colours from green to yellow to red.\n""When my father was alive, it was originally kept separately from all trading because it was such a significant piece,"" Alan Bartram told the BBC.\n""We decided we would retain that intention, and keep it as a significant and obviously excellent example of light opal from South Australia.""\nThe family has decided to pass it on for future generations to enjoy.\nThe museum's director, Brian Oldman, said the opal's rarity should not be underestimated.\n""Opal of this quality can only be created under certain climate conditions,"" Mr Oldman said.\n""When our state's inland sea evaporated millions of years ago, it provided a unique silica-rich environment for the creation of precious opal. It is these exceptional conditions that created the Fire of Australia.""\nA mining town for more than 100 years, Coober Pedy still draws people lured by the hope of striking it rich.\n""They're becoming more scarce because the overheads of mining now are getting to be so expensive - in fuel, explosives, machinery and living costs on the field,"" Mr Bartram said.\n""But South Australia supplies about 90% of the world's quality opals. There may be more major finds.""\nReporting by the BBC's Greg Dunlop",The world's finest uncut opal has mostly been kept in a safe deposit box since it was unearthed from the South Australian outback with a pick and shovel 70 years ago.,38713957
4,"Wycombe face a 430-mile round trip to two-time winners Blackpool, while League One strugglers Coventry play Brighton Under-21s at the Ricoh Arena.\nLuton, winners of the competition in 2009, take on 2012 champions Chesterfield while Mansfield will host the winner of Walsall v Oldham.\nThere is also one all under-21 side match, with Swansea hosting Wolves.\nA total of six development sides are left in the competition.\nThe matches will be played in the week starting 9 January, apart from Cheltenham or Leicester U21 v Bradford.\nLeicester's development team travels to Cheltenham on 10 January, in a second-round match postponed because of the Foxes' Champions League schedule.\nMansfield Town v Walsall or Oldham\nLuton v Chesterfield\nOxford v Scunthorpe\nBlackpool v Wycombe\nCheltenham or Leicester U21 v Bradford\nYeovil v Southampton U21 or Reading U21\nCoventry v Brighton U21\nSwansea U21 v Wolves U21\nTake part in our new Premier League Predictor game, which allows you to create leagues with friends.",League One leaders Scunthorpe United will travel to Oxford United in the last 16 of the EFL Trophy.,38251696


### We can address the problem we mentioned above by define a cleaning function that replaces new lines with white space.

In [7]:
def clean(row):
    row['document'] = row['document'].replace('\n', ' ')
    return row

### We can now apply the cleaning function we created and map it onto our data (it loads for train, test, and validation)

In [8]:
xsum = xsum.map(clean)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-ec5b3ab440c9df82.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-a176a692461cda61.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-bc530be4c3ab51ba.arrow


### Voila!

In [9]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"The London borough is the UK local authority with the highest rate of HIV. Local councils and charities warn HIV test guidelines may not be implemented in England because of a lack of funds. New guidance from the National Institute for Heath and Care Excellence aims to increase testing in people with undiagnosed HIV in England. The guidance is published to coincide with World Aids Day. It is estimated that 103,700 people are living with HIV in the UK and 17% of people with the virus are unaware of their infection, so risk unintentionally passing it on to their sexual partners. Part of the new NICE guidance focuses on testing for HIV, which is the responsibility of local authorities, where there are high or extremely high rates of HIV. Two-thirds of late HIV diagnoses occur in these areas. Over one-third of the 152 council areas have high rates. The updated guidance recommends all patients in areas with high and extremely high rates of HIV be offered a test on admission to hospital, if they have not previously been diagnosed with HIV and are undergoing a blood test for another reason. In extremely high rate areas, hospitals should offer the tests even if they are not having blood tests as part of their care. GP surgeries in high and extremely high-rate areas should also offer patients an HIV test on registration. NICE also recommends testing community settings in these high rates areas, such as pharmacies, the voluntary sector and venues where there may be high-risk sexual behaviour. HIV experts have strongly welcomed the new guidance but told BBC's Victoria Derbyshire programme they are concerned the NICE guidance may not be implemented because of a lack of funds. Dr Chloe Orkin, from the British HIV Association, said prevention was simply not ""high"" enough on the government's agenda given pubic health budgets are being cut by nearly 4% a year. Councillor Izzi Seccombe, of the Local Government Association, said achieving what NICE was asking was going to be difficult. ""The strain placed on councils by the cuts by central government to public health budgets would make commissioning HIV testing in all surgeries and hospitals in high and extremely high-risk areas an unaffordable burden. ""Despite these limited resources, testing those in high-risk areas must always be a priority. Councils are commissioning HIV testing in a variety of settings."" But the Department of Health maintained councils had been provided with sufficient funding. Nonetheless, the Elton John Aids Foundation feels councils need help and has offered to fund HIV testing in Lambeth for two years. David Furnish, chairman of the Elton John Aids Foundation, said: ""I believe everyone should have an HIV test. We know we can make a difference in Lambeth, but there is no reason why we can't do this in future in other high-rate areas."" Jennifer Reiter, of Lambeth Council, added: ""We value Elton John Aids Foundation's support and are exploring ways with them to increase access to HIV testing."" The BBC's Victoria Derbyshire programme is broadcast on weekdays from 09:00 GMT on BBC Two and the BBC News Channel.","The Elton John Aids Foundation has offered to finance HIV testing in Lambeth, the Victoria Derbyshire programme has learned.",38164600
1,"Writing in PLOS ONE they say the gene fault may encourage the formation of blood clots - the ultimate cause of most heart attacks and strokes. Scientists hope gene tests may help doctors one day to pinpoint individuals more likely to suffer these conditions. But experts say lifestyle factors such as smoking and exercise have the greatest influence on risk. Around one in 10 people in the Caucasian population carries this variation of the gene, named PIA2. And researchers from King's College London reviewed more than 80 studies involving about 50,000 people - the largest analysis of this genetic fault to date. They found individuals with PIA2 were more likely to have a stroke - caused by a blood clot blocking blood supply to the brain - than those without the gene. Scientists calculate the gene increases a person's risk of having a stroke by 10-15%. But how significant this increase is depends on an individual's baseline risk - influenced by factors such as smoking, diet, weight and exercise, the scientists say. And for people with two copies of the gene the risk rises by up to 70% from this baseline. In a second study published in the same journal, the scientists show PIA2 is also linked to an increased risk of heart attacks in people under 45. More research is needed to see whether this holds true for the whole population, they say. About 150,000 people have a stroke in the UK each year and more than 100,000 heart attacks are recorded annually. Both thrombotic strokes (the most common kind) and heart attacks are caused by blockage of blood vessels in the heart and brain - ultimately through the formation of clots. The faulty gene appears to affect a protein called glycoprotein IIIa - present on platelets, natural clotting cells in the blood. Platelets help trigger the formation of clots to stop bleeding after injury. But scientists say carrying the gene may render them overactive. They caution that overall the genes play a smaller role in risk than more established factors, such as high blood pressure and obesity. But developing a genetic test could help predict people at highest risk, allowing doctors to suggest more potent medication or lifestyle changes, they say. Prof Albert Ferro, of King's College London, who led the research, told the BBC: ""We would now need to validate this test and see how useful it is in the clinical world."" Dr Shamim Quadir, of the Stroke Association, said: ""These latest results are an important step forward in stroke research. ""We hope the findings from this study could lead to many more people who are most at risk of this devastating condition being identified. ""However, if you have a family history of stroke or have any other risk factors, this does not mean the condition is inevitable. Regular exercise, eating a balanced diet and stopping smoking can be important steps to significantly reduce your stroke risk."" Prof Jeremy Pearson, of the British Heart Foundation, said: ""It is as yet uncertain whether a genetic test to detect a variation in this protein would be beneficial for patients in everyday practice. ""All patients who are at risk should be monitored to see whether or not lifestyle changes or medication have a positive impact on the more standard major risk factors such as high blood pressure and high cholesterol.""",Researchers have identified a gene that may put people at greater risk of strokes and heart attacks.,28117422
2,"The flight from Manchester Airport to Agadir in Morocco, was diverted to London Gatwick less than an hour after take-off on Thursday. The Thomson Airways Boeing 737-800 took off at 18:42 BST before being struck. A spokeswoman for the airline said it was an ""extremely rare"" event and the diversion was ""precautionary"". The flight later landed safely in Agadir. Liam Bolton, 27, from Chester in Cheshire, was travelling to Morocco for a holiday with his girlfriend when he heard a ""sudden crack"" on the aircraft. He said the plane ""lit up like someone had taken a photo"". ""It was about 10-15 minutes after take-off and there was a large flash... everyone turned round to each other and knew it was lightning. ""About half an hour later, the pilot announced we'd been hit by lightning and we'd be landing at Gatwick,"" he said. After around three hours on the runway, the same plane took off, he added. Thomson Airways has apologised for any inconvenience caused by the adverse weather conditions.",A plane has been forced to carry out an unexpected landing after being struck by lightning.,36560599
3,"Pedro Sanchez has been trying to secure support for a coalition government with the centre-right Ciudadanos party following inconclusive December polls. In an often acrimonious debate, acting PM Mariano Rajoy said a Socialist-led coalition would be a threat to Spain's national interests. Another vote will be held on Friday. If that vote is also unsuccessful, parliament will have a further two months to choose a government. If it is unable to do, fresh elections will be held on 26 June. Blame game begins in earnest Strain of Sanchez's bid to rule Kiss that showed real political passion Mr Sanchez needed an absolute majority in Wednesday's confidence vote but lost, with 219 votes against, 130 in favour and one abstention in the 350-seat lower house. Mr Rajoy - leader of the incumbent conservative Popular Party (PP) - called Mr Sanchez a ""fictitious, unreal candidate"". He told Mr Sanchez the PP's 122 deputies would vote against him ""because you plan to eliminate what was achieved in Spain throughout these past four years which prevented this country from needing a bailout, created jobs, improved its competitiveness and caused it to grow economically"". Conversely, Mr Sanchez was also under attack from the left. ""You want to consolidate the main policies of the PP,"" said Pablo Iglesias, the pony-tailed leader of the far-left Podemos party, which represents 69 seats. However, he did not rule out a united front with the Socialists entirely, urging Mr Sanchez to ""write the future of Spain together with us"" - but leaving aside Podemos's ideological foes, Ciudadanos. On Friday, Mr Sanchez will have another chance in a vote that requires only a simple majority. However, correspondents say that now looks doomed too - leaving the country in limbo at a time when the economy is growing but still suffers serious weaknesses, primarily an unemployment rate of nearly 21%. In an address to parliament on Tuesday, Mr Sanchez called for the formation of a coalition based on common interests. He said a Socialist-led government would enact a series of progressive measures such as a minimum wage increase and a gender wage-gap law. Between them, the Socialist PSOE and partner party Ciudadanos command only 130 seats in the lower chamber. The Popular Party gained most votes in the 20 December election but Mr Rajoy was unable to secure enough backing to form a government. The PSOE performed badly, hit by the emergence of Podemos and Ciudadanos, and the fragmented political landscape has eluded efforts to agree a governing coalition.",Spain's Socialist leader has lost a bid to form a government after both main rival parties voted down his attempts to form a coalition.,35703462
4,"Fifty lodges from around Ireland, along with visiting Orangemen and women, attended the event. The parade, accompanied by 30 bands, made its way along the rural coastal setting into the centre of the seaside village before a religious service. Many families with young children gathered at vantage points along the route to take in the demonstration. County grandmaster David Mahon said Donegal prides itself on being a family-friendly day with a relaxed atmosphere. The parade is the traditional opener to the annual 12 July celebrations marking King William III's victory at the Battle of the Boyne in 1690. Assistant Grand Master Stuart Brooker told the crowds the institution was ""being challenged on many fronts"". ""One of the biggest issues we currently face is organised opposition to our traditional parades, and in this respect, we are branded intransigent, and insensitive,"" he said. ""We have responded by challenging ourselves. And I believe that we have responded in a responsible manner, by reaching out to the wider community as never before."" He added that they have done so ""in a spirit of openness and goodwill, evidenced in the success of the Twelfth here in Rossnowlagh.""",Thousands of Orangemen and women have taken part in their annual parade in Rossnowlagh in County Donegal.,40542263


Unnamed: 0,document,summary,id
0,"The London borough is the UK local authority with the highest rate of HIV. Local councils and charities warn HIV test guidelines may not be implemented in England because of a lack of funds. New guidance from the National Institute for Heath and Care Excellence aims to increase testing in people with undiagnosed HIV in England. The guidance is published to coincide with World Aids Day. It is estimated that 103,700 people are living with HIV in the UK and 17% of people with the virus are unaware of their infection, so risk unintentionally passing it on to their sexual partners. Part of the new NICE guidance focuses on testing for HIV, which is the responsibility of local authorities, where there are high or extremely high rates of HIV. Two-thirds of late HIV diagnoses occur in these areas. Over one-third of the 152 council areas have high rates. The updated guidance recommends all patients in areas with high and extremely high rates of HIV be offered a test on admission to hospital, if they have not previously been diagnosed with HIV and are undergoing a blood test for another reason. In extremely high rate areas, hospitals should offer the tests even if they are not having blood tests as part of their care. GP surgeries in high and extremely high-rate areas should also offer patients an HIV test on registration. NICE also recommends testing community settings in these high rates areas, such as pharmacies, the voluntary sector and venues where there may be high-risk sexual behaviour. HIV experts have strongly welcomed the new guidance but told BBC's Victoria Derbyshire programme they are concerned the NICE guidance may not be implemented because of a lack of funds. Dr Chloe Orkin, from the British HIV Association, said prevention was simply not ""high"" enough on the government's agenda given pubic health budgets are being cut by nearly 4% a year. Councillor Izzi Seccombe, of the Local Government Association, said achieving what NICE was asking was going to be difficult. ""The strain placed on councils by the cuts by central government to public health budgets would make commissioning HIV testing in all surgeries and hospitals in high and extremely high-risk areas an unaffordable burden. ""Despite these limited resources, testing those in high-risk areas must always be a priority. Councils are commissioning HIV testing in a variety of settings."" But the Department of Health maintained councils had been provided with sufficient funding. Nonetheless, the Elton John Aids Foundation feels councils need help and has offered to fund HIV testing in Lambeth for two years. David Furnish, chairman of the Elton John Aids Foundation, said: ""I believe everyone should have an HIV test. We know we can make a difference in Lambeth, but there is no reason why we can't do this in future in other high-rate areas."" Jennifer Reiter, of Lambeth Council, added: ""We value Elton John Aids Foundation's support and are exploring ways with them to increase access to HIV testing."" The BBC's Victoria Derbyshire programme is broadcast on weekdays from 09:00 GMT on BBC Two and the BBC News Channel.","The Elton John Aids Foundation has offered to finance HIV testing in Lambeth, the Victoria Derbyshire programme has learned.",38164600
1,"Writing in PLOS ONE they say the gene fault may encourage the formation of blood clots - the ultimate cause of most heart attacks and strokes. Scientists hope gene tests may help doctors one day to pinpoint individuals more likely to suffer these conditions. But experts say lifestyle factors such as smoking and exercise have the greatest influence on risk. Around one in 10 people in the Caucasian population carries this variation of the gene, named PIA2. And researchers from King's College London reviewed more than 80 studies involving about 50,000 people - the largest analysis of this genetic fault to date. They found individuals with PIA2 were more likely to have a stroke - caused by a blood clot blocking blood supply to the brain - than those without the gene. Scientists calculate the gene increases a person's risk of having a stroke by 10-15%. But how significant this increase is depends on an individual's baseline risk - influenced by factors such as smoking, diet, weight and exercise, the scientists say. And for people with two copies of the gene the risk rises by up to 70% from this baseline. In a second study published in the same journal, the scientists show PIA2 is also linked to an increased risk of heart attacks in people under 45. More research is needed to see whether this holds true for the whole population, they say. About 150,000 people have a stroke in the UK each year and more than 100,000 heart attacks are recorded annually. Both thrombotic strokes (the most common kind) and heart attacks are caused by blockage of blood vessels in the heart and brain - ultimately through the formation of clots. The faulty gene appears to affect a protein called glycoprotein IIIa - present on platelets, natural clotting cells in the blood. Platelets help trigger the formation of clots to stop bleeding after injury. But scientists say carrying the gene may render them overactive. They caution that overall the genes play a smaller role in risk than more established factors, such as high blood pressure and obesity. But developing a genetic test could help predict people at highest risk, allowing doctors to suggest more potent medication or lifestyle changes, they say. Prof Albert Ferro, of King's College London, who led the research, told the BBC: ""We would now need to validate this test and see how useful it is in the clinical world."" Dr Shamim Quadir, of the Stroke Association, said: ""These latest results are an important step forward in stroke research. ""We hope the findings from this study could lead to many more people who are most at risk of this devastating condition being identified. ""However, if you have a family history of stroke or have any other risk factors, this does not mean the condition is inevitable. Regular exercise, eating a balanced diet and stopping smoking can be important steps to significantly reduce your stroke risk."" Prof Jeremy Pearson, of the British Heart Foundation, said: ""It is as yet uncertain whether a genetic test to detect a variation in this protein would be beneficial for patients in everyday practice. ""All patients who are at risk should be monitored to see whether or not lifestyle changes or medication have a positive impact on the more standard major risk factors such as high blood pressure and high cholesterol.""",Researchers have identified a gene that may put people at greater risk of strokes and heart attacks.,28117422
2,"The flight from Manchester Airport to Agadir in Morocco, was diverted to London Gatwick less than an hour after take-off on Thursday. The Thomson Airways Boeing 737-800 took off at 18:42 BST before being struck. A spokeswoman for the airline said it was an ""extremely rare"" event and the diversion was ""precautionary"". The flight later landed safely in Agadir. Liam Bolton, 27, from Chester in Cheshire, was travelling to Morocco for a holiday with his girlfriend when he heard a ""sudden crack"" on the aircraft. He said the plane ""lit up like someone had taken a photo"". ""It was about 10-15 minutes after take-off and there was a large flash... everyone turned round to each other and knew it was lightning. ""About half an hour later, the pilot announced we'd been hit by lightning and we'd be landing at Gatwick,"" he said. After around three hours on the runway, the same plane took off, he added. Thomson Airways has apologised for any inconvenience caused by the adverse weather conditions.",A plane has been forced to carry out an unexpected landing after being struck by lightning.,36560599
3,"Pedro Sanchez has been trying to secure support for a coalition government with the centre-right Ciudadanos party following inconclusive December polls. In an often acrimonious debate, acting PM Mariano Rajoy said a Socialist-led coalition would be a threat to Spain's national interests. Another vote will be held on Friday. If that vote is also unsuccessful, parliament will have a further two months to choose a government. If it is unable to do, fresh elections will be held on 26 June. Blame game begins in earnest Strain of Sanchez's bid to rule Kiss that showed real political passion Mr Sanchez needed an absolute majority in Wednesday's confidence vote but lost, with 219 votes against, 130 in favour and one abstention in the 350-seat lower house. Mr Rajoy - leader of the incumbent conservative Popular Party (PP) - called Mr Sanchez a ""fictitious, unreal candidate"". He told Mr Sanchez the PP's 122 deputies would vote against him ""because you plan to eliminate what was achieved in Spain throughout these past four years which prevented this country from needing a bailout, created jobs, improved its competitiveness and caused it to grow economically"". Conversely, Mr Sanchez was also under attack from the left. ""You want to consolidate the main policies of the PP,"" said Pablo Iglesias, the pony-tailed leader of the far-left Podemos party, which represents 69 seats. However, he did not rule out a united front with the Socialists entirely, urging Mr Sanchez to ""write the future of Spain together with us"" - but leaving aside Podemos's ideological foes, Ciudadanos. On Friday, Mr Sanchez will have another chance in a vote that requires only a simple majority. However, correspondents say that now looks doomed too - leaving the country in limbo at a time when the economy is growing but still suffers serious weaknesses, primarily an unemployment rate of nearly 21%. In an address to parliament on Tuesday, Mr Sanchez called for the formation of a coalition based on common interests. He said a Socialist-led government would enact a series of progressive measures such as a minimum wage increase and a gender wage-gap law. Between them, the Socialist PSOE and partner party Ciudadanos command only 130 seats in the lower chamber. The Popular Party gained most votes in the 20 December election but Mr Rajoy was unable to secure enough backing to form a government. The PSOE performed badly, hit by the emergence of Podemos and Ciudadanos, and the fragmented political landscape has eluded efforts to agree a governing coalition.",Spain's Socialist leader has lost a bid to form a government after both main rival parties voted down his attempts to form a coalition.,35703462
4,"Fifty lodges from around Ireland, along with visiting Orangemen and women, attended the event. The parade, accompanied by 30 bands, made its way along the rural coastal setting into the centre of the seaside village before a religious service. Many families with young children gathered at vantage points along the route to take in the demonstration. County grandmaster David Mahon said Donegal prides itself on being a family-friendly day with a relaxed atmosphere. The parade is the traditional opener to the annual 12 July celebrations marking King William III's victory at the Battle of the Boyne in 1690. Assistant Grand Master Stuart Brooker told the crowds the institution was ""being challenged on many fronts"". ""One of the biggest issues we currently face is organised opposition to our traditional parades, and in this respect, we are branded intransigent, and insensitive,"" he said. ""We have responded by challenging ourselves. And I believe that we have responded in a responsible manner, by reaching out to the wider community as never before."" He added that they have done so ""in a spirit of openness and goodwill, evidenced in the success of the Twelfth here in Rossnowlagh.""",Thousands of Orangemen and women have taken part in their annual parade in Rossnowlagh in County Donegal.,40542263


Unnamed: 0,document,summary,id
0,"The London borough is the UK local authority with the highest rate of HIV. Local councils and charities warn HIV test guidelines may not be implemented in England because of a lack of funds. New guidance from the National Institute for Heath and Care Excellence aims to increase testing in people with undiagnosed HIV in England. The guidance is published to coincide with World Aids Day. It is estimated that 103,700 people are living with HIV in the UK and 17% of people with the virus are unaware of their infection, so risk unintentionally passing it on to their sexual partners. Part of the new NICE guidance focuses on testing for HIV, which is the responsibility of local authorities, where there are high or extremely high rates of HIV. Two-thirds of late HIV diagnoses occur in these areas. Over one-third of the 152 council areas have high rates. The updated guidance recommends all patients in areas with high and extremely high rates of HIV be offered a test on admission to hospital, if they have not previously been diagnosed with HIV and are undergoing a blood test for another reason. In extremely high rate areas, hospitals should offer the tests even if they are not having blood tests as part of their care. GP surgeries in high and extremely high-rate areas should also offer patients an HIV test on registration. NICE also recommends testing community settings in these high rates areas, such as pharmacies, the voluntary sector and venues where there may be high-risk sexual behaviour. HIV experts have strongly welcomed the new guidance but told BBC's Victoria Derbyshire programme they are concerned the NICE guidance may not be implemented because of a lack of funds. Dr Chloe Orkin, from the British HIV Association, said prevention was simply not ""high"" enough on the government's agenda given pubic health budgets are being cut by nearly 4% a year. Councillor Izzi Seccombe, of the Local Government Association, said achieving what NICE was asking was going to be difficult. ""The strain placed on councils by the cuts by central government to public health budgets would make commissioning HIV testing in all surgeries and hospitals in high and extremely high-risk areas an unaffordable burden. ""Despite these limited resources, testing those in high-risk areas must always be a priority. Councils are commissioning HIV testing in a variety of settings."" But the Department of Health maintained councils had been provided with sufficient funding. Nonetheless, the Elton John Aids Foundation feels councils need help and has offered to fund HIV testing in Lambeth for two years. David Furnish, chairman of the Elton John Aids Foundation, said: ""I believe everyone should have an HIV test. We know we can make a difference in Lambeth, but there is no reason why we can't do this in future in other high-rate areas."" Jennifer Reiter, of Lambeth Council, added: ""We value Elton John Aids Foundation's support and are exploring ways with them to increase access to HIV testing."" The BBC's Victoria Derbyshire programme is broadcast on weekdays from 09:00 GMT on BBC Two and the BBC News Channel.","The Elton John Aids Foundation has offered to finance HIV testing in Lambeth, the Victoria Derbyshire programme has learned.",38164600
1,"Writing in PLOS ONE they say the gene fault may encourage the formation of blood clots - the ultimate cause of most heart attacks and strokes. Scientists hope gene tests may help doctors one day to pinpoint individuals more likely to suffer these conditions. But experts say lifestyle factors such as smoking and exercise have the greatest influence on risk. Around one in 10 people in the Caucasian population carries this variation of the gene, named PIA2. And researchers from King's College London reviewed more than 80 studies involving about 50,000 people - the largest analysis of this genetic fault to date. They found individuals with PIA2 were more likely to have a stroke - caused by a blood clot blocking blood supply to the brain - than those without the gene. Scientists calculate the gene increases a person's risk of having a stroke by 10-15%. But how significant this increase is depends on an individual's baseline risk - influenced by factors such as smoking, diet, weight and exercise, the scientists say. And for people with two copies of the gene the risk rises by up to 70% from this baseline. In a second study published in the same journal, the scientists show PIA2 is also linked to an increased risk of heart attacks in people under 45. More research is needed to see whether this holds true for the whole population, they say. About 150,000 people have a stroke in the UK each year and more than 100,000 heart attacks are recorded annually. Both thrombotic strokes (the most common kind) and heart attacks are caused by blockage of blood vessels in the heart and brain - ultimately through the formation of clots. The faulty gene appears to affect a protein called glycoprotein IIIa - present on platelets, natural clotting cells in the blood. Platelets help trigger the formation of clots to stop bleeding after injury. But scientists say carrying the gene may render them overactive. They caution that overall the genes play a smaller role in risk than more established factors, such as high blood pressure and obesity. But developing a genetic test could help predict people at highest risk, allowing doctors to suggest more potent medication or lifestyle changes, they say. Prof Albert Ferro, of King's College London, who led the research, told the BBC: ""We would now need to validate this test and see how useful it is in the clinical world."" Dr Shamim Quadir, of the Stroke Association, said: ""These latest results are an important step forward in stroke research. ""We hope the findings from this study could lead to many more people who are most at risk of this devastating condition being identified. ""However, if you have a family history of stroke or have any other risk factors, this does not mean the condition is inevitable. Regular exercise, eating a balanced diet and stopping smoking can be important steps to significantly reduce your stroke risk."" Prof Jeremy Pearson, of the British Heart Foundation, said: ""It is as yet uncertain whether a genetic test to detect a variation in this protein would be beneficial for patients in everyday practice. ""All patients who are at risk should be monitored to see whether or not lifestyle changes or medication have a positive impact on the more standard major risk factors such as high blood pressure and high cholesterol.""",Researchers have identified a gene that may put people at greater risk of strokes and heart attacks.,28117422
2,"The flight from Manchester Airport to Agadir in Morocco, was diverted to London Gatwick less than an hour after take-off on Thursday. The Thomson Airways Boeing 737-800 took off at 18:42 BST before being struck. A spokeswoman for the airline said it was an ""extremely rare"" event and the diversion was ""precautionary"". The flight later landed safely in Agadir. Liam Bolton, 27, from Chester in Cheshire, was travelling to Morocco for a holiday with his girlfriend when he heard a ""sudden crack"" on the aircraft. He said the plane ""lit up like someone had taken a photo"". ""It was about 10-15 minutes after take-off and there was a large flash... everyone turned round to each other and knew it was lightning. ""About half an hour later, the pilot announced we'd been hit by lightning and we'd be landing at Gatwick,"" he said. After around three hours on the runway, the same plane took off, he added. Thomson Airways has apologised for any inconvenience caused by the adverse weather conditions.",A plane has been forced to carry out an unexpected landing after being struck by lightning.,36560599
3,"Pedro Sanchez has been trying to secure support for a coalition government with the centre-right Ciudadanos party following inconclusive December polls. In an often acrimonious debate, acting PM Mariano Rajoy said a Socialist-led coalition would be a threat to Spain's national interests. Another vote will be held on Friday. If that vote is also unsuccessful, parliament will have a further two months to choose a government. If it is unable to do, fresh elections will be held on 26 June. Blame game begins in earnest Strain of Sanchez's bid to rule Kiss that showed real political passion Mr Sanchez needed an absolute majority in Wednesday's confidence vote but lost, with 219 votes against, 130 in favour and one abstention in the 350-seat lower house. Mr Rajoy - leader of the incumbent conservative Popular Party (PP) - called Mr Sanchez a ""fictitious, unreal candidate"". He told Mr Sanchez the PP's 122 deputies would vote against him ""because you plan to eliminate what was achieved in Spain throughout these past four years which prevented this country from needing a bailout, created jobs, improved its competitiveness and caused it to grow economically"". Conversely, Mr Sanchez was also under attack from the left. ""You want to consolidate the main policies of the PP,"" said Pablo Iglesias, the pony-tailed leader of the far-left Podemos party, which represents 69 seats. However, he did not rule out a united front with the Socialists entirely, urging Mr Sanchez to ""write the future of Spain together with us"" - but leaving aside Podemos's ideological foes, Ciudadanos. On Friday, Mr Sanchez will have another chance in a vote that requires only a simple majority. However, correspondents say that now looks doomed too - leaving the country in limbo at a time when the economy is growing but still suffers serious weaknesses, primarily an unemployment rate of nearly 21%. In an address to parliament on Tuesday, Mr Sanchez called for the formation of a coalition based on common interests. He said a Socialist-led government would enact a series of progressive measures such as a minimum wage increase and a gender wage-gap law. Between them, the Socialist PSOE and partner party Ciudadanos command only 130 seats in the lower chamber. The Popular Party gained most votes in the 20 December election but Mr Rajoy was unable to secure enough backing to form a government. The PSOE performed badly, hit by the emergence of Podemos and Ciudadanos, and the fragmented political landscape has eluded efforts to agree a governing coalition.",Spain's Socialist leader has lost a bid to form a government after both main rival parties voted down his attempts to form a coalition.,35703462
4,"Fifty lodges from around Ireland, along with visiting Orangemen and women, attended the event. The parade, accompanied by 30 bands, made its way along the rural coastal setting into the centre of the seaside village before a religious service. Many families with young children gathered at vantage points along the route to take in the demonstration. County grandmaster David Mahon said Donegal prides itself on being a family-friendly day with a relaxed atmosphere. The parade is the traditional opener to the annual 12 July celebrations marking King William III's victory at the Battle of the Boyne in 1690. Assistant Grand Master Stuart Brooker told the crowds the institution was ""being challenged on many fronts"". ""One of the biggest issues we currently face is organised opposition to our traditional parades, and in this respect, we are branded intransigent, and insensitive,"" he said. ""We have responded by challenging ourselves. And I believe that we have responded in a responsible manner, by reaching out to the wider community as never before."" He added that they have done so ""in a spirit of openness and goodwill, evidenced in the success of the Twelfth here in Rossnowlagh.""",Thousands of Orangemen and women have taken part in their annual parade in Rossnowlagh in County Donegal.,40542263


### We can view the column names and data types without our dataset using .features

In [10]:
xsum['test'].features

{'document': Value(dtype='string', id=None),
 'summary': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None)}

In [11]:
print(xsum['test'].info)

DatasetInfo(description='\nExtreme Summarization (XSum) Dataset.\n\nThere are three features:\n  - document: Input news article.\n  - summary: One sentence summary of the article.\n  - id: BBC ID of the article.\n\n', citation="\n@article{Narayan2018DontGM,\n  title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization},\n  author={Shashi Narayan and Shay B. Cohen and Mirella Lapata},\n  journal={ArXiv},\n  year={2018},\n  volume={abs/1808.08745}\n}\n", homepage='https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset', license='', features={'document': Value(dtype='string', id=None), 'summary': Value(dtype='string', id=None), 'id': Value(dtype='string', id=None)}, post_processed=None, supervised_keys=SupervisedKeysData(input='document', output='summary'), task_templates=None, builder_name='xsum', config_name='default', version=1.2.0, splits={'train': SplitInfo(name='train', num_bytes=479206615, num_examples=204045, data

# Preparing XSUM Data
Before we can put the text into a model we need to convert it into a format that the transformer can understand. Encoders and decoders only understand numerical values; we need to tokenize each word and then convert the tokens into numerical values. The tokenization transformer splits text into tokens and then adds special tokens if expected based on pretraining. The tokenizer then matches each token to unique id in vocabulary of tokenizer which has a corresponding vector of numerical values. These vectors contain the contextualized value of a word. For example, the vector representation of the word "to" isnt just "to", it also takes into account the words around it which are called context (right and left context). To continue this example, "Welcome to NYC" is a sentence that has the word "to". For the word "to" the left context is "Welcome" and the right context is "NYC". The output is based on these contexts; this is how the value is a contextualized vector thanks to self-attention mechanism. We can do all of this using the AutoTokenizer.from_pretarined method to ensure that we get a tokenizer that corresponds to the model architecture we want to use (facebook/bart-large-cnn); however, we will specifically reference the BartTokenizer in our checkpoint, tokenizer, and model to ensure all aspects of our model were trained using the same methodologies so we can avoid unexpected summaries

In [12]:
checkpoint = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint)

### We now write a function that preprocesses the test data by passing it to the tokenizer. We need to use the argument truncation=True to ensure that any input longer than the model can handle will be truncated to the maximum length alowed. We can view this information in the model config. BART has a maximum length of 1024 which we can see in max_position_embeddings

In [13]:
model.config

BartConfig {
  "_name_or_path": "facebook/bart-large-cnn",
  "_num_labels": 3,
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_final_layer_norm": false,
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "early_stopping": true,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "force_bos_token_to_be_generated": true,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "L

### We can now create the function with the maximum length allowed as per the config and an arbitrary minimum length. 

In [14]:
max_input_length = 1024
max_target_length = 100


def preperation_function(examples):
    inputs = [doc for doc in examples["document"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, padding=True)

    
    with tokenizer.as_target_tokenizer(): # Setup the tokenizer for summaries where "as_target_tokenizer" is what provides passes along the context for each vector
        labels = tokenizer(
            examples["summary"], max_length=max_target_length, truncation=True
        )

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

### We can apply this function to our dataset using map

In [15]:
tokenized_xsum = xsum.map(preperation_function, batched=True)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-2b651f21d6ec073a.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-35f38c35a797b587.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-c6fb5876cc0b65d3.arrow


In [16]:
tokenized_xsum

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11334
    })
})

In [17]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

### The attention mask tells the model what to pay attention to by passing values of 1 for tokens to consider and values of 0 for tokens to ignore. The input ids are the numerical mapping of tokens to BART's vocabulary; each word in BART's vocabulary is assigned a numerical value.

In [18]:
display_function(tokenized_xsum['test'])

Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","27 September 2016 Last updated at 06:53 BST Well, someone who knows a thing or two about baking is Amari, winner of Junior Bake Off in 2015. So we thought who better than Amari to rate the remaining Bake Off contestants on their skills! Watch her give Ricky her verdict on the bakers - and who she tips as the winner of this series.",37473317,"[0, 2518, 772, 336, 1426, 4752, 23, 15007, 35, 4540, 28964, 2647, 6, 951, 54, 2215, 10, 631, 50, 80, 59, 14814, 16, 1918, 1512, 6, 1924, 9, 6843, 24138, 4995, 11, 570, 4, 407, 52, 802, 54, 357, 87, 1918, 1512, 7, 731, 5, 2405, 24138, 4995, 15051, 15, 49, 2417, 328, 3075, 69, 492, 15260, 69, 7035, 15, 5, 741, 6650, 111, 8, 54, 79, 4965, 25, 5, 1924, 9, 42, 651, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 12375, 18, 110, 5548, 28983, 314, 11, 5, 2860, 1089, 24138, 4995, 10178, 116, 2]",Who's your favourite baker left in the Great British Bake Off tent?
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ebac in Newton Aycliffe expects to make up to 300,000 washing machines a year once it is fully operational. The family-run company, which also makes dehumidifiers and water coolers, was awarded from the government's Regional Growth Fund for the project. The production line is being officially opened by the Duke of Kent later. Currently, the three million washing machines purchased annually in the UK come from overseas. John Elliott, chairman of Ebac, said: ""It is so important that UK manufacturing receives support and recognition for the vital role it plays in the economy.""",34731902,"[0, 717, 428, 1043, 11, 10793, 5847, 20152, 3352, 7, 146, 62, 7, 2993, 6, 151, 14784, 6271, 10, 76, 683, 24, 16, 1950, 5903, 4, 20, 284, 12, 2962, 138, 6, 61, 67, 817, 263, 18257, 808, 27368, 8, 514, 3035, 268, 6, 21, 4241, 31, 5, 168, 18, 4722, 7498, 2896, 13, 5, 695, 4, 20, 931, 516, 16, 145, 4142, 1357, 30, 5, 5893, 9, 7890, 423, 4, 7519, 6, 5, 130, 153, 14784, 6271, 3584, 6333, 11, 5, 987, 283, 31, 4886, 4, 610, 7624, 6, 2243, 9, 12608, 1043, 6, 26, 35, 22, 243, 16, ...]","[0, 771, 8141, 3563, 3021, 16, 278, 7, 671, 7, 5, 987, 71, 10, 4044, 9, 818, 158, 107, 19, 5, 1273, 9, 10, 92, 5566, 11, 413, 13558, 4, 2]",Washing machine manufacturing is set to return to the UK after a gap of almost 10 years with the opening of a new factory in County Durham.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Frimley Health NHS Foundation Trust has 285 full-time vacancies across three sites, which cover Surrey and Berkshire. The high cost of living in the south of England is said to be a factor affecting recruitment. Each successful applicant will receive a maximum of £1,340 in subsidies. Nurses will be offered accommodation at Wexham Park Hospital in Slough, Frimley Park near Camberley and Heatherwood Hospital in Ascot. Marko Novosil moved from Croatia to become a nurse at Wexham Park after hearing about the incentive. ""The crucial thing for coming here was the support. I realised that when I started I would get the free accommodation which helped me settle in"", he said. Wexham Park Hospital matron Helen Noakes said: ""Rental prices are higher in this area, which means people do struggle and the one thing that we can offer people is the free accommodation when they start. ""Longer term we would look to help them find somewhere in the local area to live."" Currently the average monthly rent for a one-bedroom property in Slough is £897, whereas the average for the same sort of property in Camberley is £930. The average cost for a room in both areas ranges from £500 to £550.",37248976,"[0, 597, 6103, 607, 1309, 8681, 2475, 3101, 34, 31023, 455, 12, 958, 23466, 420, 130, 3091, 6, 61, 1719, 15693, 8, 16563, 4, 20, 239, 701, 9, 1207, 11, 5, 2077, 9, 1156, 16, 26, 7, 28, 10, 3724, 7920, 11049, 4, 4028, 1800, 20321, 40, 1325, 10, 4532, 9, 984, 134, 6, 24334, 11, 10256, 4, 24583, 293, 40, 28, 1661, 11607, 23, 166, 1178, 1908, 861, 2392, 11, 4424, 4894, 6, 4967, 757, 607, 861, 583, 4536, 1943, 607, 8, 10588, 1845, 2392, 11, 287, 22921, 4, 1190, 139, 1442, 366, 718, 1410, 31, 11437, 7, 555, ...]","[0, 487, 4668, 154, 633, 10858, 32, 145, 1661, 80, 377, 481, 11607, 11, 10, 2311, 7, 2677, 1641, 813, 12737, 23, 10, 1098, 2416, 4, 2]",Nursing job applicants are being offered two months free accommodation in a bid to quell staff shortages at a hospital trust.
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ofsted visited six Schools Partnership Trust (SPTA) academies as part of routine inspections and found five were not offering good quality education. One school retained its ""inadequate"" rating by Ofsted. SPTA accepted improvement was needed but said there was ""ample evidence"" it was ""an effective"" academy sponsor. The judgements are significant because putting poor performing schools under the leadership of non-profit-making academy trusts or sponsors is the government's key engine of school improvement. Sir Paul Edwards, chief executive of SPTA - which runs 44 schools in the Yorkshire and Humber area, has also worked as a government adviser in the Department for Education's academy and free school programme. 'Ill-prepared' England's education inspectorate is not allowed to inspect academy chains in the same way as it inspects local education authorities. Ofsted inspected the six schools over a 10-day period in June and sought further information on how SPTA has been performing on school improvement. It said concerns had been raised about how well it was performing. Four of the academies still required improvement, Ofsted said, although two of these had begun to improve. One academy remained inadequate, but a sixth had improved to ""good"" from its previous rating of ""satisfactory"". The inspections also highlighted key weaknesses in the schools, such as inconsistent teaching that does not challenge pupils enough and low standards at the end of primary school. This meant too many pupils had been ill-prepared for secondary schools, Ofsted said. It also said governors lacked expertise to challenge senior leaders on teaching quality. But inspectors added that most of the principals it contacted felt they were well supported by trust officers and that SPTA human resource departments had assisted in managing under-performing staff. In a letter to Sir Paul, Ofsted said: ""In summary, there is some evidence of effective school improvement, particularly in the initial start-up period after conversion to academy status. However, the quality and impact of governance arrangements are variable. ""There are further concerns regarding the depth and accuracy of SPTA analysis of data showing pupils' progress and the contribution this makes to rapid school improvement. ""Above all, there are too many underperforming academies which have remained in this position for too long."" An SPTA spokesman said the trust recognised the important role Ofsted had played in monitoring standards in the school system. ""The trust also recognises that Ofsted comments around areas for improvement are suggested on the basis of constructive dialogue to ensure all children receive a first class education, regardless of the school setting,"" he said. ""Equally however, the trust also recognises that the evidence to support these comments was largely drawn from a small sample of six schools, in a multi-academy trust that supports in excess of 42 schools and which contains two Teaching Schools, accredited through the National College of School Leadership. ""Whilst the trust looks forward to discussions with Ofsted about how to improve our performance, it is important to consider the facts in relation to the whole group, not just the six schools that were inspected."" SPTA is the third chain to be criticised by Ofsted, with critical letters recently sent to both the Kemnal Academies Trust (TKAT) and the E-ACT Trust - one of England's biggest academy organisations.",28544252,"[0, 10643, 16460, 3790, 411, 7101, 11697, 3101, 36, 4186, 3847, 43, 36271, 26804, 25, 233, 9, 6108, 15569, 8, 303, 292, 58, 45, 1839, 205, 1318, 1265, 4, 509, 334, 12544, 63, 22, 179, 37462, 877, 113, 691, 30, 1525, 16460, 4, 6178, 3847, 3903, 3855, 21, 956, 53, 26, 89, 21, 22, 29069, 1283, 113, 24, 21, 22, 260, 2375, 113, 11756, 9242, 4, 20, 21392, 41330, 32, 1233, 142, 2057, 2129, 4655, 1304, 223, 5, 1673, 9, 786, 12, 7699, 12, 5349, 11756, 21513, 50, 8919, 16, 5, 168, 18, 762, 3819, 9, 334, 3855, 4, 5348, ...]","[0, 4688, 36271, 26804, 2416, 669, 30, 10, 320, 168, 4988, 34, 57, 174, 350, 171, 9, 63, 1304, 32, 223, 12955, 8, 45, 3927, 1769, 615, 4, 2]",An academies trust led by a former government adviser has been told too many of its schools are underperforming and not improving fast enough.
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Stevens made his home debut in their 19-13 loss to Ealing, having joined the club after leaving financially-stricken London Welsh last month. ""We're pushing for top four, we want top four,"" the ex-Nottingham, Worcester and Plymouth Albion centre said. ""We want to do really well in the British and Irish Cup and we want to do as well as we can."" He told BBC Radio Jersey: ""This club needs to be pushed forward and I think they definitely are on the right road for doing that."" The island side are currently eighth in the Championship, but are just seven points behind fourth-placed Ealing and point further back from Doncaster in third. For the latest rugby union news follow @bbcrugbyunion on Twitter.",38495526,"[0, 33184, 29, 156, 39, 184, 2453, 11, 49, 753, 12, 1558, 872, 7, 381, 8279, 6, 519, 1770, 5, 950, 71, 1618, 10625, 12, 6031, 13552, 928, 12093, 94, 353, 4, 22, 170, 214, 3784, 13, 299, 237, 6, 52, 236, 299, 237, 60, 5, 1931, 12, 7199, 2577, 1908, 6, 22963, 8, 22524, 19032, 2100, 26, 4, 22, 170, 236, 7, 109, 269, 157, 11, 5, 1089, 8, 3445, 968, 8, 52, 236, 7, 109, 25, 157, 25, 52, 64, 72, 91, 174, 3295, 4611, 3123, 35, 22, 713, 950, 782, 7, 28, 3148, 556, 8, 38, 206, ...]","[0, 4030, 3123, 3442, 12424, 10283, 161, 5, 950, 32, 4453, 9, 3970, 5, 3261, 310, 12, 10816, 4, 2]",New Jersey signing Heath Stevens says the club are capable of reaching the Championship play-offs.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","27 September 2016 Last updated at 06:53 BST Well, someone who knows a thing or two about baking is Amari, winner of Junior Bake Off in 2015. So we thought who better than Amari to rate the remaining Bake Off contestants on their skills! Watch her give Ricky her verdict on the bakers - and who she tips as the winner of this series.",37473317,"[0, 2518, 772, 336, 1426, 4752, 23, 15007, 35, 4540, 28964, 2647, 6, 951, 54, 2215, 10, 631, 50, 80, 59, 14814, 16, 1918, 1512, 6, 1924, 9, 6843, 24138, 4995, 11, 570, 4, 407, 52, 802, 54, 357, 87, 1918, 1512, 7, 731, 5, 2405, 24138, 4995, 15051, 15, 49, 2417, 328, 3075, 69, 492, 15260, 69, 7035, 15, 5, 741, 6650, 111, 8, 54, 79, 4965, 25, 5, 1924, 9, 42, 651, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 12375, 18, 110, 5548, 28983, 314, 11, 5, 2860, 1089, 24138, 4995, 10178, 116, 2]",Who's your favourite baker left in the Great British Bake Off tent?
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ebac in Newton Aycliffe expects to make up to 300,000 washing machines a year once it is fully operational. The family-run company, which also makes dehumidifiers and water coolers, was awarded from the government's Regional Growth Fund for the project. The production line is being officially opened by the Duke of Kent later. Currently, the three million washing machines purchased annually in the UK come from overseas. John Elliott, chairman of Ebac, said: ""It is so important that UK manufacturing receives support and recognition for the vital role it plays in the economy.""",34731902,"[0, 717, 428, 1043, 11, 10793, 5847, 20152, 3352, 7, 146, 62, 7, 2993, 6, 151, 14784, 6271, 10, 76, 683, 24, 16, 1950, 5903, 4, 20, 284, 12, 2962, 138, 6, 61, 67, 817, 263, 18257, 808, 27368, 8, 514, 3035, 268, 6, 21, 4241, 31, 5, 168, 18, 4722, 7498, 2896, 13, 5, 695, 4, 20, 931, 516, 16, 145, 4142, 1357, 30, 5, 5893, 9, 7890, 423, 4, 7519, 6, 5, 130, 153, 14784, 6271, 3584, 6333, 11, 5, 987, 283, 31, 4886, 4, 610, 7624, 6, 2243, 9, 12608, 1043, 6, 26, 35, 22, 243, 16, ...]","[0, 771, 8141, 3563, 3021, 16, 278, 7, 671, 7, 5, 987, 71, 10, 4044, 9, 818, 158, 107, 19, 5, 1273, 9, 10, 92, 5566, 11, 413, 13558, 4, 2]",Washing machine manufacturing is set to return to the UK after a gap of almost 10 years with the opening of a new factory in County Durham.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Frimley Health NHS Foundation Trust has 285 full-time vacancies across three sites, which cover Surrey and Berkshire. The high cost of living in the south of England is said to be a factor affecting recruitment. Each successful applicant will receive a maximum of £1,340 in subsidies. Nurses will be offered accommodation at Wexham Park Hospital in Slough, Frimley Park near Camberley and Heatherwood Hospital in Ascot. Marko Novosil moved from Croatia to become a nurse at Wexham Park after hearing about the incentive. ""The crucial thing for coming here was the support. I realised that when I started I would get the free accommodation which helped me settle in"", he said. Wexham Park Hospital matron Helen Noakes said: ""Rental prices are higher in this area, which means people do struggle and the one thing that we can offer people is the free accommodation when they start. ""Longer term we would look to help them find somewhere in the local area to live."" Currently the average monthly rent for a one-bedroom property in Slough is £897, whereas the average for the same sort of property in Camberley is £930. The average cost for a room in both areas ranges from £500 to £550.",37248976,"[0, 597, 6103, 607, 1309, 8681, 2475, 3101, 34, 31023, 455, 12, 958, 23466, 420, 130, 3091, 6, 61, 1719, 15693, 8, 16563, 4, 20, 239, 701, 9, 1207, 11, 5, 2077, 9, 1156, 16, 26, 7, 28, 10, 3724, 7920, 11049, 4, 4028, 1800, 20321, 40, 1325, 10, 4532, 9, 984, 134, 6, 24334, 11, 10256, 4, 24583, 293, 40, 28, 1661, 11607, 23, 166, 1178, 1908, 861, 2392, 11, 4424, 4894, 6, 4967, 757, 607, 861, 583, 4536, 1943, 607, 8, 10588, 1845, 2392, 11, 287, 22921, 4, 1190, 139, 1442, 366, 718, 1410, 31, 11437, 7, 555, ...]","[0, 487, 4668, 154, 633, 10858, 32, 145, 1661, 80, 377, 481, 11607, 11, 10, 2311, 7, 2677, 1641, 813, 12737, 23, 10, 1098, 2416, 4, 2]",Nursing job applicants are being offered two months free accommodation in a bid to quell staff shortages at a hospital trust.
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ofsted visited six Schools Partnership Trust (SPTA) academies as part of routine inspections and found five were not offering good quality education. One school retained its ""inadequate"" rating by Ofsted. SPTA accepted improvement was needed but said there was ""ample evidence"" it was ""an effective"" academy sponsor. The judgements are significant because putting poor performing schools under the leadership of non-profit-making academy trusts or sponsors is the government's key engine of school improvement. Sir Paul Edwards, chief executive of SPTA - which runs 44 schools in the Yorkshire and Humber area, has also worked as a government adviser in the Department for Education's academy and free school programme. 'Ill-prepared' England's education inspectorate is not allowed to inspect academy chains in the same way as it inspects local education authorities. Ofsted inspected the six schools over a 10-day period in June and sought further information on how SPTA has been performing on school improvement. It said concerns had been raised about how well it was performing. Four of the academies still required improvement, Ofsted said, although two of these had begun to improve. One academy remained inadequate, but a sixth had improved to ""good"" from its previous rating of ""satisfactory"". The inspections also highlighted key weaknesses in the schools, such as inconsistent teaching that does not challenge pupils enough and low standards at the end of primary school. This meant too many pupils had been ill-prepared for secondary schools, Ofsted said. It also said governors lacked expertise to challenge senior leaders on teaching quality. But inspectors added that most of the principals it contacted felt they were well supported by trust officers and that SPTA human resource departments had assisted in managing under-performing staff. In a letter to Sir Paul, Ofsted said: ""In summary, there is some evidence of effective school improvement, particularly in the initial start-up period after conversion to academy status. However, the quality and impact of governance arrangements are variable. ""There are further concerns regarding the depth and accuracy of SPTA analysis of data showing pupils' progress and the contribution this makes to rapid school improvement. ""Above all, there are too many underperforming academies which have remained in this position for too long."" An SPTA spokesman said the trust recognised the important role Ofsted had played in monitoring standards in the school system. ""The trust also recognises that Ofsted comments around areas for improvement are suggested on the basis of constructive dialogue to ensure all children receive a first class education, regardless of the school setting,"" he said. ""Equally however, the trust also recognises that the evidence to support these comments was largely drawn from a small sample of six schools, in a multi-academy trust that supports in excess of 42 schools and which contains two Teaching Schools, accredited through the National College of School Leadership. ""Whilst the trust looks forward to discussions with Ofsted about how to improve our performance, it is important to consider the facts in relation to the whole group, not just the six schools that were inspected."" SPTA is the third chain to be criticised by Ofsted, with critical letters recently sent to both the Kemnal Academies Trust (TKAT) and the E-ACT Trust - one of England's biggest academy organisations.",28544252,"[0, 10643, 16460, 3790, 411, 7101, 11697, 3101, 36, 4186, 3847, 43, 36271, 26804, 25, 233, 9, 6108, 15569, 8, 303, 292, 58, 45, 1839, 205, 1318, 1265, 4, 509, 334, 12544, 63, 22, 179, 37462, 877, 113, 691, 30, 1525, 16460, 4, 6178, 3847, 3903, 3855, 21, 956, 53, 26, 89, 21, 22, 29069, 1283, 113, 24, 21, 22, 260, 2375, 113, 11756, 9242, 4, 20, 21392, 41330, 32, 1233, 142, 2057, 2129, 4655, 1304, 223, 5, 1673, 9, 786, 12, 7699, 12, 5349, 11756, 21513, 50, 8919, 16, 5, 168, 18, 762, 3819, 9, 334, 3855, 4, 5348, ...]","[0, 4688, 36271, 26804, 2416, 669, 30, 10, 320, 168, 4988, 34, 57, 174, 350, 171, 9, 63, 1304, 32, 223, 12955, 8, 45, 3927, 1769, 615, 4, 2]",An academies trust led by a former government adviser has been told too many of its schools are underperforming and not improving fast enough.
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Stevens made his home debut in their 19-13 loss to Ealing, having joined the club after leaving financially-stricken London Welsh last month. ""We're pushing for top four, we want top four,"" the ex-Nottingham, Worcester and Plymouth Albion centre said. ""We want to do really well in the British and Irish Cup and we want to do as well as we can."" He told BBC Radio Jersey: ""This club needs to be pushed forward and I think they definitely are on the right road for doing that."" The island side are currently eighth in the Championship, but are just seven points behind fourth-placed Ealing and point further back from Doncaster in third. For the latest rugby union news follow @bbcrugbyunion on Twitter.",38495526,"[0, 33184, 29, 156, 39, 184, 2453, 11, 49, 753, 12, 1558, 872, 7, 381, 8279, 6, 519, 1770, 5, 950, 71, 1618, 10625, 12, 6031, 13552, 928, 12093, 94, 353, 4, 22, 170, 214, 3784, 13, 299, 237, 6, 52, 236, 299, 237, 60, 5, 1931, 12, 7199, 2577, 1908, 6, 22963, 8, 22524, 19032, 2100, 26, 4, 22, 170, 236, 7, 109, 269, 157, 11, 5, 1089, 8, 3445, 968, 8, 52, 236, 7, 109, 25, 157, 25, 52, 64, 72, 91, 174, 3295, 4611, 3123, 35, 22, 713, 950, 782, 7, 28, 3148, 556, 8, 38, 206, ...]","[0, 4030, 3123, 3442, 12424, 10283, 161, 5, 950, 32, 4453, 9, 3970, 5, 3261, 310, 12, 10816, 4, 2]",New Jersey signing Heath Stevens says the club are capable of reaching the Championship play-offs.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","27 September 2016 Last updated at 06:53 BST Well, someone who knows a thing or two about baking is Amari, winner of Junior Bake Off in 2015. So we thought who better than Amari to rate the remaining Bake Off contestants on their skills! Watch her give Ricky her verdict on the bakers - and who she tips as the winner of this series.",37473317,"[0, 2518, 772, 336, 1426, 4752, 23, 15007, 35, 4540, 28964, 2647, 6, 951, 54, 2215, 10, 631, 50, 80, 59, 14814, 16, 1918, 1512, 6, 1924, 9, 6843, 24138, 4995, 11, 570, 4, 407, 52, 802, 54, 357, 87, 1918, 1512, 7, 731, 5, 2405, 24138, 4995, 15051, 15, 49, 2417, 328, 3075, 69, 492, 15260, 69, 7035, 15, 5, 741, 6650, 111, 8, 54, 79, 4965, 25, 5, 1924, 9, 42, 651, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 12375, 18, 110, 5548, 28983, 314, 11, 5, 2860, 1089, 24138, 4995, 10178, 116, 2]",Who's your favourite baker left in the Great British Bake Off tent?
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ebac in Newton Aycliffe expects to make up to 300,000 washing machines a year once it is fully operational. The family-run company, which also makes dehumidifiers and water coolers, was awarded from the government's Regional Growth Fund for the project. The production line is being officially opened by the Duke of Kent later. Currently, the three million washing machines purchased annually in the UK come from overseas. John Elliott, chairman of Ebac, said: ""It is so important that UK manufacturing receives support and recognition for the vital role it plays in the economy.""",34731902,"[0, 717, 428, 1043, 11, 10793, 5847, 20152, 3352, 7, 146, 62, 7, 2993, 6, 151, 14784, 6271, 10, 76, 683, 24, 16, 1950, 5903, 4, 20, 284, 12, 2962, 138, 6, 61, 67, 817, 263, 18257, 808, 27368, 8, 514, 3035, 268, 6, 21, 4241, 31, 5, 168, 18, 4722, 7498, 2896, 13, 5, 695, 4, 20, 931, 516, 16, 145, 4142, 1357, 30, 5, 5893, 9, 7890, 423, 4, 7519, 6, 5, 130, 153, 14784, 6271, 3584, 6333, 11, 5, 987, 283, 31, 4886, 4, 610, 7624, 6, 2243, 9, 12608, 1043, 6, 26, 35, 22, 243, 16, ...]","[0, 771, 8141, 3563, 3021, 16, 278, 7, 671, 7, 5, 987, 71, 10, 4044, 9, 818, 158, 107, 19, 5, 1273, 9, 10, 92, 5566, 11, 413, 13558, 4, 2]",Washing machine manufacturing is set to return to the UK after a gap of almost 10 years with the opening of a new factory in County Durham.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Frimley Health NHS Foundation Trust has 285 full-time vacancies across three sites, which cover Surrey and Berkshire. The high cost of living in the south of England is said to be a factor affecting recruitment. Each successful applicant will receive a maximum of £1,340 in subsidies. Nurses will be offered accommodation at Wexham Park Hospital in Slough, Frimley Park near Camberley and Heatherwood Hospital in Ascot. Marko Novosil moved from Croatia to become a nurse at Wexham Park after hearing about the incentive. ""The crucial thing for coming here was the support. I realised that when I started I would get the free accommodation which helped me settle in"", he said. Wexham Park Hospital matron Helen Noakes said: ""Rental prices are higher in this area, which means people do struggle and the one thing that we can offer people is the free accommodation when they start. ""Longer term we would look to help them find somewhere in the local area to live."" Currently the average monthly rent for a one-bedroom property in Slough is £897, whereas the average for the same sort of property in Camberley is £930. The average cost for a room in both areas ranges from £500 to £550.",37248976,"[0, 597, 6103, 607, 1309, 8681, 2475, 3101, 34, 31023, 455, 12, 958, 23466, 420, 130, 3091, 6, 61, 1719, 15693, 8, 16563, 4, 20, 239, 701, 9, 1207, 11, 5, 2077, 9, 1156, 16, 26, 7, 28, 10, 3724, 7920, 11049, 4, 4028, 1800, 20321, 40, 1325, 10, 4532, 9, 984, 134, 6, 24334, 11, 10256, 4, 24583, 293, 40, 28, 1661, 11607, 23, 166, 1178, 1908, 861, 2392, 11, 4424, 4894, 6, 4967, 757, 607, 861, 583, 4536, 1943, 607, 8, 10588, 1845, 2392, 11, 287, 22921, 4, 1190, 139, 1442, 366, 718, 1410, 31, 11437, 7, 555, ...]","[0, 487, 4668, 154, 633, 10858, 32, 145, 1661, 80, 377, 481, 11607, 11, 10, 2311, 7, 2677, 1641, 813, 12737, 23, 10, 1098, 2416, 4, 2]",Nursing job applicants are being offered two months free accommodation in a bid to quell staff shortages at a hospital trust.
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ofsted visited six Schools Partnership Trust (SPTA) academies as part of routine inspections and found five were not offering good quality education. One school retained its ""inadequate"" rating by Ofsted. SPTA accepted improvement was needed but said there was ""ample evidence"" it was ""an effective"" academy sponsor. The judgements are significant because putting poor performing schools under the leadership of non-profit-making academy trusts or sponsors is the government's key engine of school improvement. Sir Paul Edwards, chief executive of SPTA - which runs 44 schools in the Yorkshire and Humber area, has also worked as a government adviser in the Department for Education's academy and free school programme. 'Ill-prepared' England's education inspectorate is not allowed to inspect academy chains in the same way as it inspects local education authorities. Ofsted inspected the six schools over a 10-day period in June and sought further information on how SPTA has been performing on school improvement. It said concerns had been raised about how well it was performing. Four of the academies still required improvement, Ofsted said, although two of these had begun to improve. One academy remained inadequate, but a sixth had improved to ""good"" from its previous rating of ""satisfactory"". The inspections also highlighted key weaknesses in the schools, such as inconsistent teaching that does not challenge pupils enough and low standards at the end of primary school. This meant too many pupils had been ill-prepared for secondary schools, Ofsted said. It also said governors lacked expertise to challenge senior leaders on teaching quality. But inspectors added that most of the principals it contacted felt they were well supported by trust officers and that SPTA human resource departments had assisted in managing under-performing staff. In a letter to Sir Paul, Ofsted said: ""In summary, there is some evidence of effective school improvement, particularly in the initial start-up period after conversion to academy status. However, the quality and impact of governance arrangements are variable. ""There are further concerns regarding the depth and accuracy of SPTA analysis of data showing pupils' progress and the contribution this makes to rapid school improvement. ""Above all, there are too many underperforming academies which have remained in this position for too long."" An SPTA spokesman said the trust recognised the important role Ofsted had played in monitoring standards in the school system. ""The trust also recognises that Ofsted comments around areas for improvement are suggested on the basis of constructive dialogue to ensure all children receive a first class education, regardless of the school setting,"" he said. ""Equally however, the trust also recognises that the evidence to support these comments was largely drawn from a small sample of six schools, in a multi-academy trust that supports in excess of 42 schools and which contains two Teaching Schools, accredited through the National College of School Leadership. ""Whilst the trust looks forward to discussions with Ofsted about how to improve our performance, it is important to consider the facts in relation to the whole group, not just the six schools that were inspected."" SPTA is the third chain to be criticised by Ofsted, with critical letters recently sent to both the Kemnal Academies Trust (TKAT) and the E-ACT Trust - one of England's biggest academy organisations.",28544252,"[0, 10643, 16460, 3790, 411, 7101, 11697, 3101, 36, 4186, 3847, 43, 36271, 26804, 25, 233, 9, 6108, 15569, 8, 303, 292, 58, 45, 1839, 205, 1318, 1265, 4, 509, 334, 12544, 63, 22, 179, 37462, 877, 113, 691, 30, 1525, 16460, 4, 6178, 3847, 3903, 3855, 21, 956, 53, 26, 89, 21, 22, 29069, 1283, 113, 24, 21, 22, 260, 2375, 113, 11756, 9242, 4, 20, 21392, 41330, 32, 1233, 142, 2057, 2129, 4655, 1304, 223, 5, 1673, 9, 786, 12, 7699, 12, 5349, 11756, 21513, 50, 8919, 16, 5, 168, 18, 762, 3819, 9, 334, 3855, 4, 5348, ...]","[0, 4688, 36271, 26804, 2416, 669, 30, 10, 320, 168, 4988, 34, 57, 174, 350, 171, 9, 63, 1304, 32, 223, 12955, 8, 45, 3927, 1769, 615, 4, 2]",An academies trust led by a former government adviser has been told too many of its schools are underperforming and not improving fast enough.
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Stevens made his home debut in their 19-13 loss to Ealing, having joined the club after leaving financially-stricken London Welsh last month. ""We're pushing for top four, we want top four,"" the ex-Nottingham, Worcester and Plymouth Albion centre said. ""We want to do really well in the British and Irish Cup and we want to do as well as we can."" He told BBC Radio Jersey: ""This club needs to be pushed forward and I think they definitely are on the right road for doing that."" The island side are currently eighth in the Championship, but are just seven points behind fourth-placed Ealing and point further back from Doncaster in third. For the latest rugby union news follow @bbcrugbyunion on Twitter.",38495526,"[0, 33184, 29, 156, 39, 184, 2453, 11, 49, 753, 12, 1558, 872, 7, 381, 8279, 6, 519, 1770, 5, 950, 71, 1618, 10625, 12, 6031, 13552, 928, 12093, 94, 353, 4, 22, 170, 214, 3784, 13, 299, 237, 6, 52, 236, 299, 237, 60, 5, 1931, 12, 7199, 2577, 1908, 6, 22963, 8, 22524, 19032, 2100, 26, 4, 22, 170, 236, 7, 109, 269, 157, 11, 5, 1089, 8, 3445, 968, 8, 52, 236, 7, 109, 25, 157, 25, 52, 64, 72, 91, 174, 3295, 4611, 3123, 35, 22, 713, 950, 782, 7, 28, 3148, 556, 8, 38, 206, ...]","[0, 4030, 3123, 3442, 12424, 10283, 161, 5, 950, 32, 4453, 9, 3970, 5, 3261, 310, 12, 10816, 4, 2]",New Jersey signing Heath Stevens says the club are capable of reaching the Championship play-offs.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","27 September 2016 Last updated at 06:53 BST Well, someone who knows a thing or two about baking is Amari, winner of Junior Bake Off in 2015. So we thought who better than Amari to rate the remaining Bake Off contestants on their skills! Watch her give Ricky her verdict on the bakers - and who she tips as the winner of this series.",37473317,"[0, 2518, 772, 336, 1426, 4752, 23, 15007, 35, 4540, 28964, 2647, 6, 951, 54, 2215, 10, 631, 50, 80, 59, 14814, 16, 1918, 1512, 6, 1924, 9, 6843, 24138, 4995, 11, 570, 4, 407, 52, 802, 54, 357, 87, 1918, 1512, 7, 731, 5, 2405, 24138, 4995, 15051, 15, 49, 2417, 328, 3075, 69, 492, 15260, 69, 7035, 15, 5, 741, 6650, 111, 8, 54, 79, 4965, 25, 5, 1924, 9, 42, 651, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 12375, 18, 110, 5548, 28983, 314, 11, 5, 2860, 1089, 24138, 4995, 10178, 116, 2]",Who's your favourite baker left in the Great British Bake Off tent?
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ebac in Newton Aycliffe expects to make up to 300,000 washing machines a year once it is fully operational. The family-run company, which also makes dehumidifiers and water coolers, was awarded from the government's Regional Growth Fund for the project. The production line is being officially opened by the Duke of Kent later. Currently, the three million washing machines purchased annually in the UK come from overseas. John Elliott, chairman of Ebac, said: ""It is so important that UK manufacturing receives support and recognition for the vital role it plays in the economy.""",34731902,"[0, 717, 428, 1043, 11, 10793, 5847, 20152, 3352, 7, 146, 62, 7, 2993, 6, 151, 14784, 6271, 10, 76, 683, 24, 16, 1950, 5903, 4, 20, 284, 12, 2962, 138, 6, 61, 67, 817, 263, 18257, 808, 27368, 8, 514, 3035, 268, 6, 21, 4241, 31, 5, 168, 18, 4722, 7498, 2896, 13, 5, 695, 4, 20, 931, 516, 16, 145, 4142, 1357, 30, 5, 5893, 9, 7890, 423, 4, 7519, 6, 5, 130, 153, 14784, 6271, 3584, 6333, 11, 5, 987, 283, 31, 4886, 4, 610, 7624, 6, 2243, 9, 12608, 1043, 6, 26, 35, 22, 243, 16, ...]","[0, 771, 8141, 3563, 3021, 16, 278, 7, 671, 7, 5, 987, 71, 10, 4044, 9, 818, 158, 107, 19, 5, 1273, 9, 10, 92, 5566, 11, 413, 13558, 4, 2]",Washing machine manufacturing is set to return to the UK after a gap of almost 10 years with the opening of a new factory in County Durham.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Frimley Health NHS Foundation Trust has 285 full-time vacancies across three sites, which cover Surrey and Berkshire. The high cost of living in the south of England is said to be a factor affecting recruitment. Each successful applicant will receive a maximum of £1,340 in subsidies. Nurses will be offered accommodation at Wexham Park Hospital in Slough, Frimley Park near Camberley and Heatherwood Hospital in Ascot. Marko Novosil moved from Croatia to become a nurse at Wexham Park after hearing about the incentive. ""The crucial thing for coming here was the support. I realised that when I started I would get the free accommodation which helped me settle in"", he said. Wexham Park Hospital matron Helen Noakes said: ""Rental prices are higher in this area, which means people do struggle and the one thing that we can offer people is the free accommodation when they start. ""Longer term we would look to help them find somewhere in the local area to live."" Currently the average monthly rent for a one-bedroom property in Slough is £897, whereas the average for the same sort of property in Camberley is £930. The average cost for a room in both areas ranges from £500 to £550.",37248976,"[0, 597, 6103, 607, 1309, 8681, 2475, 3101, 34, 31023, 455, 12, 958, 23466, 420, 130, 3091, 6, 61, 1719, 15693, 8, 16563, 4, 20, 239, 701, 9, 1207, 11, 5, 2077, 9, 1156, 16, 26, 7, 28, 10, 3724, 7920, 11049, 4, 4028, 1800, 20321, 40, 1325, 10, 4532, 9, 984, 134, 6, 24334, 11, 10256, 4, 24583, 293, 40, 28, 1661, 11607, 23, 166, 1178, 1908, 861, 2392, 11, 4424, 4894, 6, 4967, 757, 607, 861, 583, 4536, 1943, 607, 8, 10588, 1845, 2392, 11, 287, 22921, 4, 1190, 139, 1442, 366, 718, 1410, 31, 11437, 7, 555, ...]","[0, 487, 4668, 154, 633, 10858, 32, 145, 1661, 80, 377, 481, 11607, 11, 10, 2311, 7, 2677, 1641, 813, 12737, 23, 10, 1098, 2416, 4, 2]",Nursing job applicants are being offered two months free accommodation in a bid to quell staff shortages at a hospital trust.
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ofsted visited six Schools Partnership Trust (SPTA) academies as part of routine inspections and found five were not offering good quality education. One school retained its ""inadequate"" rating by Ofsted. SPTA accepted improvement was needed but said there was ""ample evidence"" it was ""an effective"" academy sponsor. The judgements are significant because putting poor performing schools under the leadership of non-profit-making academy trusts or sponsors is the government's key engine of school improvement. Sir Paul Edwards, chief executive of SPTA - which runs 44 schools in the Yorkshire and Humber area, has also worked as a government adviser in the Department for Education's academy and free school programme. 'Ill-prepared' England's education inspectorate is not allowed to inspect academy chains in the same way as it inspects local education authorities. Ofsted inspected the six schools over a 10-day period in June and sought further information on how SPTA has been performing on school improvement. It said concerns had been raised about how well it was performing. Four of the academies still required improvement, Ofsted said, although two of these had begun to improve. One academy remained inadequate, but a sixth had improved to ""good"" from its previous rating of ""satisfactory"". The inspections also highlighted key weaknesses in the schools, such as inconsistent teaching that does not challenge pupils enough and low standards at the end of primary school. This meant too many pupils had been ill-prepared for secondary schools, Ofsted said. It also said governors lacked expertise to challenge senior leaders on teaching quality. But inspectors added that most of the principals it contacted felt they were well supported by trust officers and that SPTA human resource departments had assisted in managing under-performing staff. In a letter to Sir Paul, Ofsted said: ""In summary, there is some evidence of effective school improvement, particularly in the initial start-up period after conversion to academy status. However, the quality and impact of governance arrangements are variable. ""There are further concerns regarding the depth and accuracy of SPTA analysis of data showing pupils' progress and the contribution this makes to rapid school improvement. ""Above all, there are too many underperforming academies which have remained in this position for too long."" An SPTA spokesman said the trust recognised the important role Ofsted had played in monitoring standards in the school system. ""The trust also recognises that Ofsted comments around areas for improvement are suggested on the basis of constructive dialogue to ensure all children receive a first class education, regardless of the school setting,"" he said. ""Equally however, the trust also recognises that the evidence to support these comments was largely drawn from a small sample of six schools, in a multi-academy trust that supports in excess of 42 schools and which contains two Teaching Schools, accredited through the National College of School Leadership. ""Whilst the trust looks forward to discussions with Ofsted about how to improve our performance, it is important to consider the facts in relation to the whole group, not just the six schools that were inspected."" SPTA is the third chain to be criticised by Ofsted, with critical letters recently sent to both the Kemnal Academies Trust (TKAT) and the E-ACT Trust - one of England's biggest academy organisations.",28544252,"[0, 10643, 16460, 3790, 411, 7101, 11697, 3101, 36, 4186, 3847, 43, 36271, 26804, 25, 233, 9, 6108, 15569, 8, 303, 292, 58, 45, 1839, 205, 1318, 1265, 4, 509, 334, 12544, 63, 22, 179, 37462, 877, 113, 691, 30, 1525, 16460, 4, 6178, 3847, 3903, 3855, 21, 956, 53, 26, 89, 21, 22, 29069, 1283, 113, 24, 21, 22, 260, 2375, 113, 11756, 9242, 4, 20, 21392, 41330, 32, 1233, 142, 2057, 2129, 4655, 1304, 223, 5, 1673, 9, 786, 12, 7699, 12, 5349, 11756, 21513, 50, 8919, 16, 5, 168, 18, 762, 3819, 9, 334, 3855, 4, 5348, ...]","[0, 4688, 36271, 26804, 2416, 669, 30, 10, 320, 168, 4988, 34, 57, 174, 350, 171, 9, 63, 1304, 32, 223, 12955, 8, 45, 3927, 1769, 615, 4, 2]",An academies trust led by a former government adviser has been told too many of its schools are underperforming and not improving fast enough.
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Stevens made his home debut in their 19-13 loss to Ealing, having joined the club after leaving financially-stricken London Welsh last month. ""We're pushing for top four, we want top four,"" the ex-Nottingham, Worcester and Plymouth Albion centre said. ""We want to do really well in the British and Irish Cup and we want to do as well as we can."" He told BBC Radio Jersey: ""This club needs to be pushed forward and I think they definitely are on the right road for doing that."" The island side are currently eighth in the Championship, but are just seven points behind fourth-placed Ealing and point further back from Doncaster in third. For the latest rugby union news follow @bbcrugbyunion on Twitter.",38495526,"[0, 33184, 29, 156, 39, 184, 2453, 11, 49, 753, 12, 1558, 872, 7, 381, 8279, 6, 519, 1770, 5, 950, 71, 1618, 10625, 12, 6031, 13552, 928, 12093, 94, 353, 4, 22, 170, 214, 3784, 13, 299, 237, 6, 52, 236, 299, 237, 60, 5, 1931, 12, 7199, 2577, 1908, 6, 22963, 8, 22524, 19032, 2100, 26, 4, 22, 170, 236, 7, 109, 269, 157, 11, 5, 1089, 8, 3445, 968, 8, 52, 236, 7, 109, 25, 157, 25, 52, 64, 72, 91, 174, 3295, 4611, 3123, 35, 22, 713, 950, 782, 7, 28, 3148, 556, 8, 38, 206, ...]","[0, 4030, 3123, 3442, 12424, 10283, 161, 5, 950, 32, 4453, 9, 3970, 5, 3261, 310, 12, 10816, 4, 2]",New Jersey signing Heath Stevens says the club are capable of reaching the Championship play-offs.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","27 September 2016 Last updated at 06:53 BST Well, someone who knows a thing or two about baking is Amari, winner of Junior Bake Off in 2015. So we thought who better than Amari to rate the remaining Bake Off contestants on their skills! Watch her give Ricky her verdict on the bakers - and who she tips as the winner of this series.",37473317,"[0, 2518, 772, 336, 1426, 4752, 23, 15007, 35, 4540, 28964, 2647, 6, 951, 54, 2215, 10, 631, 50, 80, 59, 14814, 16, 1918, 1512, 6, 1924, 9, 6843, 24138, 4995, 11, 570, 4, 407, 52, 802, 54, 357, 87, 1918, 1512, 7, 731, 5, 2405, 24138, 4995, 15051, 15, 49, 2417, 328, 3075, 69, 492, 15260, 69, 7035, 15, 5, 741, 6650, 111, 8, 54, 79, 4965, 25, 5, 1924, 9, 42, 651, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 12375, 18, 110, 5548, 28983, 314, 11, 5, 2860, 1089, 24138, 4995, 10178, 116, 2]",Who's your favourite baker left in the Great British Bake Off tent?
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ebac in Newton Aycliffe expects to make up to 300,000 washing machines a year once it is fully operational. The family-run company, which also makes dehumidifiers and water coolers, was awarded from the government's Regional Growth Fund for the project. The production line is being officially opened by the Duke of Kent later. Currently, the three million washing machines purchased annually in the UK come from overseas. John Elliott, chairman of Ebac, said: ""It is so important that UK manufacturing receives support and recognition for the vital role it plays in the economy.""",34731902,"[0, 717, 428, 1043, 11, 10793, 5847, 20152, 3352, 7, 146, 62, 7, 2993, 6, 151, 14784, 6271, 10, 76, 683, 24, 16, 1950, 5903, 4, 20, 284, 12, 2962, 138, 6, 61, 67, 817, 263, 18257, 808, 27368, 8, 514, 3035, 268, 6, 21, 4241, 31, 5, 168, 18, 4722, 7498, 2896, 13, 5, 695, 4, 20, 931, 516, 16, 145, 4142, 1357, 30, 5, 5893, 9, 7890, 423, 4, 7519, 6, 5, 130, 153, 14784, 6271, 3584, 6333, 11, 5, 987, 283, 31, 4886, 4, 610, 7624, 6, 2243, 9, 12608, 1043, 6, 26, 35, 22, 243, 16, ...]","[0, 771, 8141, 3563, 3021, 16, 278, 7, 671, 7, 5, 987, 71, 10, 4044, 9, 818, 158, 107, 19, 5, 1273, 9, 10, 92, 5566, 11, 413, 13558, 4, 2]",Washing machine manufacturing is set to return to the UK after a gap of almost 10 years with the opening of a new factory in County Durham.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Frimley Health NHS Foundation Trust has 285 full-time vacancies across three sites, which cover Surrey and Berkshire. The high cost of living in the south of England is said to be a factor affecting recruitment. Each successful applicant will receive a maximum of £1,340 in subsidies. Nurses will be offered accommodation at Wexham Park Hospital in Slough, Frimley Park near Camberley and Heatherwood Hospital in Ascot. Marko Novosil moved from Croatia to become a nurse at Wexham Park after hearing about the incentive. ""The crucial thing for coming here was the support. I realised that when I started I would get the free accommodation which helped me settle in"", he said. Wexham Park Hospital matron Helen Noakes said: ""Rental prices are higher in this area, which means people do struggle and the one thing that we can offer people is the free accommodation when they start. ""Longer term we would look to help them find somewhere in the local area to live."" Currently the average monthly rent for a one-bedroom property in Slough is £897, whereas the average for the same sort of property in Camberley is £930. The average cost for a room in both areas ranges from £500 to £550.",37248976,"[0, 597, 6103, 607, 1309, 8681, 2475, 3101, 34, 31023, 455, 12, 958, 23466, 420, 130, 3091, 6, 61, 1719, 15693, 8, 16563, 4, 20, 239, 701, 9, 1207, 11, 5, 2077, 9, 1156, 16, 26, 7, 28, 10, 3724, 7920, 11049, 4, 4028, 1800, 20321, 40, 1325, 10, 4532, 9, 984, 134, 6, 24334, 11, 10256, 4, 24583, 293, 40, 28, 1661, 11607, 23, 166, 1178, 1908, 861, 2392, 11, 4424, 4894, 6, 4967, 757, 607, 861, 583, 4536, 1943, 607, 8, 10588, 1845, 2392, 11, 287, 22921, 4, 1190, 139, 1442, 366, 718, 1410, 31, 11437, 7, 555, ...]","[0, 487, 4668, 154, 633, 10858, 32, 145, 1661, 80, 377, 481, 11607, 11, 10, 2311, 7, 2677, 1641, 813, 12737, 23, 10, 1098, 2416, 4, 2]",Nursing job applicants are being offered two months free accommodation in a bid to quell staff shortages at a hospital trust.
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ofsted visited six Schools Partnership Trust (SPTA) academies as part of routine inspections and found five were not offering good quality education. One school retained its ""inadequate"" rating by Ofsted. SPTA accepted improvement was needed but said there was ""ample evidence"" it was ""an effective"" academy sponsor. The judgements are significant because putting poor performing schools under the leadership of non-profit-making academy trusts or sponsors is the government's key engine of school improvement. Sir Paul Edwards, chief executive of SPTA - which runs 44 schools in the Yorkshire and Humber area, has also worked as a government adviser in the Department for Education's academy and free school programme. 'Ill-prepared' England's education inspectorate is not allowed to inspect academy chains in the same way as it inspects local education authorities. Ofsted inspected the six schools over a 10-day period in June and sought further information on how SPTA has been performing on school improvement. It said concerns had been raised about how well it was performing. Four of the academies still required improvement, Ofsted said, although two of these had begun to improve. One academy remained inadequate, but a sixth had improved to ""good"" from its previous rating of ""satisfactory"". The inspections also highlighted key weaknesses in the schools, such as inconsistent teaching that does not challenge pupils enough and low standards at the end of primary school. This meant too many pupils had been ill-prepared for secondary schools, Ofsted said. It also said governors lacked expertise to challenge senior leaders on teaching quality. But inspectors added that most of the principals it contacted felt they were well supported by trust officers and that SPTA human resource departments had assisted in managing under-performing staff. In a letter to Sir Paul, Ofsted said: ""In summary, there is some evidence of effective school improvement, particularly in the initial start-up period after conversion to academy status. However, the quality and impact of governance arrangements are variable. ""There are further concerns regarding the depth and accuracy of SPTA analysis of data showing pupils' progress and the contribution this makes to rapid school improvement. ""Above all, there are too many underperforming academies which have remained in this position for too long."" An SPTA spokesman said the trust recognised the important role Ofsted had played in monitoring standards in the school system. ""The trust also recognises that Ofsted comments around areas for improvement are suggested on the basis of constructive dialogue to ensure all children receive a first class education, regardless of the school setting,"" he said. ""Equally however, the trust also recognises that the evidence to support these comments was largely drawn from a small sample of six schools, in a multi-academy trust that supports in excess of 42 schools and which contains two Teaching Schools, accredited through the National College of School Leadership. ""Whilst the trust looks forward to discussions with Ofsted about how to improve our performance, it is important to consider the facts in relation to the whole group, not just the six schools that were inspected."" SPTA is the third chain to be criticised by Ofsted, with critical letters recently sent to both the Kemnal Academies Trust (TKAT) and the E-ACT Trust - one of England's biggest academy organisations.",28544252,"[0, 10643, 16460, 3790, 411, 7101, 11697, 3101, 36, 4186, 3847, 43, 36271, 26804, 25, 233, 9, 6108, 15569, 8, 303, 292, 58, 45, 1839, 205, 1318, 1265, 4, 509, 334, 12544, 63, 22, 179, 37462, 877, 113, 691, 30, 1525, 16460, 4, 6178, 3847, 3903, 3855, 21, 956, 53, 26, 89, 21, 22, 29069, 1283, 113, 24, 21, 22, 260, 2375, 113, 11756, 9242, 4, 20, 21392, 41330, 32, 1233, 142, 2057, 2129, 4655, 1304, 223, 5, 1673, 9, 786, 12, 7699, 12, 5349, 11756, 21513, 50, 8919, 16, 5, 168, 18, 762, 3819, 9, 334, 3855, 4, 5348, ...]","[0, 4688, 36271, 26804, 2416, 669, 30, 10, 320, 168, 4988, 34, 57, 174, 350, 171, 9, 63, 1304, 32, 223, 12955, 8, 45, 3927, 1769, 615, 4, 2]",An academies trust led by a former government adviser has been told too many of its schools are underperforming and not improving fast enough.
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Stevens made his home debut in their 19-13 loss to Ealing, having joined the club after leaving financially-stricken London Welsh last month. ""We're pushing for top four, we want top four,"" the ex-Nottingham, Worcester and Plymouth Albion centre said. ""We want to do really well in the British and Irish Cup and we want to do as well as we can."" He told BBC Radio Jersey: ""This club needs to be pushed forward and I think they definitely are on the right road for doing that."" The island side are currently eighth in the Championship, but are just seven points behind fourth-placed Ealing and point further back from Doncaster in third. For the latest rugby union news follow @bbcrugbyunion on Twitter.",38495526,"[0, 33184, 29, 156, 39, 184, 2453, 11, 49, 753, 12, 1558, 872, 7, 381, 8279, 6, 519, 1770, 5, 950, 71, 1618, 10625, 12, 6031, 13552, 928, 12093, 94, 353, 4, 22, 170, 214, 3784, 13, 299, 237, 6, 52, 236, 299, 237, 60, 5, 1931, 12, 7199, 2577, 1908, 6, 22963, 8, 22524, 19032, 2100, 26, 4, 22, 170, 236, 7, 109, 269, 157, 11, 5, 1089, 8, 3445, 968, 8, 52, 236, 7, 109, 25, 157, 25, 52, 64, 72, 91, 174, 3295, 4611, 3123, 35, 22, 713, 950, 782, 7, 28, 3148, 556, 8, 38, 206, ...]","[0, 4030, 3123, 3442, 12424, 10283, 161, 5, 950, 32, 4453, 9, 3970, 5, 3261, 310, 12, 10816, 4, 2]",New Jersey signing Heath Stevens says the club are capable of reaching the Championship play-offs.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","27 September 2016 Last updated at 06:53 BST Well, someone who knows a thing or two about baking is Amari, winner of Junior Bake Off in 2015. So we thought who better than Amari to rate the remaining Bake Off contestants on their skills! Watch her give Ricky her verdict on the bakers - and who she tips as the winner of this series.",37473317,"[0, 2518, 772, 336, 1426, 4752, 23, 15007, 35, 4540, 28964, 2647, 6, 951, 54, 2215, 10, 631, 50, 80, 59, 14814, 16, 1918, 1512, 6, 1924, 9, 6843, 24138, 4995, 11, 570, 4, 407, 52, 802, 54, 357, 87, 1918, 1512, 7, 731, 5, 2405, 24138, 4995, 15051, 15, 49, 2417, 328, 3075, 69, 492, 15260, 69, 7035, 15, 5, 741, 6650, 111, 8, 54, 79, 4965, 25, 5, 1924, 9, 42, 651, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 12375, 18, 110, 5548, 28983, 314, 11, 5, 2860, 1089, 24138, 4995, 10178, 116, 2]",Who's your favourite baker left in the Great British Bake Off tent?
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ebac in Newton Aycliffe expects to make up to 300,000 washing machines a year once it is fully operational. The family-run company, which also makes dehumidifiers and water coolers, was awarded from the government's Regional Growth Fund for the project. The production line is being officially opened by the Duke of Kent later. Currently, the three million washing machines purchased annually in the UK come from overseas. John Elliott, chairman of Ebac, said: ""It is so important that UK manufacturing receives support and recognition for the vital role it plays in the economy.""",34731902,"[0, 717, 428, 1043, 11, 10793, 5847, 20152, 3352, 7, 146, 62, 7, 2993, 6, 151, 14784, 6271, 10, 76, 683, 24, 16, 1950, 5903, 4, 20, 284, 12, 2962, 138, 6, 61, 67, 817, 263, 18257, 808, 27368, 8, 514, 3035, 268, 6, 21, 4241, 31, 5, 168, 18, 4722, 7498, 2896, 13, 5, 695, 4, 20, 931, 516, 16, 145, 4142, 1357, 30, 5, 5893, 9, 7890, 423, 4, 7519, 6, 5, 130, 153, 14784, 6271, 3584, 6333, 11, 5, 987, 283, 31, 4886, 4, 610, 7624, 6, 2243, 9, 12608, 1043, 6, 26, 35, 22, 243, 16, ...]","[0, 771, 8141, 3563, 3021, 16, 278, 7, 671, 7, 5, 987, 71, 10, 4044, 9, 818, 158, 107, 19, 5, 1273, 9, 10, 92, 5566, 11, 413, 13558, 4, 2]",Washing machine manufacturing is set to return to the UK after a gap of almost 10 years with the opening of a new factory in County Durham.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Frimley Health NHS Foundation Trust has 285 full-time vacancies across three sites, which cover Surrey and Berkshire. The high cost of living in the south of England is said to be a factor affecting recruitment. Each successful applicant will receive a maximum of £1,340 in subsidies. Nurses will be offered accommodation at Wexham Park Hospital in Slough, Frimley Park near Camberley and Heatherwood Hospital in Ascot. Marko Novosil moved from Croatia to become a nurse at Wexham Park after hearing about the incentive. ""The crucial thing for coming here was the support. I realised that when I started I would get the free accommodation which helped me settle in"", he said. Wexham Park Hospital matron Helen Noakes said: ""Rental prices are higher in this area, which means people do struggle and the one thing that we can offer people is the free accommodation when they start. ""Longer term we would look to help them find somewhere in the local area to live."" Currently the average monthly rent for a one-bedroom property in Slough is £897, whereas the average for the same sort of property in Camberley is £930. The average cost for a room in both areas ranges from £500 to £550.",37248976,"[0, 597, 6103, 607, 1309, 8681, 2475, 3101, 34, 31023, 455, 12, 958, 23466, 420, 130, 3091, 6, 61, 1719, 15693, 8, 16563, 4, 20, 239, 701, 9, 1207, 11, 5, 2077, 9, 1156, 16, 26, 7, 28, 10, 3724, 7920, 11049, 4, 4028, 1800, 20321, 40, 1325, 10, 4532, 9, 984, 134, 6, 24334, 11, 10256, 4, 24583, 293, 40, 28, 1661, 11607, 23, 166, 1178, 1908, 861, 2392, 11, 4424, 4894, 6, 4967, 757, 607, 861, 583, 4536, 1943, 607, 8, 10588, 1845, 2392, 11, 287, 22921, 4, 1190, 139, 1442, 366, 718, 1410, 31, 11437, 7, 555, ...]","[0, 487, 4668, 154, 633, 10858, 32, 145, 1661, 80, 377, 481, 11607, 11, 10, 2311, 7, 2677, 1641, 813, 12737, 23, 10, 1098, 2416, 4, 2]",Nursing job applicants are being offered two months free accommodation in a bid to quell staff shortages at a hospital trust.
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Ofsted visited six Schools Partnership Trust (SPTA) academies as part of routine inspections and found five were not offering good quality education. One school retained its ""inadequate"" rating by Ofsted. SPTA accepted improvement was needed but said there was ""ample evidence"" it was ""an effective"" academy sponsor. The judgements are significant because putting poor performing schools under the leadership of non-profit-making academy trusts or sponsors is the government's key engine of school improvement. Sir Paul Edwards, chief executive of SPTA - which runs 44 schools in the Yorkshire and Humber area, has also worked as a government adviser in the Department for Education's academy and free school programme. 'Ill-prepared' England's education inspectorate is not allowed to inspect academy chains in the same way as it inspects local education authorities. Ofsted inspected the six schools over a 10-day period in June and sought further information on how SPTA has been performing on school improvement. It said concerns had been raised about how well it was performing. Four of the academies still required improvement, Ofsted said, although two of these had begun to improve. One academy remained inadequate, but a sixth had improved to ""good"" from its previous rating of ""satisfactory"". The inspections also highlighted key weaknesses in the schools, such as inconsistent teaching that does not challenge pupils enough and low standards at the end of primary school. This meant too many pupils had been ill-prepared for secondary schools, Ofsted said. It also said governors lacked expertise to challenge senior leaders on teaching quality. But inspectors added that most of the principals it contacted felt they were well supported by trust officers and that SPTA human resource departments had assisted in managing under-performing staff. In a letter to Sir Paul, Ofsted said: ""In summary, there is some evidence of effective school improvement, particularly in the initial start-up period after conversion to academy status. However, the quality and impact of governance arrangements are variable. ""There are further concerns regarding the depth and accuracy of SPTA analysis of data showing pupils' progress and the contribution this makes to rapid school improvement. ""Above all, there are too many underperforming academies which have remained in this position for too long."" An SPTA spokesman said the trust recognised the important role Ofsted had played in monitoring standards in the school system. ""The trust also recognises that Ofsted comments around areas for improvement are suggested on the basis of constructive dialogue to ensure all children receive a first class education, regardless of the school setting,"" he said. ""Equally however, the trust also recognises that the evidence to support these comments was largely drawn from a small sample of six schools, in a multi-academy trust that supports in excess of 42 schools and which contains two Teaching Schools, accredited through the National College of School Leadership. ""Whilst the trust looks forward to discussions with Ofsted about how to improve our performance, it is important to consider the facts in relation to the whole group, not just the six schools that were inspected."" SPTA is the third chain to be criticised by Ofsted, with critical letters recently sent to both the Kemnal Academies Trust (TKAT) and the E-ACT Trust - one of England's biggest academy organisations.",28544252,"[0, 10643, 16460, 3790, 411, 7101, 11697, 3101, 36, 4186, 3847, 43, 36271, 26804, 25, 233, 9, 6108, 15569, 8, 303, 292, 58, 45, 1839, 205, 1318, 1265, 4, 509, 334, 12544, 63, 22, 179, 37462, 877, 113, 691, 30, 1525, 16460, 4, 6178, 3847, 3903, 3855, 21, 956, 53, 26, 89, 21, 22, 29069, 1283, 113, 24, 21, 22, 260, 2375, 113, 11756, 9242, 4, 20, 21392, 41330, 32, 1233, 142, 2057, 2129, 4655, 1304, 223, 5, 1673, 9, 786, 12, 7699, 12, 5349, 11756, 21513, 50, 8919, 16, 5, 168, 18, 762, 3819, 9, 334, 3855, 4, 5348, ...]","[0, 4688, 36271, 26804, 2416, 669, 30, 10, 320, 168, 4988, 34, 57, 174, 350, 171, 9, 63, 1304, 32, 223, 12955, 8, 45, 3927, 1769, 615, 4, 2]",An academies trust led by a former government adviser has been told too many of its schools are underperforming and not improving fast enough.
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Stevens made his home debut in their 19-13 loss to Ealing, having joined the club after leaving financially-stricken London Welsh last month. ""We're pushing for top four, we want top four,"" the ex-Nottingham, Worcester and Plymouth Albion centre said. ""We want to do really well in the British and Irish Cup and we want to do as well as we can."" He told BBC Radio Jersey: ""This club needs to be pushed forward and I think they definitely are on the right road for doing that."" The island side are currently eighth in the Championship, but are just seven points behind fourth-placed Ealing and point further back from Doncaster in third. For the latest rugby union news follow @bbcrugbyunion on Twitter.",38495526,"[0, 33184, 29, 156, 39, 184, 2453, 11, 49, 753, 12, 1558, 872, 7, 381, 8279, 6, 519, 1770, 5, 950, 71, 1618, 10625, 12, 6031, 13552, 928, 12093, 94, 353, 4, 22, 170, 214, 3784, 13, 299, 237, 6, 52, 236, 299, 237, 60, 5, 1931, 12, 7199, 2577, 1908, 6, 22963, 8, 22524, 19032, 2100, 26, 4, 22, 170, 236, 7, 109, 269, 157, 11, 5, 1089, 8, 3445, 968, 8, 52, 236, 7, 109, 25, 157, 25, 52, 64, 72, 91, 174, 3295, 4611, 3123, 35, 22, 713, 950, 782, 7, 28, 3148, 556, 8, 38, 206, ...]","[0, 4030, 3123, 3442, 12424, 10283, 161, 5, 950, 32, 4453, 9, 3970, 5, 3261, 310, 12, 10816, 4, 2]",New Jersey signing Heath Stevens says the club are capable of reaching the Championship play-offs.


# 

In [19]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

## Compare Machine Summaries to Professional Human Written Summaries
To score our machine generated summaries against professional human written ones, we compute the cosine similarities between embeddings to measure the semantic similaritiy between two texts. The comparisons we will be marking include: human summary to machine summary, human summary to original document, and machine summary to original document. The maximum length in each machine summary is contextually related to the human summary (but not by the number of words, rather characters in the summary). When words were selected the summary did not perform well and only provided the first words of every article. This makes sense because BART's pretraining likely influenced it's methodology to recognize that the start of text often contains valuable summarization inforamtion. Instead, we wanted to provide a contextual number related to the summary using the maximum length as the number of characters in the human summary rather than an arbitrary number. It is important to note that the model config for BART shows the minimum_length for tokens in a sequence is 56

### We are going to focus on 10 articles and build 10 models to inspect each pair individually

In [20]:
def listToString(s): 
    str1 = "" 
    
    for ele in s: 
        str1 += ele  
 
    return str1 

In [21]:
article1 = tokenized_xsum['test']['document'][0]
article2 = tokenized_xsum['test']['document'][123]
article3 = tokenized_xsum['test']['document'][99]
article4 = tokenized_xsum['test']['document'][1100]
article5 = tokenized_xsum['test']['document'][1118]
article6 = tokenized_xsum['test']['document'][45]
article7 = tokenized_xsum['test']['document'][13]
article8 = tokenized_xsum['test']['document'][69]
article9 = tokenized_xsum['test']['document'][27]
article10 = tokenized_xsum['test']['document'][9]

summary1 = tokenized_xsum['test']['summary'][0]
summary2 = tokenized_xsum['test']['summary'][123]
summary3 = tokenized_xsum['test']['summary'][99]
summary4 = tokenized_xsum['test']['summary'][1100]
summary5 = tokenized_xsum['test']['summary'][1118]
summary6 = tokenized_xsum['test']['summary'][45]
summary7 = tokenized_xsum['test']['summary'][13]
summary8 = tokenized_xsum['test']['summary'][69]
summary9 = tokenized_xsum['test']['summary'][27]
summary10 = tokenized_xsum['test']['summary'][9]


## Model 1

In [22]:
input1 = tokenizer(article1, return_tensors='pt', truncation=True)
summary_ids1 = model.generate(input1['input_ids'], max_length=len(summary1), early_stopping=False)
machineSummary1 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids1])

In [23]:
machineSummary1 = listToString(machineSummary1)
original1 = listToString(article1)

comparison1 = [summary1, machineSummary1, original1]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings1 = token_model.encode(comparison1)
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings1[1], comparison_embeddings1[2])) # machine summary to original article

tensor([[0.7408]])
tensor([[0.7645]])
tensor([[0.9808]])


In [24]:
comparison1

['There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. Welsh Government said more people than ever were getting help to address housing problems. Changes to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. The Welsh Government said more people than ever were getting help to address housing problems. Changes to the Housing Act in Wales, introduced in 2015, removed the right fo

# Model 2

In [25]:
input2 = tokenizer(article2, return_tensors='pt', truncation=True)
summary_ids2 = model.generate(input2['input_ids'], max_length=len(summary2), early_stopping=False)
machineSummary2 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids2])

In [26]:
machineSummary2 = listToString(machineSummary2)
original2 = listToString(article2)

comparison2 = [summary2, machineSummary2, original2]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings2 = token_model.encode(comparison2)
print(util.pytorch_cos_sim(comparison_embeddings2[0], comparison_embeddings2[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings2[0], comparison_embeddings2[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings2[1], comparison_embeddings2[2])) # machine summary to original article

tensor([[0.7160]])
tensor([[0.5882]])
tensor([[0.7505]])


In [27]:
comparison2

["For a man often described as capricious, Tyson Fury's chaotic reign as world heavyweight champion was strangely predictable.",
 "Wladimir Klitschko beat Wladimir Fury in Dusseldorf last November. Fury has been drinking like a fish and hoovering up cocaine instead. The Briton has withdrawn from the scheduled rematch because of depression. Fury is not the first boxer to lose motivation having reached the pinnacle of the sport, and he certainly won't be the last. The repeated claims from Fury's camp that his victory was downplayed by the British media, and that they had an agenda against him from the outset, are delusional. Almost every boxing writer proclaimed Fury as one of the finest ever by",

# Model 3

In [63]:
input3 = tokenizer(article3, return_tensors='pt', truncation=True)
summary_ids3 = model.generate(input3['input_ids'], max_length=len(summary3), early_stopping=False)
machineSummary3 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids3])

In [64]:
machineSummary3 = listToString(machineSummary3)
original3 = listToString(article3)

comparison3 = [summary3, machineSummary3, original3]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings3 = token_model.encode(comparison3)
print(util.pytorch_cos_sim(comparison_embeddings3[0], comparison_embeddings3[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings3[0], comparison_embeddings3[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings3[1], comparison_embeddings3[2])) # machine summary to original article

tensor([[0.6624]])
tensor([[0.7642]])
tensor([[0.9342]])


In [65]:
comparison3

['A barrister who was due to move into his own chambers in Huddersfield has pleaded guilty to supplying cocaine.',
 'Omar Khan, 31, had worked at The Johnson Partnership in Nottingham for five years. Partner Digby Johnson said he did not represent Khan, who had set up his own office and was set to leave the company. Erlin Manahasa, Albert Dibra and Nazaquat Ali also admitted the same charge. They are due to be sentenced on 15 April at Nottingham Crown Court.',
 'Omar Khan, 31, had worked at The Johnson Partnership in Nottingham for five years before he was arrested. Erlin Manahasa, Albert Dibra and Nazaquat Ali joined Khan in admitting the same charge, between 1 October  and 4 December last year, at Nottingham Crown Court. They are due to be sentenced on 15 April. Updates on this story and more from Nottinghamshire The court heard the case involved the recovery of 1kg (2.2lb) of cocaine. Digby Johnson, a partner at the Johnson firm, confirmed they did not represent Khan - who had set u

# Model 4

In [31]:
input4 = tokenizer(article4, return_tensors='pt', truncation=True)
summary_ids4 = model.generate(input4['input_ids'], max_length=len(summary4), early_stopping=False)
machineSummary4 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids4])

In [32]:
machineSummary4 = listToString(machineSummary4)
original4 = listToString(article4)

comparison4 = [summary4, machineSummary4, original4]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings4 = token_model.encode(comparison4)
print(util.pytorch_cos_sim(comparison_embeddings4[0], comparison_embeddings4[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings4[0], comparison_embeddings4[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings4[1], comparison_embeddings4[2])) # machine summary to original article

tensor([[0.6147]])
tensor([[0.6402]])
tensor([[0.8899]])


In [33]:
comparison4

['Star Wars fans are being given the opportunity to become Jedi Knights and learn how to wield lightsabers in combat.',
 'LudoSport has opened its first academy in Cheltenham. Instructor Jordan Court said people were already "hooked" The sport began eight years ago in Italy but has only just come to England. So far there are six pupils, but this number is expected to increase. There are several ranks for those wishing to become a fully-fledged Jedi Knight.',
 'LudoSport has opened its first academy teaching seven forms of combat from the Star Wars world using flexible blades mounted on weighted hilts. The sport began eight years ago in Italy but has only just come to England with the first classes in Cheltenham. Instructor Jordan Court said people were already "hooked". The classes in Cheltenham began last month. So far there are six pupils, but this number is expected to increase. Mr Court attended an international boot camp to learn the different stages of the sport which range in ch

# Model 5

In [34]:
input5 = tokenizer(article5, return_tensors='pt', truncation=True)
summary_ids5 = model.generate(input5['input_ids'], max_length=len(summary5), early_stopping=False)
machineSummary5 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids5])

In [35]:
machineSummary5 = listToString(machineSummary5)
original5 = listToString(article5)

comparison5 = [summary5, machineSummary5, original5]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings5 = token_model.encode(comparison5)
print(util.pytorch_cos_sim(comparison_embeddings5[0], comparison_embeddings5[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings5[0], comparison_embeddings5[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings5[1], comparison_embeddings5[2])) # machine summary to original article

tensor([[0.6016]])
tensor([[0.6121]])
tensor([[0.9886]])


In [36]:
comparison5

['Awareness rides are taking place to try and cut the number of people on horseback injured or killed on roads.',
 "The Pass Wide and Slow Wales campaign has collected 1,300 signatures on the assembly's e-petition website. It wants an annual road safety awareness campaign explaining to motorists how to react around horses. The British Horse Society found that since 2010 there have been 2,000 road accidents in the UK, with 1,500 because of cars passing too closely. 180 horses and 36 riders have died as a result of these accidents.",
 "The Pass Wide and Slow Wales campaign has collected 1,300 signatures on the assembly's e-petition website. It wants an annual road safety awareness campaign explaining to motorists how to react around horses. The British Horse Society found that since 2010 there have been 2,000 road accidents in the UK, with 1,500 because of cars passing too closely. As a result of these, 180 horses and 36 riders have died. Awareness rides were planned for Penarth, Vale of

# Model 6

In [37]:
input6 = tokenizer(article6, return_tensors='pt', truncation=True)
summary_ids6 = model.generate(input6['input_ids'], max_length=len(summary6), early_stopping=False)
machineSummary6 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids6])

In [38]:
machineSummary6 = listToString(machineSummary6)
original6 = listToString(article6)

comparison6 = [summary6, machineSummary6, original6]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings6 = token_model.encode(comparison6)
print(util.pytorch_cos_sim(comparison_embeddings6[0], comparison_embeddings6[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings6[0], comparison_embeddings6[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings6[1], comparison_embeddings6[2])) # machine summary to original article

tensor([[0.7346]])
tensor([[0.7330]])
tensor([[0.9746]])


In [39]:
comparison6

['Two new councillors have been elected in a by-election in the City of Edinburgh.',
 " SNP's John Lewis Ritchie topped the Leith Walk poll with 2,290 votes. Labour's Marion Donaldson received 1,623 votes, ahead of Scottish Greens. The by-election was called after Deidre Brock and Maggie Chapman stood down. It was the first time the Single Transferable Vote (STV) system had been used to select two members in the same ward",
 "It was the first time the Single Transferable Vote (STV) system had been used to select two members in the same ward in a by-election. The SNP topped the vote in the Leith Walk by-election, while Scottish Labour won the second seat from the Greens. The by-election was called after Deidre Brock of the SNP and Maggie Chapman of the Scottish Greens stood down. The SNP's John Lewis Ritchie topped the Leith Walk poll with 2,290 votes. He was elected at stage one in the STV process with a swing in first-preference votes of 7.6% from Labour. Labour's Marion Donaldson rec

# Model 7

In [40]:
input7 = tokenizer(article7, return_tensors='pt', truncation=True)
summary_ids7 = model.generate(input7['input_ids'], max_length=len(summary7), early_stopping=False)
machineSummary7 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids7])

In [41]:
machineSummary7 = listToString(machineSummary7)
original7 = listToString(article7)

comparison7 = [summary7, machineSummary7, original7]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings7 = token_model.encode(comparison7)
print(util.pytorch_cos_sim(comparison_embeddings7[0], comparison_embeddings7[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings7[0], comparison_embeddings7[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings7[1], comparison_embeddings7[2])) # machine summary to original article

tensor([[0.6711]])
tensor([[0.6875]])
tensor([[0.9709]])


In [42]:
comparison7

["Torquay United boss Kevin Nicholson says none of the money from Eunan O'Kane's move to Leeds from Bournemouth will go to the playing squad.",
 "The Gulls sold the Republic of Ireland midfielder to the Cherries for £175,000 in 2012. Nicholson says any money will go to help the cash-strapped club. The club are still looking for new owners having been taken over by a consortium of local business people last summer. They were forced to close down the club's academy and drastically reduce the playing budget after millionaire former owner Thea Bristow left the club.",
 'The National League sold the Republic of Ireland midfielder to the Cherries for £175,000 in 2012 and had a 15% sell-on clause included in the deal. O\'Kane moved for an undisclosed fee, but Nicholson says any money will go to help the cash-strapped club. "I don\'t think I\'ll be getting anything," Nicholson told BBC Devon. "There\'s more important things." The Gulls are still looking for new owners having been taken over by

# Model 8

In [43]:
input8 = tokenizer(article8, return_tensors='pt', truncation=True)
summary_ids8 = model.generate(input8['input_ids'], max_length=len(summary8), early_stopping=False)
machineSummary8 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids8])

In [44]:
machineSummary8 = listToString(machineSummary8)
original8 = listToString(article8)

comparison8 = [summary8, machineSummary8, original8]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings8 = token_model.encode(comparison8)
print(util.pytorch_cos_sim(comparison_embeddings8[0], comparison_embeddings8[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings8[0], comparison_embeddings8[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings8[1], comparison_embeddings8[2])) # machine summary to original article

tensor([[0.6096]])
tensor([[0.6410]])
tensor([[0.9767]])


In [45]:
comparison8

['Manufacturers have reported positive business trends, in the latest survey from the Scottish Chambers of Commerce.',

# Model 9

In [52]:
input9 = tokenizer(article9, return_tensors='pt', truncation=True)
summary_ids9 = model.generate(input9['input_ids'], max_length=len(summary9), early_stopping=False)
machineSummary9 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids9])

In [53]:
machineSummary9 = listToString(machineSummary9)
original9 = listToString(article9)

comparison9 = [summary9, machineSummary9, original9]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings9 = token_model.encode(comparison9)
print(util.pytorch_cos_sim(comparison_embeddings9[0], comparison_embeddings9[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings9[0], comparison_embeddings9[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings9[1], comparison_embeddings9[2])) # machine summary to original article

tensor([[0.8472]])
tensor([[0.8311]])
tensor([[0.9409]])


In [54]:
comparison9

['Of his last 30 matches in 2016, Andy Murray won 28 and lost just two.',
 'World number one has won 21 and lost nine of his first 30 matches in 2017. He has had shingles and an elbow problem, and now his left hip is proving cause for concern. Murray is virtually 5,000 points behind Rafael Nadal in the season-long race. He could be overtaken after Wimbledon by',
 "Media playback is not supported on this device Of his first 30 matches in 2017, the world number one has won 21 and lost nine. Winning his last five tournaments of 2016 to pip Novak Djokovic to the year-end number one position in the final match of the season at London's O2 Arena was astonishing, dramatic and unforgettable. And yet it appears that relentless run of success, and the 87 matches he played over a season, has come at a price. Murray's straight-set defeat by world number 90 Jordan Thompson in the first round at Queen's Club was the sixth time he has lost to a player outside the top 20 this year. He has had shingles

# Model 10

In [49]:
input10 = tokenizer(article10, return_tensors='pt', truncation=True)
summary_ids10 = model.generate(input10['input_ids'], max_length=len(summary10), early_stopping=False)
machineSummary10 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids10])

In [50]:
machineSummary10 = listToString(machineSummary10)
summary10 = listToString(summary10)
original10 = listToString(article10)

comparison10 = [summary10, machineSummary10, original10]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings10 = token_model.encode(comparison10)
print(util.pytorch_cos_sim(comparison_embeddings10[0], comparison_embeddings10[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings10[0], comparison_embeddings10[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings10[1], comparison_embeddings10[2])) # machine summary to original article

tensor([[0.8072]])
tensor([[0.8070]])
tensor([[0.7622]])


In [51]:
comparison10

["Manager Brendan Rodgers is sure Celtic can exploit the wide open spaces of Hampden when they meet Rangers in Sunday's League Cup semi-final.",
 "Celtic face Rangers in the Scottish Cup semi-final at Hampden Park. Brendan Rodgers' side won 5-1 at Celtic Park in the league last month. Rodgers lost two semi-finals in his time at Liverpool and is aiming to make it third time lucky at the club he joined in the summer. The Northern Irishman would not be drawn on whether this was a step on the way to a potential domestic treble.",
 '"I\'m really looking forward to it - the home of Scottish football," said Rodgers ahead of his maiden visit. "I hear the pitch is good, a nice big pitch suits the speed in our team and our intensity. "The technical area goes right out to the end of the pitch, but you might need a taxi to get back to your staff." This will be Rodgers\' second taste of the Old Firm derby and his experience of the fixture got off to a great start with a 5-1 league victory at Celtic