# Text Summary & Scoring Project
##### Michael Creegan, Yungfeng Dai, Hong Gyu Ji, Ziling Zeng
##### Python for Data Analysis
##### Columbia University

# Abstract

Summarization is a common problem in the 21st century as the world has become increasingly driven by data. Summarization of data can be very useful to  quickly determine if something is relevant or whether it's worth reading. Another use case could could be to store summaries of articles it in the backend to run downstream taks on. It could also be useful to understand the semantic integrity to indicate quality.

To explore this topic, we will leverage the extreme summarization dataset (XSUM) which consists of BBC articles accompanying single sentence summaries. Each article is prefaced with an introductory sentence (which is a summary) that is professionally written, typically by the author of the article.

To summarize articles, we will use an encoder-decoder transformer (sequence-to-sequence) which combines  decoders and encoders because we need to perform both input and output tasks: taking in text and then generating a summary. We selected this type of transformer because the encoder accepts inputs (text) and computes a high level representation of those inputs  which are then passed to the decoder to generate a prediction output (summary). This has advantages over using a standalone encoder like BERT/ALBERT/ELECTRA/RoBERTA/DistilBERT to name a few because  encoders are pre-trained by filling randomly masked words in sentences and therefore are better suited for output tasks. Using a standalone decoder like gpt2 would also not be optimal because decoders are trained to guess the next word in a sequence (left or right context aka does not have context on one side of the sequence) and therefore are better suited at generating text but not necessarily taking in text because of the hidden context limitations. 

Our scoring will compare the output of the BART encoder-decoder model to the professionally written summaries in the XSUM dataset to see how similar a machine generated summary is to a professional one. Our scoring methodology will be focused on semantic textual similarity and computed using the cosine similarity between the professional human written summary and the machine generated one. 

# Importing Transformers & Dependencies

In [1]:
import pandas as pd
import numpy as np
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
from datasets import load_dataset, load_metric
from sentence_transformers import SentenceTransformer, util
import random
from IPython.display import display, HTML

# Load XSUM Dataset

In [2]:
xsum = load_dataset('xsum')

Using custom data configuration default
Reusing dataset xsum (C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934)
100%|██████████| 3/3 [00:00<00:00, 52.69it/s]


### We can see that the dataset is a "DatasetDict" where the keys are strings that correspond to the split and the values are the dataset object. In the XSUM dataset, the the keys are "training", "validation", and "test" with values corresponding to "document", "summary", and "id" (columns)

In [3]:
xsum

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

### View a record of the underlying data

In [4]:
xsum['test'][0]

{'document': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said the

### We can use a function to view a random selection of articles and summaries in the training section (largest section) to get a more accurate depiction of what the data looks like in a synthesized format

In [5]:
def display_function(xsum, num_examples=5):
    assert num_examples <= len(xsum)                # limit to number of records in the xsum
    
    selections = []                                 # create empty list to put the records into 
    
    for _ in range(num_examples):                   # we can use _ here in place of a variable name because we don't care how many time sthe loop is run
        selection = random.randint(0, len(xsum) - 1)
        while selection in selections:
            selection = random.randint(0, len(xsum) - 1)
        selections.append(selection)

    xsumPd = pd.DataFrame(xsum[selections])
    for column, typ in xsum.features.items():
        display(HTML(xsumPd.to_html()))

### Our end goal is to create accurate summaries using this model so we need to remove the text characters that do not provide any contextual value. We can also see that there are characters in the document that are not present in the summary which could cause discrepencies between our machine generated summary vs the professional human generated one. We need to remove new line characters that are present in the document column but not the summary column

In [6]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"The 35-year-old batsman will link up again with his ex-Black Caps team-mate Daniel Vettori, who was appointed Middlesex's Twenty20 coach on a three-year deal in December.\nMcCullum worked under Vettori at Big Bash side Brisbane Heat this winter.\nMeanwhile, James Franklin, 36, has been made full-time captain in the County Championship and the One-Day Cup.\nFranklin took over the captaincy on an interim basis last season following an injury to Adam Voges and helped steer the club to their first Championship title since 1993.\n""It's a very exciting time for the club, getting Brendon back again after his stint with us last year,"" Franklin told BBC Radio London.\n""He's one of the best captains in world cricket of the modern age so it's going to be exciting for players to work with him.\n""And Dan being on board as the Twenty20 coach, I think it shows how progressive the club are looking at things.\n""To be able to get in a coach of his calibre is hugely exciting.""\nLast season's T20 Blast captain Dawid Malan, 29, has been named vice-captain in that format and will lead the side when McCullum, who has re-signed for nine group games, is unavailable.",Former New Zealand skipper Brendan McCullum has been named Middlesex captain for this season's T20 Blast.,39507079
1,"Chris Packham, who is in Malta, said rare species were being targeted, and hunters were even shooting Montagu's harrier birds on the ground at night.\n""It's a desperate situation,"" he told BBC Radio 4's Today programme.\nA Maltese wildlife official insisted that patrols to stop illegal hunting had been stepped up.\nMalta has an exemption from the EU Birds Directive, allowing its hunters to shoot turtle doves and quail during the spring migration, a crucial stage in the birds' life cycle. But according to Mr Packham, turtle doves were vulnerable, with their numbers down by 95% in the UK.\nMalta is the only EU country to have a recreational spring hunting season allowing birds to be shot.\nMr Packham, a presenter of TV documentaries on wildlife, said Maltese hunters were ignoring restrictions under the exemption, or ""derogation"" in EU jargon. He said they were killing many other birds which are supposed to be protected.\nHe is in Malta with the conservation group Birdlife Malta to draw attention to the annual spring shoot, which has been criticised by environmentalists for years.\n""Yesterday I'm afraid to say I had a dead swift in my hand that had been illegally shot and also a dead little bittern,"" Mr Packham told Today.\nSergei Golovkin, head of Malta's Wild Birds Regulation Unit, insisted that the authorities were controlling the hunters.\nHe said enforcement of the restrictions had ""improved dramatically in the last few years"". Malta has ""the highest ratio in Europe"" of enforcement staff deployed against illegal hunting, he told Today.\nThirty-three MEPs have jointly lobbied the European Commission to put pressure on Malta over the hunting exemption. A British Liberal Democrat MEP, Catherine Bearder, says the EU must ""stop Malta from breaking EU rules, by systematically failing to apply the derogation correctly"".",A leading British naturalist has accused the Maltese authorities of failing to prevent large-scale illegal shooting of migratory birds by hunters.,27108910
2,"Putnam County Sheriff deputies believe the toddlers probably teamed up to work the pedals and steer the wheel before crashing it in a ditch.\nThe pair made it three miles (4.8km) down the road and successfully navigated multiple turns.\nThey were not hurt but officials are weighing charges against the mother.\nThey had taken their mother's 2005 Ford Focus after finding the keys in the floor mat while playing in the front yard.\nOfficials believe they were trying to reach their grandfather's farm but crashed five miles short in the town of Red House.\n""Luckily, they didn't pass anybody because they would've probably had a wreck before then,"" said Putnam County Sheriff Steve Deweese.\nMr Deweese told WSAZ-TV that the sheriff's office is working with the county prosecutor and Child Protective Services to determine if the mother should be charged with any crime.","Two brothers aged five and two stole their mother's car and wrecked it on a drive to their grandfather's house, say authorities in West Virginia.",40673822
3,"The blaze in the Aberdeenshire town's High Street in May that year claimed the life of 43-year-old Gordon Graham.\nBarry Henderson, 41, from Fraserburgh, was charged with murder and attempted murder when he appeared at Peterhead Sheriff Court.\nHe made no plea and was released on bail.",A man has been charged with murder following a fire in Fraserburgh in 1998.,35493517
4,"The government is currently negotiating with the Lib Dems and Greens to strike a deal to get its budget plans passed.\nGreen co-convener Patrick Harvie has asked for concessions over tax, while Lib Dem Willie Rennie has targeted up to £400m of additional spending.\nFinance Secretary Derek Mackay has said he is ""positive"" about winning support.\nWith the SNP a minority government, they will need at least one opposition party to help the budget pass, either by voting for it or by abstaining.\nMr Mackay has indicated a budget deal with the Conservatives or Labour is unlikely, but said there was ""room for manoeuvre"" in talks with the Lib Dems and Greens.\nAnd while he has said no matters are ""absolutely closed"", he wants to ""adhere as close to the [SNP] manifesto as possible"" on tax, making a deal with the Lib Dems the more likely.\nThe Lib Dems said they wanted ""substantial changes"" made, which would ""set Scotland on a stronger, more liberal path"".\nThese focus on the party's manifesto pledges around education and mental health. Specific measures include:\nMr Rennie said: ""Liberal Democrats will not agree to the draft budget as it stands and will need these substantial changes. If we don't get what the country needs then we will walk away.\n""Our plan invests for a step change in mental health and a transformation in education that will help in the road to a liberal Scotland. A properly funded pupil premium and more money for colleges will create that opportunity and boost jobs and the economy.\n""New investment in mental health services will boost this Cinderella service and make the whole NHS more sustainable in the future. We have also included support for alcohol and drug services, a higher budget for the police and lower cost transport for the Northern Isles.\n""I have had a number of meetings and discussions with the finance secretary so far and I am looking forward to receiving his response to our plan.""\nMr Mackay has defended his tax and spending plans in two meetings of the finance committee, where he also took questions from the public over social media.\nThe committee will submit its report on the budget on Friday 27 January, with the first chamber debate on the budget the following week.\nThe final vote on the budget will follow a separate vote on Mr Mackay's tax proposals, in the week beginning 20 February.\nThis is a decent snapshot of what the final budget deal is likely to look like.\nWillie Rennie probably won't get absolutely everything he is asking for - one would suspect Derek Mackay is too good a negotiator for that - but this is a far more palatable list of demands for the finance secretary than that put forward by the Greens, who want to see some movement over tax.\nWhile he says nothing is off the table, Mr Mackay isn't going to budge on tax. He sees his current proposals as well-balanced, and endorsed by the electorate last May.\nAnd with a deal with Labour or the Tories more or less dismissed in advance on political grounds, that leaves Mr Rennie as the clear favourite.\nHe may well have beefed up his demands accordingly - minus any red lines, of course, over tax.\nThere are plenty of talks still to come, but there is also plenty common ground here. Expect a SNP/Lib Dem coalition to usher an amended budget through come the end of February.","The Liberal Democrats have set out funding for education, mental health and transport links as their demands for backing the Scottish budget.",38663835


Unnamed: 0,document,summary,id
0,"The 35-year-old batsman will link up again with his ex-Black Caps team-mate Daniel Vettori, who was appointed Middlesex's Twenty20 coach on a three-year deal in December.\nMcCullum worked under Vettori at Big Bash side Brisbane Heat this winter.\nMeanwhile, James Franklin, 36, has been made full-time captain in the County Championship and the One-Day Cup.\nFranklin took over the captaincy on an interim basis last season following an injury to Adam Voges and helped steer the club to their first Championship title since 1993.\n""It's a very exciting time for the club, getting Brendon back again after his stint with us last year,"" Franklin told BBC Radio London.\n""He's one of the best captains in world cricket of the modern age so it's going to be exciting for players to work with him.\n""And Dan being on board as the Twenty20 coach, I think it shows how progressive the club are looking at things.\n""To be able to get in a coach of his calibre is hugely exciting.""\nLast season's T20 Blast captain Dawid Malan, 29, has been named vice-captain in that format and will lead the side when McCullum, who has re-signed for nine group games, is unavailable.",Former New Zealand skipper Brendan McCullum has been named Middlesex captain for this season's T20 Blast.,39507079
1,"Chris Packham, who is in Malta, said rare species were being targeted, and hunters were even shooting Montagu's harrier birds on the ground at night.\n""It's a desperate situation,"" he told BBC Radio 4's Today programme.\nA Maltese wildlife official insisted that patrols to stop illegal hunting had been stepped up.\nMalta has an exemption from the EU Birds Directive, allowing its hunters to shoot turtle doves and quail during the spring migration, a crucial stage in the birds' life cycle. But according to Mr Packham, turtle doves were vulnerable, with their numbers down by 95% in the UK.\nMalta is the only EU country to have a recreational spring hunting season allowing birds to be shot.\nMr Packham, a presenter of TV documentaries on wildlife, said Maltese hunters were ignoring restrictions under the exemption, or ""derogation"" in EU jargon. He said they were killing many other birds which are supposed to be protected.\nHe is in Malta with the conservation group Birdlife Malta to draw attention to the annual spring shoot, which has been criticised by environmentalists for years.\n""Yesterday I'm afraid to say I had a dead swift in my hand that had been illegally shot and also a dead little bittern,"" Mr Packham told Today.\nSergei Golovkin, head of Malta's Wild Birds Regulation Unit, insisted that the authorities were controlling the hunters.\nHe said enforcement of the restrictions had ""improved dramatically in the last few years"". Malta has ""the highest ratio in Europe"" of enforcement staff deployed against illegal hunting, he told Today.\nThirty-three MEPs have jointly lobbied the European Commission to put pressure on Malta over the hunting exemption. A British Liberal Democrat MEP, Catherine Bearder, says the EU must ""stop Malta from breaking EU rules, by systematically failing to apply the derogation correctly"".",A leading British naturalist has accused the Maltese authorities of failing to prevent large-scale illegal shooting of migratory birds by hunters.,27108910
2,"Putnam County Sheriff deputies believe the toddlers probably teamed up to work the pedals and steer the wheel before crashing it in a ditch.\nThe pair made it three miles (4.8km) down the road and successfully navigated multiple turns.\nThey were not hurt but officials are weighing charges against the mother.\nThey had taken their mother's 2005 Ford Focus after finding the keys in the floor mat while playing in the front yard.\nOfficials believe they were trying to reach their grandfather's farm but crashed five miles short in the town of Red House.\n""Luckily, they didn't pass anybody because they would've probably had a wreck before then,"" said Putnam County Sheriff Steve Deweese.\nMr Deweese told WSAZ-TV that the sheriff's office is working with the county prosecutor and Child Protective Services to determine if the mother should be charged with any crime.","Two brothers aged five and two stole their mother's car and wrecked it on a drive to their grandfather's house, say authorities in West Virginia.",40673822
3,"The blaze in the Aberdeenshire town's High Street in May that year claimed the life of 43-year-old Gordon Graham.\nBarry Henderson, 41, from Fraserburgh, was charged with murder and attempted murder when he appeared at Peterhead Sheriff Court.\nHe made no plea and was released on bail.",A man has been charged with murder following a fire in Fraserburgh in 1998.,35493517
4,"The government is currently negotiating with the Lib Dems and Greens to strike a deal to get its budget plans passed.\nGreen co-convener Patrick Harvie has asked for concessions over tax, while Lib Dem Willie Rennie has targeted up to £400m of additional spending.\nFinance Secretary Derek Mackay has said he is ""positive"" about winning support.\nWith the SNP a minority government, they will need at least one opposition party to help the budget pass, either by voting for it or by abstaining.\nMr Mackay has indicated a budget deal with the Conservatives or Labour is unlikely, but said there was ""room for manoeuvre"" in talks with the Lib Dems and Greens.\nAnd while he has said no matters are ""absolutely closed"", he wants to ""adhere as close to the [SNP] manifesto as possible"" on tax, making a deal with the Lib Dems the more likely.\nThe Lib Dems said they wanted ""substantial changes"" made, which would ""set Scotland on a stronger, more liberal path"".\nThese focus on the party's manifesto pledges around education and mental health. Specific measures include:\nMr Rennie said: ""Liberal Democrats will not agree to the draft budget as it stands and will need these substantial changes. If we don't get what the country needs then we will walk away.\n""Our plan invests for a step change in mental health and a transformation in education that will help in the road to a liberal Scotland. A properly funded pupil premium and more money for colleges will create that opportunity and boost jobs and the economy.\n""New investment in mental health services will boost this Cinderella service and make the whole NHS more sustainable in the future. We have also included support for alcohol and drug services, a higher budget for the police and lower cost transport for the Northern Isles.\n""I have had a number of meetings and discussions with the finance secretary so far and I am looking forward to receiving his response to our plan.""\nMr Mackay has defended his tax and spending plans in two meetings of the finance committee, where he also took questions from the public over social media.\nThe committee will submit its report on the budget on Friday 27 January, with the first chamber debate on the budget the following week.\nThe final vote on the budget will follow a separate vote on Mr Mackay's tax proposals, in the week beginning 20 February.\nThis is a decent snapshot of what the final budget deal is likely to look like.\nWillie Rennie probably won't get absolutely everything he is asking for - one would suspect Derek Mackay is too good a negotiator for that - but this is a far more palatable list of demands for the finance secretary than that put forward by the Greens, who want to see some movement over tax.\nWhile he says nothing is off the table, Mr Mackay isn't going to budge on tax. He sees his current proposals as well-balanced, and endorsed by the electorate last May.\nAnd with a deal with Labour or the Tories more or less dismissed in advance on political grounds, that leaves Mr Rennie as the clear favourite.\nHe may well have beefed up his demands accordingly - minus any red lines, of course, over tax.\nThere are plenty of talks still to come, but there is also plenty common ground here. Expect a SNP/Lib Dem coalition to usher an amended budget through come the end of February.","The Liberal Democrats have set out funding for education, mental health and transport links as their demands for backing the Scottish budget.",38663835


Unnamed: 0,document,summary,id
0,"The 35-year-old batsman will link up again with his ex-Black Caps team-mate Daniel Vettori, who was appointed Middlesex's Twenty20 coach on a three-year deal in December.\nMcCullum worked under Vettori at Big Bash side Brisbane Heat this winter.\nMeanwhile, James Franklin, 36, has been made full-time captain in the County Championship and the One-Day Cup.\nFranklin took over the captaincy on an interim basis last season following an injury to Adam Voges and helped steer the club to their first Championship title since 1993.\n""It's a very exciting time for the club, getting Brendon back again after his stint with us last year,"" Franklin told BBC Radio London.\n""He's one of the best captains in world cricket of the modern age so it's going to be exciting for players to work with him.\n""And Dan being on board as the Twenty20 coach, I think it shows how progressive the club are looking at things.\n""To be able to get in a coach of his calibre is hugely exciting.""\nLast season's T20 Blast captain Dawid Malan, 29, has been named vice-captain in that format and will lead the side when McCullum, who has re-signed for nine group games, is unavailable.",Former New Zealand skipper Brendan McCullum has been named Middlesex captain for this season's T20 Blast.,39507079
1,"Chris Packham, who is in Malta, said rare species were being targeted, and hunters were even shooting Montagu's harrier birds on the ground at night.\n""It's a desperate situation,"" he told BBC Radio 4's Today programme.\nA Maltese wildlife official insisted that patrols to stop illegal hunting had been stepped up.\nMalta has an exemption from the EU Birds Directive, allowing its hunters to shoot turtle doves and quail during the spring migration, a crucial stage in the birds' life cycle. But according to Mr Packham, turtle doves were vulnerable, with their numbers down by 95% in the UK.\nMalta is the only EU country to have a recreational spring hunting season allowing birds to be shot.\nMr Packham, a presenter of TV documentaries on wildlife, said Maltese hunters were ignoring restrictions under the exemption, or ""derogation"" in EU jargon. He said they were killing many other birds which are supposed to be protected.\nHe is in Malta with the conservation group Birdlife Malta to draw attention to the annual spring shoot, which has been criticised by environmentalists for years.\n""Yesterday I'm afraid to say I had a dead swift in my hand that had been illegally shot and also a dead little bittern,"" Mr Packham told Today.\nSergei Golovkin, head of Malta's Wild Birds Regulation Unit, insisted that the authorities were controlling the hunters.\nHe said enforcement of the restrictions had ""improved dramatically in the last few years"". Malta has ""the highest ratio in Europe"" of enforcement staff deployed against illegal hunting, he told Today.\nThirty-three MEPs have jointly lobbied the European Commission to put pressure on Malta over the hunting exemption. A British Liberal Democrat MEP, Catherine Bearder, says the EU must ""stop Malta from breaking EU rules, by systematically failing to apply the derogation correctly"".",A leading British naturalist has accused the Maltese authorities of failing to prevent large-scale illegal shooting of migratory birds by hunters.,27108910
2,"Putnam County Sheriff deputies believe the toddlers probably teamed up to work the pedals and steer the wheel before crashing it in a ditch.\nThe pair made it three miles (4.8km) down the road and successfully navigated multiple turns.\nThey were not hurt but officials are weighing charges against the mother.\nThey had taken their mother's 2005 Ford Focus after finding the keys in the floor mat while playing in the front yard.\nOfficials believe they were trying to reach their grandfather's farm but crashed five miles short in the town of Red House.\n""Luckily, they didn't pass anybody because they would've probably had a wreck before then,"" said Putnam County Sheriff Steve Deweese.\nMr Deweese told WSAZ-TV that the sheriff's office is working with the county prosecutor and Child Protective Services to determine if the mother should be charged with any crime.","Two brothers aged five and two stole their mother's car and wrecked it on a drive to their grandfather's house, say authorities in West Virginia.",40673822
3,"The blaze in the Aberdeenshire town's High Street in May that year claimed the life of 43-year-old Gordon Graham.\nBarry Henderson, 41, from Fraserburgh, was charged with murder and attempted murder when he appeared at Peterhead Sheriff Court.\nHe made no plea and was released on bail.",A man has been charged with murder following a fire in Fraserburgh in 1998.,35493517
4,"The government is currently negotiating with the Lib Dems and Greens to strike a deal to get its budget plans passed.\nGreen co-convener Patrick Harvie has asked for concessions over tax, while Lib Dem Willie Rennie has targeted up to £400m of additional spending.\nFinance Secretary Derek Mackay has said he is ""positive"" about winning support.\nWith the SNP a minority government, they will need at least one opposition party to help the budget pass, either by voting for it or by abstaining.\nMr Mackay has indicated a budget deal with the Conservatives or Labour is unlikely, but said there was ""room for manoeuvre"" in talks with the Lib Dems and Greens.\nAnd while he has said no matters are ""absolutely closed"", he wants to ""adhere as close to the [SNP] manifesto as possible"" on tax, making a deal with the Lib Dems the more likely.\nThe Lib Dems said they wanted ""substantial changes"" made, which would ""set Scotland on a stronger, more liberal path"".\nThese focus on the party's manifesto pledges around education and mental health. Specific measures include:\nMr Rennie said: ""Liberal Democrats will not agree to the draft budget as it stands and will need these substantial changes. If we don't get what the country needs then we will walk away.\n""Our plan invests for a step change in mental health and a transformation in education that will help in the road to a liberal Scotland. A properly funded pupil premium and more money for colleges will create that opportunity and boost jobs and the economy.\n""New investment in mental health services will boost this Cinderella service and make the whole NHS more sustainable in the future. We have also included support for alcohol and drug services, a higher budget for the police and lower cost transport for the Northern Isles.\n""I have had a number of meetings and discussions with the finance secretary so far and I am looking forward to receiving his response to our plan.""\nMr Mackay has defended his tax and spending plans in two meetings of the finance committee, where he also took questions from the public over social media.\nThe committee will submit its report on the budget on Friday 27 January, with the first chamber debate on the budget the following week.\nThe final vote on the budget will follow a separate vote on Mr Mackay's tax proposals, in the week beginning 20 February.\nThis is a decent snapshot of what the final budget deal is likely to look like.\nWillie Rennie probably won't get absolutely everything he is asking for - one would suspect Derek Mackay is too good a negotiator for that - but this is a far more palatable list of demands for the finance secretary than that put forward by the Greens, who want to see some movement over tax.\nWhile he says nothing is off the table, Mr Mackay isn't going to budge on tax. He sees his current proposals as well-balanced, and endorsed by the electorate last May.\nAnd with a deal with Labour or the Tories more or less dismissed in advance on political grounds, that leaves Mr Rennie as the clear favourite.\nHe may well have beefed up his demands accordingly - minus any red lines, of course, over tax.\nThere are plenty of talks still to come, but there is also plenty common ground here. Expect a SNP/Lib Dem coalition to usher an amended budget through come the end of February.","The Liberal Democrats have set out funding for education, mental health and transport links as their demands for backing the Scottish budget.",38663835


### We can address the problem we mentioned above by define a cleaning function that replaces new lines with white space.

In [7]:
def clean(row):
    row['document'] = row['document'].replace('\n', ' ')
    return row

### We can now apply the cleaning function we created and map it onto our data (it loads for train, test, and validation)

In [8]:
xsum = xsum.map(clean)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-ec5b3ab440c9df82.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-a176a692461cda61.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-bc530be4c3ab51ba.arrow


### Voila!

In [9]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"On one occasion there was only one commode available for more than 100 patients at North Middlesex University Hospital, a report by the watchdog said. Emergency services at the London hospital have been rated ""inadequate"". The hospital said it was ""extremely sorry"" for the problems in the unit. Inspectors from the Care Quality Commission said there were too few competent doctors who were able to assess and treat patients at night when they inspected the department in April and May. The unit - which sees 500 hundred patients a day - logged 22 serious incidents in the past year, including the dead patient not being found for hours. Others included a patient being left sitting on a bedpan for more than an hour. And nurse to patient ratios were rarely achieved because they frequently had 20 patients being treated in the corridor. The report also said staff were afraid to speak up for fear of retribution. The inspection of the emergency department and two of the hospital's medical wards was in response to concerns about the standards of care. The hospital has apologised to patients and says the A&E department now has five additional doctors and consultants on loan from other London trusts, a new nursing lead and new clinical director. Chief Inspector of Hospitals Sir Mike Richards said the hospital has already ""turned a corner"" since the inspection. He said: ""A new leadership team is in place in the emergency department, there are moves to appoint more senior doctors - and I note that the trust is calling on consultants from other departments within the hospital to provide the routine daily support to A and E which is so badly needed. ""There is still much more that needs to be done. We will be watching their progress very closely."" David Burrowes, MP for Enfield Southgate, said he was left to wait for 12 hours on a trolley with a ruptured appendix in the emergency department in 2014. He said ""urgent action"" was needed. ""The important question is why the warning signals from at least two years ago were not heeded,"" he added Tottenham MP David Lammy said the ""damning"" report is ""even worse than I feared"" and demanded answers from Health Secretary Jeremy Hunt. He said: ""It shocks and appals me that this situation has been left for so long without an intervention from the Health Secretary, and the way that this has been covered up is nothing short of a scandal."" During the same inspection, medical care services were rated as requiring improvement. The trust is now required to improve the care of patients in the emergency department by 26 August 2016 following a warning from the CQC. A full inspection of the trust will take place in September.","A patient lay dead for up to four-and-a-half hours before being spotted at one of the busiest's A&E departments in the country, inspectors have revealed.",36716248
1,"""Golden Rice"" has been developed by scientists to combat vitamin A deficiency, which affects millions of children in the developing world. The crop was just weeks away from being submitted to the authorities for a safety evaluation. But a group of around 400 protestors attacked the field trial in the Bicol region and uprooted all the GM plants. The project to develop Golden Rice was started 20 years ago in 1993 by German researchers with funding from the Rockefeller Foundation. The rice has been modified by adding extra genes that turn on the plant's ability to produce beta-carotene, which humans can convert into vitamin A. A lack of this vitamin increases the chances of blindness and susceptibility to disease. Vitamin A deficiency is a significant problem among children in developing countries. According to Helen Keller International, around 670,000 children will die each year from the problem, while 350,000 will go blind. It is estimated that one cup of Golden Rice could provide half an adult's recommended daily intake. Golden Rice field trials are currently being carried out in the Philippines under the auspices of the International Rice Research Institute (IRRI), together with PhilRice, the local research body. Five small test plots have been planted with the idea that there would shortly be a submission to the regulatory authority of the Philippines. It was hoped that initial releases to farmers could happen in 2014. The plot in Bicol was guarded and fenced but the protestors broke through the security and uprooted and trampled the rice plants. The attackers who were members of a group called Sikwal-Gmo say they attacked the crop because they believe that GM technology is not the solution to malnutrition in the Philippines. The protestors argue that international agrochemical corporations and the US are behind the drive for Golden Rice. In a statement, they said they were concerned that the rice trial was both a danger to human health and biodiversity. The scientists involved with the trial say they were relieved that no-one was hurt during the incident. ""It was not completely unexpected as we had heard threats,"" the IRRI's Dr Bruce Tolentino told BBC News. ""It was certainly disappointing to have our field trial vandalised because our Golden Rice research aims to avoid the horrible plight of women and children suffering vitamin A deficiency."" The researchers say that the development of the modified rice remains critical for the Philippines as 1.7 million children in the country aged under five are affected by vitamin A deficiency. They say they are determined to go ahead with the project. ""This is not a major setback, because it is just one trial of a series and just one of several sites. We remain completely committed to continuing our Golden Rice research to help improve people's nutrition,"" said Dr Tolentino. The development of GM technology is highly contentious in the Philippines. Earlier this year, the Court of Appeals rejected another crop, an eggplant that had been modified to produce toxins to a pest. The court ruled that the crop violated the constitutional rights of Filipinos to health and a balanced ecology. Follow Matt on Twitter.",A trial plot of genetically modified rice has been destroyed by local farmers in the Philippines.,23632042
2,"At least a dozen Republicans have said they will not be voting for him, since the comments emerged on Friday. Mr Trump says he will never drop out of the race to be president and will never let his supporters down. He has been under pressure after a tape from 2005 of him bragging about groping and kissing women was broadcast. The latest to withdraw their support are former Republican presidential candidate John McCain and former Secretary of State Condoleezza Rice. Mr McCain said Mr Trump's comments ""make it impossible to continue to offer even conditional support for his candidacy"", while Ms Rice said: ""Enough! Donald Trump should not be President. He should withdraw."" New Hampshire Senator Kelly Ayotte said in a statement: ""I cannot and will not support a candidate for president who brags about degrading and assaulting women,"" she said Ms Ayotte - who faces a competitive race for re-election - said she would not vote for Mrs Clinton but instead would ""write in"" Mike Pence, Mr Trump's vice-presidential running mate, on her ballot paper. Several other Republicans also said they would vote for Mr Pence. Mr Trump himself stressed that there was ""zero chance I'll quit"", adding that he was getting ""unbelievable"" support. And in a tweet, the Republican candidate said ""the media and establishment want me out of the race so badly"". Mr Trump's wife Melania issued a statement on Saturday saying: ""The words my husband used are unacceptable and offensive to me."" She said her husband had ""the heart and mind of a leader"". Mr Pence said he was ""offended"" by Mr Trump's video, but grateful he had expressed remorse and apologised to the American people. ""We pray for his family,"" he said in a statement. House Speaker Paul Ryan had originally invited Mr Trump to attend a campaign event in Wisconsin this weekend but rescinded his invitation, saying he was ""sickened"" by what he had heard. Mr Pence was due to go in his running mate's place, but declined to attend. Meanwhile, Hillary Clinton, Mr Trump's Democratic election rival, called his comments in the tape ""horrific"". In the recorded comments, which date back to 2005 when Mr Trump was appearing as a guest on a soap, he says ""you can do anything"" to women ""when you're a star"" and is heard saying ""grab them by the pussy"". The candidate released a video statement apologising for the comments. Mr Trump's 2005 comments, posted by the Washington Post, overshadowed the release of transcripts of Mrs Clinton's speeches to private events, by the whistle-blowing site Wikileaks. The candidate had married his third wife Melania a few months before the recording. She said on Saturday: ""I hope people will accept his apology, as I have, and focus on the important issues facing our nation and the world."" Who is ahead in the polls? 49% Hillary Clinton 45% Donald Trump Last updated October 3, 2016 The second TV debate between Mr Trump and Mrs Clinton will take place on Sunday evening in St Louis. Mr Trump recently said he would not bring up stories about Bill Clinton's infidelities in the debate, after previously threatening to do so. But in his video apology, he attacked the former president directly: ""Bill Clinton has actually abused women, and Hillary has bullied, attacked and shamed his victims. ""We'll discuss this in the coming days,"" he said. ""See you at the debate on Sunday."" The day after a video emerged in which he suggested he could have any woman he wants because he's a star and so could just grab them by the pussy, Mr Trump is in a whole ocean of hot political water. Enough, quite possibly, to sink any chance he had of winning the White House. There is a violence in the phrases ""grab 'em by the pussy"" and ""you can do anything"" that any victim of abuse would recognise and that most women would find sickening. But this tape doesn't just offend women, judging from the reaction in the Republican party. It has offended a lot of men too. Whether those men will now withdraw their endorsements of him is yet to be seen. Read more from Katty",More senior Republicans have withdrawn support for US presidential candidate Donald Trump after his obscene remarks about women became public.,37599111
3,"The song topped the UK singles charts in February 1969 and remained number one for four weeks. It was also number one in many other countries and won the Ivor Novello award for best song composition. He died peacefully after a six-year battle with Progressive Supranuclear Palsy, a family statement said. The statement said his closest family were ""with him to the last"" and that many people would miss his songs and his music. Where Do You Go To (My Lovely), a song about a girl born in poverty who becomes a member of the European jet-set, was replaced as number one by Marvin Gaye's I Heard it Through the Grapevine. It was included in the compilation programme One-Hit Wonders at the BBC, which was broadcast on BBC Four last year, although Sarstedt also reached number 10 in the charts with Frozen Orange Juice in June 1969. He wrote more than a dozen albums in a career that spanned more than 50 years, releasing his last, Restless Heart, in 2013 Born into a musical family in India, Sarstedt was one of three brothers who all enjoyed success in the UK singles chart. His older sibling, Richard Sarstedt, who performed under the stage name Eden Kane, also topped the charts with Well I ask You in 1961, while younger brother Clive, performing under the name Robin Sarstedt, reached number three in 1976 with My Resistance is Low. Sarstedt's music reached new audiences when Where Do You Go To (My Lovely) was included in the Wes Anderson films Hotel Chevalier and The Darjeeling Limited, which were both released in 2007. According to his website, he retired in 2010 because of his illness - a rare, progressive neurological condition.","Singer-songwriter Peter Sarstedt, best known for the song Where Do You Go To (My Lovely), has died at the age of 75, his family has said.",38548507
4,"The former England player opened the batting and made 101, with nine fours, before he was stumped off Ashar Zaidi. Laurie Evans provided late impetus with an unbeaten 70 off 53 balls as they posted a total of 283-7 at Edgbaston. Tom Westley made 61 for Essex and Ryan ten Doeschate was last to go for 50 as they were all out for 213. It was a disappointing batting effort which left 7.5 overs unused, and Warwickshire will now be at home to Somerset - who beat Worcestershire by nine wickets - on 28 or 29 August, with a place in the Lord's final at stake. With England seamer Chris Woakes conceding 47 from seven wicketless overs, it was Warwickshire's spinners who undermined the Essex run chase after openers Westley and Nick Browne put on 75 in 12 overs, claiming eight wickets between them. Browne was stumped off Ateeq Javid and Jesse Ryder, Jaik Mickleburgh and Zaidi were all guilty of poor shots as numbers three to six in the order all failed to reach double figures. Essex slumped to 134-6 in the 28th over, with leg-spinner Josh Poysden claiming 3-46, and it was Jeetan Patel (3-32) who ended the game by having ten Doeschate lbw after he reached a run-a-ball half-century. Earlier Trott, who now averages 77.80 in this season's competition, anchored the Warwickshire innings after skipper Ian Bell was caught behind for a fourth-ball duck. He shared a stand of 136 with Tim Ambrose (60) and although his dismissal sparked a mini-slump from 227-3 to 257-7, Evans hit three sixes and three fours to boost the total in the closing overs.",Jonathan Trott made his third One-Day Cup century in four innings as Warwickshire reached the semi-finals with a 70-run home win over Essex.,37106568


Unnamed: 0,document,summary,id
0,"On one occasion there was only one commode available for more than 100 patients at North Middlesex University Hospital, a report by the watchdog said. Emergency services at the London hospital have been rated ""inadequate"". The hospital said it was ""extremely sorry"" for the problems in the unit. Inspectors from the Care Quality Commission said there were too few competent doctors who were able to assess and treat patients at night when they inspected the department in April and May. The unit - which sees 500 hundred patients a day - logged 22 serious incidents in the past year, including the dead patient not being found for hours. Others included a patient being left sitting on a bedpan for more than an hour. And nurse to patient ratios were rarely achieved because they frequently had 20 patients being treated in the corridor. The report also said staff were afraid to speak up for fear of retribution. The inspection of the emergency department and two of the hospital's medical wards was in response to concerns about the standards of care. The hospital has apologised to patients and says the A&E department now has five additional doctors and consultants on loan from other London trusts, a new nursing lead and new clinical director. Chief Inspector of Hospitals Sir Mike Richards said the hospital has already ""turned a corner"" since the inspection. He said: ""A new leadership team is in place in the emergency department, there are moves to appoint more senior doctors - and I note that the trust is calling on consultants from other departments within the hospital to provide the routine daily support to A and E which is so badly needed. ""There is still much more that needs to be done. We will be watching their progress very closely."" David Burrowes, MP for Enfield Southgate, said he was left to wait for 12 hours on a trolley with a ruptured appendix in the emergency department in 2014. He said ""urgent action"" was needed. ""The important question is why the warning signals from at least two years ago were not heeded,"" he added Tottenham MP David Lammy said the ""damning"" report is ""even worse than I feared"" and demanded answers from Health Secretary Jeremy Hunt. He said: ""It shocks and appals me that this situation has been left for so long without an intervention from the Health Secretary, and the way that this has been covered up is nothing short of a scandal."" During the same inspection, medical care services were rated as requiring improvement. The trust is now required to improve the care of patients in the emergency department by 26 August 2016 following a warning from the CQC. A full inspection of the trust will take place in September.","A patient lay dead for up to four-and-a-half hours before being spotted at one of the busiest's A&E departments in the country, inspectors have revealed.",36716248
1,"""Golden Rice"" has been developed by scientists to combat vitamin A deficiency, which affects millions of children in the developing world. The crop was just weeks away from being submitted to the authorities for a safety evaluation. But a group of around 400 protestors attacked the field trial in the Bicol region and uprooted all the GM plants. The project to develop Golden Rice was started 20 years ago in 1993 by German researchers with funding from the Rockefeller Foundation. The rice has been modified by adding extra genes that turn on the plant's ability to produce beta-carotene, which humans can convert into vitamin A. A lack of this vitamin increases the chances of blindness and susceptibility to disease. Vitamin A deficiency is a significant problem among children in developing countries. According to Helen Keller International, around 670,000 children will die each year from the problem, while 350,000 will go blind. It is estimated that one cup of Golden Rice could provide half an adult's recommended daily intake. Golden Rice field trials are currently being carried out in the Philippines under the auspices of the International Rice Research Institute (IRRI), together with PhilRice, the local research body. Five small test plots have been planted with the idea that there would shortly be a submission to the regulatory authority of the Philippines. It was hoped that initial releases to farmers could happen in 2014. The plot in Bicol was guarded and fenced but the protestors broke through the security and uprooted and trampled the rice plants. The attackers who were members of a group called Sikwal-Gmo say they attacked the crop because they believe that GM technology is not the solution to malnutrition in the Philippines. The protestors argue that international agrochemical corporations and the US are behind the drive for Golden Rice. In a statement, they said they were concerned that the rice trial was both a danger to human health and biodiversity. The scientists involved with the trial say they were relieved that no-one was hurt during the incident. ""It was not completely unexpected as we had heard threats,"" the IRRI's Dr Bruce Tolentino told BBC News. ""It was certainly disappointing to have our field trial vandalised because our Golden Rice research aims to avoid the horrible plight of women and children suffering vitamin A deficiency."" The researchers say that the development of the modified rice remains critical for the Philippines as 1.7 million children in the country aged under five are affected by vitamin A deficiency. They say they are determined to go ahead with the project. ""This is not a major setback, because it is just one trial of a series and just one of several sites. We remain completely committed to continuing our Golden Rice research to help improve people's nutrition,"" said Dr Tolentino. The development of GM technology is highly contentious in the Philippines. Earlier this year, the Court of Appeals rejected another crop, an eggplant that had been modified to produce toxins to a pest. The court ruled that the crop violated the constitutional rights of Filipinos to health and a balanced ecology. Follow Matt on Twitter.",A trial plot of genetically modified rice has been destroyed by local farmers in the Philippines.,23632042
2,"At least a dozen Republicans have said they will not be voting for him, since the comments emerged on Friday. Mr Trump says he will never drop out of the race to be president and will never let his supporters down. He has been under pressure after a tape from 2005 of him bragging about groping and kissing women was broadcast. The latest to withdraw their support are former Republican presidential candidate John McCain and former Secretary of State Condoleezza Rice. Mr McCain said Mr Trump's comments ""make it impossible to continue to offer even conditional support for his candidacy"", while Ms Rice said: ""Enough! Donald Trump should not be President. He should withdraw."" New Hampshire Senator Kelly Ayotte said in a statement: ""I cannot and will not support a candidate for president who brags about degrading and assaulting women,"" she said Ms Ayotte - who faces a competitive race for re-election - said she would not vote for Mrs Clinton but instead would ""write in"" Mike Pence, Mr Trump's vice-presidential running mate, on her ballot paper. Several other Republicans also said they would vote for Mr Pence. Mr Trump himself stressed that there was ""zero chance I'll quit"", adding that he was getting ""unbelievable"" support. And in a tweet, the Republican candidate said ""the media and establishment want me out of the race so badly"". Mr Trump's wife Melania issued a statement on Saturday saying: ""The words my husband used are unacceptable and offensive to me."" She said her husband had ""the heart and mind of a leader"". Mr Pence said he was ""offended"" by Mr Trump's video, but grateful he had expressed remorse and apologised to the American people. ""We pray for his family,"" he said in a statement. House Speaker Paul Ryan had originally invited Mr Trump to attend a campaign event in Wisconsin this weekend but rescinded his invitation, saying he was ""sickened"" by what he had heard. Mr Pence was due to go in his running mate's place, but declined to attend. Meanwhile, Hillary Clinton, Mr Trump's Democratic election rival, called his comments in the tape ""horrific"". In the recorded comments, which date back to 2005 when Mr Trump was appearing as a guest on a soap, he says ""you can do anything"" to women ""when you're a star"" and is heard saying ""grab them by the pussy"". The candidate released a video statement apologising for the comments. Mr Trump's 2005 comments, posted by the Washington Post, overshadowed the release of transcripts of Mrs Clinton's speeches to private events, by the whistle-blowing site Wikileaks. The candidate had married his third wife Melania a few months before the recording. She said on Saturday: ""I hope people will accept his apology, as I have, and focus on the important issues facing our nation and the world."" Who is ahead in the polls? 49% Hillary Clinton 45% Donald Trump Last updated October 3, 2016 The second TV debate between Mr Trump and Mrs Clinton will take place on Sunday evening in St Louis. Mr Trump recently said he would not bring up stories about Bill Clinton's infidelities in the debate, after previously threatening to do so. But in his video apology, he attacked the former president directly: ""Bill Clinton has actually abused women, and Hillary has bullied, attacked and shamed his victims. ""We'll discuss this in the coming days,"" he said. ""See you at the debate on Sunday."" The day after a video emerged in which he suggested he could have any woman he wants because he's a star and so could just grab them by the pussy, Mr Trump is in a whole ocean of hot political water. Enough, quite possibly, to sink any chance he had of winning the White House. There is a violence in the phrases ""grab 'em by the pussy"" and ""you can do anything"" that any victim of abuse would recognise and that most women would find sickening. But this tape doesn't just offend women, judging from the reaction in the Republican party. It has offended a lot of men too. Whether those men will now withdraw their endorsements of him is yet to be seen. Read more from Katty",More senior Republicans have withdrawn support for US presidential candidate Donald Trump after his obscene remarks about women became public.,37599111
3,"The song topped the UK singles charts in February 1969 and remained number one for four weeks. It was also number one in many other countries and won the Ivor Novello award for best song composition. He died peacefully after a six-year battle with Progressive Supranuclear Palsy, a family statement said. The statement said his closest family were ""with him to the last"" and that many people would miss his songs and his music. Where Do You Go To (My Lovely), a song about a girl born in poverty who becomes a member of the European jet-set, was replaced as number one by Marvin Gaye's I Heard it Through the Grapevine. It was included in the compilation programme One-Hit Wonders at the BBC, which was broadcast on BBC Four last year, although Sarstedt also reached number 10 in the charts with Frozen Orange Juice in June 1969. He wrote more than a dozen albums in a career that spanned more than 50 years, releasing his last, Restless Heart, in 2013 Born into a musical family in India, Sarstedt was one of three brothers who all enjoyed success in the UK singles chart. His older sibling, Richard Sarstedt, who performed under the stage name Eden Kane, also topped the charts with Well I ask You in 1961, while younger brother Clive, performing under the name Robin Sarstedt, reached number three in 1976 with My Resistance is Low. Sarstedt's music reached new audiences when Where Do You Go To (My Lovely) was included in the Wes Anderson films Hotel Chevalier and The Darjeeling Limited, which were both released in 2007. According to his website, he retired in 2010 because of his illness - a rare, progressive neurological condition.","Singer-songwriter Peter Sarstedt, best known for the song Where Do You Go To (My Lovely), has died at the age of 75, his family has said.",38548507
4,"The former England player opened the batting and made 101, with nine fours, before he was stumped off Ashar Zaidi. Laurie Evans provided late impetus with an unbeaten 70 off 53 balls as they posted a total of 283-7 at Edgbaston. Tom Westley made 61 for Essex and Ryan ten Doeschate was last to go for 50 as they were all out for 213. It was a disappointing batting effort which left 7.5 overs unused, and Warwickshire will now be at home to Somerset - who beat Worcestershire by nine wickets - on 28 or 29 August, with a place in the Lord's final at stake. With England seamer Chris Woakes conceding 47 from seven wicketless overs, it was Warwickshire's spinners who undermined the Essex run chase after openers Westley and Nick Browne put on 75 in 12 overs, claiming eight wickets between them. Browne was stumped off Ateeq Javid and Jesse Ryder, Jaik Mickleburgh and Zaidi were all guilty of poor shots as numbers three to six in the order all failed to reach double figures. Essex slumped to 134-6 in the 28th over, with leg-spinner Josh Poysden claiming 3-46, and it was Jeetan Patel (3-32) who ended the game by having ten Doeschate lbw after he reached a run-a-ball half-century. Earlier Trott, who now averages 77.80 in this season's competition, anchored the Warwickshire innings after skipper Ian Bell was caught behind for a fourth-ball duck. He shared a stand of 136 with Tim Ambrose (60) and although his dismissal sparked a mini-slump from 227-3 to 257-7, Evans hit three sixes and three fours to boost the total in the closing overs.",Jonathan Trott made his third One-Day Cup century in four innings as Warwickshire reached the semi-finals with a 70-run home win over Essex.,37106568


Unnamed: 0,document,summary,id
0,"On one occasion there was only one commode available for more than 100 patients at North Middlesex University Hospital, a report by the watchdog said. Emergency services at the London hospital have been rated ""inadequate"". The hospital said it was ""extremely sorry"" for the problems in the unit. Inspectors from the Care Quality Commission said there were too few competent doctors who were able to assess and treat patients at night when they inspected the department in April and May. The unit - which sees 500 hundred patients a day - logged 22 serious incidents in the past year, including the dead patient not being found for hours. Others included a patient being left sitting on a bedpan for more than an hour. And nurse to patient ratios were rarely achieved because they frequently had 20 patients being treated in the corridor. The report also said staff were afraid to speak up for fear of retribution. The inspection of the emergency department and two of the hospital's medical wards was in response to concerns about the standards of care. The hospital has apologised to patients and says the A&E department now has five additional doctors and consultants on loan from other London trusts, a new nursing lead and new clinical director. Chief Inspector of Hospitals Sir Mike Richards said the hospital has already ""turned a corner"" since the inspection. He said: ""A new leadership team is in place in the emergency department, there are moves to appoint more senior doctors - and I note that the trust is calling on consultants from other departments within the hospital to provide the routine daily support to A and E which is so badly needed. ""There is still much more that needs to be done. We will be watching their progress very closely."" David Burrowes, MP for Enfield Southgate, said he was left to wait for 12 hours on a trolley with a ruptured appendix in the emergency department in 2014. He said ""urgent action"" was needed. ""The important question is why the warning signals from at least two years ago were not heeded,"" he added Tottenham MP David Lammy said the ""damning"" report is ""even worse than I feared"" and demanded answers from Health Secretary Jeremy Hunt. He said: ""It shocks and appals me that this situation has been left for so long without an intervention from the Health Secretary, and the way that this has been covered up is nothing short of a scandal."" During the same inspection, medical care services were rated as requiring improvement. The trust is now required to improve the care of patients in the emergency department by 26 August 2016 following a warning from the CQC. A full inspection of the trust will take place in September.","A patient lay dead for up to four-and-a-half hours before being spotted at one of the busiest's A&E departments in the country, inspectors have revealed.",36716248
1,"""Golden Rice"" has been developed by scientists to combat vitamin A deficiency, which affects millions of children in the developing world. The crop was just weeks away from being submitted to the authorities for a safety evaluation. But a group of around 400 protestors attacked the field trial in the Bicol region and uprooted all the GM plants. The project to develop Golden Rice was started 20 years ago in 1993 by German researchers with funding from the Rockefeller Foundation. The rice has been modified by adding extra genes that turn on the plant's ability to produce beta-carotene, which humans can convert into vitamin A. A lack of this vitamin increases the chances of blindness and susceptibility to disease. Vitamin A deficiency is a significant problem among children in developing countries. According to Helen Keller International, around 670,000 children will die each year from the problem, while 350,000 will go blind. It is estimated that one cup of Golden Rice could provide half an adult's recommended daily intake. Golden Rice field trials are currently being carried out in the Philippines under the auspices of the International Rice Research Institute (IRRI), together with PhilRice, the local research body. Five small test plots have been planted with the idea that there would shortly be a submission to the regulatory authority of the Philippines. It was hoped that initial releases to farmers could happen in 2014. The plot in Bicol was guarded and fenced but the protestors broke through the security and uprooted and trampled the rice plants. The attackers who were members of a group called Sikwal-Gmo say they attacked the crop because they believe that GM technology is not the solution to malnutrition in the Philippines. The protestors argue that international agrochemical corporations and the US are behind the drive for Golden Rice. In a statement, they said they were concerned that the rice trial was both a danger to human health and biodiversity. The scientists involved with the trial say they were relieved that no-one was hurt during the incident. ""It was not completely unexpected as we had heard threats,"" the IRRI's Dr Bruce Tolentino told BBC News. ""It was certainly disappointing to have our field trial vandalised because our Golden Rice research aims to avoid the horrible plight of women and children suffering vitamin A deficiency."" The researchers say that the development of the modified rice remains critical for the Philippines as 1.7 million children in the country aged under five are affected by vitamin A deficiency. They say they are determined to go ahead with the project. ""This is not a major setback, because it is just one trial of a series and just one of several sites. We remain completely committed to continuing our Golden Rice research to help improve people's nutrition,"" said Dr Tolentino. The development of GM technology is highly contentious in the Philippines. Earlier this year, the Court of Appeals rejected another crop, an eggplant that had been modified to produce toxins to a pest. The court ruled that the crop violated the constitutional rights of Filipinos to health and a balanced ecology. Follow Matt on Twitter.",A trial plot of genetically modified rice has been destroyed by local farmers in the Philippines.,23632042
2,"At least a dozen Republicans have said they will not be voting for him, since the comments emerged on Friday. Mr Trump says he will never drop out of the race to be president and will never let his supporters down. He has been under pressure after a tape from 2005 of him bragging about groping and kissing women was broadcast. The latest to withdraw their support are former Republican presidential candidate John McCain and former Secretary of State Condoleezza Rice. Mr McCain said Mr Trump's comments ""make it impossible to continue to offer even conditional support for his candidacy"", while Ms Rice said: ""Enough! Donald Trump should not be President. He should withdraw."" New Hampshire Senator Kelly Ayotte said in a statement: ""I cannot and will not support a candidate for president who brags about degrading and assaulting women,"" she said Ms Ayotte - who faces a competitive race for re-election - said she would not vote for Mrs Clinton but instead would ""write in"" Mike Pence, Mr Trump's vice-presidential running mate, on her ballot paper. Several other Republicans also said they would vote for Mr Pence. Mr Trump himself stressed that there was ""zero chance I'll quit"", adding that he was getting ""unbelievable"" support. And in a tweet, the Republican candidate said ""the media and establishment want me out of the race so badly"". Mr Trump's wife Melania issued a statement on Saturday saying: ""The words my husband used are unacceptable and offensive to me."" She said her husband had ""the heart and mind of a leader"". Mr Pence said he was ""offended"" by Mr Trump's video, but grateful he had expressed remorse and apologised to the American people. ""We pray for his family,"" he said in a statement. House Speaker Paul Ryan had originally invited Mr Trump to attend a campaign event in Wisconsin this weekend but rescinded his invitation, saying he was ""sickened"" by what he had heard. Mr Pence was due to go in his running mate's place, but declined to attend. Meanwhile, Hillary Clinton, Mr Trump's Democratic election rival, called his comments in the tape ""horrific"". In the recorded comments, which date back to 2005 when Mr Trump was appearing as a guest on a soap, he says ""you can do anything"" to women ""when you're a star"" and is heard saying ""grab them by the pussy"". The candidate released a video statement apologising for the comments. Mr Trump's 2005 comments, posted by the Washington Post, overshadowed the release of transcripts of Mrs Clinton's speeches to private events, by the whistle-blowing site Wikileaks. The candidate had married his third wife Melania a few months before the recording. She said on Saturday: ""I hope people will accept his apology, as I have, and focus on the important issues facing our nation and the world."" Who is ahead in the polls? 49% Hillary Clinton 45% Donald Trump Last updated October 3, 2016 The second TV debate between Mr Trump and Mrs Clinton will take place on Sunday evening in St Louis. Mr Trump recently said he would not bring up stories about Bill Clinton's infidelities in the debate, after previously threatening to do so. But in his video apology, he attacked the former president directly: ""Bill Clinton has actually abused women, and Hillary has bullied, attacked and shamed his victims. ""We'll discuss this in the coming days,"" he said. ""See you at the debate on Sunday."" The day after a video emerged in which he suggested he could have any woman he wants because he's a star and so could just grab them by the pussy, Mr Trump is in a whole ocean of hot political water. Enough, quite possibly, to sink any chance he had of winning the White House. There is a violence in the phrases ""grab 'em by the pussy"" and ""you can do anything"" that any victim of abuse would recognise and that most women would find sickening. But this tape doesn't just offend women, judging from the reaction in the Republican party. It has offended a lot of men too. Whether those men will now withdraw their endorsements of him is yet to be seen. Read more from Katty",More senior Republicans have withdrawn support for US presidential candidate Donald Trump after his obscene remarks about women became public.,37599111
3,"The song topped the UK singles charts in February 1969 and remained number one for four weeks. It was also number one in many other countries and won the Ivor Novello award for best song composition. He died peacefully after a six-year battle with Progressive Supranuclear Palsy, a family statement said. The statement said his closest family were ""with him to the last"" and that many people would miss his songs and his music. Where Do You Go To (My Lovely), a song about a girl born in poverty who becomes a member of the European jet-set, was replaced as number one by Marvin Gaye's I Heard it Through the Grapevine. It was included in the compilation programme One-Hit Wonders at the BBC, which was broadcast on BBC Four last year, although Sarstedt also reached number 10 in the charts with Frozen Orange Juice in June 1969. He wrote more than a dozen albums in a career that spanned more than 50 years, releasing his last, Restless Heart, in 2013 Born into a musical family in India, Sarstedt was one of three brothers who all enjoyed success in the UK singles chart. His older sibling, Richard Sarstedt, who performed under the stage name Eden Kane, also topped the charts with Well I ask You in 1961, while younger brother Clive, performing under the name Robin Sarstedt, reached number three in 1976 with My Resistance is Low. Sarstedt's music reached new audiences when Where Do You Go To (My Lovely) was included in the Wes Anderson films Hotel Chevalier and The Darjeeling Limited, which were both released in 2007. According to his website, he retired in 2010 because of his illness - a rare, progressive neurological condition.","Singer-songwriter Peter Sarstedt, best known for the song Where Do You Go To (My Lovely), has died at the age of 75, his family has said.",38548507
4,"The former England player opened the batting and made 101, with nine fours, before he was stumped off Ashar Zaidi. Laurie Evans provided late impetus with an unbeaten 70 off 53 balls as they posted a total of 283-7 at Edgbaston. Tom Westley made 61 for Essex and Ryan ten Doeschate was last to go for 50 as they were all out for 213. It was a disappointing batting effort which left 7.5 overs unused, and Warwickshire will now be at home to Somerset - who beat Worcestershire by nine wickets - on 28 or 29 August, with a place in the Lord's final at stake. With England seamer Chris Woakes conceding 47 from seven wicketless overs, it was Warwickshire's spinners who undermined the Essex run chase after openers Westley and Nick Browne put on 75 in 12 overs, claiming eight wickets between them. Browne was stumped off Ateeq Javid and Jesse Ryder, Jaik Mickleburgh and Zaidi were all guilty of poor shots as numbers three to six in the order all failed to reach double figures. Essex slumped to 134-6 in the 28th over, with leg-spinner Josh Poysden claiming 3-46, and it was Jeetan Patel (3-32) who ended the game by having ten Doeschate lbw after he reached a run-a-ball half-century. Earlier Trott, who now averages 77.80 in this season's competition, anchored the Warwickshire innings after skipper Ian Bell was caught behind for a fourth-ball duck. He shared a stand of 136 with Tim Ambrose (60) and although his dismissal sparked a mini-slump from 227-3 to 257-7, Evans hit three sixes and three fours to boost the total in the closing overs.",Jonathan Trott made his third One-Day Cup century in four innings as Warwickshire reached the semi-finals with a 70-run home win over Essex.,37106568


### We can view the column names and data types without our dataset using .features

In [10]:
xsum['test'].features

{'document': Value(dtype='string', id=None),
 'summary': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None)}

In [11]:
print(xsum['test'].info)

DatasetInfo(description='\nExtreme Summarization (XSum) Dataset.\n\nThere are three features:\n  - document: Input news article.\n  - summary: One sentence summary of the article.\n  - id: BBC ID of the article.\n\n', citation="\n@article{Narayan2018DontGM,\n  title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization},\n  author={Shashi Narayan and Shay B. Cohen and Mirella Lapata},\n  journal={ArXiv},\n  year={2018},\n  volume={abs/1808.08745}\n}\n", homepage='https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset', license='', features={'document': Value(dtype='string', id=None), 'summary': Value(dtype='string', id=None), 'id': Value(dtype='string', id=None)}, post_processed=None, supervised_keys=SupervisedKeysData(input='document', output='summary'), task_templates=None, builder_name='xsum', config_name='default', version=1.2.0, splits={'train': SplitInfo(name='train', num_bytes=479206615, num_examples=204045, data

# Preparing XSUM Data
Before we can put the text into a model we need to convert it into a format that the transformer can understand. Encoders and decoders only understand numerical values; we need to tokenize each word and then convert the tokens into numerical values. The tokenization transformer splits text into tokens and then adds special tokens if expected based on pretraining. The tokenizer then matches each token to unique id in vocabulary of tokenizer which has a corresponding vector of numerical values. These vectors contain the contextualized value of a word. For example, the vector representation of the word "to" isnt just "to", it also takes into account the words around it which are called context (right and left context). To continue this example, "Welcome to NYC" is a sentence that has the word "to". For the word "to" the left context is "Welcome" and the right context is "NYC". The output is based on these contexts; this is how the value is a contextualized vector thanks to self-attention mechanism. We can do all of this using the AutoTokenizer.from_pretarined method to ensure that we get a tokenizer that corresponds to the model architecture we want to use (facebook/bart-large-cnn); however, we will specifically reference the BartTokenizer in our checkpoint, tokenizer, and model to ensure all aspects of our model were trained using the same methodologies so we can avoid unexpected summaries

In [12]:
checkpoint = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint)

### We now write a function that preprocesses the test data by passing it to the tokenizer. We need to use the argument truncation=True to ensure that any input longer than the model can handle will be truncated to the maximum length alowed. We can view this information in the model config. BART has a maximum length of 1024 which we can see in max_position_embeddings

In [13]:
model.config

BartConfig {
  "_name_or_path": "facebook/bart-large-cnn",
  "_num_labels": 3,
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_final_layer_norm": false,
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "early_stopping": true,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "force_bos_token_to_be_generated": true,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "L

### We can now create the function with the maximum length allowed as per the config and an arbitrary minimum length. 

In [14]:
max_input_length = 1024
max_target_length = 100


def preperation_function(examples):
    inputs = [doc for doc in examples["document"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, padding=True)

    
    with tokenizer.as_target_tokenizer(): # Setup the tokenizer for summaries where "as_target_tokenizer" is what provides passes along the context for each vector
        labels = tokenizer(
            examples["summary"], max_length=max_target_length, truncation=True
        )

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

### We can apply this function to our dataset using map

In [15]:
tokenized_xsum = xsum.map(preperation_function, batched=True)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-2b651f21d6ec073a.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-35f38c35a797b587.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-c6fb5876cc0b65d3.arrow


In [16]:
tokenized_xsum

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11334
    })
})

In [17]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

### The attention mask tells the model what to pay attention to by passing values of 1 for tokens to consider and values of 0 for tokens to ignore. The input ids are the numerical mapping of tokens to BART's vocabulary; each word in BART's vocabulary is assigned a numerical value.

In [18]:
display_function(tokenized_xsum['test'])

Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The company reported profits of $98m (£65m) for the quarter, after posting a big loss for the same period last year. The Z10 handset is seen as crucial to the future of Blackberry, which has struggled to keep up with new Apple and Android phones. It has been on sale for a month in the UK, Canada and other markets. It went on sale with little fanfare a week ago in the United States, Blackberry's most important market. The latest figures do not include US sales. Blackberry was previously called Research In Motion, but changed its name last year. Analysts greeted the results cautiously, saying that it was too early to judge the success of the Z10 and its sister device the Q10. Earlier in the week, Blackberry shares were hit when two major US brokerages expressed disappointment with the US launch of the Z10. In a note to its clients, Citigroup described the launch as ""a big disappointment"". The Blackberry results also showed the company lost three million users over the year. Its handsets are now used by 76 million people, down from 79 million 12 months ago. In total, Blackberry said it had shipped a total of about six million handsets in the three months to early March.",21966363,"[0, 133, 138, 431, 4632, 9, 68, 5208, 119, 11888, 3506, 119, 43, 13, 5, 297, 6, 71, 6016, 10, 380, 872, 13, 5, 276, 675, 94, 76, 4, 20, 525, 698, 17621, 16, 450, 25, 4096, 7, 5, 499, 9, 1378, 8132, 6, 61, 34, 3956, 7, 489, 62, 19, 92, 1257, 8, 3208, 4247, 4, 85, 34, 57, 15, 1392, 13, 10, 353, 11, 5, 987, 6, 896, 8, 97, 1048, 4, 85, 439, 15, 1392, 19, 410, 2378, 17825, 10, 186, 536, 11, 5, 315, 532, 6, 1378, 8132, 18, 144, 505, 210, 4, 20, 665, 2415, ...]","[0, 12271, 1028, 4403, 1378, 8132, 161, 24, 12502, 65, 153, 9, 63, 92, 525, 698, 7466, 11, 5, 78, 130, 377, 9, 1014, 4, 2]",Mobile phone maker Blackberry says it shipped one million of its new Z10 smartphones in the first three months of 2013.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The General Court of the European Union said there were ""internal inconsistencies"" in the Commission's 2010 decision. Of the firms, Air France was fined the largest amount - €182.9m - while KLM was fined €127.2m. The two carriers merged to form Air France-KLM in 2004. Other carriers involved were Air Canada, Martinair, British Airways, Cargolux, Cathay Pacific Airways, Japan Airlines, LAN Chile, Qantas, SAS and Singapore Airlines. Lufthansa escaped a sanction after providing information to the Commission. The court said that the European Commission had not been clear enough in demonstrating an unambiguous ""single and continuous infringement"" by the carriers. Instead, the Commission had found four infringements which it had attributed directly to the carriers on particular routes, the court said. ""Internal inconsistencies"" in the decision could infringe the airline's rights of defence, the court added. Some of the carriers had said that the decision ""did not allow them to determine the nature and scope of the infringement or infringements that they were alleged to have committed"".",35111824,"[0, 133, 1292, 837, 9, 5, 796, 1332, 26, 89, 58, 22, 37559, 35604, 113, 11, 5, 1463, 18, 1824, 568, 4, 1525, 5, 2566, 6, 1754, 1470, 21, 10110, 5, 1154, 1280, 111, 4480, 27127, 4, 466, 119, 111, 150, 229, 21672, 21, 10110, 4480, 24174, 4, 176, 119, 4, 20, 80, 9816, 21379, 7, 1026, 1754, 1470, 12, 530, 21672, 11, 4482, 4, 1944, 9816, 963, 58, 1754, 896, 6, 1896, 2456, 6, 1089, 13337, 6, 230, 5384, 1168, 7073, 6, 20349, 857, 3073, 13337, 6, 1429, 6503, 6, 32425, 9614, 6, 1209, 927, 281, 6, 32143, 8, ...]","[0, 27000, 18, 200, 12, 21810, 461, 34, 4094, 10, 1539, 30, 365, 8537, 136, 41, 4480, 3913, 119, 11888, 39123, 119, 43, 796, 1463, 14592, 22001, 2051, 4, 2]",Europe's second-highest court has backed a challenge by 11 airlines against an €800m (£583m) European Commission freight cartel fine.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The University of London's Institute of Education compared vocabulary test scores and reading habits of 9,400 British people born in 1970. The researchers analysed data collected at the ages of 10, 16 and 42. As well as the tabloids finding, they said childhood reading for fun boosted vocabulary throughout life, while highbrow fiction helped adults further. The research team drew on the 1970 British Cohort Study, which collects information on a group of people from England, Scotland and Wales who were born in the same week. At the age of 10, the group took a pictorial language comprehension test and at 16 they did a multiple-choice vocabulary test. The test they did aged 42 was a shortened version of the one used at 16. The researchers also analysed information on the group's reading habits as adults and their educational achievements. The group were asked how often they read books for pleasure and what sort of books they read. The vocabulary tests showed all respondents had greater word power by the age of 42 than they had had at 16, with the average vocabulary score rising from 55% to 63%. But those who had read regularly for pleasure as children beat the rest, scoring an average 67% in the age 42 test, compared with infrequent childhood readers who scored an average of 51%. The study found those who read regularly as children tended to come from better-off families and had higher vocabulary scores as children. However, even after the data was reanalysed to take these differences into account, there was still a nine percentage point gap in the vocabulary scores at age 42 between the two groups. This may be because the frequent childhood readers continued to read for pleasure as adults, wrote the researchers. ""In other words, they developed 'good' reading habits in childhood and adolescence that they have subsequently benefited from."" But they also found ""what people read mattered as how often they read"". In terms of newspapers, they found readers of broadsheets made more progress in vocabulary than people who did not read newspapers. But ""tabloid readers actually made less progress than non-readers of newspapers"". Co-author Prof Alice Sullivan said the finding was in line with the team's previous work, which showed ""the presence of tabloid newspapers in the home during childhood was linked to poor cognitive attainment at age 16"". The report also said: ""Those who read 'highbrow' fiction made greater vocabulary gains than those who read middlebrow fiction; and lowbrow fiction readers made no more progress than non-readers."" The study found the adults with the biggest vocabularies were graduates of Russell Group of sought-after universities, scoring an average of 81% in the age 42 vocabulary test. Of this group, two-thirds (66%) preferred ""highbrow"" fiction and more than half (56%) said they read only broadsheet newspapers.",29885222,"[0, 133, 589, 9, 928, 18, 2534, 9, 3061, 1118, 32644, 1296, 4391, 8, 2600, 10095, 9, 361, 6, 4017, 1089, 82, 2421, 11, 6200, 4, 20, 2634, 24305, 414, 4786, 23, 5, 4864, 9, 158, 6, 545, 8, 3330, 4, 287, 157, 25, 5, 41048, 22926, 2609, 6, 51, 26, 6585, 2600, 13, 1531, 5934, 32644, 1328, 301, 6, 150, 239, 39457, 11845, 1147, 3362, 617, 4, 20, 557, 165, 4855, 15, 5, 6200, 1089, 32321, 2723, 13019, 6, 61, 22671, 335, 15, 10, 333, 9, 82, 31, 1156, 6, 3430, 8, 5295, 54, 58, 2421, 11, 5, 276, ...]","[0, 25439, 268, 9, 25135, 6665, 33, 2735, 28312, 873, 8244, 918, 87, 82, 54, 109, 45, 1166, 9911, 6, 3649, 10, 892, 4, 2]","Readers of tabloid papers have smaller vocabularies than people who do not read newspapers, suggests a study."
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Dr Ian Paterson denies 20 counts of wounding with intent against nine women and one man at Nottingham Crown Court. He said he had never told alleged victims they had ""a ticking bomb"" of cancer inside them. He said the phrase appears in three witness statements which was ""clear evidence"" statements have been coached. ""It's a scary thing, why would I intentionally scare a patient, that you've got a time bomb?"" he said. More updates on this and other stories in Birmingham and the Black Country The 59-year-old also said one patient, John Ingram, who had a double mastectomy after tests showed only potentially abnormal cells, was a ""quivering mass of anxiety"", convinced he would get cancer. Nothing he told him would have changed his mind, Mr Paterson said. Mr Ingram gave evidence saying Mr Paterson, who worked at hospitals run by the Heart of England NHS Trust and Spire Healthcare, told him in 2006 he was ""on the road to developing breast cancer"". But Mr Paterson, of Ashley, Altrincham, Greater Manchester, said on Wednesday that Mr Ingram's memory had become ""confused"" over time. He described his patient as a ""troubled gentleman with multiple phobias - one of them breast cancer, because his mother had died of breast cancer, aged 42"". ""So the minute he had an abnormality in his chest wall, in his head he was on the way to getting breast cancer,"" he said. ""Very little I told him thereafter would disavow him of that view."" Prosecutor Julian Christopher QC asked whether it was ""quite wrong"" to say he would ""travel in time towards cancer"". Mr Paterson said: ""I doubt I said that, simply because nobody has a crystal ball."" The trial continues.",39504958,"[0, 14043, 5965, 3769, 4277, 9118, 291, 3948, 9, 21354, 19, 5927, 136, 1117, 390, 8, 65, 313, 23, 17142, 5748, 837, 4, 91, 26, 37, 56, 393, 174, 1697, 1680, 51, 56, 22, 102, 25535, 4840, 113, 9, 1668, 1025, 106, 4, 91, 26, 5, 11054, 2092, 11, 130, 4562, 1997, 61, 21, 22, 18763, 1283, 113, 1997, 33, 57, 12531, 4, 22, 243, 18, 10, 10222, 631, 6, 596, 74, 38, 14149, 13207, 10, 3186, 6, 14, 47, 348, 300, 10, 86, 4840, 1917, 37, 26, 4, 901, 3496, 15, 42, 8, 97, 1652, 11, 8353, 8, 5, ...]","[0, 250, 6181, 16308, 1238, 9, 3406, 66, 10495, 1414, 34, 174, 10, 461, 14, 4562, 1997, 136, 123, 33, 57, 22, 876, 12552, 845, 2]","A breast surgeon accused of carrying out unnecessary operations has told a court that witness statements against him have been ""coached""."
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","The A40 between Nantgaredig and Whitemill is closed and diversions are in place following the incident at about 18:15 GMT on Friday. A grey Smart Fortwo Prime, a black Volkswagon Golf and a grey Volvo were involved in the crash. Dyfed Powys Police are appealing for witnesses.",39008393,"[0, 133, 83, 1749, 227, 234, 927, 571, 6537, 1023, 8, 3990, 36907, 1873, 16, 1367, 8, 6302, 2485, 32, 11, 317, 511, 5, 1160, 23, 59, 504, 35, 996, 5050, 15, 273, 4, 83, 10521, 5900, 3339, 14869, 1489, 6, 10, 909, 42056, 45199, 6111, 8, 10, 10521, 18674, 58, 963, 11, 5, 2058, 4, 10179, 21967, 25347, 2459, 522, 32, 9364, 13, 6057, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 22113, 82, 33, 57, 551, 7, 1098, 71, 10, 130, 512, 7329, 11, 10636, 2013, 2457, 9959, 4, 2]",Four people have been taken to hospital after a three car collision in Carmarthenshire.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The company reported profits of $98m (£65m) for the quarter, after posting a big loss for the same period last year. The Z10 handset is seen as crucial to the future of Blackberry, which has struggled to keep up with new Apple and Android phones. It has been on sale for a month in the UK, Canada and other markets. It went on sale with little fanfare a week ago in the United States, Blackberry's most important market. The latest figures do not include US sales. Blackberry was previously called Research In Motion, but changed its name last year. Analysts greeted the results cautiously, saying that it was too early to judge the success of the Z10 and its sister device the Q10. Earlier in the week, Blackberry shares were hit when two major US brokerages expressed disappointment with the US launch of the Z10. In a note to its clients, Citigroup described the launch as ""a big disappointment"". The Blackberry results also showed the company lost three million users over the year. Its handsets are now used by 76 million people, down from 79 million 12 months ago. In total, Blackberry said it had shipped a total of about six million handsets in the three months to early March.",21966363,"[0, 133, 138, 431, 4632, 9, 68, 5208, 119, 11888, 3506, 119, 43, 13, 5, 297, 6, 71, 6016, 10, 380, 872, 13, 5, 276, 675, 94, 76, 4, 20, 525, 698, 17621, 16, 450, 25, 4096, 7, 5, 499, 9, 1378, 8132, 6, 61, 34, 3956, 7, 489, 62, 19, 92, 1257, 8, 3208, 4247, 4, 85, 34, 57, 15, 1392, 13, 10, 353, 11, 5, 987, 6, 896, 8, 97, 1048, 4, 85, 439, 15, 1392, 19, 410, 2378, 17825, 10, 186, 536, 11, 5, 315, 532, 6, 1378, 8132, 18, 144, 505, 210, 4, 20, 665, 2415, ...]","[0, 12271, 1028, 4403, 1378, 8132, 161, 24, 12502, 65, 153, 9, 63, 92, 525, 698, 7466, 11, 5, 78, 130, 377, 9, 1014, 4, 2]",Mobile phone maker Blackberry says it shipped one million of its new Z10 smartphones in the first three months of 2013.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The General Court of the European Union said there were ""internal inconsistencies"" in the Commission's 2010 decision. Of the firms, Air France was fined the largest amount - €182.9m - while KLM was fined €127.2m. The two carriers merged to form Air France-KLM in 2004. Other carriers involved were Air Canada, Martinair, British Airways, Cargolux, Cathay Pacific Airways, Japan Airlines, LAN Chile, Qantas, SAS and Singapore Airlines. Lufthansa escaped a sanction after providing information to the Commission. The court said that the European Commission had not been clear enough in demonstrating an unambiguous ""single and continuous infringement"" by the carriers. Instead, the Commission had found four infringements which it had attributed directly to the carriers on particular routes, the court said. ""Internal inconsistencies"" in the decision could infringe the airline's rights of defence, the court added. Some of the carriers had said that the decision ""did not allow them to determine the nature and scope of the infringement or infringements that they were alleged to have committed"".",35111824,"[0, 133, 1292, 837, 9, 5, 796, 1332, 26, 89, 58, 22, 37559, 35604, 113, 11, 5, 1463, 18, 1824, 568, 4, 1525, 5, 2566, 6, 1754, 1470, 21, 10110, 5, 1154, 1280, 111, 4480, 27127, 4, 466, 119, 111, 150, 229, 21672, 21, 10110, 4480, 24174, 4, 176, 119, 4, 20, 80, 9816, 21379, 7, 1026, 1754, 1470, 12, 530, 21672, 11, 4482, 4, 1944, 9816, 963, 58, 1754, 896, 6, 1896, 2456, 6, 1089, 13337, 6, 230, 5384, 1168, 7073, 6, 20349, 857, 3073, 13337, 6, 1429, 6503, 6, 32425, 9614, 6, 1209, 927, 281, 6, 32143, 8, ...]","[0, 27000, 18, 200, 12, 21810, 461, 34, 4094, 10, 1539, 30, 365, 8537, 136, 41, 4480, 3913, 119, 11888, 39123, 119, 43, 796, 1463, 14592, 22001, 2051, 4, 2]",Europe's second-highest court has backed a challenge by 11 airlines against an €800m (£583m) European Commission freight cartel fine.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The University of London's Institute of Education compared vocabulary test scores and reading habits of 9,400 British people born in 1970. The researchers analysed data collected at the ages of 10, 16 and 42. As well as the tabloids finding, they said childhood reading for fun boosted vocabulary throughout life, while highbrow fiction helped adults further. The research team drew on the 1970 British Cohort Study, which collects information on a group of people from England, Scotland and Wales who were born in the same week. At the age of 10, the group took a pictorial language comprehension test and at 16 they did a multiple-choice vocabulary test. The test they did aged 42 was a shortened version of the one used at 16. The researchers also analysed information on the group's reading habits as adults and their educational achievements. The group were asked how often they read books for pleasure and what sort of books they read. The vocabulary tests showed all respondents had greater word power by the age of 42 than they had had at 16, with the average vocabulary score rising from 55% to 63%. But those who had read regularly for pleasure as children beat the rest, scoring an average 67% in the age 42 test, compared with infrequent childhood readers who scored an average of 51%. The study found those who read regularly as children tended to come from better-off families and had higher vocabulary scores as children. However, even after the data was reanalysed to take these differences into account, there was still a nine percentage point gap in the vocabulary scores at age 42 between the two groups. This may be because the frequent childhood readers continued to read for pleasure as adults, wrote the researchers. ""In other words, they developed 'good' reading habits in childhood and adolescence that they have subsequently benefited from."" But they also found ""what people read mattered as how often they read"". In terms of newspapers, they found readers of broadsheets made more progress in vocabulary than people who did not read newspapers. But ""tabloid readers actually made less progress than non-readers of newspapers"". Co-author Prof Alice Sullivan said the finding was in line with the team's previous work, which showed ""the presence of tabloid newspapers in the home during childhood was linked to poor cognitive attainment at age 16"". The report also said: ""Those who read 'highbrow' fiction made greater vocabulary gains than those who read middlebrow fiction; and lowbrow fiction readers made no more progress than non-readers."" The study found the adults with the biggest vocabularies were graduates of Russell Group of sought-after universities, scoring an average of 81% in the age 42 vocabulary test. Of this group, two-thirds (66%) preferred ""highbrow"" fiction and more than half (56%) said they read only broadsheet newspapers.",29885222,"[0, 133, 589, 9, 928, 18, 2534, 9, 3061, 1118, 32644, 1296, 4391, 8, 2600, 10095, 9, 361, 6, 4017, 1089, 82, 2421, 11, 6200, 4, 20, 2634, 24305, 414, 4786, 23, 5, 4864, 9, 158, 6, 545, 8, 3330, 4, 287, 157, 25, 5, 41048, 22926, 2609, 6, 51, 26, 6585, 2600, 13, 1531, 5934, 32644, 1328, 301, 6, 150, 239, 39457, 11845, 1147, 3362, 617, 4, 20, 557, 165, 4855, 15, 5, 6200, 1089, 32321, 2723, 13019, 6, 61, 22671, 335, 15, 10, 333, 9, 82, 31, 1156, 6, 3430, 8, 5295, 54, 58, 2421, 11, 5, 276, ...]","[0, 25439, 268, 9, 25135, 6665, 33, 2735, 28312, 873, 8244, 918, 87, 82, 54, 109, 45, 1166, 9911, 6, 3649, 10, 892, 4, 2]","Readers of tabloid papers have smaller vocabularies than people who do not read newspapers, suggests a study."
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Dr Ian Paterson denies 20 counts of wounding with intent against nine women and one man at Nottingham Crown Court. He said he had never told alleged victims they had ""a ticking bomb"" of cancer inside them. He said the phrase appears in three witness statements which was ""clear evidence"" statements have been coached. ""It's a scary thing, why would I intentionally scare a patient, that you've got a time bomb?"" he said. More updates on this and other stories in Birmingham and the Black Country The 59-year-old also said one patient, John Ingram, who had a double mastectomy after tests showed only potentially abnormal cells, was a ""quivering mass of anxiety"", convinced he would get cancer. Nothing he told him would have changed his mind, Mr Paterson said. Mr Ingram gave evidence saying Mr Paterson, who worked at hospitals run by the Heart of England NHS Trust and Spire Healthcare, told him in 2006 he was ""on the road to developing breast cancer"". But Mr Paterson, of Ashley, Altrincham, Greater Manchester, said on Wednesday that Mr Ingram's memory had become ""confused"" over time. He described his patient as a ""troubled gentleman with multiple phobias - one of them breast cancer, because his mother had died of breast cancer, aged 42"". ""So the minute he had an abnormality in his chest wall, in his head he was on the way to getting breast cancer,"" he said. ""Very little I told him thereafter would disavow him of that view."" Prosecutor Julian Christopher QC asked whether it was ""quite wrong"" to say he would ""travel in time towards cancer"". Mr Paterson said: ""I doubt I said that, simply because nobody has a crystal ball."" The trial continues.",39504958,"[0, 14043, 5965, 3769, 4277, 9118, 291, 3948, 9, 21354, 19, 5927, 136, 1117, 390, 8, 65, 313, 23, 17142, 5748, 837, 4, 91, 26, 37, 56, 393, 174, 1697, 1680, 51, 56, 22, 102, 25535, 4840, 113, 9, 1668, 1025, 106, 4, 91, 26, 5, 11054, 2092, 11, 130, 4562, 1997, 61, 21, 22, 18763, 1283, 113, 1997, 33, 57, 12531, 4, 22, 243, 18, 10, 10222, 631, 6, 596, 74, 38, 14149, 13207, 10, 3186, 6, 14, 47, 348, 300, 10, 86, 4840, 1917, 37, 26, 4, 901, 3496, 15, 42, 8, 97, 1652, 11, 8353, 8, 5, ...]","[0, 250, 6181, 16308, 1238, 9, 3406, 66, 10495, 1414, 34, 174, 10, 461, 14, 4562, 1997, 136, 123, 33, 57, 22, 876, 12552, 845, 2]","A breast surgeon accused of carrying out unnecessary operations has told a court that witness statements against him have been ""coached""."
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","The A40 between Nantgaredig and Whitemill is closed and diversions are in place following the incident at about 18:15 GMT on Friday. A grey Smart Fortwo Prime, a black Volkswagon Golf and a grey Volvo were involved in the crash. Dyfed Powys Police are appealing for witnesses.",39008393,"[0, 133, 83, 1749, 227, 234, 927, 571, 6537, 1023, 8, 3990, 36907, 1873, 16, 1367, 8, 6302, 2485, 32, 11, 317, 511, 5, 1160, 23, 59, 504, 35, 996, 5050, 15, 273, 4, 83, 10521, 5900, 3339, 14869, 1489, 6, 10, 909, 42056, 45199, 6111, 8, 10, 10521, 18674, 58, 963, 11, 5, 2058, 4, 10179, 21967, 25347, 2459, 522, 32, 9364, 13, 6057, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 22113, 82, 33, 57, 551, 7, 1098, 71, 10, 130, 512, 7329, 11, 10636, 2013, 2457, 9959, 4, 2]",Four people have been taken to hospital after a three car collision in Carmarthenshire.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The company reported profits of $98m (£65m) for the quarter, after posting a big loss for the same period last year. The Z10 handset is seen as crucial to the future of Blackberry, which has struggled to keep up with new Apple and Android phones. It has been on sale for a month in the UK, Canada and other markets. It went on sale with little fanfare a week ago in the United States, Blackberry's most important market. The latest figures do not include US sales. Blackberry was previously called Research In Motion, but changed its name last year. Analysts greeted the results cautiously, saying that it was too early to judge the success of the Z10 and its sister device the Q10. Earlier in the week, Blackberry shares were hit when two major US brokerages expressed disappointment with the US launch of the Z10. In a note to its clients, Citigroup described the launch as ""a big disappointment"". The Blackberry results also showed the company lost three million users over the year. Its handsets are now used by 76 million people, down from 79 million 12 months ago. In total, Blackberry said it had shipped a total of about six million handsets in the three months to early March.",21966363,"[0, 133, 138, 431, 4632, 9, 68, 5208, 119, 11888, 3506, 119, 43, 13, 5, 297, 6, 71, 6016, 10, 380, 872, 13, 5, 276, 675, 94, 76, 4, 20, 525, 698, 17621, 16, 450, 25, 4096, 7, 5, 499, 9, 1378, 8132, 6, 61, 34, 3956, 7, 489, 62, 19, 92, 1257, 8, 3208, 4247, 4, 85, 34, 57, 15, 1392, 13, 10, 353, 11, 5, 987, 6, 896, 8, 97, 1048, 4, 85, 439, 15, 1392, 19, 410, 2378, 17825, 10, 186, 536, 11, 5, 315, 532, 6, 1378, 8132, 18, 144, 505, 210, 4, 20, 665, 2415, ...]","[0, 12271, 1028, 4403, 1378, 8132, 161, 24, 12502, 65, 153, 9, 63, 92, 525, 698, 7466, 11, 5, 78, 130, 377, 9, 1014, 4, 2]",Mobile phone maker Blackberry says it shipped one million of its new Z10 smartphones in the first three months of 2013.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The General Court of the European Union said there were ""internal inconsistencies"" in the Commission's 2010 decision. Of the firms, Air France was fined the largest amount - €182.9m - while KLM was fined €127.2m. The two carriers merged to form Air France-KLM in 2004. Other carriers involved were Air Canada, Martinair, British Airways, Cargolux, Cathay Pacific Airways, Japan Airlines, LAN Chile, Qantas, SAS and Singapore Airlines. Lufthansa escaped a sanction after providing information to the Commission. The court said that the European Commission had not been clear enough in demonstrating an unambiguous ""single and continuous infringement"" by the carriers. Instead, the Commission had found four infringements which it had attributed directly to the carriers on particular routes, the court said. ""Internal inconsistencies"" in the decision could infringe the airline's rights of defence, the court added. Some of the carriers had said that the decision ""did not allow them to determine the nature and scope of the infringement or infringements that they were alleged to have committed"".",35111824,"[0, 133, 1292, 837, 9, 5, 796, 1332, 26, 89, 58, 22, 37559, 35604, 113, 11, 5, 1463, 18, 1824, 568, 4, 1525, 5, 2566, 6, 1754, 1470, 21, 10110, 5, 1154, 1280, 111, 4480, 27127, 4, 466, 119, 111, 150, 229, 21672, 21, 10110, 4480, 24174, 4, 176, 119, 4, 20, 80, 9816, 21379, 7, 1026, 1754, 1470, 12, 530, 21672, 11, 4482, 4, 1944, 9816, 963, 58, 1754, 896, 6, 1896, 2456, 6, 1089, 13337, 6, 230, 5384, 1168, 7073, 6, 20349, 857, 3073, 13337, 6, 1429, 6503, 6, 32425, 9614, 6, 1209, 927, 281, 6, 32143, 8, ...]","[0, 27000, 18, 200, 12, 21810, 461, 34, 4094, 10, 1539, 30, 365, 8537, 136, 41, 4480, 3913, 119, 11888, 39123, 119, 43, 796, 1463, 14592, 22001, 2051, 4, 2]",Europe's second-highest court has backed a challenge by 11 airlines against an €800m (£583m) European Commission freight cartel fine.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The University of London's Institute of Education compared vocabulary test scores and reading habits of 9,400 British people born in 1970. The researchers analysed data collected at the ages of 10, 16 and 42. As well as the tabloids finding, they said childhood reading for fun boosted vocabulary throughout life, while highbrow fiction helped adults further. The research team drew on the 1970 British Cohort Study, which collects information on a group of people from England, Scotland and Wales who were born in the same week. At the age of 10, the group took a pictorial language comprehension test and at 16 they did a multiple-choice vocabulary test. The test they did aged 42 was a shortened version of the one used at 16. The researchers also analysed information on the group's reading habits as adults and their educational achievements. The group were asked how often they read books for pleasure and what sort of books they read. The vocabulary tests showed all respondents had greater word power by the age of 42 than they had had at 16, with the average vocabulary score rising from 55% to 63%. But those who had read regularly for pleasure as children beat the rest, scoring an average 67% in the age 42 test, compared with infrequent childhood readers who scored an average of 51%. The study found those who read regularly as children tended to come from better-off families and had higher vocabulary scores as children. However, even after the data was reanalysed to take these differences into account, there was still a nine percentage point gap in the vocabulary scores at age 42 between the two groups. This may be because the frequent childhood readers continued to read for pleasure as adults, wrote the researchers. ""In other words, they developed 'good' reading habits in childhood and adolescence that they have subsequently benefited from."" But they also found ""what people read mattered as how often they read"". In terms of newspapers, they found readers of broadsheets made more progress in vocabulary than people who did not read newspapers. But ""tabloid readers actually made less progress than non-readers of newspapers"". Co-author Prof Alice Sullivan said the finding was in line with the team's previous work, which showed ""the presence of tabloid newspapers in the home during childhood was linked to poor cognitive attainment at age 16"". The report also said: ""Those who read 'highbrow' fiction made greater vocabulary gains than those who read middlebrow fiction; and lowbrow fiction readers made no more progress than non-readers."" The study found the adults with the biggest vocabularies were graduates of Russell Group of sought-after universities, scoring an average of 81% in the age 42 vocabulary test. Of this group, two-thirds (66%) preferred ""highbrow"" fiction and more than half (56%) said they read only broadsheet newspapers.",29885222,"[0, 133, 589, 9, 928, 18, 2534, 9, 3061, 1118, 32644, 1296, 4391, 8, 2600, 10095, 9, 361, 6, 4017, 1089, 82, 2421, 11, 6200, 4, 20, 2634, 24305, 414, 4786, 23, 5, 4864, 9, 158, 6, 545, 8, 3330, 4, 287, 157, 25, 5, 41048, 22926, 2609, 6, 51, 26, 6585, 2600, 13, 1531, 5934, 32644, 1328, 301, 6, 150, 239, 39457, 11845, 1147, 3362, 617, 4, 20, 557, 165, 4855, 15, 5, 6200, 1089, 32321, 2723, 13019, 6, 61, 22671, 335, 15, 10, 333, 9, 82, 31, 1156, 6, 3430, 8, 5295, 54, 58, 2421, 11, 5, 276, ...]","[0, 25439, 268, 9, 25135, 6665, 33, 2735, 28312, 873, 8244, 918, 87, 82, 54, 109, 45, 1166, 9911, 6, 3649, 10, 892, 4, 2]","Readers of tabloid papers have smaller vocabularies than people who do not read newspapers, suggests a study."
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Dr Ian Paterson denies 20 counts of wounding with intent against nine women and one man at Nottingham Crown Court. He said he had never told alleged victims they had ""a ticking bomb"" of cancer inside them. He said the phrase appears in three witness statements which was ""clear evidence"" statements have been coached. ""It's a scary thing, why would I intentionally scare a patient, that you've got a time bomb?"" he said. More updates on this and other stories in Birmingham and the Black Country The 59-year-old also said one patient, John Ingram, who had a double mastectomy after tests showed only potentially abnormal cells, was a ""quivering mass of anxiety"", convinced he would get cancer. Nothing he told him would have changed his mind, Mr Paterson said. Mr Ingram gave evidence saying Mr Paterson, who worked at hospitals run by the Heart of England NHS Trust and Spire Healthcare, told him in 2006 he was ""on the road to developing breast cancer"". But Mr Paterson, of Ashley, Altrincham, Greater Manchester, said on Wednesday that Mr Ingram's memory had become ""confused"" over time. He described his patient as a ""troubled gentleman with multiple phobias - one of them breast cancer, because his mother had died of breast cancer, aged 42"". ""So the minute he had an abnormality in his chest wall, in his head he was on the way to getting breast cancer,"" he said. ""Very little I told him thereafter would disavow him of that view."" Prosecutor Julian Christopher QC asked whether it was ""quite wrong"" to say he would ""travel in time towards cancer"". Mr Paterson said: ""I doubt I said that, simply because nobody has a crystal ball."" The trial continues.",39504958,"[0, 14043, 5965, 3769, 4277, 9118, 291, 3948, 9, 21354, 19, 5927, 136, 1117, 390, 8, 65, 313, 23, 17142, 5748, 837, 4, 91, 26, 37, 56, 393, 174, 1697, 1680, 51, 56, 22, 102, 25535, 4840, 113, 9, 1668, 1025, 106, 4, 91, 26, 5, 11054, 2092, 11, 130, 4562, 1997, 61, 21, 22, 18763, 1283, 113, 1997, 33, 57, 12531, 4, 22, 243, 18, 10, 10222, 631, 6, 596, 74, 38, 14149, 13207, 10, 3186, 6, 14, 47, 348, 300, 10, 86, 4840, 1917, 37, 26, 4, 901, 3496, 15, 42, 8, 97, 1652, 11, 8353, 8, 5, ...]","[0, 250, 6181, 16308, 1238, 9, 3406, 66, 10495, 1414, 34, 174, 10, 461, 14, 4562, 1997, 136, 123, 33, 57, 22, 876, 12552, 845, 2]","A breast surgeon accused of carrying out unnecessary operations has told a court that witness statements against him have been ""coached""."
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","The A40 between Nantgaredig and Whitemill is closed and diversions are in place following the incident at about 18:15 GMT on Friday. A grey Smart Fortwo Prime, a black Volkswagon Golf and a grey Volvo were involved in the crash. Dyfed Powys Police are appealing for witnesses.",39008393,"[0, 133, 83, 1749, 227, 234, 927, 571, 6537, 1023, 8, 3990, 36907, 1873, 16, 1367, 8, 6302, 2485, 32, 11, 317, 511, 5, 1160, 23, 59, 504, 35, 996, 5050, 15, 273, 4, 83, 10521, 5900, 3339, 14869, 1489, 6, 10, 909, 42056, 45199, 6111, 8, 10, 10521, 18674, 58, 963, 11, 5, 2058, 4, 10179, 21967, 25347, 2459, 522, 32, 9364, 13, 6057, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 22113, 82, 33, 57, 551, 7, 1098, 71, 10, 130, 512, 7329, 11, 10636, 2013, 2457, 9959, 4, 2]",Four people have been taken to hospital after a three car collision in Carmarthenshire.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The company reported profits of $98m (£65m) for the quarter, after posting a big loss for the same period last year. The Z10 handset is seen as crucial to the future of Blackberry, which has struggled to keep up with new Apple and Android phones. It has been on sale for a month in the UK, Canada and other markets. It went on sale with little fanfare a week ago in the United States, Blackberry's most important market. The latest figures do not include US sales. Blackberry was previously called Research In Motion, but changed its name last year. Analysts greeted the results cautiously, saying that it was too early to judge the success of the Z10 and its sister device the Q10. Earlier in the week, Blackberry shares were hit when two major US brokerages expressed disappointment with the US launch of the Z10. In a note to its clients, Citigroup described the launch as ""a big disappointment"". The Blackberry results also showed the company lost three million users over the year. Its handsets are now used by 76 million people, down from 79 million 12 months ago. In total, Blackberry said it had shipped a total of about six million handsets in the three months to early March.",21966363,"[0, 133, 138, 431, 4632, 9, 68, 5208, 119, 11888, 3506, 119, 43, 13, 5, 297, 6, 71, 6016, 10, 380, 872, 13, 5, 276, 675, 94, 76, 4, 20, 525, 698, 17621, 16, 450, 25, 4096, 7, 5, 499, 9, 1378, 8132, 6, 61, 34, 3956, 7, 489, 62, 19, 92, 1257, 8, 3208, 4247, 4, 85, 34, 57, 15, 1392, 13, 10, 353, 11, 5, 987, 6, 896, 8, 97, 1048, 4, 85, 439, 15, 1392, 19, 410, 2378, 17825, 10, 186, 536, 11, 5, 315, 532, 6, 1378, 8132, 18, 144, 505, 210, 4, 20, 665, 2415, ...]","[0, 12271, 1028, 4403, 1378, 8132, 161, 24, 12502, 65, 153, 9, 63, 92, 525, 698, 7466, 11, 5, 78, 130, 377, 9, 1014, 4, 2]",Mobile phone maker Blackberry says it shipped one million of its new Z10 smartphones in the first three months of 2013.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The General Court of the European Union said there were ""internal inconsistencies"" in the Commission's 2010 decision. Of the firms, Air France was fined the largest amount - €182.9m - while KLM was fined €127.2m. The two carriers merged to form Air France-KLM in 2004. Other carriers involved were Air Canada, Martinair, British Airways, Cargolux, Cathay Pacific Airways, Japan Airlines, LAN Chile, Qantas, SAS and Singapore Airlines. Lufthansa escaped a sanction after providing information to the Commission. The court said that the European Commission had not been clear enough in demonstrating an unambiguous ""single and continuous infringement"" by the carriers. Instead, the Commission had found four infringements which it had attributed directly to the carriers on particular routes, the court said. ""Internal inconsistencies"" in the decision could infringe the airline's rights of defence, the court added. Some of the carriers had said that the decision ""did not allow them to determine the nature and scope of the infringement or infringements that they were alleged to have committed"".",35111824,"[0, 133, 1292, 837, 9, 5, 796, 1332, 26, 89, 58, 22, 37559, 35604, 113, 11, 5, 1463, 18, 1824, 568, 4, 1525, 5, 2566, 6, 1754, 1470, 21, 10110, 5, 1154, 1280, 111, 4480, 27127, 4, 466, 119, 111, 150, 229, 21672, 21, 10110, 4480, 24174, 4, 176, 119, 4, 20, 80, 9816, 21379, 7, 1026, 1754, 1470, 12, 530, 21672, 11, 4482, 4, 1944, 9816, 963, 58, 1754, 896, 6, 1896, 2456, 6, 1089, 13337, 6, 230, 5384, 1168, 7073, 6, 20349, 857, 3073, 13337, 6, 1429, 6503, 6, 32425, 9614, 6, 1209, 927, 281, 6, 32143, 8, ...]","[0, 27000, 18, 200, 12, 21810, 461, 34, 4094, 10, 1539, 30, 365, 8537, 136, 41, 4480, 3913, 119, 11888, 39123, 119, 43, 796, 1463, 14592, 22001, 2051, 4, 2]",Europe's second-highest court has backed a challenge by 11 airlines against an €800m (£583m) European Commission freight cartel fine.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The University of London's Institute of Education compared vocabulary test scores and reading habits of 9,400 British people born in 1970. The researchers analysed data collected at the ages of 10, 16 and 42. As well as the tabloids finding, they said childhood reading for fun boosted vocabulary throughout life, while highbrow fiction helped adults further. The research team drew on the 1970 British Cohort Study, which collects information on a group of people from England, Scotland and Wales who were born in the same week. At the age of 10, the group took a pictorial language comprehension test and at 16 they did a multiple-choice vocabulary test. The test they did aged 42 was a shortened version of the one used at 16. The researchers also analysed information on the group's reading habits as adults and their educational achievements. The group were asked how often they read books for pleasure and what sort of books they read. The vocabulary tests showed all respondents had greater word power by the age of 42 than they had had at 16, with the average vocabulary score rising from 55% to 63%. But those who had read regularly for pleasure as children beat the rest, scoring an average 67% in the age 42 test, compared with infrequent childhood readers who scored an average of 51%. The study found those who read regularly as children tended to come from better-off families and had higher vocabulary scores as children. However, even after the data was reanalysed to take these differences into account, there was still a nine percentage point gap in the vocabulary scores at age 42 between the two groups. This may be because the frequent childhood readers continued to read for pleasure as adults, wrote the researchers. ""In other words, they developed 'good' reading habits in childhood and adolescence that they have subsequently benefited from."" But they also found ""what people read mattered as how often they read"". In terms of newspapers, they found readers of broadsheets made more progress in vocabulary than people who did not read newspapers. But ""tabloid readers actually made less progress than non-readers of newspapers"". Co-author Prof Alice Sullivan said the finding was in line with the team's previous work, which showed ""the presence of tabloid newspapers in the home during childhood was linked to poor cognitive attainment at age 16"". The report also said: ""Those who read 'highbrow' fiction made greater vocabulary gains than those who read middlebrow fiction; and lowbrow fiction readers made no more progress than non-readers."" The study found the adults with the biggest vocabularies were graduates of Russell Group of sought-after universities, scoring an average of 81% in the age 42 vocabulary test. Of this group, two-thirds (66%) preferred ""highbrow"" fiction and more than half (56%) said they read only broadsheet newspapers.",29885222,"[0, 133, 589, 9, 928, 18, 2534, 9, 3061, 1118, 32644, 1296, 4391, 8, 2600, 10095, 9, 361, 6, 4017, 1089, 82, 2421, 11, 6200, 4, 20, 2634, 24305, 414, 4786, 23, 5, 4864, 9, 158, 6, 545, 8, 3330, 4, 287, 157, 25, 5, 41048, 22926, 2609, 6, 51, 26, 6585, 2600, 13, 1531, 5934, 32644, 1328, 301, 6, 150, 239, 39457, 11845, 1147, 3362, 617, 4, 20, 557, 165, 4855, 15, 5, 6200, 1089, 32321, 2723, 13019, 6, 61, 22671, 335, 15, 10, 333, 9, 82, 31, 1156, 6, 3430, 8, 5295, 54, 58, 2421, 11, 5, 276, ...]","[0, 25439, 268, 9, 25135, 6665, 33, 2735, 28312, 873, 8244, 918, 87, 82, 54, 109, 45, 1166, 9911, 6, 3649, 10, 892, 4, 2]","Readers of tabloid papers have smaller vocabularies than people who do not read newspapers, suggests a study."
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Dr Ian Paterson denies 20 counts of wounding with intent against nine women and one man at Nottingham Crown Court. He said he had never told alleged victims they had ""a ticking bomb"" of cancer inside them. He said the phrase appears in three witness statements which was ""clear evidence"" statements have been coached. ""It's a scary thing, why would I intentionally scare a patient, that you've got a time bomb?"" he said. More updates on this and other stories in Birmingham and the Black Country The 59-year-old also said one patient, John Ingram, who had a double mastectomy after tests showed only potentially abnormal cells, was a ""quivering mass of anxiety"", convinced he would get cancer. Nothing he told him would have changed his mind, Mr Paterson said. Mr Ingram gave evidence saying Mr Paterson, who worked at hospitals run by the Heart of England NHS Trust and Spire Healthcare, told him in 2006 he was ""on the road to developing breast cancer"". But Mr Paterson, of Ashley, Altrincham, Greater Manchester, said on Wednesday that Mr Ingram's memory had become ""confused"" over time. He described his patient as a ""troubled gentleman with multiple phobias - one of them breast cancer, because his mother had died of breast cancer, aged 42"". ""So the minute he had an abnormality in his chest wall, in his head he was on the way to getting breast cancer,"" he said. ""Very little I told him thereafter would disavow him of that view."" Prosecutor Julian Christopher QC asked whether it was ""quite wrong"" to say he would ""travel in time towards cancer"". Mr Paterson said: ""I doubt I said that, simply because nobody has a crystal ball."" The trial continues.",39504958,"[0, 14043, 5965, 3769, 4277, 9118, 291, 3948, 9, 21354, 19, 5927, 136, 1117, 390, 8, 65, 313, 23, 17142, 5748, 837, 4, 91, 26, 37, 56, 393, 174, 1697, 1680, 51, 56, 22, 102, 25535, 4840, 113, 9, 1668, 1025, 106, 4, 91, 26, 5, 11054, 2092, 11, 130, 4562, 1997, 61, 21, 22, 18763, 1283, 113, 1997, 33, 57, 12531, 4, 22, 243, 18, 10, 10222, 631, 6, 596, 74, 38, 14149, 13207, 10, 3186, 6, 14, 47, 348, 300, 10, 86, 4840, 1917, 37, 26, 4, 901, 3496, 15, 42, 8, 97, 1652, 11, 8353, 8, 5, ...]","[0, 250, 6181, 16308, 1238, 9, 3406, 66, 10495, 1414, 34, 174, 10, 461, 14, 4562, 1997, 136, 123, 33, 57, 22, 876, 12552, 845, 2]","A breast surgeon accused of carrying out unnecessary operations has told a court that witness statements against him have been ""coached""."
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","The A40 between Nantgaredig and Whitemill is closed and diversions are in place following the incident at about 18:15 GMT on Friday. A grey Smart Fortwo Prime, a black Volkswagon Golf and a grey Volvo were involved in the crash. Dyfed Powys Police are appealing for witnesses.",39008393,"[0, 133, 83, 1749, 227, 234, 927, 571, 6537, 1023, 8, 3990, 36907, 1873, 16, 1367, 8, 6302, 2485, 32, 11, 317, 511, 5, 1160, 23, 59, 504, 35, 996, 5050, 15, 273, 4, 83, 10521, 5900, 3339, 14869, 1489, 6, 10, 909, 42056, 45199, 6111, 8, 10, 10521, 18674, 58, 963, 11, 5, 2058, 4, 10179, 21967, 25347, 2459, 522, 32, 9364, 13, 6057, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 22113, 82, 33, 57, 551, 7, 1098, 71, 10, 130, 512, 7329, 11, 10636, 2013, 2457, 9959, 4, 2]",Four people have been taken to hospital after a three car collision in Carmarthenshire.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The company reported profits of $98m (£65m) for the quarter, after posting a big loss for the same period last year. The Z10 handset is seen as crucial to the future of Blackberry, which has struggled to keep up with new Apple and Android phones. It has been on sale for a month in the UK, Canada and other markets. It went on sale with little fanfare a week ago in the United States, Blackberry's most important market. The latest figures do not include US sales. Blackberry was previously called Research In Motion, but changed its name last year. Analysts greeted the results cautiously, saying that it was too early to judge the success of the Z10 and its sister device the Q10. Earlier in the week, Blackberry shares were hit when two major US brokerages expressed disappointment with the US launch of the Z10. In a note to its clients, Citigroup described the launch as ""a big disappointment"". The Blackberry results also showed the company lost three million users over the year. Its handsets are now used by 76 million people, down from 79 million 12 months ago. In total, Blackberry said it had shipped a total of about six million handsets in the three months to early March.",21966363,"[0, 133, 138, 431, 4632, 9, 68, 5208, 119, 11888, 3506, 119, 43, 13, 5, 297, 6, 71, 6016, 10, 380, 872, 13, 5, 276, 675, 94, 76, 4, 20, 525, 698, 17621, 16, 450, 25, 4096, 7, 5, 499, 9, 1378, 8132, 6, 61, 34, 3956, 7, 489, 62, 19, 92, 1257, 8, 3208, 4247, 4, 85, 34, 57, 15, 1392, 13, 10, 353, 11, 5, 987, 6, 896, 8, 97, 1048, 4, 85, 439, 15, 1392, 19, 410, 2378, 17825, 10, 186, 536, 11, 5, 315, 532, 6, 1378, 8132, 18, 144, 505, 210, 4, 20, 665, 2415, ...]","[0, 12271, 1028, 4403, 1378, 8132, 161, 24, 12502, 65, 153, 9, 63, 92, 525, 698, 7466, 11, 5, 78, 130, 377, 9, 1014, 4, 2]",Mobile phone maker Blackberry says it shipped one million of its new Z10 smartphones in the first three months of 2013.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The General Court of the European Union said there were ""internal inconsistencies"" in the Commission's 2010 decision. Of the firms, Air France was fined the largest amount - €182.9m - while KLM was fined €127.2m. The two carriers merged to form Air France-KLM in 2004. Other carriers involved were Air Canada, Martinair, British Airways, Cargolux, Cathay Pacific Airways, Japan Airlines, LAN Chile, Qantas, SAS and Singapore Airlines. Lufthansa escaped a sanction after providing information to the Commission. The court said that the European Commission had not been clear enough in demonstrating an unambiguous ""single and continuous infringement"" by the carriers. Instead, the Commission had found four infringements which it had attributed directly to the carriers on particular routes, the court said. ""Internal inconsistencies"" in the decision could infringe the airline's rights of defence, the court added. Some of the carriers had said that the decision ""did not allow them to determine the nature and scope of the infringement or infringements that they were alleged to have committed"".",35111824,"[0, 133, 1292, 837, 9, 5, 796, 1332, 26, 89, 58, 22, 37559, 35604, 113, 11, 5, 1463, 18, 1824, 568, 4, 1525, 5, 2566, 6, 1754, 1470, 21, 10110, 5, 1154, 1280, 111, 4480, 27127, 4, 466, 119, 111, 150, 229, 21672, 21, 10110, 4480, 24174, 4, 176, 119, 4, 20, 80, 9816, 21379, 7, 1026, 1754, 1470, 12, 530, 21672, 11, 4482, 4, 1944, 9816, 963, 58, 1754, 896, 6, 1896, 2456, 6, 1089, 13337, 6, 230, 5384, 1168, 7073, 6, 20349, 857, 3073, 13337, 6, 1429, 6503, 6, 32425, 9614, 6, 1209, 927, 281, 6, 32143, 8, ...]","[0, 27000, 18, 200, 12, 21810, 461, 34, 4094, 10, 1539, 30, 365, 8537, 136, 41, 4480, 3913, 119, 11888, 39123, 119, 43, 796, 1463, 14592, 22001, 2051, 4, 2]",Europe's second-highest court has backed a challenge by 11 airlines against an €800m (£583m) European Commission freight cartel fine.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The University of London's Institute of Education compared vocabulary test scores and reading habits of 9,400 British people born in 1970. The researchers analysed data collected at the ages of 10, 16 and 42. As well as the tabloids finding, they said childhood reading for fun boosted vocabulary throughout life, while highbrow fiction helped adults further. The research team drew on the 1970 British Cohort Study, which collects information on a group of people from England, Scotland and Wales who were born in the same week. At the age of 10, the group took a pictorial language comprehension test and at 16 they did a multiple-choice vocabulary test. The test they did aged 42 was a shortened version of the one used at 16. The researchers also analysed information on the group's reading habits as adults and their educational achievements. The group were asked how often they read books for pleasure and what sort of books they read. The vocabulary tests showed all respondents had greater word power by the age of 42 than they had had at 16, with the average vocabulary score rising from 55% to 63%. But those who had read regularly for pleasure as children beat the rest, scoring an average 67% in the age 42 test, compared with infrequent childhood readers who scored an average of 51%. The study found those who read regularly as children tended to come from better-off families and had higher vocabulary scores as children. However, even after the data was reanalysed to take these differences into account, there was still a nine percentage point gap in the vocabulary scores at age 42 between the two groups. This may be because the frequent childhood readers continued to read for pleasure as adults, wrote the researchers. ""In other words, they developed 'good' reading habits in childhood and adolescence that they have subsequently benefited from."" But they also found ""what people read mattered as how often they read"". In terms of newspapers, they found readers of broadsheets made more progress in vocabulary than people who did not read newspapers. But ""tabloid readers actually made less progress than non-readers of newspapers"". Co-author Prof Alice Sullivan said the finding was in line with the team's previous work, which showed ""the presence of tabloid newspapers in the home during childhood was linked to poor cognitive attainment at age 16"". The report also said: ""Those who read 'highbrow' fiction made greater vocabulary gains than those who read middlebrow fiction; and lowbrow fiction readers made no more progress than non-readers."" The study found the adults with the biggest vocabularies were graduates of Russell Group of sought-after universities, scoring an average of 81% in the age 42 vocabulary test. Of this group, two-thirds (66%) preferred ""highbrow"" fiction and more than half (56%) said they read only broadsheet newspapers.",29885222,"[0, 133, 589, 9, 928, 18, 2534, 9, 3061, 1118, 32644, 1296, 4391, 8, 2600, 10095, 9, 361, 6, 4017, 1089, 82, 2421, 11, 6200, 4, 20, 2634, 24305, 414, 4786, 23, 5, 4864, 9, 158, 6, 545, 8, 3330, 4, 287, 157, 25, 5, 41048, 22926, 2609, 6, 51, 26, 6585, 2600, 13, 1531, 5934, 32644, 1328, 301, 6, 150, 239, 39457, 11845, 1147, 3362, 617, 4, 20, 557, 165, 4855, 15, 5, 6200, 1089, 32321, 2723, 13019, 6, 61, 22671, 335, 15, 10, 333, 9, 82, 31, 1156, 6, 3430, 8, 5295, 54, 58, 2421, 11, 5, 276, ...]","[0, 25439, 268, 9, 25135, 6665, 33, 2735, 28312, 873, 8244, 918, 87, 82, 54, 109, 45, 1166, 9911, 6, 3649, 10, 892, 4, 2]","Readers of tabloid papers have smaller vocabularies than people who do not read newspapers, suggests a study."
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Dr Ian Paterson denies 20 counts of wounding with intent against nine women and one man at Nottingham Crown Court. He said he had never told alleged victims they had ""a ticking bomb"" of cancer inside them. He said the phrase appears in three witness statements which was ""clear evidence"" statements have been coached. ""It's a scary thing, why would I intentionally scare a patient, that you've got a time bomb?"" he said. More updates on this and other stories in Birmingham and the Black Country The 59-year-old also said one patient, John Ingram, who had a double mastectomy after tests showed only potentially abnormal cells, was a ""quivering mass of anxiety"", convinced he would get cancer. Nothing he told him would have changed his mind, Mr Paterson said. Mr Ingram gave evidence saying Mr Paterson, who worked at hospitals run by the Heart of England NHS Trust and Spire Healthcare, told him in 2006 he was ""on the road to developing breast cancer"". But Mr Paterson, of Ashley, Altrincham, Greater Manchester, said on Wednesday that Mr Ingram's memory had become ""confused"" over time. He described his patient as a ""troubled gentleman with multiple phobias - one of them breast cancer, because his mother had died of breast cancer, aged 42"". ""So the minute he had an abnormality in his chest wall, in his head he was on the way to getting breast cancer,"" he said. ""Very little I told him thereafter would disavow him of that view."" Prosecutor Julian Christopher QC asked whether it was ""quite wrong"" to say he would ""travel in time towards cancer"". Mr Paterson said: ""I doubt I said that, simply because nobody has a crystal ball."" The trial continues.",39504958,"[0, 14043, 5965, 3769, 4277, 9118, 291, 3948, 9, 21354, 19, 5927, 136, 1117, 390, 8, 65, 313, 23, 17142, 5748, 837, 4, 91, 26, 37, 56, 393, 174, 1697, 1680, 51, 56, 22, 102, 25535, 4840, 113, 9, 1668, 1025, 106, 4, 91, 26, 5, 11054, 2092, 11, 130, 4562, 1997, 61, 21, 22, 18763, 1283, 113, 1997, 33, 57, 12531, 4, 22, 243, 18, 10, 10222, 631, 6, 596, 74, 38, 14149, 13207, 10, 3186, 6, 14, 47, 348, 300, 10, 86, 4840, 1917, 37, 26, 4, 901, 3496, 15, 42, 8, 97, 1652, 11, 8353, 8, 5, ...]","[0, 250, 6181, 16308, 1238, 9, 3406, 66, 10495, 1414, 34, 174, 10, 461, 14, 4562, 1997, 136, 123, 33, 57, 22, 876, 12552, 845, 2]","A breast surgeon accused of carrying out unnecessary operations has told a court that witness statements against him have been ""coached""."
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","The A40 between Nantgaredig and Whitemill is closed and diversions are in place following the incident at about 18:15 GMT on Friday. A grey Smart Fortwo Prime, a black Volkswagon Golf and a grey Volvo were involved in the crash. Dyfed Powys Police are appealing for witnesses.",39008393,"[0, 133, 83, 1749, 227, 234, 927, 571, 6537, 1023, 8, 3990, 36907, 1873, 16, 1367, 8, 6302, 2485, 32, 11, 317, 511, 5, 1160, 23, 59, 504, 35, 996, 5050, 15, 273, 4, 83, 10521, 5900, 3339, 14869, 1489, 6, 10, 909, 42056, 45199, 6111, 8, 10, 10521, 18674, 58, 963, 11, 5, 2058, 4, 10179, 21967, 25347, 2459, 522, 32, 9364, 13, 6057, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 22113, 82, 33, 57, 551, 7, 1098, 71, 10, 130, 512, 7329, 11, 10636, 2013, 2457, 9959, 4, 2]",Four people have been taken to hospital after a three car collision in Carmarthenshire.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The company reported profits of $98m (£65m) for the quarter, after posting a big loss for the same period last year. The Z10 handset is seen as crucial to the future of Blackberry, which has struggled to keep up with new Apple and Android phones. It has been on sale for a month in the UK, Canada and other markets. It went on sale with little fanfare a week ago in the United States, Blackberry's most important market. The latest figures do not include US sales. Blackberry was previously called Research In Motion, but changed its name last year. Analysts greeted the results cautiously, saying that it was too early to judge the success of the Z10 and its sister device the Q10. Earlier in the week, Blackberry shares were hit when two major US brokerages expressed disappointment with the US launch of the Z10. In a note to its clients, Citigroup described the launch as ""a big disappointment"". The Blackberry results also showed the company lost three million users over the year. Its handsets are now used by 76 million people, down from 79 million 12 months ago. In total, Blackberry said it had shipped a total of about six million handsets in the three months to early March.",21966363,"[0, 133, 138, 431, 4632, 9, 68, 5208, 119, 11888, 3506, 119, 43, 13, 5, 297, 6, 71, 6016, 10, 380, 872, 13, 5, 276, 675, 94, 76, 4, 20, 525, 698, 17621, 16, 450, 25, 4096, 7, 5, 499, 9, 1378, 8132, 6, 61, 34, 3956, 7, 489, 62, 19, 92, 1257, 8, 3208, 4247, 4, 85, 34, 57, 15, 1392, 13, 10, 353, 11, 5, 987, 6, 896, 8, 97, 1048, 4, 85, 439, 15, 1392, 19, 410, 2378, 17825, 10, 186, 536, 11, 5, 315, 532, 6, 1378, 8132, 18, 144, 505, 210, 4, 20, 665, 2415, ...]","[0, 12271, 1028, 4403, 1378, 8132, 161, 24, 12502, 65, 153, 9, 63, 92, 525, 698, 7466, 11, 5, 78, 130, 377, 9, 1014, 4, 2]",Mobile phone maker Blackberry says it shipped one million of its new Z10 smartphones in the first three months of 2013.
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The General Court of the European Union said there were ""internal inconsistencies"" in the Commission's 2010 decision. Of the firms, Air France was fined the largest amount - €182.9m - while KLM was fined €127.2m. The two carriers merged to form Air France-KLM in 2004. Other carriers involved were Air Canada, Martinair, British Airways, Cargolux, Cathay Pacific Airways, Japan Airlines, LAN Chile, Qantas, SAS and Singapore Airlines. Lufthansa escaped a sanction after providing information to the Commission. The court said that the European Commission had not been clear enough in demonstrating an unambiguous ""single and continuous infringement"" by the carriers. Instead, the Commission had found four infringements which it had attributed directly to the carriers on particular routes, the court said. ""Internal inconsistencies"" in the decision could infringe the airline's rights of defence, the court added. Some of the carriers had said that the decision ""did not allow them to determine the nature and scope of the infringement or infringements that they were alleged to have committed"".",35111824,"[0, 133, 1292, 837, 9, 5, 796, 1332, 26, 89, 58, 22, 37559, 35604, 113, 11, 5, 1463, 18, 1824, 568, 4, 1525, 5, 2566, 6, 1754, 1470, 21, 10110, 5, 1154, 1280, 111, 4480, 27127, 4, 466, 119, 111, 150, 229, 21672, 21, 10110, 4480, 24174, 4, 176, 119, 4, 20, 80, 9816, 21379, 7, 1026, 1754, 1470, 12, 530, 21672, 11, 4482, 4, 1944, 9816, 963, 58, 1754, 896, 6, 1896, 2456, 6, 1089, 13337, 6, 230, 5384, 1168, 7073, 6, 20349, 857, 3073, 13337, 6, 1429, 6503, 6, 32425, 9614, 6, 1209, 927, 281, 6, 32143, 8, ...]","[0, 27000, 18, 200, 12, 21810, 461, 34, 4094, 10, 1539, 30, 365, 8537, 136, 41, 4480, 3913, 119, 11888, 39123, 119, 43, 796, 1463, 14592, 22001, 2051, 4, 2]",Europe's second-highest court has backed a challenge by 11 airlines against an €800m (£583m) European Commission freight cartel fine.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The University of London's Institute of Education compared vocabulary test scores and reading habits of 9,400 British people born in 1970. The researchers analysed data collected at the ages of 10, 16 and 42. As well as the tabloids finding, they said childhood reading for fun boosted vocabulary throughout life, while highbrow fiction helped adults further. The research team drew on the 1970 British Cohort Study, which collects information on a group of people from England, Scotland and Wales who were born in the same week. At the age of 10, the group took a pictorial language comprehension test and at 16 they did a multiple-choice vocabulary test. The test they did aged 42 was a shortened version of the one used at 16. The researchers also analysed information on the group's reading habits as adults and their educational achievements. The group were asked how often they read books for pleasure and what sort of books they read. The vocabulary tests showed all respondents had greater word power by the age of 42 than they had had at 16, with the average vocabulary score rising from 55% to 63%. But those who had read regularly for pleasure as children beat the rest, scoring an average 67% in the age 42 test, compared with infrequent childhood readers who scored an average of 51%. The study found those who read regularly as children tended to come from better-off families and had higher vocabulary scores as children. However, even after the data was reanalysed to take these differences into account, there was still a nine percentage point gap in the vocabulary scores at age 42 between the two groups. This may be because the frequent childhood readers continued to read for pleasure as adults, wrote the researchers. ""In other words, they developed 'good' reading habits in childhood and adolescence that they have subsequently benefited from."" But they also found ""what people read mattered as how often they read"". In terms of newspapers, they found readers of broadsheets made more progress in vocabulary than people who did not read newspapers. But ""tabloid readers actually made less progress than non-readers of newspapers"". Co-author Prof Alice Sullivan said the finding was in line with the team's previous work, which showed ""the presence of tabloid newspapers in the home during childhood was linked to poor cognitive attainment at age 16"". The report also said: ""Those who read 'highbrow' fiction made greater vocabulary gains than those who read middlebrow fiction; and lowbrow fiction readers made no more progress than non-readers."" The study found the adults with the biggest vocabularies were graduates of Russell Group of sought-after universities, scoring an average of 81% in the age 42 vocabulary test. Of this group, two-thirds (66%) preferred ""highbrow"" fiction and more than half (56%) said they read only broadsheet newspapers.",29885222,"[0, 133, 589, 9, 928, 18, 2534, 9, 3061, 1118, 32644, 1296, 4391, 8, 2600, 10095, 9, 361, 6, 4017, 1089, 82, 2421, 11, 6200, 4, 20, 2634, 24305, 414, 4786, 23, 5, 4864, 9, 158, 6, 545, 8, 3330, 4, 287, 157, 25, 5, 41048, 22926, 2609, 6, 51, 26, 6585, 2600, 13, 1531, 5934, 32644, 1328, 301, 6, 150, 239, 39457, 11845, 1147, 3362, 617, 4, 20, 557, 165, 4855, 15, 5, 6200, 1089, 32321, 2723, 13019, 6, 61, 22671, 335, 15, 10, 333, 9, 82, 31, 1156, 6, 3430, 8, 5295, 54, 58, 2421, 11, 5, 276, ...]","[0, 25439, 268, 9, 25135, 6665, 33, 2735, 28312, 873, 8244, 918, 87, 82, 54, 109, 45, 1166, 9911, 6, 3649, 10, 892, 4, 2]","Readers of tabloid papers have smaller vocabularies than people who do not read newspapers, suggests a study."
3,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Dr Ian Paterson denies 20 counts of wounding with intent against nine women and one man at Nottingham Crown Court. He said he had never told alleged victims they had ""a ticking bomb"" of cancer inside them. He said the phrase appears in three witness statements which was ""clear evidence"" statements have been coached. ""It's a scary thing, why would I intentionally scare a patient, that you've got a time bomb?"" he said. More updates on this and other stories in Birmingham and the Black Country The 59-year-old also said one patient, John Ingram, who had a double mastectomy after tests showed only potentially abnormal cells, was a ""quivering mass of anxiety"", convinced he would get cancer. Nothing he told him would have changed his mind, Mr Paterson said. Mr Ingram gave evidence saying Mr Paterson, who worked at hospitals run by the Heart of England NHS Trust and Spire Healthcare, told him in 2006 he was ""on the road to developing breast cancer"". But Mr Paterson, of Ashley, Altrincham, Greater Manchester, said on Wednesday that Mr Ingram's memory had become ""confused"" over time. He described his patient as a ""troubled gentleman with multiple phobias - one of them breast cancer, because his mother had died of breast cancer, aged 42"". ""So the minute he had an abnormality in his chest wall, in his head he was on the way to getting breast cancer,"" he said. ""Very little I told him thereafter would disavow him of that view."" Prosecutor Julian Christopher QC asked whether it was ""quite wrong"" to say he would ""travel in time towards cancer"". Mr Paterson said: ""I doubt I said that, simply because nobody has a crystal ball."" The trial continues.",39504958,"[0, 14043, 5965, 3769, 4277, 9118, 291, 3948, 9, 21354, 19, 5927, 136, 1117, 390, 8, 65, 313, 23, 17142, 5748, 837, 4, 91, 26, 37, 56, 393, 174, 1697, 1680, 51, 56, 22, 102, 25535, 4840, 113, 9, 1668, 1025, 106, 4, 91, 26, 5, 11054, 2092, 11, 130, 4562, 1997, 61, 21, 22, 18763, 1283, 113, 1997, 33, 57, 12531, 4, 22, 243, 18, 10, 10222, 631, 6, 596, 74, 38, 14149, 13207, 10, 3186, 6, 14, 47, 348, 300, 10, 86, 4840, 1917, 37, 26, 4, 901, 3496, 15, 42, 8, 97, 1652, 11, 8353, 8, 5, ...]","[0, 250, 6181, 16308, 1238, 9, 3406, 66, 10495, 1414, 34, 174, 10, 461, 14, 4562, 1997, 136, 123, 33, 57, 22, 876, 12552, 845, 2]","A breast surgeon accused of carrying out unnecessary operations has told a court that witness statements against him have been ""coached""."
4,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]","The A40 between Nantgaredig and Whitemill is closed and diversions are in place following the incident at about 18:15 GMT on Friday. A grey Smart Fortwo Prime, a black Volkswagon Golf and a grey Volvo were involved in the crash. Dyfed Powys Police are appealing for witnesses.",39008393,"[0, 133, 83, 1749, 227, 234, 927, 571, 6537, 1023, 8, 3990, 36907, 1873, 16, 1367, 8, 6302, 2485, 32, 11, 317, 511, 5, 1160, 23, 59, 504, 35, 996, 5050, 15, 273, 4, 83, 10521, 5900, 3339, 14869, 1489, 6, 10, 909, 42056, 45199, 6111, 8, 10, 10521, 18674, 58, 963, 11, 5, 2058, 4, 10179, 21967, 25347, 2459, 522, 32, 9364, 13, 6057, 4, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","[0, 22113, 82, 33, 57, 551, 7, 1098, 71, 10, 130, 512, 7329, 11, 10636, 2013, 2457, 9959, 4, 2]",Four people have been taken to hospital after a three car collision in Carmarthenshire.


# 

In [19]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

## Compare Machine Summaries to Professional Human Written Summaries
To score our machine generated summaries against professional human written ones, we compute the cosine similarities between embeddings to measure the semantic similaritiy between two texts. The comparisons we will be marking include: human summary to machine summary, human summary to original document, and machine summary to original document

### We are going to focus on 10 articles and build 10 models to inspect each pair individually

In [20]:
def listToString(s): 
    str1 = "" 
    
    for ele in s: 
        str1 += ele  
 
    return str1 

In [21]:
article1 = tokenized_xsum['test']['document'][0]
article2 = tokenized_xsum['test']['document'][1]
article3 = tokenized_xsum['test']['document'][2]
article4 = tokenized_xsum['test']['document'][3]
article5 = tokenized_xsum['test']['document'][4]
article6 = tokenized_xsum['test']['document'][5]
article7 = tokenized_xsum['test']['document'][7]
article8 = tokenized_xsum['test']['document'][8]
article9 = tokenized_xsum['test']['document'][9]
article10 = tokenized_xsum['test']['document'][10]

summary1 = tokenized_xsum['test']['summary'][0]
summary2 = tokenized_xsum['test']['summary'][1]
summary3 = tokenized_xsum['test']['summary'][2]
summary4 = tokenized_xsum['test']['summary'][3]
summary5 = tokenized_xsum['test']['summary'][4]
summary6 = tokenized_xsum['test']['summary'][5]
summary7 = tokenized_xsum['test']['summary'][7]
summary8 = tokenized_xsum['test']['summary'][8]
summary9 = tokenized_xsum['test']['summary'][9]
summary10 = tokenized_xsum['test']['summary'][10]


## Model 1

In [22]:
input1 = tokenizer(article1, return_tensors='pt', truncation=True)
summary_ids1 = model.generate(input1['input_ids'], max_length=500, early_stopping=False)
machineSummary1 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids1])

In [23]:
machineSummary1 = listToString(machineSummary1)
summary1 = listToString(summary1)
original1 = listToString(article1)

comparison1 = [summary1, machineSummary1, original1]
model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings1 = model.encode(comparison1)
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings1[1], comparison_embeddings1[2])) # machine summary to original article

tensor([[0.7415]])
tensor([[0.7645]])
tensor([[0.9807]])


In [24]:
comparison1

['There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. Welsh Government said more people than ever were getting help to address housing problems. Changes to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. The Welsh Government said more people than ever were getting help to address housing problems. Changes to the Housing Act in Wales, introduced in 2015, r