# Text Summary & Scoring Project
##### Michael Creegan, Yungfeng Dai, Hong Gyu Ji, Ziling Zeng
##### Python for Data Analysis
##### Columbia University

# Abstract

Summarization is a common problem in the 21st century as the world has become increasingly driven by data. Summarization of data can be very useful to  quickly determine if something is relevant or whether it's worth reading. Another use case could could be to store summaries of articles it in the backend to run downstream taks on. It could also be useful to understand the semantic integrity to indicate quality.

To explore this topic, we will leverage the extreme summarization dataset (XSUM) which consists of BBC articles accompanying single sentence summaries. Each article is prefaced with an introductory sentence (which is a summary) that is professionally written, typically by the author of the article.

To summarize articles, we will use an encoder-decoder transformer (sequence-to-sequence) which combines  decoders and encoders because we need to perform both input and output tasks: taking in text and then generating a summary. We selected this type of transformer because the encoder accepts inputs (text) and computes a high level representation of those inputs  which are then passed to the decoder to generate a prediction output (summary). This has advantages over using a standalone encoder like BERT/ALBERT/ELECTRA/RoBERTA/DistilBERT to name a few because  encoders are pre-trained by filling randomly masked words in sentences and therefore are better suited for output tasks. Using a standalone decoder like gpt2 would also not be optimal because decoders are trained to guess the next word in a sequence (left or right context aka does not have context on one side of the sequence) and therefore are better suited at generating text but not necessarily taking in text because of the hidden context limitations. 

Our scoring will compare the output of the BART encoder-decoder model to the professionally written summaries in the XSUM dataset to see how semantically similar a machine generated summary is to a professional one as well as to their source articles. Our scoring methodology will be focused on semantic textual similarity and computed using the cosine similarity between the professional human written summary and the machine generated one. 

# Importing Transformers & Dependencies

In [53]:
import pandas as pd
import numpy as np
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig
from datasets import load_dataset, load_metric
from sentence_transformers import SentenceTransformer, util
import random
from IPython.display import display, HTML

# Load XSUM Dataset

In [54]:
xsum = load_dataset('xsum')

Using custom data configuration default
Reusing dataset xsum (C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934)
100%|██████████| 3/3 [00:00<00:00, 93.78it/s]


### We can see that the dataset is a "DatasetDict" where the keys are strings that correspond to the split and the values are the dataset object. In the XSUM dataset, the the keys are "training", "validation", and "test" with values corresponding to "document", "summary", and "id" (columns)

In [55]:
xsum

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

# View Underlying Data

In [56]:
xsum['test'][0]

{'document': 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation.\nWorkers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders.\nThe Welsh Government said more people than ever were getting help to address housing problems.\nChanges to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation.\nPrison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because issues such as children or domestic violence were now considered.\nHowever, the same could not be said for men, the charity said, because issues which often affect them, such as post traumatic stress disorder or drug dependency, were often viewed as less of a priority.\nAndrew Stevens, who works in Welsh prisons trying to secure housing for prison leavers, said the

## We can use a function to view a random selection of articles and summaries in the training section (largest section) to get a more accurate depiction of what the data looks like in a synthesized format

In [57]:
def display_function(xsum, num_examples=3):
    assert num_examples <= len(xsum)                # limit to number of records in the xsum
    
    selections = []                                 # create empty list to put the records into 
    
    for _ in range(num_examples):                   # we can use _ here in place of a variable name because we don't care how many time sthe loop is run
        selection = random.randint(0, len(xsum) - 1)
        while selection in selections:
            selection = random.randint(0, len(xsum) - 1)
        selections.append(selection)

    xsumPd = pd.DataFrame(xsum[selections])
    for column, typ in xsum.features.items():
        display(HTML(xsumPd.to_html()))

# Cleaning
Our end goal is to create accurate summaries using this model so we need to remove the text characters that do not provide any contextual value. We can also see that there are characters in the document that are not present in the summary which could cause discrepencies between our machine generated summary vs the professional human generated one. We need to remove new line characters and backslashes that are present in the document column but not the summary column

In [58]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"Mr Tucker will take over on 1 October, succeeding Douglas Flint who has been in the role since 2010.\nThe appointment breaks an HSBC tradition of appointing insiders to the chairmanship.\nOne of his first jobs will be to find a replacement for Stuart Gulliver, the chief executive of HSBC, who plans to step down next year.\nWhile HSBC is Europe's biggest bank, the bulk of its profits are generated in Asia.\nMr Tucker has been chief executive of AIA for seven years, during which he oversaw the insurer's expansion in Asia.\nBefore AIA, he was the chief executive of insurance giant Prudential, and brings to HSBC his experience at the top of a UK financial giant as well as his Asian exposure.\nRichard Dunbar of Aberdeen Asset Management, told the BBC the bank has ""obviously decided"" that an external perspective would be useful to HSBC at this time.\nHe added that while chief executive of Prudential, Mr Tucker did a good job of expanding its Asian assets, which are seen as the firm's ""jewel in the crown"".\nHSBC has been through an overhaul in recent years in an attempt to reverse declining profits.\nOver the past six years it has cut more than 40,000 jobs and sold off businesses.\nDespite those efforts, profits tumbled more than 60% last year.\nThe banking industry has been hampered by the extended period of very low interest rates, which makes lending money less profitable.\nFor HSBC, that problem has been compounded by its move into less risky areas of banking since the financial crisis which started in 2007.\nThose challenges make the appointment of a new chief executive even more crucial for investors, a search which will now be led by Mr Tucker.\nHSBC has also been attempting to repair its image after a series of scandals.\nEarlier this year it reached a $470m (Â£325m) settlement with the US government and states related to dubious mortgage lending and foreclosure practices during the financial crisis.\nIn 2015 Mr Gulliver and Mr Flint apologised for ""unacceptable"" practices at its Swiss private bank which helped clients to avoid tax.\nIn late 2012 HSBC paid US authorities $1.9bn in a settlement over money laundering.\nAIA said that Ng Keng Hooi, would take over as chief executive from 1 September.","HSBC has appointed Mark Tucker, the chief executive of Asian insurer AIA, as group chairman.",39251348
1,"In dollar terms, imports dropped 20.4% from a year earlier to $145.2bn, a steeper fall than had been expected.\nThe drop was due to lower commodity prices and weaker domestic demand.\nNext week, China is due to report its third-quarter growth rate, which is expected to be lower than the 7% annual pace seen in the second quarter.\nChina recently revised down its growth rate for 2014 from 7.4% to 7.3%, the weakest pace for almost 25 years.\nChina has been attempting to shift from an export-led economy to a consumer-led one, although the steep fall in imports suggests domestic demand is not as strong as the government would have hoped.\nIn dollar terms, China's exports fell by 3.7% from a year earlier to $205.6bn - although analysts had forecast a steeper fall.\nThe country's trade surplus nearly doubled to $60.34bn.\nIn yuan-denominated terms, imports fell by 17.7% while exports were down 1.1%.\nIn a research note, economists at ANZ said: ""September's import figure does not bode well for industrial production and fixed-asset investment.\n""Overall growth momentum last month remained weak and third-quarter GDP growth to be released next Monday will likely have edged down to 6.4% in the third quarter, compared with 7% in the first half.""","China saw a sharp fall in the value of its imports last month, figures show, raising further questions over the strength of its economy.",34513044
2,"The interim review of Liverpool's green and open spaces, commissioned by Mayor Joe Anderson, suggests an extra Â£4.50 contribution is needed per person.\nThe proposal to increase council tax is one of 31 recommendations made in green activist Simon O'Brien's report.\nHe warned that Liverpool was ""heading to a brick wall"" when it comes to maintaining open spaces in the city.\nMr Anderson explained: ""Sadly, the 58% cut to our budget by central government has left us grappling with the challenge of finding new ways to fund non-essential services, including maintenance of and investment in our green and open spaces.""\nFormer Brookside actor Mr O'Brien said: ""As central government is cutting money left, right and centre, non-statutory provision is the first thing that goes.\n""I've suggested other things like tourist levies, which you can only set up nationally unfortunately. I think if we charge everyone who comes to stay in the city Â£1 a head, this problem goes away but we're not allowed to do that yet.\n""If I can see a way that maybe you could commercialise a park - perhaps you could put a cafÃ© or a health centre in and bring in revenue - that's good.""\nHe called for residents to give their feedback before a final report is produced.","A city's parks could be funded by an increase in council tax, a report has recommended.",35040063


Unnamed: 0,document,summary,id
0,"Mr Tucker will take over on 1 October, succeeding Douglas Flint who has been in the role since 2010.\nThe appointment breaks an HSBC tradition of appointing insiders to the chairmanship.\nOne of his first jobs will be to find a replacement for Stuart Gulliver, the chief executive of HSBC, who plans to step down next year.\nWhile HSBC is Europe's biggest bank, the bulk of its profits are generated in Asia.\nMr Tucker has been chief executive of AIA for seven years, during which he oversaw the insurer's expansion in Asia.\nBefore AIA, he was the chief executive of insurance giant Prudential, and brings to HSBC his experience at the top of a UK financial giant as well as his Asian exposure.\nRichard Dunbar of Aberdeen Asset Management, told the BBC the bank has ""obviously decided"" that an external perspective would be useful to HSBC at this time.\nHe added that while chief executive of Prudential, Mr Tucker did a good job of expanding its Asian assets, which are seen as the firm's ""jewel in the crown"".\nHSBC has been through an overhaul in recent years in an attempt to reverse declining profits.\nOver the past six years it has cut more than 40,000 jobs and sold off businesses.\nDespite those efforts, profits tumbled more than 60% last year.\nThe banking industry has been hampered by the extended period of very low interest rates, which makes lending money less profitable.\nFor HSBC, that problem has been compounded by its move into less risky areas of banking since the financial crisis which started in 2007.\nThose challenges make the appointment of a new chief executive even more crucial for investors, a search which will now be led by Mr Tucker.\nHSBC has also been attempting to repair its image after a series of scandals.\nEarlier this year it reached a $470m (Â£325m) settlement with the US government and states related to dubious mortgage lending and foreclosure practices during the financial crisis.\nIn 2015 Mr Gulliver and Mr Flint apologised for ""unacceptable"" practices at its Swiss private bank which helped clients to avoid tax.\nIn late 2012 HSBC paid US authorities $1.9bn in a settlement over money laundering.\nAIA said that Ng Keng Hooi, would take over as chief executive from 1 September.","HSBC has appointed Mark Tucker, the chief executive of Asian insurer AIA, as group chairman.",39251348
1,"In dollar terms, imports dropped 20.4% from a year earlier to $145.2bn, a steeper fall than had been expected.\nThe drop was due to lower commodity prices and weaker domestic demand.\nNext week, China is due to report its third-quarter growth rate, which is expected to be lower than the 7% annual pace seen in the second quarter.\nChina recently revised down its growth rate for 2014 from 7.4% to 7.3%, the weakest pace for almost 25 years.\nChina has been attempting to shift from an export-led economy to a consumer-led one, although the steep fall in imports suggests domestic demand is not as strong as the government would have hoped.\nIn dollar terms, China's exports fell by 3.7% from a year earlier to $205.6bn - although analysts had forecast a steeper fall.\nThe country's trade surplus nearly doubled to $60.34bn.\nIn yuan-denominated terms, imports fell by 17.7% while exports were down 1.1%.\nIn a research note, economists at ANZ said: ""September's import figure does not bode well for industrial production and fixed-asset investment.\n""Overall growth momentum last month remained weak and third-quarter GDP growth to be released next Monday will likely have edged down to 6.4% in the third quarter, compared with 7% in the first half.""","China saw a sharp fall in the value of its imports last month, figures show, raising further questions over the strength of its economy.",34513044
2,"The interim review of Liverpool's green and open spaces, commissioned by Mayor Joe Anderson, suggests an extra Â£4.50 contribution is needed per person.\nThe proposal to increase council tax is one of 31 recommendations made in green activist Simon O'Brien's report.\nHe warned that Liverpool was ""heading to a brick wall"" when it comes to maintaining open spaces in the city.\nMr Anderson explained: ""Sadly, the 58% cut to our budget by central government has left us grappling with the challenge of finding new ways to fund non-essential services, including maintenance of and investment in our green and open spaces.""\nFormer Brookside actor Mr O'Brien said: ""As central government is cutting money left, right and centre, non-statutory provision is the first thing that goes.\n""I've suggested other things like tourist levies, which you can only set up nationally unfortunately. I think if we charge everyone who comes to stay in the city Â£1 a head, this problem goes away but we're not allowed to do that yet.\n""If I can see a way that maybe you could commercialise a park - perhaps you could put a cafÃ© or a health centre in and bring in revenue - that's good.""\nHe called for residents to give their feedback before a final report is produced.","A city's parks could be funded by an increase in council tax, a report has recommended.",35040063


Unnamed: 0,document,summary,id
0,"Mr Tucker will take over on 1 October, succeeding Douglas Flint who has been in the role since 2010.\nThe appointment breaks an HSBC tradition of appointing insiders to the chairmanship.\nOne of his first jobs will be to find a replacement for Stuart Gulliver, the chief executive of HSBC, who plans to step down next year.\nWhile HSBC is Europe's biggest bank, the bulk of its profits are generated in Asia.\nMr Tucker has been chief executive of AIA for seven years, during which he oversaw the insurer's expansion in Asia.\nBefore AIA, he was the chief executive of insurance giant Prudential, and brings to HSBC his experience at the top of a UK financial giant as well as his Asian exposure.\nRichard Dunbar of Aberdeen Asset Management, told the BBC the bank has ""obviously decided"" that an external perspective would be useful to HSBC at this time.\nHe added that while chief executive of Prudential, Mr Tucker did a good job of expanding its Asian assets, which are seen as the firm's ""jewel in the crown"".\nHSBC has been through an overhaul in recent years in an attempt to reverse declining profits.\nOver the past six years it has cut more than 40,000 jobs and sold off businesses.\nDespite those efforts, profits tumbled more than 60% last year.\nThe banking industry has been hampered by the extended period of very low interest rates, which makes lending money less profitable.\nFor HSBC, that problem has been compounded by its move into less risky areas of banking since the financial crisis which started in 2007.\nThose challenges make the appointment of a new chief executive even more crucial for investors, a search which will now be led by Mr Tucker.\nHSBC has also been attempting to repair its image after a series of scandals.\nEarlier this year it reached a $470m (Â£325m) settlement with the US government and states related to dubious mortgage lending and foreclosure practices during the financial crisis.\nIn 2015 Mr Gulliver and Mr Flint apologised for ""unacceptable"" practices at its Swiss private bank which helped clients to avoid tax.\nIn late 2012 HSBC paid US authorities $1.9bn in a settlement over money laundering.\nAIA said that Ng Keng Hooi, would take over as chief executive from 1 September.","HSBC has appointed Mark Tucker, the chief executive of Asian insurer AIA, as group chairman.",39251348
1,"In dollar terms, imports dropped 20.4% from a year earlier to $145.2bn, a steeper fall than had been expected.\nThe drop was due to lower commodity prices and weaker domestic demand.\nNext week, China is due to report its third-quarter growth rate, which is expected to be lower than the 7% annual pace seen in the second quarter.\nChina recently revised down its growth rate for 2014 from 7.4% to 7.3%, the weakest pace for almost 25 years.\nChina has been attempting to shift from an export-led economy to a consumer-led one, although the steep fall in imports suggests domestic demand is not as strong as the government would have hoped.\nIn dollar terms, China's exports fell by 3.7% from a year earlier to $205.6bn - although analysts had forecast a steeper fall.\nThe country's trade surplus nearly doubled to $60.34bn.\nIn yuan-denominated terms, imports fell by 17.7% while exports were down 1.1%.\nIn a research note, economists at ANZ said: ""September's import figure does not bode well for industrial production and fixed-asset investment.\n""Overall growth momentum last month remained weak and third-quarter GDP growth to be released next Monday will likely have edged down to 6.4% in the third quarter, compared with 7% in the first half.""","China saw a sharp fall in the value of its imports last month, figures show, raising further questions over the strength of its economy.",34513044
2,"The interim review of Liverpool's green and open spaces, commissioned by Mayor Joe Anderson, suggests an extra Â£4.50 contribution is needed per person.\nThe proposal to increase council tax is one of 31 recommendations made in green activist Simon O'Brien's report.\nHe warned that Liverpool was ""heading to a brick wall"" when it comes to maintaining open spaces in the city.\nMr Anderson explained: ""Sadly, the 58% cut to our budget by central government has left us grappling with the challenge of finding new ways to fund non-essential services, including maintenance of and investment in our green and open spaces.""\nFormer Brookside actor Mr O'Brien said: ""As central government is cutting money left, right and centre, non-statutory provision is the first thing that goes.\n""I've suggested other things like tourist levies, which you can only set up nationally unfortunately. I think if we charge everyone who comes to stay in the city Â£1 a head, this problem goes away but we're not allowed to do that yet.\n""If I can see a way that maybe you could commercialise a park - perhaps you could put a cafÃ© or a health centre in and bring in revenue - that's good.""\nHe called for residents to give their feedback before a final report is produced.","A city's parks could be funded by an increase in council tax, a report has recommended.",35040063


## We can address the problem we mentioned above by define a cleaning function that replaces new lines and backslashes with white space.

In [59]:
def clean(row):
    row['document'] = row['document'].replace('\n', ' ')\
                                     .replace('\'', '').replace('\"','')
    return row

## We can now apply the cleaning function we created and map it onto our data (it loads for train, test, and validation)

In [60]:
xsum = xsum.map(clean)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-fd36b556705cbe4d.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-edb3a2dc2f06b92c.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-a4042da98a2992a2.arrow


### Voila!

In [61]:
display_function(xsum["test"])

Unnamed: 0,document,summary,id
0,"10 February 2017 Last updated at 09:27 GMT Some people are worried hospitals are now getting too busy and overcrowded, meaning patients are having to wait a long time to be seen by a doctor. How is this affecting children who get injured and need to go to hospital? Jenny went to a childrens hospital in Sheffield to speak to a doctor and find out. She also meets Lilly and Jake, who have come to hospital needing treatment, to see how their experience went.","Winter is a very busy time of year for hospitals, with more people needing to see a doctor.",38928746
1,"The Stanford University team said the findings were incredibly exciting and would now be tested in clinics. Eventually, they believe using AI could revolutionise healthcare by turning anyones smartphone into a cancer scanner. Cancer Research UK said it could become a useful tool for doctors. The AI was repurposed from software developed by Google that had learned to spot the difference between images of cats and dogs. It was shown 129,450 photographs and told what type of skin condition it was looking at in each one. It then learned to spot the hallmarks of the most common type of skin cancer: carcinoma, and the most deadly: melanoma. Only one in 20 skin cancers are melanoma, yet the tumour accounts for three-quarters of skin cancer deaths. The experiment, detailed in the journal Nature, then tested the AI against 21 trained skin cancer doctors. One of the researchers, Dr Andre Esteva, told the BBC News website: We find, in general, that we are on par with board-certified dermatologists. However, the computer software cannot make a full diagnosis, as this is normally confirmed with a tissue biopsy. Dr Esteva said the system now needed to be tested alongside doctors in the clinic. The application of AI to healthcare is, we believe, an incredibly exciting area of research that can be leveraged to achieve a great deal of societal good, he said. One particular route that we find exciting is the use of this algorithm on a mobile device, but to achieve this we would have to build an app and test its accuracy directly from a mobile device. Incredible advances in machine-learning have already led to AI beating one of humanitys best Go players. And a team of doctors in London have trained AI to predict when the heart will fail. Dr Jana Witt, from the charity Cancer Research UK, said: Using artificial intelligence to help diagnose skin cancer is very interesting, as it could support assessments by GPs and dermatologists. Its unlikely that AI will replace all of the other information your clinician would consider when making a diagnosis, but AI could help guide GP referrals to specialists in the future. Follow James on Twitter.","Artificial intelligence can identify skin cancer in photographs with the same accuracy as trained doctors, say scientists.",38717928
2,"Bernard Mensah struck a post for Aldershot after 11 minutes, but the visitors were a man down midway through the first half when Jim Kellerman saw red for a foul on Ross Stearn. The Shots regrouped and took a deserved lead in the 42nd minute when Idris Kanu latched on to a through pass and poked the ball under Ryan Clarke. But in the second half Eastleighs extra man began to tell as they pushed forward and, with 10 minutes left, McAllister fired home from close range after getting on to the end of a flick-on. That was enough to earn Eastleigh their first point in five outings, while Aldershot extended their unbeaten run to 10 games. Report supplied by the Press Association. Match ends, Eastleigh 1, Aldershot Town 1. Second Half ends, Eastleigh 1, Aldershot Town 1. Sam Matthews (Eastleigh) is shown the yellow card for a bad foul. Sam Muggleton (Eastleigh) is shown the yellow card for a bad foul. Substitution, Aldershot Town. Nick Arnold replaces Cheye Alexander. Goal! Eastleigh 1, Aldershot Town 1. Craig McAllister (Eastleigh). Ayo Obileye (Eastleigh) is shown the yellow card for a bad foul. Substitution, Eastleigh. Sam Matthews replaces Tyler Garrett. James Constable (Eastleigh) is shown the yellow card for a bad foul. Substitution, Eastleigh. James Constable replaces Ross Stearn. Substitution, Aldershot Town. Shamir Fenelon replaces Bernard Mensah. Second Half begins Eastleigh 0, Aldershot Town 1. First Half ends, Eastleigh 0, Aldershot Town 1. Goal! Eastleigh 0, Aldershot Town 1. Idris Kanu (Aldershot Town). Jim Kellerman (Aldershot Town) is shown the red card. First Half begins. Lineups are announced and players are warming up.",Craig McAllister's late goal saw Eastleigh end a run of four straight defeats as they held the 10 men of high-flying Aldershot in a 1-1 draw.,39035546


Unnamed: 0,document,summary,id
0,"10 February 2017 Last updated at 09:27 GMT Some people are worried hospitals are now getting too busy and overcrowded, meaning patients are having to wait a long time to be seen by a doctor. How is this affecting children who get injured and need to go to hospital? Jenny went to a childrens hospital in Sheffield to speak to a doctor and find out. She also meets Lilly and Jake, who have come to hospital needing treatment, to see how their experience went.","Winter is a very busy time of year for hospitals, with more people needing to see a doctor.",38928746
1,"The Stanford University team said the findings were incredibly exciting and would now be tested in clinics. Eventually, they believe using AI could revolutionise healthcare by turning anyones smartphone into a cancer scanner. Cancer Research UK said it could become a useful tool for doctors. The AI was repurposed from software developed by Google that had learned to spot the difference between images of cats and dogs. It was shown 129,450 photographs and told what type of skin condition it was looking at in each one. It then learned to spot the hallmarks of the most common type of skin cancer: carcinoma, and the most deadly: melanoma. Only one in 20 skin cancers are melanoma, yet the tumour accounts for three-quarters of skin cancer deaths. The experiment, detailed in the journal Nature, then tested the AI against 21 trained skin cancer doctors. One of the researchers, Dr Andre Esteva, told the BBC News website: We find, in general, that we are on par with board-certified dermatologists. However, the computer software cannot make a full diagnosis, as this is normally confirmed with a tissue biopsy. Dr Esteva said the system now needed to be tested alongside doctors in the clinic. The application of AI to healthcare is, we believe, an incredibly exciting area of research that can be leveraged to achieve a great deal of societal good, he said. One particular route that we find exciting is the use of this algorithm on a mobile device, but to achieve this we would have to build an app and test its accuracy directly from a mobile device. Incredible advances in machine-learning have already led to AI beating one of humanitys best Go players. And a team of doctors in London have trained AI to predict when the heart will fail. Dr Jana Witt, from the charity Cancer Research UK, said: Using artificial intelligence to help diagnose skin cancer is very interesting, as it could support assessments by GPs and dermatologists. Its unlikely that AI will replace all of the other information your clinician would consider when making a diagnosis, but AI could help guide GP referrals to specialists in the future. Follow James on Twitter.","Artificial intelligence can identify skin cancer in photographs with the same accuracy as trained doctors, say scientists.",38717928
2,"Bernard Mensah struck a post for Aldershot after 11 minutes, but the visitors were a man down midway through the first half when Jim Kellerman saw red for a foul on Ross Stearn. The Shots regrouped and took a deserved lead in the 42nd minute when Idris Kanu latched on to a through pass and poked the ball under Ryan Clarke. But in the second half Eastleighs extra man began to tell as they pushed forward and, with 10 minutes left, McAllister fired home from close range after getting on to the end of a flick-on. That was enough to earn Eastleigh their first point in five outings, while Aldershot extended their unbeaten run to 10 games. Report supplied by the Press Association. Match ends, Eastleigh 1, Aldershot Town 1. Second Half ends, Eastleigh 1, Aldershot Town 1. Sam Matthews (Eastleigh) is shown the yellow card for a bad foul. Sam Muggleton (Eastleigh) is shown the yellow card for a bad foul. Substitution, Aldershot Town. Nick Arnold replaces Cheye Alexander. Goal! Eastleigh 1, Aldershot Town 1. Craig McAllister (Eastleigh). Ayo Obileye (Eastleigh) is shown the yellow card for a bad foul. Substitution, Eastleigh. Sam Matthews replaces Tyler Garrett. James Constable (Eastleigh) is shown the yellow card for a bad foul. Substitution, Eastleigh. James Constable replaces Ross Stearn. Substitution, Aldershot Town. Shamir Fenelon replaces Bernard Mensah. Second Half begins Eastleigh 0, Aldershot Town 1. First Half ends, Eastleigh 0, Aldershot Town 1. Goal! Eastleigh 0, Aldershot Town 1. Idris Kanu (Aldershot Town). Jim Kellerman (Aldershot Town) is shown the red card. First Half begins. Lineups are announced and players are warming up.",Craig McAllister's late goal saw Eastleigh end a run of four straight defeats as they held the 10 men of high-flying Aldershot in a 1-1 draw.,39035546


Unnamed: 0,document,summary,id
0,"10 February 2017 Last updated at 09:27 GMT Some people are worried hospitals are now getting too busy and overcrowded, meaning patients are having to wait a long time to be seen by a doctor. How is this affecting children who get injured and need to go to hospital? Jenny went to a childrens hospital in Sheffield to speak to a doctor and find out. She also meets Lilly and Jake, who have come to hospital needing treatment, to see how their experience went.","Winter is a very busy time of year for hospitals, with more people needing to see a doctor.",38928746
1,"The Stanford University team said the findings were incredibly exciting and would now be tested in clinics. Eventually, they believe using AI could revolutionise healthcare by turning anyones smartphone into a cancer scanner. Cancer Research UK said it could become a useful tool for doctors. The AI was repurposed from software developed by Google that had learned to spot the difference between images of cats and dogs. It was shown 129,450 photographs and told what type of skin condition it was looking at in each one. It then learned to spot the hallmarks of the most common type of skin cancer: carcinoma, and the most deadly: melanoma. Only one in 20 skin cancers are melanoma, yet the tumour accounts for three-quarters of skin cancer deaths. The experiment, detailed in the journal Nature, then tested the AI against 21 trained skin cancer doctors. One of the researchers, Dr Andre Esteva, told the BBC News website: We find, in general, that we are on par with board-certified dermatologists. However, the computer software cannot make a full diagnosis, as this is normally confirmed with a tissue biopsy. Dr Esteva said the system now needed to be tested alongside doctors in the clinic. The application of AI to healthcare is, we believe, an incredibly exciting area of research that can be leveraged to achieve a great deal of societal good, he said. One particular route that we find exciting is the use of this algorithm on a mobile device, but to achieve this we would have to build an app and test its accuracy directly from a mobile device. Incredible advances in machine-learning have already led to AI beating one of humanitys best Go players. And a team of doctors in London have trained AI to predict when the heart will fail. Dr Jana Witt, from the charity Cancer Research UK, said: Using artificial intelligence to help diagnose skin cancer is very interesting, as it could support assessments by GPs and dermatologists. Its unlikely that AI will replace all of the other information your clinician would consider when making a diagnosis, but AI could help guide GP referrals to specialists in the future. Follow James on Twitter.","Artificial intelligence can identify skin cancer in photographs with the same accuracy as trained doctors, say scientists.",38717928
2,"Bernard Mensah struck a post for Aldershot after 11 minutes, but the visitors were a man down midway through the first half when Jim Kellerman saw red for a foul on Ross Stearn. The Shots regrouped and took a deserved lead in the 42nd minute when Idris Kanu latched on to a through pass and poked the ball under Ryan Clarke. But in the second half Eastleighs extra man began to tell as they pushed forward and, with 10 minutes left, McAllister fired home from close range after getting on to the end of a flick-on. That was enough to earn Eastleigh their first point in five outings, while Aldershot extended their unbeaten run to 10 games. Report supplied by the Press Association. Match ends, Eastleigh 1, Aldershot Town 1. Second Half ends, Eastleigh 1, Aldershot Town 1. Sam Matthews (Eastleigh) is shown the yellow card for a bad foul. Sam Muggleton (Eastleigh) is shown the yellow card for a bad foul. Substitution, Aldershot Town. Nick Arnold replaces Cheye Alexander. Goal! Eastleigh 1, Aldershot Town 1. Craig McAllister (Eastleigh). Ayo Obileye (Eastleigh) is shown the yellow card for a bad foul. Substitution, Eastleigh. Sam Matthews replaces Tyler Garrett. James Constable (Eastleigh) is shown the yellow card for a bad foul. Substitution, Eastleigh. James Constable replaces Ross Stearn. Substitution, Aldershot Town. Shamir Fenelon replaces Bernard Mensah. Second Half begins Eastleigh 0, Aldershot Town 1. First Half ends, Eastleigh 0, Aldershot Town 1. Goal! Eastleigh 0, Aldershot Town 1. Idris Kanu (Aldershot Town). Jim Kellerman (Aldershot Town) is shown the red card. First Half begins. Lineups are announced and players are warming up.",Craig McAllister's late goal saw Eastleigh end a run of four straight defeats as they held the 10 men of high-flying Aldershot in a 1-1 draw.,39035546


## We can view the column names and data types without our dataset using .features

In [62]:
xsum['test'].features

{'document': Value(dtype='string', id=None),
 'summary': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None)}

In [63]:
print(xsum['test'].info)

DatasetInfo(description='\nExtreme Summarization (XSum) Dataset.\n\nThere are three features:\n  - document: Input news article.\n  - summary: One sentence summary of the article.\n  - id: BBC ID of the article.\n\n', citation="\n@article{Narayan2018DontGM,\n  title={Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization},\n  author={Shashi Narayan and Shay B. Cohen and Mirella Lapata},\n  journal={ArXiv},\n  year={2018},\n  volume={abs/1808.08745}\n}\n", homepage='https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset', license='', features={'document': Value(dtype='string', id=None), 'summary': Value(dtype='string', id=None), 'id': Value(dtype='string', id=None)}, post_processed=None, supervised_keys=SupervisedKeysData(input='document', output='summary'), task_templates=None, builder_name='xsum', config_name='default', version=1.2.0, splits={'train': SplitInfo(name='train', num_bytes=479206615, num_examples=204045, data

# Preparing XSUM Data
Before we can put the text into a model we need to convert it into a format that the transformer can understand. Encoders and decoders only understand numerical values; we need to tokenize each word and then convert the tokens into numerical values. The tokenization transformer splits text into tokens and then adds special tokens if expected based on pretraining. The tokenizer then matches each token to unique id in vocabulary of tokenizer which has a corresponding vector of numerical values. These vectors contain the contextualized value of a word. For example, the vector representation of the word "to" isnt just "to", it also takes into account the words around it which are called context (right and left context). To continue this example, "Welcome to NYC" is a sentence that has the word "to". For the word "to" the left context is "Welcome" and the right context is "NYC". The output is based on these contexts; this is how the value is a contextualized vector thanks to self-attention mechanism. We can do all of this using the AutoTokenizer.from_pretarined method to ensure that we get a tokenizer that corresponds to the model architecture we want to use (facebook/bart-large-cnn); however, we will specifically reference the BartTokenizer in our checkpoint, tokenizer, and model to ensure all aspects of our model were trained using the same methodologies so we can avoid unexpected summaries

In [64]:
checkpoint = "facebook/bart-large-cnn"
tokenizer = BartTokenizer.from_pretrained(checkpoint)
model = BartForConditionalGeneration.from_pretrained(checkpoint)

## We now write a function that preprocesses the test data by passing it to the tokenizer. We need to use the argument truncation=True to ensure that any input longer than the model can handle will be truncated to the maximum length alowed. We can view this information in the model config. BART has a maximum length (can take in 1024 tokens in a sequence) of 1024 which we can see in max_position_embeddings

In [65]:
model.config

BartConfig {
  "_name_or_path": "facebook/bart-large-cnn",
  "_num_labels": 3,
  "activation_dropout": 0.0,
  "activation_function": "gelu",
  "add_final_layer_norm": false,
  "architectures": [
    "BartForConditionalGeneration"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 0,
  "classif_dropout": 0.0,
  "classifier_dropout": 0.0,
  "d_model": 1024,
  "decoder_attention_heads": 16,
  "decoder_ffn_dim": 4096,
  "decoder_layerdrop": 0.0,
  "decoder_layers": 12,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "early_stopping": true,
  "encoder_attention_heads": 16,
  "encoder_ffn_dim": 4096,
  "encoder_layerdrop": 0.0,
  "encoder_layers": 12,
  "eos_token_id": 2,
  "force_bos_token_to_be_generated": true,
  "forced_bos_token_id": 0,
  "forced_eos_token_id": 2,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2"
  },
  "init_std": 0.02,
  "is_encoder_decoder": true,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "L

## We can now create the function with the maximum length allowed as per the config and a minimum length of 60 which is explained in the section where we compare human summaries and machine summaries to each other and the original articles

In [66]:
max_input_length = 1024
max_target_length = 60


def preperation_function(examples):
    inputs = [doc for doc in examples["document"]]
    model_inputs = tokenizer(inputs, max_length=max_input_length, truncation=True, padding=True)

    
    with tokenizer.as_target_tokenizer(): # Setup the tokenizer for summaries where "as_target_tokenizer" is what provides passes along the context for each vector
        labels = tokenizer(
            examples["summary"], max_length=max_target_length, truncation=True
        )

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

## We can apply this function to our dataset using map

In [67]:
tokenized_xsum = xsum.map(preperation_function, batched=True)

Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-4b553bd8e5c78318.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-4d942dda870775b4.arrow
Loading cached processed dataset at C:\Users\creeg\.cache\huggingface\datasets\xsum\default\1.2.0\32c23220eadddb1149b16ed2e9430a05293768cfffbdfd151058697d4c11f934\cache-3189c97aecc791f8.arrow


In [68]:
tokenized_xsum

DatasetDict({
    train: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['attention_mask', 'document', 'id', 'input_ids', 'labels', 'summary'],
        num_rows: 11334
    })
})

In [69]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

## The attention mask tells the model what to pay attention to by passing values of 1 for tokens to consider and values of 0 for tokens to ignore. The input ids are the numerical mapping of tokens to BART's vocabulary; each word in BART's vocabulary is assigned a numerical value.

In [70]:
display_function(tokenized_xsum['test'])

Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Phoenix, 37, told Interview Magazine: I dont want to be a part of it. I dont believe in it. The actor has been tipped for awards success The Master, in which he plays a war veteran. He was nominated for an Oscar for his role as music legend Johnny Cash in Walk The Line in 2005. Phoenix said he was dreading the upcoming Hollywood awards season which culminates with the Oscars in February 2013. Its a carrot, but its the worst-tasting carrot Ive ever tasted in my whole life. I dont want this carrot, he said. Its totally subjective. Pitting people against each other....Its the stupidest thing in the whole world. Phoenix described the period when Walk The Line was up for multiple awards seven years ago as one of the most uncomfortable periods of my life. I never want to have that experience again, he revealed in the interview. I dont know how to explain it - and its not like Im in this place where I think Im just above it - but I just dont ever want to get comfortable with that part of things. The Master, which also stars Philip Seymour Hoffman, is Phoenixs first feature film since Im Still Here, Casey Afflecks spoof 2010 documentary which chronicled Phoenixs supposed retirement from acting to launch a career as a rapper. Phoenix called that experience unbelievably liberating and said it was hard subsequently to find projects that interested and excited him. I mean, everything that they teach you when youre a kid about acting is completely...wrong. They tell you to memorise your lines, follow your light, and hit your marks. Those are the three things that you shouldnt do. You should not learn your lines, you should not hit your mark, and you should never follow your light. Find your light - thats my opinion, he said.",20004220,"[0, 41932, 6, 2908, 6, 174, 21902, 10202, 35, 38, 33976, 236, 7, 28, 10, 233, 9, 24, 4, 38, 33976, 679, 11, 24, 4, 20, 2701, 34, 57, 13402, 13, 4188, 1282, 20, 6935, 6, 11, 61, 37, 1974, 10, 997, 3142, 4, 91, 21, 7076, 13, 41, 5887, 13, 39, 774, 25, 930, 7875, 8781, 7871, 11, 9693, 20, 5562, 11, 4013, 4, 5524, 26, 37, 21, 24506, 154, 5, 2568, 3049, 4188, 191, 61, 32887, 1626, 19, 5, 15300, 11, 902, 1014, 4, 3139, 10, 33129, 6, 53, 63, 5, 2373, 12, 90, 15374, 33129, 38, 548, ...]","[0, 19842, 23186, 5524, 34, 26, 37, 1072, 117, 233, 11, 5, 1569, 539, 4188, 191, 6, 1765, 5, 4188, 22, 620, 35820, 113, 8, 22, 40747, 2088, 845, 2]","Joaquin Phoenix has said he wants no part in the movie industry awards season, calling the awards ""stupid"" and ""subjective""."
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old striker, on loan from Al Hilal, pounced on a sloppy pass from defender Daniel Ayala and slotted a fine finish past Darren Randolph. Boro forward Martin Braithwaite should have levelled at Molineux before the break, but headed wide while unmarked. Britt Assombalonga was denied by goalkeeper John Ruddy as Boro pressed late on and Wolves held on for victory. Wolves new manager Nuno Espirito Santo handed starts to seven summer acquisitions, including the Championships record signing Ruben Neves in midfield. Finding a prolific striker was one of Nunos priorities ahead of the campaign, after midfielder Dave Edwards and winger Helder Costa, missing through injury, topped Wolves scoring charts last season. And it was former Brazil under-17 player Bonatini who made the difference with the most composed of strikes after some terrible defending. Ayala attempted to pass the ball to Ben Gibson, but it was woefully short allowing Bonatini to nip in and place his effort into the bottom corner. Middlesbrough also struggled to find the back of the net in 2016-17, scoring just 27 times as they were relegated from the Premier League. New boss Garry Monk started three of his new forwards - Ashley Fletcher, Assombalonga and Denmark international Braithwaite - but they could not find a way through a resolute Wolves defence. Wolves head coach Nuno Espirito Santo: It was a tough game and in the first half we played very well. We controlled the game and this is the way we should work. I think that we deserved the three points and we are pleased with the boys. We are still not the final product and every game will be better. This is the line that we want from the boys, always progress, always get better. Middlesbrough boss Garry Monk: In the first half Wolves were the better team and we made too many mistakes and obviously one of them led to a goal. But I thought we were the better team in the second half and we upped our level of urgency. We need that at the start of games. We had the best chances in the game and on any other day we could have taken one or two of them. That is football and is sometimes the way that it works out. Match ends, Wolverhampton Wanderers 1, Middlesbrough 0. Second Half ends, Wolverhampton Wanderers 1, Middlesbrough 0. Adam Forshaw (Middlesbrough) is shown the yellow card for a bad foul. Foul by Adam Forshaw (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Hand ball by David Edwards (Wolverhampton Wanderers). Offside, Wolverhampton Wanderers. Matt Doherty tries a through ball, but David Edwards is caught offside. Foul by Daniel Ayala (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Attempt missed. Cyrus Christie (Middlesbrough) right footed shot from outside the box is just a bit too high. Substitution, Wolverhampton Wanderers. Jordan Graham replaces Diogo Jota. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from outside the box is too high. Substitution, Middlesbrough. Rudy Gestede replaces Jonny Howson. Attempt saved. Patrick Bamford (Middlesbrough) left footed shot from outside the box is saved in the centre of the goal. Assisted by Jonny Howson. Substitution, Wolverhampton Wanderers. David Edwards replaces Bright Enobakhare. Romain Saiss (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Cyrus Christie (Middlesbrough). Attempt blocked. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box is blocked. Corner, Middlesbrough. Conceded by John Ruddy. Attempt saved. Britt Assombalonga (Middlesbrough) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Patrick Bamford with a through ball. Attempt missed. Jonny Howson (Middlesbrough) right footed shot from outside the box misses to the left. Assisted by Adam Clayton following a set piece situation. Foul by Willy Boly (Wolverhampton Wanderers). Patrick Bamford (Middlesbrough) wins a free kick on the right wing. Attempt blocked. Willy Boly (Wolverhampton Wanderers) left footed shot from the centre of the box is blocked. Assisted by Barry Douglas with a cross. Corner, Wolverhampton Wanderers. Conceded by Adam Clayton. Attempt blocked. Bright Enobakhare (Wolverhampton Wanderers) left footed shot from outside the box is blocked. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from the right side of the six yard box is close, but misses to the right. Assisted by Barry Douglas with a cross following a corner. Corner, Wolverhampton Wanderers. Conceded by George Friend. Substitution, Middlesbrough. Adam Forshaw replaces Marten de Roon. Offside, Middlesbrough. Marten de Roon tries a through ball, but Adam Clayton is caught offside. Corner, Middlesbrough. Conceded by Conor Coady. Diogo Jota (Wolverhampton Wanderers) wins a free kick on the left wing. Foul by Cyrus Christie (Middlesbrough). Corner, Wolverhampton Wanderers. Conceded by Marten de Roon. Adam Clayton (Middlesbrough) is shown the yellow card for a bad foul. Diogo Jota (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Adam Clayton (Middlesbrough). Substitution, Wolverhampton Wanderers. Nouha Dicko replaces Léo Bonatini. Substitution, Middlesbrough. Patrick Bamford replaces Ashley Fletcher. Attempt missed. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box misses to the left following a corner.",40760787,"[0, 133, 883, 12, 180, 12, 279, 5955, 6, 15, 2541, 31, 726, 14909, 337, 6, 181, 20155, 15, 10, 26654, 1323, 31, 5142, 3028, 5847, 2331, 8, 3369, 14265, 10, 2051, 2073, 375, 11335, 24500, 4, 7943, 139, 556, 1896, 9076, 3432, 2739, 1459, 197, 33, 16066, 9970, 23, 256, 18675, 7073, 137, 5, 1108, 6, 53, 3475, 1810, 150, 30161, 4, 16278, 6331, 5223, 20774, 102, 21, 2296, 30, 7551, 610, 248, 24471, 25, 7943, 139, 11224, 628, 15, 8, 13889, 547, 15, 13, 1124, 4, 13889, 92, 1044, 234, 25217, 11631, 30710, 139, 8550, 139, 4507, 2012, ...]","[0, 31004, 811, 12520, 5520, 415, 2531, 1008, 15, 39, 2453, 7, 492, 13889, 41, 1273, 12, 1208, 3261, 339, 136, 20421, 428, 10344, 4, 2]",Brazilian Leo Bonatini scored on his debut to give Wolves an opening-day Championship win against Middlesbrough.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Sinead Higgins, 37, and son Oisin ODriscoll were found after police forced their way into the house in The Fairway, Ruislip, west London on Wednesday. Police responded to concerns for the pairs welfare at about 10:50 GMT. A Met Police spokeswoman said detectives do not believe a third party was involved in the deaths. Det Insp Dave Bolton said: Inquiries so far lead us to believe there is a likelihood that the tragic events that led to the deaths do not involve a third party. A post-mortem examination is scheduled to take place on Friday. Next of kin have been informed.",38329030,"[0, 104, 833, 625, 19422, 6, 2908, 6, 8, 979, 384, 29761, 384, 14043, 4473, 3937, 58, 303, 71, 249, 1654, 49, 169, 88, 5, 790, 11, 20, 3896, 1970, 6, 10318, 13714, 1588, 6, 3072, 928, 15, 307, 4, 522, 2334, 7, 1379, 13, 5, 15029, 6642, 23, 59, 158, 35, 1096, 5050, 4, 83, 4369, 522, 3582, 26, 10412, 109, 45, 679, 10, 371, 537, 21, 963, 11, 5, 3257, 4, 11185, 19190, 4475, 12160, 26, 35, 28727, 19947, 98, 444, 483, 201, 7, 679, 89, 16, 10, 11801, 14, 5, 8805, 1061, 14, 669, 7, 5, 3257, ...]","[0, 133, 3738, 9, 10, 985, 8, 69, 707, 12, 180, 12, 279, 979, 33, 57, 2967, 23, 49, 184, 4, 2]",The bodies of a mother and her seven-year-old son have been discovered at their home.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Phoenix, 37, told Interview Magazine: I dont want to be a part of it. I dont believe in it. The actor has been tipped for awards success The Master, in which he plays a war veteran. He was nominated for an Oscar for his role as music legend Johnny Cash in Walk The Line in 2005. Phoenix said he was dreading the upcoming Hollywood awards season which culminates with the Oscars in February 2013. Its a carrot, but its the worst-tasting carrot Ive ever tasted in my whole life. I dont want this carrot, he said. Its totally subjective. Pitting people against each other....Its the stupidest thing in the whole world. Phoenix described the period when Walk The Line was up for multiple awards seven years ago as one of the most uncomfortable periods of my life. I never want to have that experience again, he revealed in the interview. I dont know how to explain it - and its not like Im in this place where I think Im just above it - but I just dont ever want to get comfortable with that part of things. The Master, which also stars Philip Seymour Hoffman, is Phoenixs first feature film since Im Still Here, Casey Afflecks spoof 2010 documentary which chronicled Phoenixs supposed retirement from acting to launch a career as a rapper. Phoenix called that experience unbelievably liberating and said it was hard subsequently to find projects that interested and excited him. I mean, everything that they teach you when youre a kid about acting is completely...wrong. They tell you to memorise your lines, follow your light, and hit your marks. Those are the three things that you shouldnt do. You should not learn your lines, you should not hit your mark, and you should never follow your light. Find your light - thats my opinion, he said.",20004220,"[0, 41932, 6, 2908, 6, 174, 21902, 10202, 35, 38, 33976, 236, 7, 28, 10, 233, 9, 24, 4, 38, 33976, 679, 11, 24, 4, 20, 2701, 34, 57, 13402, 13, 4188, 1282, 20, 6935, 6, 11, 61, 37, 1974, 10, 997, 3142, 4, 91, 21, 7076, 13, 41, 5887, 13, 39, 774, 25, 930, 7875, 8781, 7871, 11, 9693, 20, 5562, 11, 4013, 4, 5524, 26, 37, 21, 24506, 154, 5, 2568, 3049, 4188, 191, 61, 32887, 1626, 19, 5, 15300, 11, 902, 1014, 4, 3139, 10, 33129, 6, 53, 63, 5, 2373, 12, 90, 15374, 33129, 38, 548, ...]","[0, 19842, 23186, 5524, 34, 26, 37, 1072, 117, 233, 11, 5, 1569, 539, 4188, 191, 6, 1765, 5, 4188, 22, 620, 35820, 113, 8, 22, 40747, 2088, 845, 2]","Joaquin Phoenix has said he wants no part in the movie industry awards season, calling the awards ""stupid"" and ""subjective""."
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old striker, on loan from Al Hilal, pounced on a sloppy pass from defender Daniel Ayala and slotted a fine finish past Darren Randolph. Boro forward Martin Braithwaite should have levelled at Molineux before the break, but headed wide while unmarked. Britt Assombalonga was denied by goalkeeper John Ruddy as Boro pressed late on and Wolves held on for victory. Wolves new manager Nuno Espirito Santo handed starts to seven summer acquisitions, including the Championships record signing Ruben Neves in midfield. Finding a prolific striker was one of Nunos priorities ahead of the campaign, after midfielder Dave Edwards and winger Helder Costa, missing through injury, topped Wolves scoring charts last season. And it was former Brazil under-17 player Bonatini who made the difference with the most composed of strikes after some terrible defending. Ayala attempted to pass the ball to Ben Gibson, but it was woefully short allowing Bonatini to nip in and place his effort into the bottom corner. Middlesbrough also struggled to find the back of the net in 2016-17, scoring just 27 times as they were relegated from the Premier League. New boss Garry Monk started three of his new forwards - Ashley Fletcher, Assombalonga and Denmark international Braithwaite - but they could not find a way through a resolute Wolves defence. Wolves head coach Nuno Espirito Santo: It was a tough game and in the first half we played very well. We controlled the game and this is the way we should work. I think that we deserved the three points and we are pleased with the boys. We are still not the final product and every game will be better. This is the line that we want from the boys, always progress, always get better. Middlesbrough boss Garry Monk: In the first half Wolves were the better team and we made too many mistakes and obviously one of them led to a goal. But I thought we were the better team in the second half and we upped our level of urgency. We need that at the start of games. We had the best chances in the game and on any other day we could have taken one or two of them. That is football and is sometimes the way that it works out. Match ends, Wolverhampton Wanderers 1, Middlesbrough 0. Second Half ends, Wolverhampton Wanderers 1, Middlesbrough 0. Adam Forshaw (Middlesbrough) is shown the yellow card for a bad foul. Foul by Adam Forshaw (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Hand ball by David Edwards (Wolverhampton Wanderers). Offside, Wolverhampton Wanderers. Matt Doherty tries a through ball, but David Edwards is caught offside. Foul by Daniel Ayala (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Attempt missed. Cyrus Christie (Middlesbrough) right footed shot from outside the box is just a bit too high. Substitution, Wolverhampton Wanderers. Jordan Graham replaces Diogo Jota. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from outside the box is too high. Substitution, Middlesbrough. Rudy Gestede replaces Jonny Howson. Attempt saved. Patrick Bamford (Middlesbrough) left footed shot from outside the box is saved in the centre of the goal. Assisted by Jonny Howson. Substitution, Wolverhampton Wanderers. David Edwards replaces Bright Enobakhare. Romain Saiss (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Cyrus Christie (Middlesbrough). Attempt blocked. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box is blocked. Corner, Middlesbrough. Conceded by John Ruddy. Attempt saved. Britt Assombalonga (Middlesbrough) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Patrick Bamford with a through ball. Attempt missed. Jonny Howson (Middlesbrough) right footed shot from outside the box misses to the left. Assisted by Adam Clayton following a set piece situation. Foul by Willy Boly (Wolverhampton Wanderers). Patrick Bamford (Middlesbrough) wins a free kick on the right wing. Attempt blocked. Willy Boly (Wolverhampton Wanderers) left footed shot from the centre of the box is blocked. Assisted by Barry Douglas with a cross. Corner, Wolverhampton Wanderers. Conceded by Adam Clayton. Attempt blocked. Bright Enobakhare (Wolverhampton Wanderers) left footed shot from outside the box is blocked. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from the right side of the six yard box is close, but misses to the right. Assisted by Barry Douglas with a cross following a corner. Corner, Wolverhampton Wanderers. Conceded by George Friend. Substitution, Middlesbrough. Adam Forshaw replaces Marten de Roon. Offside, Middlesbrough. Marten de Roon tries a through ball, but Adam Clayton is caught offside. Corner, Middlesbrough. Conceded by Conor Coady. Diogo Jota (Wolverhampton Wanderers) wins a free kick on the left wing. Foul by Cyrus Christie (Middlesbrough). Corner, Wolverhampton Wanderers. Conceded by Marten de Roon. Adam Clayton (Middlesbrough) is shown the yellow card for a bad foul. Diogo Jota (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Adam Clayton (Middlesbrough). Substitution, Wolverhampton Wanderers. Nouha Dicko replaces Léo Bonatini. Substitution, Middlesbrough. Patrick Bamford replaces Ashley Fletcher. Attempt missed. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box misses to the left following a corner.",40760787,"[0, 133, 883, 12, 180, 12, 279, 5955, 6, 15, 2541, 31, 726, 14909, 337, 6, 181, 20155, 15, 10, 26654, 1323, 31, 5142, 3028, 5847, 2331, 8, 3369, 14265, 10, 2051, 2073, 375, 11335, 24500, 4, 7943, 139, 556, 1896, 9076, 3432, 2739, 1459, 197, 33, 16066, 9970, 23, 256, 18675, 7073, 137, 5, 1108, 6, 53, 3475, 1810, 150, 30161, 4, 16278, 6331, 5223, 20774, 102, 21, 2296, 30, 7551, 610, 248, 24471, 25, 7943, 139, 11224, 628, 15, 8, 13889, 547, 15, 13, 1124, 4, 13889, 92, 1044, 234, 25217, 11631, 30710, 139, 8550, 139, 4507, 2012, ...]","[0, 31004, 811, 12520, 5520, 415, 2531, 1008, 15, 39, 2453, 7, 492, 13889, 41, 1273, 12, 1208, 3261, 339, 136, 20421, 428, 10344, 4, 2]",Brazilian Leo Bonatini scored on his debut to give Wolves an opening-day Championship win against Middlesbrough.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Sinead Higgins, 37, and son Oisin ODriscoll were found after police forced their way into the house in The Fairway, Ruislip, west London on Wednesday. Police responded to concerns for the pairs welfare at about 10:50 GMT. A Met Police spokeswoman said detectives do not believe a third party was involved in the deaths. Det Insp Dave Bolton said: Inquiries so far lead us to believe there is a likelihood that the tragic events that led to the deaths do not involve a third party. A post-mortem examination is scheduled to take place on Friday. Next of kin have been informed.",38329030,"[0, 104, 833, 625, 19422, 6, 2908, 6, 8, 979, 384, 29761, 384, 14043, 4473, 3937, 58, 303, 71, 249, 1654, 49, 169, 88, 5, 790, 11, 20, 3896, 1970, 6, 10318, 13714, 1588, 6, 3072, 928, 15, 307, 4, 522, 2334, 7, 1379, 13, 5, 15029, 6642, 23, 59, 158, 35, 1096, 5050, 4, 83, 4369, 522, 3582, 26, 10412, 109, 45, 679, 10, 371, 537, 21, 963, 11, 5, 3257, 4, 11185, 19190, 4475, 12160, 26, 35, 28727, 19947, 98, 444, 483, 201, 7, 679, 89, 16, 10, 11801, 14, 5, 8805, 1061, 14, 669, 7, 5, 3257, ...]","[0, 133, 3738, 9, 10, 985, 8, 69, 707, 12, 180, 12, 279, 979, 33, 57, 2967, 23, 49, 184, 4, 2]",The bodies of a mother and her seven-year-old son have been discovered at their home.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Phoenix, 37, told Interview Magazine: I dont want to be a part of it. I dont believe in it. The actor has been tipped for awards success The Master, in which he plays a war veteran. He was nominated for an Oscar for his role as music legend Johnny Cash in Walk The Line in 2005. Phoenix said he was dreading the upcoming Hollywood awards season which culminates with the Oscars in February 2013. Its a carrot, but its the worst-tasting carrot Ive ever tasted in my whole life. I dont want this carrot, he said. Its totally subjective. Pitting people against each other....Its the stupidest thing in the whole world. Phoenix described the period when Walk The Line was up for multiple awards seven years ago as one of the most uncomfortable periods of my life. I never want to have that experience again, he revealed in the interview. I dont know how to explain it - and its not like Im in this place where I think Im just above it - but I just dont ever want to get comfortable with that part of things. The Master, which also stars Philip Seymour Hoffman, is Phoenixs first feature film since Im Still Here, Casey Afflecks spoof 2010 documentary which chronicled Phoenixs supposed retirement from acting to launch a career as a rapper. Phoenix called that experience unbelievably liberating and said it was hard subsequently to find projects that interested and excited him. I mean, everything that they teach you when youre a kid about acting is completely...wrong. They tell you to memorise your lines, follow your light, and hit your marks. Those are the three things that you shouldnt do. You should not learn your lines, you should not hit your mark, and you should never follow your light. Find your light - thats my opinion, he said.",20004220,"[0, 41932, 6, 2908, 6, 174, 21902, 10202, 35, 38, 33976, 236, 7, 28, 10, 233, 9, 24, 4, 38, 33976, 679, 11, 24, 4, 20, 2701, 34, 57, 13402, 13, 4188, 1282, 20, 6935, 6, 11, 61, 37, 1974, 10, 997, 3142, 4, 91, 21, 7076, 13, 41, 5887, 13, 39, 774, 25, 930, 7875, 8781, 7871, 11, 9693, 20, 5562, 11, 4013, 4, 5524, 26, 37, 21, 24506, 154, 5, 2568, 3049, 4188, 191, 61, 32887, 1626, 19, 5, 15300, 11, 902, 1014, 4, 3139, 10, 33129, 6, 53, 63, 5, 2373, 12, 90, 15374, 33129, 38, 548, ...]","[0, 19842, 23186, 5524, 34, 26, 37, 1072, 117, 233, 11, 5, 1569, 539, 4188, 191, 6, 1765, 5, 4188, 22, 620, 35820, 113, 8, 22, 40747, 2088, 845, 2]","Joaquin Phoenix has said he wants no part in the movie industry awards season, calling the awards ""stupid"" and ""subjective""."
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old striker, on loan from Al Hilal, pounced on a sloppy pass from defender Daniel Ayala and slotted a fine finish past Darren Randolph. Boro forward Martin Braithwaite should have levelled at Molineux before the break, but headed wide while unmarked. Britt Assombalonga was denied by goalkeeper John Ruddy as Boro pressed late on and Wolves held on for victory. Wolves new manager Nuno Espirito Santo handed starts to seven summer acquisitions, including the Championships record signing Ruben Neves in midfield. Finding a prolific striker was one of Nunos priorities ahead of the campaign, after midfielder Dave Edwards and winger Helder Costa, missing through injury, topped Wolves scoring charts last season. And it was former Brazil under-17 player Bonatini who made the difference with the most composed of strikes after some terrible defending. Ayala attempted to pass the ball to Ben Gibson, but it was woefully short allowing Bonatini to nip in and place his effort into the bottom corner. Middlesbrough also struggled to find the back of the net in 2016-17, scoring just 27 times as they were relegated from the Premier League. New boss Garry Monk started three of his new forwards - Ashley Fletcher, Assombalonga and Denmark international Braithwaite - but they could not find a way through a resolute Wolves defence. Wolves head coach Nuno Espirito Santo: It was a tough game and in the first half we played very well. We controlled the game and this is the way we should work. I think that we deserved the three points and we are pleased with the boys. We are still not the final product and every game will be better. This is the line that we want from the boys, always progress, always get better. Middlesbrough boss Garry Monk: In the first half Wolves were the better team and we made too many mistakes and obviously one of them led to a goal. But I thought we were the better team in the second half and we upped our level of urgency. We need that at the start of games. We had the best chances in the game and on any other day we could have taken one or two of them. That is football and is sometimes the way that it works out. Match ends, Wolverhampton Wanderers 1, Middlesbrough 0. Second Half ends, Wolverhampton Wanderers 1, Middlesbrough 0. Adam Forshaw (Middlesbrough) is shown the yellow card for a bad foul. Foul by Adam Forshaw (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Hand ball by David Edwards (Wolverhampton Wanderers). Offside, Wolverhampton Wanderers. Matt Doherty tries a through ball, but David Edwards is caught offside. Foul by Daniel Ayala (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Attempt missed. Cyrus Christie (Middlesbrough) right footed shot from outside the box is just a bit too high. Substitution, Wolverhampton Wanderers. Jordan Graham replaces Diogo Jota. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from outside the box is too high. Substitution, Middlesbrough. Rudy Gestede replaces Jonny Howson. Attempt saved. Patrick Bamford (Middlesbrough) left footed shot from outside the box is saved in the centre of the goal. Assisted by Jonny Howson. Substitution, Wolverhampton Wanderers. David Edwards replaces Bright Enobakhare. Romain Saiss (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Cyrus Christie (Middlesbrough). Attempt blocked. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box is blocked. Corner, Middlesbrough. Conceded by John Ruddy. Attempt saved. Britt Assombalonga (Middlesbrough) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Patrick Bamford with a through ball. Attempt missed. Jonny Howson (Middlesbrough) right footed shot from outside the box misses to the left. Assisted by Adam Clayton following a set piece situation. Foul by Willy Boly (Wolverhampton Wanderers). Patrick Bamford (Middlesbrough) wins a free kick on the right wing. Attempt blocked. Willy Boly (Wolverhampton Wanderers) left footed shot from the centre of the box is blocked. Assisted by Barry Douglas with a cross. Corner, Wolverhampton Wanderers. Conceded by Adam Clayton. Attempt blocked. Bright Enobakhare (Wolverhampton Wanderers) left footed shot from outside the box is blocked. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from the right side of the six yard box is close, but misses to the right. Assisted by Barry Douglas with a cross following a corner. Corner, Wolverhampton Wanderers. Conceded by George Friend. Substitution, Middlesbrough. Adam Forshaw replaces Marten de Roon. Offside, Middlesbrough. Marten de Roon tries a through ball, but Adam Clayton is caught offside. Corner, Middlesbrough. Conceded by Conor Coady. Diogo Jota (Wolverhampton Wanderers) wins a free kick on the left wing. Foul by Cyrus Christie (Middlesbrough). Corner, Wolverhampton Wanderers. Conceded by Marten de Roon. Adam Clayton (Middlesbrough) is shown the yellow card for a bad foul. Diogo Jota (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Adam Clayton (Middlesbrough). Substitution, Wolverhampton Wanderers. Nouha Dicko replaces Léo Bonatini. Substitution, Middlesbrough. Patrick Bamford replaces Ashley Fletcher. Attempt missed. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box misses to the left following a corner.",40760787,"[0, 133, 883, 12, 180, 12, 279, 5955, 6, 15, 2541, 31, 726, 14909, 337, 6, 181, 20155, 15, 10, 26654, 1323, 31, 5142, 3028, 5847, 2331, 8, 3369, 14265, 10, 2051, 2073, 375, 11335, 24500, 4, 7943, 139, 556, 1896, 9076, 3432, 2739, 1459, 197, 33, 16066, 9970, 23, 256, 18675, 7073, 137, 5, 1108, 6, 53, 3475, 1810, 150, 30161, 4, 16278, 6331, 5223, 20774, 102, 21, 2296, 30, 7551, 610, 248, 24471, 25, 7943, 139, 11224, 628, 15, 8, 13889, 547, 15, 13, 1124, 4, 13889, 92, 1044, 234, 25217, 11631, 30710, 139, 8550, 139, 4507, 2012, ...]","[0, 31004, 811, 12520, 5520, 415, 2531, 1008, 15, 39, 2453, 7, 492, 13889, 41, 1273, 12, 1208, 3261, 339, 136, 20421, 428, 10344, 4, 2]",Brazilian Leo Bonatini scored on his debut to give Wolves an opening-day Championship win against Middlesbrough.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Sinead Higgins, 37, and son Oisin ODriscoll were found after police forced their way into the house in The Fairway, Ruislip, west London on Wednesday. Police responded to concerns for the pairs welfare at about 10:50 GMT. A Met Police spokeswoman said detectives do not believe a third party was involved in the deaths. Det Insp Dave Bolton said: Inquiries so far lead us to believe there is a likelihood that the tragic events that led to the deaths do not involve a third party. A post-mortem examination is scheduled to take place on Friday. Next of kin have been informed.",38329030,"[0, 104, 833, 625, 19422, 6, 2908, 6, 8, 979, 384, 29761, 384, 14043, 4473, 3937, 58, 303, 71, 249, 1654, 49, 169, 88, 5, 790, 11, 20, 3896, 1970, 6, 10318, 13714, 1588, 6, 3072, 928, 15, 307, 4, 522, 2334, 7, 1379, 13, 5, 15029, 6642, 23, 59, 158, 35, 1096, 5050, 4, 83, 4369, 522, 3582, 26, 10412, 109, 45, 679, 10, 371, 537, 21, 963, 11, 5, 3257, 4, 11185, 19190, 4475, 12160, 26, 35, 28727, 19947, 98, 444, 483, 201, 7, 679, 89, 16, 10, 11801, 14, 5, 8805, 1061, 14, 669, 7, 5, 3257, ...]","[0, 133, 3738, 9, 10, 985, 8, 69, 707, 12, 180, 12, 279, 979, 33, 57, 2967, 23, 49, 184, 4, 2]",The bodies of a mother and her seven-year-old son have been discovered at their home.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Phoenix, 37, told Interview Magazine: I dont want to be a part of it. I dont believe in it. The actor has been tipped for awards success The Master, in which he plays a war veteran. He was nominated for an Oscar for his role as music legend Johnny Cash in Walk The Line in 2005. Phoenix said he was dreading the upcoming Hollywood awards season which culminates with the Oscars in February 2013. Its a carrot, but its the worst-tasting carrot Ive ever tasted in my whole life. I dont want this carrot, he said. Its totally subjective. Pitting people against each other....Its the stupidest thing in the whole world. Phoenix described the period when Walk The Line was up for multiple awards seven years ago as one of the most uncomfortable periods of my life. I never want to have that experience again, he revealed in the interview. I dont know how to explain it - and its not like Im in this place where I think Im just above it - but I just dont ever want to get comfortable with that part of things. The Master, which also stars Philip Seymour Hoffman, is Phoenixs first feature film since Im Still Here, Casey Afflecks spoof 2010 documentary which chronicled Phoenixs supposed retirement from acting to launch a career as a rapper. Phoenix called that experience unbelievably liberating and said it was hard subsequently to find projects that interested and excited him. I mean, everything that they teach you when youre a kid about acting is completely...wrong. They tell you to memorise your lines, follow your light, and hit your marks. Those are the three things that you shouldnt do. You should not learn your lines, you should not hit your mark, and you should never follow your light. Find your light - thats my opinion, he said.",20004220,"[0, 41932, 6, 2908, 6, 174, 21902, 10202, 35, 38, 33976, 236, 7, 28, 10, 233, 9, 24, 4, 38, 33976, 679, 11, 24, 4, 20, 2701, 34, 57, 13402, 13, 4188, 1282, 20, 6935, 6, 11, 61, 37, 1974, 10, 997, 3142, 4, 91, 21, 7076, 13, 41, 5887, 13, 39, 774, 25, 930, 7875, 8781, 7871, 11, 9693, 20, 5562, 11, 4013, 4, 5524, 26, 37, 21, 24506, 154, 5, 2568, 3049, 4188, 191, 61, 32887, 1626, 19, 5, 15300, 11, 902, 1014, 4, 3139, 10, 33129, 6, 53, 63, 5, 2373, 12, 90, 15374, 33129, 38, 548, ...]","[0, 19842, 23186, 5524, 34, 26, 37, 1072, 117, 233, 11, 5, 1569, 539, 4188, 191, 6, 1765, 5, 4188, 22, 620, 35820, 113, 8, 22, 40747, 2088, 845, 2]","Joaquin Phoenix has said he wants no part in the movie industry awards season, calling the awards ""stupid"" and ""subjective""."
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old striker, on loan from Al Hilal, pounced on a sloppy pass from defender Daniel Ayala and slotted a fine finish past Darren Randolph. Boro forward Martin Braithwaite should have levelled at Molineux before the break, but headed wide while unmarked. Britt Assombalonga was denied by goalkeeper John Ruddy as Boro pressed late on and Wolves held on for victory. Wolves new manager Nuno Espirito Santo handed starts to seven summer acquisitions, including the Championships record signing Ruben Neves in midfield. Finding a prolific striker was one of Nunos priorities ahead of the campaign, after midfielder Dave Edwards and winger Helder Costa, missing through injury, topped Wolves scoring charts last season. And it was former Brazil under-17 player Bonatini who made the difference with the most composed of strikes after some terrible defending. Ayala attempted to pass the ball to Ben Gibson, but it was woefully short allowing Bonatini to nip in and place his effort into the bottom corner. Middlesbrough also struggled to find the back of the net in 2016-17, scoring just 27 times as they were relegated from the Premier League. New boss Garry Monk started three of his new forwards - Ashley Fletcher, Assombalonga and Denmark international Braithwaite - but they could not find a way through a resolute Wolves defence. Wolves head coach Nuno Espirito Santo: It was a tough game and in the first half we played very well. We controlled the game and this is the way we should work. I think that we deserved the three points and we are pleased with the boys. We are still not the final product and every game will be better. This is the line that we want from the boys, always progress, always get better. Middlesbrough boss Garry Monk: In the first half Wolves were the better team and we made too many mistakes and obviously one of them led to a goal. But I thought we were the better team in the second half and we upped our level of urgency. We need that at the start of games. We had the best chances in the game and on any other day we could have taken one or two of them. That is football and is sometimes the way that it works out. Match ends, Wolverhampton Wanderers 1, Middlesbrough 0. Second Half ends, Wolverhampton Wanderers 1, Middlesbrough 0. Adam Forshaw (Middlesbrough) is shown the yellow card for a bad foul. Foul by Adam Forshaw (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Hand ball by David Edwards (Wolverhampton Wanderers). Offside, Wolverhampton Wanderers. Matt Doherty tries a through ball, but David Edwards is caught offside. Foul by Daniel Ayala (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Attempt missed. Cyrus Christie (Middlesbrough) right footed shot from outside the box is just a bit too high. Substitution, Wolverhampton Wanderers. Jordan Graham replaces Diogo Jota. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from outside the box is too high. Substitution, Middlesbrough. Rudy Gestede replaces Jonny Howson. Attempt saved. Patrick Bamford (Middlesbrough) left footed shot from outside the box is saved in the centre of the goal. Assisted by Jonny Howson. Substitution, Wolverhampton Wanderers. David Edwards replaces Bright Enobakhare. Romain Saiss (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Cyrus Christie (Middlesbrough). Attempt blocked. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box is blocked. Corner, Middlesbrough. Conceded by John Ruddy. Attempt saved. Britt Assombalonga (Middlesbrough) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Patrick Bamford with a through ball. Attempt missed. Jonny Howson (Middlesbrough) right footed shot from outside the box misses to the left. Assisted by Adam Clayton following a set piece situation. Foul by Willy Boly (Wolverhampton Wanderers). Patrick Bamford (Middlesbrough) wins a free kick on the right wing. Attempt blocked. Willy Boly (Wolverhampton Wanderers) left footed shot from the centre of the box is blocked. Assisted by Barry Douglas with a cross. Corner, Wolverhampton Wanderers. Conceded by Adam Clayton. Attempt blocked. Bright Enobakhare (Wolverhampton Wanderers) left footed shot from outside the box is blocked. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from the right side of the six yard box is close, but misses to the right. Assisted by Barry Douglas with a cross following a corner. Corner, Wolverhampton Wanderers. Conceded by George Friend. Substitution, Middlesbrough. Adam Forshaw replaces Marten de Roon. Offside, Middlesbrough. Marten de Roon tries a through ball, but Adam Clayton is caught offside. Corner, Middlesbrough. Conceded by Conor Coady. Diogo Jota (Wolverhampton Wanderers) wins a free kick on the left wing. Foul by Cyrus Christie (Middlesbrough). Corner, Wolverhampton Wanderers. Conceded by Marten de Roon. Adam Clayton (Middlesbrough) is shown the yellow card for a bad foul. Diogo Jota (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Adam Clayton (Middlesbrough). Substitution, Wolverhampton Wanderers. Nouha Dicko replaces Léo Bonatini. Substitution, Middlesbrough. Patrick Bamford replaces Ashley Fletcher. Attempt missed. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box misses to the left following a corner.",40760787,"[0, 133, 883, 12, 180, 12, 279, 5955, 6, 15, 2541, 31, 726, 14909, 337, 6, 181, 20155, 15, 10, 26654, 1323, 31, 5142, 3028, 5847, 2331, 8, 3369, 14265, 10, 2051, 2073, 375, 11335, 24500, 4, 7943, 139, 556, 1896, 9076, 3432, 2739, 1459, 197, 33, 16066, 9970, 23, 256, 18675, 7073, 137, 5, 1108, 6, 53, 3475, 1810, 150, 30161, 4, 16278, 6331, 5223, 20774, 102, 21, 2296, 30, 7551, 610, 248, 24471, 25, 7943, 139, 11224, 628, 15, 8, 13889, 547, 15, 13, 1124, 4, 13889, 92, 1044, 234, 25217, 11631, 30710, 139, 8550, 139, 4507, 2012, ...]","[0, 31004, 811, 12520, 5520, 415, 2531, 1008, 15, 39, 2453, 7, 492, 13889, 41, 1273, 12, 1208, 3261, 339, 136, 20421, 428, 10344, 4, 2]",Brazilian Leo Bonatini scored on his debut to give Wolves an opening-day Championship win against Middlesbrough.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Sinead Higgins, 37, and son Oisin ODriscoll were found after police forced their way into the house in The Fairway, Ruislip, west London on Wednesday. Police responded to concerns for the pairs welfare at about 10:50 GMT. A Met Police spokeswoman said detectives do not believe a third party was involved in the deaths. Det Insp Dave Bolton said: Inquiries so far lead us to believe there is a likelihood that the tragic events that led to the deaths do not involve a third party. A post-mortem examination is scheduled to take place on Friday. Next of kin have been informed.",38329030,"[0, 104, 833, 625, 19422, 6, 2908, 6, 8, 979, 384, 29761, 384, 14043, 4473, 3937, 58, 303, 71, 249, 1654, 49, 169, 88, 5, 790, 11, 20, 3896, 1970, 6, 10318, 13714, 1588, 6, 3072, 928, 15, 307, 4, 522, 2334, 7, 1379, 13, 5, 15029, 6642, 23, 59, 158, 35, 1096, 5050, 4, 83, 4369, 522, 3582, 26, 10412, 109, 45, 679, 10, 371, 537, 21, 963, 11, 5, 3257, 4, 11185, 19190, 4475, 12160, 26, 35, 28727, 19947, 98, 444, 483, 201, 7, 679, 89, 16, 10, 11801, 14, 5, 8805, 1061, 14, 669, 7, 5, 3257, ...]","[0, 133, 3738, 9, 10, 985, 8, 69, 707, 12, 180, 12, 279, 979, 33, 57, 2967, 23, 49, 184, 4, 2]",The bodies of a mother and her seven-year-old son have been discovered at their home.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Phoenix, 37, told Interview Magazine: I dont want to be a part of it. I dont believe in it. The actor has been tipped for awards success The Master, in which he plays a war veteran. He was nominated for an Oscar for his role as music legend Johnny Cash in Walk The Line in 2005. Phoenix said he was dreading the upcoming Hollywood awards season which culminates with the Oscars in February 2013. Its a carrot, but its the worst-tasting carrot Ive ever tasted in my whole life. I dont want this carrot, he said. Its totally subjective. Pitting people against each other....Its the stupidest thing in the whole world. Phoenix described the period when Walk The Line was up for multiple awards seven years ago as one of the most uncomfortable periods of my life. I never want to have that experience again, he revealed in the interview. I dont know how to explain it - and its not like Im in this place where I think Im just above it - but I just dont ever want to get comfortable with that part of things. The Master, which also stars Philip Seymour Hoffman, is Phoenixs first feature film since Im Still Here, Casey Afflecks spoof 2010 documentary which chronicled Phoenixs supposed retirement from acting to launch a career as a rapper. Phoenix called that experience unbelievably liberating and said it was hard subsequently to find projects that interested and excited him. I mean, everything that they teach you when youre a kid about acting is completely...wrong. They tell you to memorise your lines, follow your light, and hit your marks. Those are the three things that you shouldnt do. You should not learn your lines, you should not hit your mark, and you should never follow your light. Find your light - thats my opinion, he said.",20004220,"[0, 41932, 6, 2908, 6, 174, 21902, 10202, 35, 38, 33976, 236, 7, 28, 10, 233, 9, 24, 4, 38, 33976, 679, 11, 24, 4, 20, 2701, 34, 57, 13402, 13, 4188, 1282, 20, 6935, 6, 11, 61, 37, 1974, 10, 997, 3142, 4, 91, 21, 7076, 13, 41, 5887, 13, 39, 774, 25, 930, 7875, 8781, 7871, 11, 9693, 20, 5562, 11, 4013, 4, 5524, 26, 37, 21, 24506, 154, 5, 2568, 3049, 4188, 191, 61, 32887, 1626, 19, 5, 15300, 11, 902, 1014, 4, 3139, 10, 33129, 6, 53, 63, 5, 2373, 12, 90, 15374, 33129, 38, 548, ...]","[0, 19842, 23186, 5524, 34, 26, 37, 1072, 117, 233, 11, 5, 1569, 539, 4188, 191, 6, 1765, 5, 4188, 22, 620, 35820, 113, 8, 22, 40747, 2088, 845, 2]","Joaquin Phoenix has said he wants no part in the movie industry awards season, calling the awards ""stupid"" and ""subjective""."
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old striker, on loan from Al Hilal, pounced on a sloppy pass from defender Daniel Ayala and slotted a fine finish past Darren Randolph. Boro forward Martin Braithwaite should have levelled at Molineux before the break, but headed wide while unmarked. Britt Assombalonga was denied by goalkeeper John Ruddy as Boro pressed late on and Wolves held on for victory. Wolves new manager Nuno Espirito Santo handed starts to seven summer acquisitions, including the Championships record signing Ruben Neves in midfield. Finding a prolific striker was one of Nunos priorities ahead of the campaign, after midfielder Dave Edwards and winger Helder Costa, missing through injury, topped Wolves scoring charts last season. And it was former Brazil under-17 player Bonatini who made the difference with the most composed of strikes after some terrible defending. Ayala attempted to pass the ball to Ben Gibson, but it was woefully short allowing Bonatini to nip in and place his effort into the bottom corner. Middlesbrough also struggled to find the back of the net in 2016-17, scoring just 27 times as they were relegated from the Premier League. New boss Garry Monk started three of his new forwards - Ashley Fletcher, Assombalonga and Denmark international Braithwaite - but they could not find a way through a resolute Wolves defence. Wolves head coach Nuno Espirito Santo: It was a tough game and in the first half we played very well. We controlled the game and this is the way we should work. I think that we deserved the three points and we are pleased with the boys. We are still not the final product and every game will be better. This is the line that we want from the boys, always progress, always get better. Middlesbrough boss Garry Monk: In the first half Wolves were the better team and we made too many mistakes and obviously one of them led to a goal. But I thought we were the better team in the second half and we upped our level of urgency. We need that at the start of games. We had the best chances in the game and on any other day we could have taken one or two of them. That is football and is sometimes the way that it works out. Match ends, Wolverhampton Wanderers 1, Middlesbrough 0. Second Half ends, Wolverhampton Wanderers 1, Middlesbrough 0. Adam Forshaw (Middlesbrough) is shown the yellow card for a bad foul. Foul by Adam Forshaw (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Hand ball by David Edwards (Wolverhampton Wanderers). Offside, Wolverhampton Wanderers. Matt Doherty tries a through ball, but David Edwards is caught offside. Foul by Daniel Ayala (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Attempt missed. Cyrus Christie (Middlesbrough) right footed shot from outside the box is just a bit too high. Substitution, Wolverhampton Wanderers. Jordan Graham replaces Diogo Jota. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from outside the box is too high. Substitution, Middlesbrough. Rudy Gestede replaces Jonny Howson. Attempt saved. Patrick Bamford (Middlesbrough) left footed shot from outside the box is saved in the centre of the goal. Assisted by Jonny Howson. Substitution, Wolverhampton Wanderers. David Edwards replaces Bright Enobakhare. Romain Saiss (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Cyrus Christie (Middlesbrough). Attempt blocked. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box is blocked. Corner, Middlesbrough. Conceded by John Ruddy. Attempt saved. Britt Assombalonga (Middlesbrough) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Patrick Bamford with a through ball. Attempt missed. Jonny Howson (Middlesbrough) right footed shot from outside the box misses to the left. Assisted by Adam Clayton following a set piece situation. Foul by Willy Boly (Wolverhampton Wanderers). Patrick Bamford (Middlesbrough) wins a free kick on the right wing. Attempt blocked. Willy Boly (Wolverhampton Wanderers) left footed shot from the centre of the box is blocked. Assisted by Barry Douglas with a cross. Corner, Wolverhampton Wanderers. Conceded by Adam Clayton. Attempt blocked. Bright Enobakhare (Wolverhampton Wanderers) left footed shot from outside the box is blocked. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from the right side of the six yard box is close, but misses to the right. Assisted by Barry Douglas with a cross following a corner. Corner, Wolverhampton Wanderers. Conceded by George Friend. Substitution, Middlesbrough. Adam Forshaw replaces Marten de Roon. Offside, Middlesbrough. Marten de Roon tries a through ball, but Adam Clayton is caught offside. Corner, Middlesbrough. Conceded by Conor Coady. Diogo Jota (Wolverhampton Wanderers) wins a free kick on the left wing. Foul by Cyrus Christie (Middlesbrough). Corner, Wolverhampton Wanderers. Conceded by Marten de Roon. Adam Clayton (Middlesbrough) is shown the yellow card for a bad foul. Diogo Jota (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Adam Clayton (Middlesbrough). Substitution, Wolverhampton Wanderers. Nouha Dicko replaces Léo Bonatini. Substitution, Middlesbrough. Patrick Bamford replaces Ashley Fletcher. Attempt missed. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box misses to the left following a corner.",40760787,"[0, 133, 883, 12, 180, 12, 279, 5955, 6, 15, 2541, 31, 726, 14909, 337, 6, 181, 20155, 15, 10, 26654, 1323, 31, 5142, 3028, 5847, 2331, 8, 3369, 14265, 10, 2051, 2073, 375, 11335, 24500, 4, 7943, 139, 556, 1896, 9076, 3432, 2739, 1459, 197, 33, 16066, 9970, 23, 256, 18675, 7073, 137, 5, 1108, 6, 53, 3475, 1810, 150, 30161, 4, 16278, 6331, 5223, 20774, 102, 21, 2296, 30, 7551, 610, 248, 24471, 25, 7943, 139, 11224, 628, 15, 8, 13889, 547, 15, 13, 1124, 4, 13889, 92, 1044, 234, 25217, 11631, 30710, 139, 8550, 139, 4507, 2012, ...]","[0, 31004, 811, 12520, 5520, 415, 2531, 1008, 15, 39, 2453, 7, 492, 13889, 41, 1273, 12, 1208, 3261, 339, 136, 20421, 428, 10344, 4, 2]",Brazilian Leo Bonatini scored on his debut to give Wolves an opening-day Championship win against Middlesbrough.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Sinead Higgins, 37, and son Oisin ODriscoll were found after police forced their way into the house in The Fairway, Ruislip, west London on Wednesday. Police responded to concerns for the pairs welfare at about 10:50 GMT. A Met Police spokeswoman said detectives do not believe a third party was involved in the deaths. Det Insp Dave Bolton said: Inquiries so far lead us to believe there is a likelihood that the tragic events that led to the deaths do not involve a third party. A post-mortem examination is scheduled to take place on Friday. Next of kin have been informed.",38329030,"[0, 104, 833, 625, 19422, 6, 2908, 6, 8, 979, 384, 29761, 384, 14043, 4473, 3937, 58, 303, 71, 249, 1654, 49, 169, 88, 5, 790, 11, 20, 3896, 1970, 6, 10318, 13714, 1588, 6, 3072, 928, 15, 307, 4, 522, 2334, 7, 1379, 13, 5, 15029, 6642, 23, 59, 158, 35, 1096, 5050, 4, 83, 4369, 522, 3582, 26, 10412, 109, 45, 679, 10, 371, 537, 21, 963, 11, 5, 3257, 4, 11185, 19190, 4475, 12160, 26, 35, 28727, 19947, 98, 444, 483, 201, 7, 679, 89, 16, 10, 11801, 14, 5, 8805, 1061, 14, 669, 7, 5, 3257, ...]","[0, 133, 3738, 9, 10, 985, 8, 69, 707, 12, 180, 12, 279, 979, 33, 57, 2967, 23, 49, 184, 4, 2]",The bodies of a mother and her seven-year-old son have been discovered at their home.


Unnamed: 0,attention_mask,document,id,input_ids,labels,summary
0,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Phoenix, 37, told Interview Magazine: I dont want to be a part of it. I dont believe in it. The actor has been tipped for awards success The Master, in which he plays a war veteran. He was nominated for an Oscar for his role as music legend Johnny Cash in Walk The Line in 2005. Phoenix said he was dreading the upcoming Hollywood awards season which culminates with the Oscars in February 2013. Its a carrot, but its the worst-tasting carrot Ive ever tasted in my whole life. I dont want this carrot, he said. Its totally subjective. Pitting people against each other....Its the stupidest thing in the whole world. Phoenix described the period when Walk The Line was up for multiple awards seven years ago as one of the most uncomfortable periods of my life. I never want to have that experience again, he revealed in the interview. I dont know how to explain it - and its not like Im in this place where I think Im just above it - but I just dont ever want to get comfortable with that part of things. The Master, which also stars Philip Seymour Hoffman, is Phoenixs first feature film since Im Still Here, Casey Afflecks spoof 2010 documentary which chronicled Phoenixs supposed retirement from acting to launch a career as a rapper. Phoenix called that experience unbelievably liberating and said it was hard subsequently to find projects that interested and excited him. I mean, everything that they teach you when youre a kid about acting is completely...wrong. They tell you to memorise your lines, follow your light, and hit your marks. Those are the three things that you shouldnt do. You should not learn your lines, you should not hit your mark, and you should never follow your light. Find your light - thats my opinion, he said.",20004220,"[0, 41932, 6, 2908, 6, 174, 21902, 10202, 35, 38, 33976, 236, 7, 28, 10, 233, 9, 24, 4, 38, 33976, 679, 11, 24, 4, 20, 2701, 34, 57, 13402, 13, 4188, 1282, 20, 6935, 6, 11, 61, 37, 1974, 10, 997, 3142, 4, 91, 21, 7076, 13, 41, 5887, 13, 39, 774, 25, 930, 7875, 8781, 7871, 11, 9693, 20, 5562, 11, 4013, 4, 5524, 26, 37, 21, 24506, 154, 5, 2568, 3049, 4188, 191, 61, 32887, 1626, 19, 5, 15300, 11, 902, 1014, 4, 3139, 10, 33129, 6, 53, 63, 5, 2373, 12, 90, 15374, 33129, 38, 548, ...]","[0, 19842, 23186, 5524, 34, 26, 37, 1072, 117, 233, 11, 5, 1569, 539, 4188, 191, 6, 1765, 5, 4188, 22, 620, 35820, 113, 8, 22, 40747, 2088, 845, 2]","Joaquin Phoenix has said he wants no part in the movie industry awards season, calling the awards ""stupid"" and ""subjective""."
1,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","The 23-year-old striker, on loan from Al Hilal, pounced on a sloppy pass from defender Daniel Ayala and slotted a fine finish past Darren Randolph. Boro forward Martin Braithwaite should have levelled at Molineux before the break, but headed wide while unmarked. Britt Assombalonga was denied by goalkeeper John Ruddy as Boro pressed late on and Wolves held on for victory. Wolves new manager Nuno Espirito Santo handed starts to seven summer acquisitions, including the Championships record signing Ruben Neves in midfield. Finding a prolific striker was one of Nunos priorities ahead of the campaign, after midfielder Dave Edwards and winger Helder Costa, missing through injury, topped Wolves scoring charts last season. And it was former Brazil under-17 player Bonatini who made the difference with the most composed of strikes after some terrible defending. Ayala attempted to pass the ball to Ben Gibson, but it was woefully short allowing Bonatini to nip in and place his effort into the bottom corner. Middlesbrough also struggled to find the back of the net in 2016-17, scoring just 27 times as they were relegated from the Premier League. New boss Garry Monk started three of his new forwards - Ashley Fletcher, Assombalonga and Denmark international Braithwaite - but they could not find a way through a resolute Wolves defence. Wolves head coach Nuno Espirito Santo: It was a tough game and in the first half we played very well. We controlled the game and this is the way we should work. I think that we deserved the three points and we are pleased with the boys. We are still not the final product and every game will be better. This is the line that we want from the boys, always progress, always get better. Middlesbrough boss Garry Monk: In the first half Wolves were the better team and we made too many mistakes and obviously one of them led to a goal. But I thought we were the better team in the second half and we upped our level of urgency. We need that at the start of games. We had the best chances in the game and on any other day we could have taken one or two of them. That is football and is sometimes the way that it works out. Match ends, Wolverhampton Wanderers 1, Middlesbrough 0. Second Half ends, Wolverhampton Wanderers 1, Middlesbrough 0. Adam Forshaw (Middlesbrough) is shown the yellow card for a bad foul. Foul by Adam Forshaw (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Hand ball by David Edwards (Wolverhampton Wanderers). Offside, Wolverhampton Wanderers. Matt Doherty tries a through ball, but David Edwards is caught offside. Foul by Daniel Ayala (Middlesbrough). Jordan Graham (Wolverhampton Wanderers) wins a free kick on the left wing. Attempt missed. Cyrus Christie (Middlesbrough) right footed shot from outside the box is just a bit too high. Substitution, Wolverhampton Wanderers. Jordan Graham replaces Diogo Jota. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from outside the box is too high. Substitution, Middlesbrough. Rudy Gestede replaces Jonny Howson. Attempt saved. Patrick Bamford (Middlesbrough) left footed shot from outside the box is saved in the centre of the goal. Assisted by Jonny Howson. Substitution, Wolverhampton Wanderers. David Edwards replaces Bright Enobakhare. Romain Saiss (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Cyrus Christie (Middlesbrough). Attempt blocked. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box is blocked. Corner, Middlesbrough. Conceded by John Ruddy. Attempt saved. Britt Assombalonga (Middlesbrough) right footed shot from the centre of the box is saved in the centre of the goal. Assisted by Patrick Bamford with a through ball. Attempt missed. Jonny Howson (Middlesbrough) right footed shot from outside the box misses to the left. Assisted by Adam Clayton following a set piece situation. Foul by Willy Boly (Wolverhampton Wanderers). Patrick Bamford (Middlesbrough) wins a free kick on the right wing. Attempt blocked. Willy Boly (Wolverhampton Wanderers) left footed shot from the centre of the box is blocked. Assisted by Barry Douglas with a cross. Corner, Wolverhampton Wanderers. Conceded by Adam Clayton. Attempt blocked. Bright Enobakhare (Wolverhampton Wanderers) left footed shot from outside the box is blocked. Attempt missed. Romain Saiss (Wolverhampton Wanderers) left footed shot from the right side of the six yard box is close, but misses to the right. Assisted by Barry Douglas with a cross following a corner. Corner, Wolverhampton Wanderers. Conceded by George Friend. Substitution, Middlesbrough. Adam Forshaw replaces Marten de Roon. Offside, Middlesbrough. Marten de Roon tries a through ball, but Adam Clayton is caught offside. Corner, Middlesbrough. Conceded by Conor Coady. Diogo Jota (Wolverhampton Wanderers) wins a free kick on the left wing. Foul by Cyrus Christie (Middlesbrough). Corner, Wolverhampton Wanderers. Conceded by Marten de Roon. Adam Clayton (Middlesbrough) is shown the yellow card for a bad foul. Diogo Jota (Wolverhampton Wanderers) wins a free kick in the defensive half. Foul by Adam Clayton (Middlesbrough). Substitution, Wolverhampton Wanderers. Nouha Dicko replaces Léo Bonatini. Substitution, Middlesbrough. Patrick Bamford replaces Ashley Fletcher. Attempt missed. Rúben Neves (Wolverhampton Wanderers) right footed shot from outside the box misses to the left following a corner.",40760787,"[0, 133, 883, 12, 180, 12, 279, 5955, 6, 15, 2541, 31, 726, 14909, 337, 6, 181, 20155, 15, 10, 26654, 1323, 31, 5142, 3028, 5847, 2331, 8, 3369, 14265, 10, 2051, 2073, 375, 11335, 24500, 4, 7943, 139, 556, 1896, 9076, 3432, 2739, 1459, 197, 33, 16066, 9970, 23, 256, 18675, 7073, 137, 5, 1108, 6, 53, 3475, 1810, 150, 30161, 4, 16278, 6331, 5223, 20774, 102, 21, 2296, 30, 7551, 610, 248, 24471, 25, 7943, 139, 11224, 628, 15, 8, 13889, 547, 15, 13, 1124, 4, 13889, 92, 1044, 234, 25217, 11631, 30710, 139, 8550, 139, 4507, 2012, ...]","[0, 31004, 811, 12520, 5520, 415, 2531, 1008, 15, 39, 2453, 7, 492, 13889, 41, 1273, 12, 1208, 3261, 339, 136, 20421, 428, 10344, 4, 2]",Brazilian Leo Bonatini scored on his debut to give Wolves an opening-day Championship win against Middlesbrough.
2,"[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]","Sinead Higgins, 37, and son Oisin ODriscoll were found after police forced their way into the house in The Fairway, Ruislip, west London on Wednesday. Police responded to concerns for the pairs welfare at about 10:50 GMT. A Met Police spokeswoman said detectives do not believe a third party was involved in the deaths. Det Insp Dave Bolton said: Inquiries so far lead us to believe there is a likelihood that the tragic events that led to the deaths do not involve a third party. A post-mortem examination is scheduled to take place on Friday. Next of kin have been informed.",38329030,"[0, 104, 833, 625, 19422, 6, 2908, 6, 8, 979, 384, 29761, 384, 14043, 4473, 3937, 58, 303, 71, 249, 1654, 49, 169, 88, 5, 790, 11, 20, 3896, 1970, 6, 10318, 13714, 1588, 6, 3072, 928, 15, 307, 4, 522, 2334, 7, 1379, 13, 5, 15029, 6642, 23, 59, 158, 35, 1096, 5050, 4, 83, 4369, 522, 3582, 26, 10412, 109, 45, 679, 10, 371, 537, 21, 963, 11, 5, 3257, 4, 11185, 19190, 4475, 12160, 26, 35, 28727, 19947, 98, 444, 483, 201, 7, 679, 89, 16, 10, 11801, 14, 5, 8805, 1061, 14, 669, 7, 5, 3257, ...]","[0, 133, 3738, 9, 10, 985, 8, 69, 707, 12, 180, 12, 279, 979, 33, 57, 2967, 23, 49, 184, 4, 2]",The bodies of a mother and her seven-year-old son have been discovered at their home.


# 

In [71]:
tokenized_xsum['test'].features

{'attention_mask': Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None),
 'document': Value(dtype='string', id=None),
 'id': Value(dtype='string', id=None),
 'input_ids': Sequence(feature=Value(dtype='int32', id=None), length=-1, id=None),
 'labels': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'summary': Value(dtype='string', id=None)}

# Compare Machine Summaries to Professional Human Written Summaries
To score our machine generated summaries against professional human written ones, we compute the cosine similarities between embeddings to measure the semantic similaritiy between two texts. The comparisons we will be marking include: human summary to machine summary, human summary to original document, and machine summary to original document. Initially, we wanted to make the maximum length in each machine summary the same length as the summaries in the XSUM. However, because the length of the XSUM summaries are so short (hence the name extreme summaries), the model  only provided the first words of every article. This makes sense because BART's pretraining likely influenced it's methodology to recognize that the start of text often contains valuable summarization inforamtion. As a result we opted for a length of 60 words to keep it brief but allow the model to output enough context to be meaningful. The average summaries for our models are outlined below (at ~19 words per human summary)

We are going to focus on 10 articles and build 10 models to inspect each pair individually

In [72]:
def listToString(s): 
    str1 = "" 
    
    for ele in s: 
        str1 += ele  
 
    return str1 

In [73]:
article1 = tokenized_xsum['test']['document'][0]
article2 = tokenized_xsum['test']['document'][123]
article3 = tokenized_xsum['test']['document'][99]
article4 = tokenized_xsum['test']['document'][1100]
article5 = tokenized_xsum['test']['document'][1118]
article6 = tokenized_xsum['test']['document'][45]
article7 = tokenized_xsum['test']['document'][13]
article8 = tokenized_xsum['test']['document'][69]
article9 = tokenized_xsum['test']['document'][27]
article10 = tokenized_xsum['test']['document'][9]

summary1 = tokenized_xsum['test']['summary'][0]
summary2 = tokenized_xsum['test']['summary'][123]
summary3 = tokenized_xsum['test']['summary'][99]
summary4 = tokenized_xsum['test']['summary'][1100]
summary5 = tokenized_xsum['test']['summary'][1118]
summary6 = tokenized_xsum['test']['summary'][45]
summary7 = tokenized_xsum['test']['summary'][13]
summary8 = tokenized_xsum['test']['summary'][69]
summary9 = tokenized_xsum['test']['summary'][27]
summary10 = tokenized_xsum['test']['summary'][9]


In [74]:
summaryList = [summary1.split(),
summary2.split(), 
summary3.split(), 
summary4.split(),
summary5.split(),
summary6.split(),
summary7.split(), 
summary8.split(),
summary9.split(), 
summary10.split()]

count = sum( [ len(listElem) for listElem in summaryList])

print('The total number of words in these summaries is: ', count)
print('The average words per summary is: ', count / len(summaryList))

The total number of words in these summaries is:  186
The average words per summary is:  18.6


## We had 50% of our models run with the parameters early_stopping=True and 50% with early_stopping=False to see if this would provide any meaningful difference

## Model 1

In [75]:
input1 = tokenizer(article1, return_tensors='pt', truncation=True)
summary_ids1 = model.generate(input1['input_ids'], max_length=60)
machineSummary1 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids1])

In [76]:
machineSummary1 = listToString(machineSummary1)
original1 = listToString(article1)

comparison1 = [summary1, machineSummary1, original1]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings1 = token_model.encode(comparison1)
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings1[0], comparison_embeddings1[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings1[1], comparison_embeddings1[2])) # machine summary to original article

tensor([[0.7313]])
tensor([[0.7645]])
tensor([[0.9574]])


In [77]:
comparison1

['There is a "chronic" need for more housing for prison leavers in Wales, according to a charity.',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. Welsh Government said',
 'Prison Link Cymru had 1,099 referrals in 2015-16 and said some ex-offenders were living rough for up to a year before finding suitable accommodation. Workers at the charity claim investment in housing would be cheaper than jailing homeless repeat offenders. The Welsh Government said more people than ever were getting help to address housing problems. Changes to the Housing Act in Wales, introduced in 2015, removed the right for prison leavers to be given priority for accommodation. Prison Link Cymru, which helps people find accommodation after their release, said things were generally good for women because

# Model 2

In [78]:
input2 = tokenizer(article2, return_tensors='pt', truncation=True)
summary_ids2 = model.generate(input2['input_ids'], max_length=60)
machineSummary2 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids2])

In [79]:
machineSummary2 = listToString(machineSummary2)
original2 = listToString(article2)

comparison2 = [summary2, machineSummary2, original2]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings2 = token_model.encode(comparison2)
print(util.pytorch_cos_sim(comparison_embeddings2[0], comparison_embeddings2[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings2[0], comparison_embeddings2[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings2[1], comparison_embeddings2[2])) # machine summary to original article

tensor([[0.7189]])
tensor([[0.5850]])
tensor([[0.6048]])


In [80]:
comparison2

["For a man often described as capricious, Tyson Fury's chaotic reign as world heavyweight champion was strangely predictable.",
 'Fury has been speaking about his mental health struggles for years. The repeated claims from Furys camp that his victory was downplayed by the British media, and that they had an agenda against him from the outset, are delusional. Fury is not the first boxer to lose motivation having reached',

# Model 3

In [81]:
input3 = tokenizer(article3, return_tensors='pt', truncation=True)
summary_ids3 = model.generate(input3['input_ids'], max_length=60)
machineSummary3 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids3])

In [82]:
machineSummary3 = listToString(machineSummary3)
original3 = listToString(article3)

comparison3 = [summary3, machineSummary3, original3]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings3 = token_model.encode(comparison3)
print(util.pytorch_cos_sim(comparison_embeddings3[0], comparison_embeddings3[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings3[0], comparison_embeddings3[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings3[1], comparison_embeddings3[2])) # machine summary to original article

tensor([[0.5551]])
tensor([[0.7642]])
tensor([[0.8500]])


In [83]:
comparison3

['A barrister who was due to move into his own chambers in Huddersfield has pleaded guilty to supplying cocaine.',
 'Omar Khan, 31, had worked at The Johnson Partnership in Nottingham for five years. Partner Digby Johnson said he did not represent Khan, who had set up his own office and was set to leave the company. Erlin Manahasa, Albert Dibra and Naza',
 'Omar Khan, 31, had worked at The Johnson Partnership in Nottingham for five years before he was arrested. Erlin Manahasa, Albert Dibra and Nazaquat Ali joined Khan in admitting the same charge, between 1 October  and 4 December last year, at Nottingham Crown Court. They are due to be sentenced on 15 April. Updates on this story and more from Nottinghamshire The court heard the case involved the recovery of 1kg (2.2lb) of cocaine. Digby Johnson, a partner at the Johnson firm, confirmed they did not represent Khan - who had set up his own office and was set to leave the company. I still find it hard to believe he could do something as

# Model 4

In [84]:
input4 = tokenizer(article4, return_tensors='pt', truncation=True)
summary_ids4 = model.generate(input4['input_ids'], max_length=60)
machineSummary4 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids4])

In [85]:
machineSummary4 = listToString(machineSummary4)
original4 = listToString(article4)

comparison4 = [summary4, machineSummary4, original4]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings4 = token_model.encode(comparison4)
print(util.pytorch_cos_sim(comparison_embeddings4[0], comparison_embeddings4[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings4[0], comparison_embeddings4[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings4[1], comparison_embeddings4[2])) # machine summary to original article

tensor([[0.5436]])
tensor([[0.6342]])
tensor([[0.8264]])


In [86]:
comparison4

['Star Wars fans are being given the opportunity to become Jedi Knights and learn how to wield lightsabers in combat.',
 'The sport began eight years ago in Italy but has only just come to England with the first classes in Cheltenham. Instructor Jordan Court said people were already hooked. The lightsabers used in the sport are all hand-made and are provided for use during the classes.',
 'LudoSport has opened its first academy teaching seven forms of combat from the Star Wars world using flexible blades mounted on weighted hilts. The sport began eight years ago in Italy but has only just come to England with the first classes in Cheltenham. Instructor Jordan Court said people were already hooked. The classes in Cheltenham began last month. So far there are six pupils, but this number is expected to increase. Mr Court attended an international boot camp to learn the different stages of the sport which range in characteristics from defensive in stage one to aggressive and flamboyant in 

# Model 5

In [87]:
input5 = tokenizer(article5, return_tensors='pt', truncation=True)
summary_ids5 = model.generate(input5['input_ids'], max_length=60)
machineSummary5 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids5])

In [88]:
machineSummary5 = listToString(machineSummary5)
original5 = listToString(article5)

comparison5 = [summary5, machineSummary5, original5]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings5 = token_model.encode(comparison5)
print(util.pytorch_cos_sim(comparison_embeddings5[0], comparison_embeddings5[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings5[0], comparison_embeddings5[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings5[1], comparison_embeddings5[2])) # machine summary to original article

tensor([[0.5847]])
tensor([[0.6152]])
tensor([[0.9742]])


In [89]:
comparison5

['Awareness rides are taking place to try and cut the number of people on horseback injured or killed on roads.',
 'The Pass Wide and Slow Wales campaign has collected 1,300 signatures on the assemblys e-petition website. It wants an annual road safety awareness campaign explaining to motorists how to react around horses. The British Horse Society found that since 2010 there have been 2,000 road accidents in',
 'The Pass Wide and Slow Wales campaign has collected 1,300 signatures on the assemblys e-petition website. It wants an annual road safety awareness campaign explaining to motorists how to react around horses. The British Horse Society found that since 2010 there have been 2,000 road accidents in the UK, with 1,500 because of cars passing too closely. As a result of these, 180 horses and 36 riders have died. Awareness rides were planned for Penarth, Vale of Glamorgan, Swansea, Neyland in Pembrokeshire, Machynlleth, Powys, Flintshire and Porthmadog in Gwynedd. Any petition with ov

# Model 6

In [90]:
input6 = tokenizer(article6, return_tensors='pt', truncation=True)
summary_ids6 = model.generate(input6['input_ids'], max_length=60, early_stopping=False)
machineSummary6 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids6])

In [91]:
machineSummary6 = listToString(machineSummary6)
original6 = listToString(article6)

comparison6 = [summary6, machineSummary6, original6]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings6 = token_model.encode(comparison6)
print(util.pytorch_cos_sim(comparison_embeddings6[0], comparison_embeddings6[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings6[0], comparison_embeddings6[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings6[1], comparison_embeddings6[2])) # machine summary to original article

tensor([[0.7071]])
tensor([[0.7340]])
tensor([[0.9464]])


In [92]:
comparison6

['Two new councillors have been elected in a by-election in the City of Edinburgh.',
 'SNP topped the vote in the Leith Walk by-election. Scottish Labour won the second seat from the Greens. Deidre Brock of the SNP and Maggie Chapman of the Scottish Greens stood down. It was the first time the Single Transferable Vote (STV) system had',
 'It was the first time the Single Transferable Vote (STV) system had been used to select two members in the same ward in a by-election. The SNP topped the vote in the Leith Walk by-election, while Scottish Labour won the second seat from the Greens. The by-election was called after Deidre Brock of the SNP and Maggie Chapman of the Scottish Greens stood down. The SNPs John Lewis Ritchie topped the Leith Walk poll with 2,290 votes. He was elected at stage one in the STV process with a swing in first-preference votes of 7.6% from Labour. Labours Marion Donaldson received 1,623 votes, ahead of Susan Jane Rae of the Scottish Greens on 1,381. Ms Donaldson wa

# Model 7

In [93]:
input7 = tokenizer(article7, return_tensors='pt', truncation=True)
summary_ids7 = model.generate(input7['input_ids'], max_length=60, early_stopping=False)
machineSummary7 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids7])

In [94]:
machineSummary7 = listToString(machineSummary7)
original7 = listToString(article7)

comparison7 = [summary7, machineSummary7, original7]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings7 = token_model.encode(comparison7)
print(util.pytorch_cos_sim(comparison_embeddings7[0], comparison_embeddings7[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings7[0], comparison_embeddings7[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings7[1], comparison_embeddings7[2])) # machine summary to original article

tensor([[0.7054]])
tensor([[0.6673]])
tensor([[0.9054]])


In [95]:
comparison7

["Torquay United boss Kevin Nicholson says none of the money from Eunan O'Kane's move to Leeds from Bournemouth will go to the playing squad.",
 ' OKane moved for an undisclosed fee, but Nicholson says any money will go to help the cash-strapped club. The Gulls are still looking for new owners having been taken over by a consortium of local business people last summer. They were forced to close down the clubs academy',
 'The National League sold the Republic of Ireland midfielder to the Cherries for £175,000 in 2012 and had a 15% sell-on clause included in the deal. OKane moved for an undisclosed fee, but Nicholson says any money will go to help the cash-strapped club. I dont think Ill be getting anything, Nicholson told BBC Devon. Theres more important things. The Gulls are still looking for new owners having been taken over by a consortium of local business people last summer. They were forced to close down the clubs academy and drastically reduce the playing budget after millionaire

# Model 8

In [96]:
input8 = tokenizer(article8, return_tensors='pt', truncation=True)
summary_ids8 = model.generate(input8['input_ids'], max_length=60, early_stopping=False)
machineSummary8 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids8])

In [97]:
machineSummary8 = listToString(machineSummary8)
original8 = listToString(article8)

comparison8 = [summary8, machineSummary8, original8]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings8 = token_model.encode(comparison8)
print(util.pytorch_cos_sim(comparison_embeddings8[0], comparison_embeddings8[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings8[0], comparison_embeddings8[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings8[1], comparison_embeddings8[2])) # machine summary to original article

tensor([[0.5923]])
tensor([[0.6410]])
tensor([[0.9681]])


In [98]:
comparison8

['Manufacturers have reported positive business trends, in the latest survey from the Scottish Chambers of Commerce.',
 'Manufacturers reported their highest growth in new orders for nearly three years. In retail, there was also a return to optimism - though only just. In tourism, firms reported improving visitor numbers in the final quarter of the year, but falling sales revenues. Construction is expecting an investment dip.',

# Model 9

In [99]:
input9 = tokenizer(article9, return_tensors='pt', truncation=True)
summary_ids9 = model.generate(input9['input_ids'], max_length=60, early_stopping=False)
machineSummary9 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids9])

In [100]:
machineSummary9 = listToString(machineSummary9)
original9 = listToString(article9)

comparison9 = [summary9, machineSummary9, original9]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings9 = token_model.encode(comparison9)
print(util.pytorch_cos_sim(comparison_embeddings9[0], comparison_embeddings9[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings9[0], comparison_embeddings9[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings9[1], comparison_embeddings9[2])) # machine summary to original article

tensor([[0.8161]])
tensor([[0.8348]])
tensor([[0.8977]])


In [101]:
comparison9

['Of his last 30 matches in 2016, Andy Murray won 28 and lost just two.',
 'The world number one has won 21 of his first 30 matches in 2017. Murray has had shingles and an elbow problem, and now his left hip is proving cause for concern. Opting out of two scheduled exhibition matches at the Hurlingham Club in London may not be too',
 'Media playback is not supported on this device Of his first 30 matches in 2017, the world number one has won 21 and lost nine. Winning his last five tournaments of 2016 to pip Novak Djokovic to the year-end number one position in the final match of the season at Londons O2 Arena was astonishing, dramatic and unforgettable. And yet it appears that relentless run of success, and the 87 matches he played over a season, has come at a price. Murrays straight-set defeat by world number 90 Jordan Thompson in the first round at Queens Club was the sixth time he has lost to a player outside the top 20 this year. He has had shingles and an elbow problem, and now hi

# Model 10

In [102]:
input10 = tokenizer(article10, return_tensors='pt', truncation=True)
summary_ids10 = model.generate(input10['input_ids'], max_length=60, early_stopping=False)
machineSummary10 = ([tokenizer.decode(g, skip_special_tokens=True) for g in summary_ids10])

In [103]:
machineSummary10 = listToString(machineSummary10)
summary10 = listToString(summary10)
original10 = listToString(article10)

comparison10 = [summary10, machineSummary10, original10]
token_model = SentenceTransformer('distilbert-base-nli-mean-tokens')
comparison_embeddings10 = token_model.encode(comparison10)
print(util.pytorch_cos_sim(comparison_embeddings10[0], comparison_embeddings10[1])) # human summary to machine summary similarity
print(util.pytorch_cos_sim(comparison_embeddings10[0], comparison_embeddings10[2])) # human summary to original article
print(util.pytorch_cos_sim(comparison_embeddings10[1], comparison_embeddings10[2])) # machine summary to original article

tensor([[0.7916]])
tensor([[0.7987]])
tensor([[0.7452]])


In [104]:
comparison10

["Manager Brendan Rodgers is sure Celtic can exploit the wide open spaces of Hampden when they meet Rangers in Sunday's League Cup semi-final.",
 "Celtic face Rangers in the Scottish Cup semi-final at Hampden Park. Brendan Rodgers' side beat Rangers 5-1 at Celtic Park last month. Rodgers lost two semi-finals in his time at Liverpool and is aiming to make it third time lucky at the club he joined",
 'Im really looking forward to it - the home of Scottish football, said Rodgers ahead of his maiden visit. I hear the pitch is good, a nice big pitch suits the speed in our team and our intensity. The technical area goes right out to the end of the pitch, but you might need a taxi to get back to your staff. This will be Rodgers second taste of the Old Firm derby and his experience of the fixture got off to a great start with a 5-1 league victory at Celtic Park last month. It was a brilliant performance by the players in every aspect, he recalled. Obviously this one is on a neutral ground, but

# Conclusion

We can see that the machine model had higher cosine similarity to the original article 70% of the time compared to the human article. However, this may be influenced by the fact that the length of the machine summary was about 3x the size of the average human summary. The argument early_stopping=True/False did not appear to have any real affect on cosine-similarity at the max length size of 60 (we compared the 10 models with and without and obtained similar results). The pretrained transformers do provide relevant summaries when reviewing these articles so it appears there is a definite use case for providing news article snippits in products like Bloomberg First Word or other content editors. 20% of the models showed the machine vs human summaries having relatively equivalent cosine similarities. It appears that human summaries are shorter and more semantically similar to articles than machine summaries for articles about sports and athletes. This may be an area that huggingface could focus on pretraining new pipelines, transformers, and models in the future to expand their use cases.