"Sentence Features for Extractive Summarization:
1. Surface Features: Based on structure of documents or sentences, including **sentence position** in the document, the **number of words in the sentence**, and the number of quoted words in the sentence.
2. Content Features: Integrated three well-known sentence features based on content-bearing words i.e., centroid words, **signature terms**, and **high frequency words**.
3. Event Features: An **event** is comprised of an event term and associated event elements.
4. Relevance features: Incorporated to exploit inter-sentence relationships. It is assumed that: (1) sentences related to important sentences are important; (2) sentences related to many other sentences are important. The first sentence in a document or a paragraph is important, and other sentences in a document are compared with the leading ones."

(We have chosen to use the features in bold.)

-- *Extractive Summarization Using Supervised and Semi-Supervised Learning*

http://anthology.aclweb.org/C/C08/C08-1124.pdf

In [1]:
from collections import Counter
import pandas as pd
import spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     --------------------------------------- 0.0/12.8 MB 640.0 kB/s eta 0:00:20
     --------------------------------------- 0.1/12.8 MB 656.4 kB/s eta 0:00:20
      --------------------------------------- 0.2/12.8 MB 1.4 MB/s eta 0:00:10
     - -------------------------------------- 0.5/12.8 MB 2.6 MB/s eta 0:00:05
     --- ------------------------------------ 1.1/12.8 MB 4.6 MB/s eta 0:00:03
     ----- ---------------------------------- 1.7/12.8 MB 6.1 MB/s eta 0:00:02
     ------- -------------------------------- 2.3/12.8 MB 7.1 MB/s eta 0:00:02
     --------- ------------------------------ 3.0/12.8 MB 7.9 MB/s eta 0:00:02
     ---------- ----------------------------- 3.3/12.8 MB 8.0 MB/s eta 0:00:02
     ---------- -----------------------

In [2]:
# Load the processed and cleaned small portion of dataframes.
train_final = pd.read_csv("C:/Users/ankar/Desktop/summarization-models/final_processed-cleaned_train.csv")
val_final = pd.read_csv("C:/Users/ankar/Desktop/summarization-models/final_processed-cleaned_val.csv")
test_final = pd.read_csv("C:/Users/ankar/Desktop/summarization-models/final_processed-cleaned_test.csv")

In [3]:
# Hold the three datasets.
datasets = {'train': train_final, 'validation': val_final, 'test': test_final}

In [4]:
datasets['train'].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28711 entries, 0 to 28710
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Unnamed: 0          28711 non-null  int64 
 1   article             28711 non-null  object
 2   highlights          28711 non-null  object
 3   id                  28711 non-null  object
 4   internet-free_art   28711 non-null  object
 5   internet-free_high  28711 non-null  object
 6   boiler-free_art     28711 non-null  object
 7   boiler-free_high    28711 non-null  object
dtypes: int64(1), object(7)
memory usage: 1.8+ MB


In [5]:
datasets['validation'].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1337 entries, 0 to 1336
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Unnamed: 0          1337 non-null   int64 
 1   article             1337 non-null   object
 2   highlights          1337 non-null   object
 3   id                  1337 non-null   object
 4   internet-free_art   1337 non-null   object
 5   internet-free_high  1337 non-null   object
 6   boiler-free_art     1337 non-null   object
 7   boiler-free_high    1337 non-null   object
dtypes: int64(1), object(7)
memory usage: 83.7+ KB


In [6]:
datasets['test'].info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11490 entries, 0 to 11489
Data columns (total 8 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Unnamed: 0          11490 non-null  int64 
 1   article             11490 non-null  object
 2   highlights          11490 non-null  object
 3   id                  11490 non-null  object
 4   internet-free_art   11490 non-null  object
 5   internet-free_high  11490 non-null  object
 6   boiler-free_art     11490 non-null  object
 7   boiler-free_high    11490 non-null  object
dtypes: int64(1), object(7)
memory usage: 718.3+ KB


In [7]:
#datasets['train'].head(2)
#datasets['validation'].head(2)
datasets['test']['boiler-free_art'].head(2)

0    The Palestinian Authority officially became th...
1    Never mind cats having nine lives. A stray poo...
Name: boiler-free_art, dtype: object

In [8]:
# For readability.
# pd.set_option('display.max_colwidth', 200)
pd.set_option('max_colwidth', None)
# pd.set_option('display.max_columns', None)

### Sentence Tokenization from cleaner, internet-free and boiler-free sentences. And explode the dfs.

In [9]:
# Speed up the following feature extraction, which invloves CPU-intensive processes.
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=True) #  Use ALL of the laptop's CPU cores.

INFO: Pandarallel will run on 14 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.

https://nalepae.github.io/pandarallel/troubleshooting/


In [10]:
def sentence_tokenization(text):
    import spacy
    nlp = spacy.load('en_core_web_sm')
    doc = nlp(text)
    sentences_as_sublists = [[token.text for token in sentence] for sentence in doc.sents]
    return sentences_as_sublists

#example
#t = """Luke Shaw has conceded his debut season at Manchester United has been a frustrating one and admits he doesn't know what he would do if he scored against boyhood club Chelsea on Saturday. Shaw joined United last summer in a £31.5million transfer from Southampton after an impressive campaign on the south coast, which resulted in him playing for England at the World Cup. However, injuries have plagued the 19-year-old's start to life at Old Trafford with the defender picking up his latest fitness setback during United's 2-1 FA Cup sixth-round exit at home to Arsenal last month. Luke Shaw admits he has endured a 'frustrating' debut season at Manchester United giving himself a 'C-' The 19-year-old's campaign has been beset by injuries since joining from Southampton last summer. """
#exa = sentence_tokenization(t)
#print(exa)

In [11]:
for name, df in datasets.items():
    # Tokenize sentences for articles and highlights
    df['art_tokenized'] = df['boiler-free_art'].parallel_apply(sentence_tokenization)
    df['high_tokenized'] = df['boiler-free_high'].parallel_apply(sentence_tokenization)
    # Explode the dataframe so each row contains one sentence.
    df = df.explode('art_tokenized')
    df = df.rename(columns={'art_tokenized': 'sentences'})
    df = df.reset_index(drop=True)
    datasets[name] = df

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=2051), Label(value='0 / 2051'))), …

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=2051), Label(value='0 / 2051'))), …

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=96), Label(value='0 / 96'))), HBox…

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=96), Label(value='0 / 96'))), HBox…

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=821), Label(value='0 / 821'))), HB…

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=821), Label(value='0 / 821'))), HB…

In [12]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
datasets['train']['high_tokenized'].head(2)

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized'],
      dtype='object')


0    [[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, ., \n], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, ., \n], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]
1    [[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, ., \n], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, ., \n], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]
Name: high_tokenized, dtype: object

In [13]:
datasets['train']['sentences'].head(2)

0    [Lukas, Podolski, scored, his, first, goal, since, returning, to, Cologne, but, it, was, not, enough, as, they, crashed, to, a, defeat, 2, -, 1, against, Schalke, which, leaves, them, bottom, of, the, German, Bundesliga, .]
1                                                                                             [Podolski, celebrates, his, first, goal, since, returning, to, Cologne, ,, but, it, did, not, stop, his, side, sliding, to, defeat, .]
Name: sentences, dtype: object

In [14]:
# Remove newline token from the tokenized highlights (high_tokenized) column.
for name, df in datasets.items():
    if 'high_tokenized' in df.columns:
        df['high_tokenized'] = df['high_tokenized'].parallel_apply(lambda sentences: [[token for token in sentence if token != '\n'] for sentence in sentences])
    datasets[name] = df

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=79435), Label(value='0 / 79435')))…

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=3204), Label(value='0 / 3204'))), …

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=28291), Label(value='0 / 28291')))…

In [15]:
datasets['train']['high_tokenized'].head(2) #OK.

0    [[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]
1    [[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]
Name: high_tokenized, dtype: object

## Sentence Embeddings

In [16]:
!pip install gensim



In [17]:
import gensim.downloader as api
import numpy as np

# Each row in the sentences column contains a list representing a single sentence, with each word as an element in this list.

model = api.load("word2vec-google-news-300")

def get_mean_sentence_embeddings(df, model):
    mean_embeddings = []
    
    for sentence in df['sentences']:
        if isinstance(sentence, list):
            word_embeddings = [model[word] for word in sentence if word in model.key_to_index]
            if word_embeddings:
                mean_embedding = np.mean(word_embeddings, axis=0)
            else:
                mean_embedding = np.zeros(model.vector_size)
            mean_embeddings.append(mean_embedding)
        else:
            mean_embeddings.append(np.zeros(model.vector_size))

    return mean_embeddings

for name, df in datasets.items():
    df['mean_sent_embeddings_art'] = get_mean_sentence_embeddings(df, model)
    datasets[name] = df

In [19]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
#datasets['train']['mean_sent_embeddings_art'].head(2)
datasets['validation'].head()
#datasets['test'].head(2)

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art'],
      dtype='object')


Unnamed: 0.1,Unnamed: 0,article,highlights,id,internet-free_art,internet-free_high,boiler-free_art,boiler-free_high,sentences,high_tokenized,mean_sent_embeddings_art
0,8477,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,911d354c3ae2eefdd0b3e2e4de98f35205da1fe4,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video. Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks and Karl, nine, Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape.",Critics have made cruel taunts about Rebecca Ferguson's post-baby body. \r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead.,"[Since, giving, birth, to, her, daughter, Arabella, four, months, ago, ,, Rebecca, Ferguson, has, become, the, victim, of, cruel, taunts, about, her, size, 12, post, -, pregnancy, body, .]","[[Critics, have, made, cruel, taunts, about, Rebecca, Ferguson, 's, post, -, baby, body, ., \r\n], [But, singer, , is, refusing, to, bow, to, pressure, to, ', snap, back, into, shape, ', \r\n, She, wants, to, enjoy, being, a, new, mother, to, four, -, month, -, old, Arabella, instead, .]]","[0.027662193, 0.05405393, -0.03408747, -0.030848628, 0.013147437, -0.029952671, 0.043759555, -0.12509023, 0.052129332, 0.062703006, 0.06360593, -0.14858279, -0.02269382, 0.09056838, -0.017980991, 0.03302798, 0.022392357, 0.083795965, -0.02055956, -0.050559666, -0.031165082, 0.04600724, 0.027901027, -0.04097217, 0.023095172, -0.08387624, -0.08010068, 0.027871836, 0.07533795, -0.036785625, 0.0012114152, -0.05086086, -0.086457625, -0.04640828, -0.022938604, -0.013645338, 0.027804168, -0.05839207, -0.03867307, 0.037342403, 0.049364172, -0.033886455, 0.050032906, -0.03487811, 0.054727305, -0.008356509, 0.038465418, 0.04365075, 0.043918777, 0.12766697, -0.016175643, -0.038198058, -0.031019127, 0.003736413, -0.030589228, 0.031589676, -0.075643785, -0.08582737, 0.028790018, -0.069006875, 0.040936016, 0.09072942, -0.025946079, -0.036047895, 0.034641434, 0.004111912, 0.03821995, 0.035663106, -0.04493614, 0.09575089, 0.047716223, 0.0016982037, -4.1795814e-05, 0.08574776, -0.116063654, 0.04603643, 0.0991808, 0.04116158, 0.103743844, 0.17792478, 0.040455695, -0.076116145, -0.009408702, -0.0017992102, -0.10372262, -0.05384893, -0.0521081, 0.09281855, 0.02728006, 0.01097306, 0.043600332, -0.04699707, -0.0903453, -0.06373265, -0.06275741, 0.006252123, 0.034121305, -0.02351114, 0.055457074, 0.047249172, ...]"
1,8477,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,911d354c3ae2eefdd0b3e2e4de98f35205da1fe4,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video. Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks and Karl, nine, Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape.",Critics have made cruel taunts about Rebecca Ferguson's post-baby body. \r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead.,"[But, the, singer, has, now, hit, back, at, her, critics, and, vowed, to, resist, the, pressure, to, slim, down, .]","[[Critics, have, made, cruel, taunts, about, Rebecca, Ferguson, 's, post, -, baby, body, ., \r\n], [But, singer, , is, refusing, to, bow, to, pressure, to, ', snap, back, into, shape, ', \r\n, She, wants, to, enjoy, being, a, new, mother, to, four, -, month, -, old, Arabella, instead, .]]","[0.06628418, 0.07335663, 0.03889656, 0.013214111, -0.045043945, 0.0008125305, -0.0082092285, -0.0953064, 0.08425522, 0.07511902, -0.060092926, -0.11382294, 0.01039505, 0.049114227, -0.10839081, 0.013858795, 0.06663132, 0.042678833, -0.016020775, -0.08219147, 0.017364502, 0.08538818, 0.035087585, -0.03607273, 0.06313133, -0.058288574, -0.07086182, -0.04826355, 0.045591354, -0.018550873, 0.05378723, 0.031562805, -0.07662201, -0.0023477077, 0.029876709, -0.016181946, 0.014930725, 0.01802826, 0.030150414, 0.057994843, 0.093198776, -0.12684631, 0.08836365, -0.09148097, 0.00031089783, 0.025989532, -0.063331604, 0.08298111, 0.12960434, 0.027451158, 0.027014613, 0.029894352, -0.023597717, 0.014839172, -0.041412354, 0.07449341, -0.083833694, -0.049171448, -0.0035133362, -0.052749634, -0.049079895, 0.07566643, -0.07615757, -0.09806824, 0.00088500977, -0.0024528503, 0.075302124, 0.14154416, 0.0065727234, 0.16720581, 0.06377792, 0.014798701, 0.10624695, 0.123435974, -0.1520977, -0.03581524, 0.06338501, 0.10535622, 0.050373077, 0.106666565, 0.033546448, -0.03796959, 0.07025528, -0.04150486, -0.024875045, -0.031104565, -0.10054016, 0.1467514, -0.038017273, 0.018484272, 0.018119812, 0.0015487671, -0.04067993, -0.08946228, -0.07416153, -0.0014572144, 0.032115936, 0.0070824623, -0.006454468, -0.010546684, ...]"
2,8477,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,911d354c3ae2eefdd0b3e2e4de98f35205da1fe4,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video. Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks and Karl, nine, Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape.",Critics have made cruel taunts about Rebecca Ferguson's post-baby body. \r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead.,"[Speaking, to, Fabulous, , magazine, ,, the, 28, -, year, -, old, former, X, Factor, star, confessed, that, recent, comments, aimed, at, her, weight, ,, and, questions, about, when, she, 'll, ', drop, the, post, -, baby, pounds, ', ,, have, left, her, more, determined, than, ever, to, stand, up, for, herself, .]","[[Critics, have, made, cruel, taunts, about, Rebecca, Ferguson, 's, post, -, baby, body, ., \r\n], [But, singer, , is, refusing, to, bow, to, pressure, to, ', snap, back, into, shape, ', \r\n, She, wants, to, enjoy, being, a, new, mother, to, four, -, month, -, old, Arabella, instead, .]]","[0.039373543, 0.0461594, 0.037656344, 0.010656014, -0.0042063394, -0.003594814, 0.020003881, -0.08517691, 0.05518204, 0.052551366, 0.009244087, -0.08413618, 0.03212562, 0.035916843, -0.06674507, 0.044044692, 0.0066982172, 0.07384237, -0.022517668, -0.007945917, 0.004550885, 0.04316203, -0.028246757, 0.018836193, 0.018307418, -0.03831047, -0.10495073, 0.07605059, 0.06404701, -0.039921686, -0.021752577, -0.006609794, -0.048145782, -0.03129695, 0.024648422, 0.0032239084, 0.041323148, 0.0037403596, -0.028609056, 0.05843979, 0.046205178, -0.12114833, 0.08325281, -0.00061035156, 0.008480365, 0.0043108035, -0.02550487, 0.043135423, 0.028048784, 0.050926406, 0.021499438, 0.0072095823, -0.011064579, 0.0067643384, -0.007688669, 0.024274778, -0.026511168, -0.09739567, 0.010630877, -0.07104727, -0.010869735, 0.0547591, -0.06546881, -0.027394613, 0.014532628, -0.0063899113, 0.0019241724, 0.07704906, -0.011382177, 0.10902092, 0.025121445, 0.014560235, 0.06299063, 0.08589133, -0.10238256, -0.007702167, 0.08427625, 0.10188137, -0.0062052407, 0.098703235, 0.026389098, -0.10553135, 0.0229344, -0.027184706, -0.07621413, -0.070783176, -0.0404935, 0.13321628, 0.03776785, 0.02529203, 0.03870881, -0.020652967, -0.06426728, -0.097318895, -0.04338817, -0.0330879, 0.06479664, 0.03046926, 0.011280549, 0.01617823, ...]"
3,8477,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,911d354c3ae2eefdd0b3e2e4de98f35205da1fe4,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video. Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks and Karl, nine, Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape.",Critics have made cruel taunts about Rebecca Ferguson's post-baby body. \r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead.,"[Scroll, down, for, video, ., ]","[[Critics, have, made, cruel, taunts, about, Rebecca, Ferguson, 's, post, -, baby, body, ., \r\n], [But, singer, , is, refusing, to, bow, to, pressure, to, ', snap, back, into, shape, ', \r\n, She, wants, to, enjoy, being, a, new, mother, to, four, -, month, -, old, Arabella, instead, .]]","[-0.021438599, -0.024757385, 0.0869751, 0.13012695, 0.051483154, 0.008178711, 0.120155334, -0.1060791, 0.22851562, 0.07556152, -0.04534912, -0.14007568, 0.12475586, 0.12927246, -0.093048096, -0.11682129, 0.18640137, -0.1060791, -0.13684082, -0.2487793, 0.11816406, 0.112854004, -0.029510498, 0.047180176, 0.12237549, -0.038269043, -0.08882141, 0.14978027, 0.15849304, 0.0028076172, -0.016944885, -0.003768921, -0.10290527, -0.05051422, -0.044891357, -0.06829834, -0.023742676, 0.017486572, 0.00090026855, 0.036956787, -0.04071045, -0.13699341, 0.06777954, 0.0949707, 0.03894043, 0.048461914, 0.026306152, -0.06427002, 0.10925293, -0.13470459, -0.16232729, -0.020065308, 0.16973877, -0.06185913, 0.06365967, 0.05354309, -0.043136597, -0.00018310547, 0.0721817, 0.10479736, 0.0017089844, -0.05041504, -0.04296875, -0.06335449, -0.032165527, -0.032592773, -0.029327393, -0.009140015, 0.15637207, 0.12695312, -0.09460449, 0.03717041, -0.07055664, -0.00793457, 0.031921387, -0.20893478, 0.19750977, 0.08099365, 0.11482239, 0.0076904297, 0.08258057, -0.076187134, 0.074920654, -0.08223343, 0.13535309, -0.04296875, -0.12478638, 0.18859863, -0.0064697266, 0.03793335, 0.15106201, 0.15362549, -0.018920898, -0.1517334, -0.1030426, -0.01586914, 0.0066223145, 0.06286621, -0.054504395, 0.14465332, ...]"
4,8477,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,911d354c3ae2eefdd0b3e2e4de98f35205da1fe4,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video . Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks (Left: Rebecca during her third trimester and right: Rebecca in a bikini last month . She said 'There's this culture where we celebrate people snapping back into shape a week after the birth, but I don't want to be one of those people - I just want to enjoy Arabella.' Rebecca, who has never named the father of Arabella publicly, also has two other children called Lillie, aged 10, and Karl, nine, with her former boyfriend and teenage sweetheart Karl Dures, 29. And the singer now admits that she regrets not having had this empowering new mindset after she had her other kids. Describing how she felt as a new mum in previous years, the Get Happy singer said, 'When I look back, I think I should have just enjoyed my babies and not worried about my weight or being skinny. 'I'm curvy, I've got thighs, but I'm not big. I eat healthily and I'm breastfeeding, so I know the weight will come off naturally.' Rebecca pictured with her two older children Lillie 10, (left) and Karl, nine, (right) Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined (Pictured left: In concert at the St James Theatre, London, last month, and right: on This Morning last week) She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible (Rebecca pictured with her newborn) Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape .",Critics have made cruel taunts about Rebecca Ferguson's post-baby body .\r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead .,"Since giving birth to her daughter Arabella four months ago, Rebecca Ferguson has become the victim of cruel taunts about her size 12 post-pregnancy body. But the singer has now hit back at her critics and vowed to resist the pressure to slim down. Speaking to Fabulous magazine, the 28-year-old former X Factor star confessed that recent comments aimed at her weight, and questions about when she'll 'drop the post-baby pounds', have left her more determined than ever to stand up for herself. Scroll down for video. Rebecca Ferguson gave birth to gorgeous baby Arabella just four months ago, but has faced constant questions, even from those close to her, about how and when she'll lose her 'baby weight' The singer says even though she's only a size 12 following her third pregnancy, there has been a lot of pressure on her as a celebrity mum to 'snap back into shape' within a few weeks and Karl, nine, Currently promoting her new album Lady Sings The Blues, it would be easy for Rebecca to feel the need to slim down for upcoming photo shoots and interviews. But while some celebrities are famed for getting their impossibly taut figures back just weeks after giving birth, Rebecca says she's not feeling that pressure this time around. Instead, she's standing up for new mums everywhere, by taking aim at the critics who try to publicly shame women into losing weight. The former X Factor contestant has refused to name and shame her weight critics, but revealed some have been a lot closer to home than she ever imagined She said: 'Women's bodies are amazing, what our bodies can do is incredible so it's sad that we get distracted - all this stuff about being skinny, be this, be that, they're all distractions. 'They rob us of what we should be focusing on and that's sad.' Though the singer has refused to name and shame her own critics, she has revealed that they're a lot closer home than she ever imagined. 'I just want to enjoy Arabella,' said Rebecca, who previously worried about dropping post-baby pounds as quickly as possible Rebecca also spoke of her toughest year yet during the interview, when her partner walked out after discovering she was pregnant with Arabella. She revealed that although it has been hard to live life as a single mother again, she is now moving forward and enjoying her life as a new mum. The star hasn't been left completely on her own though - her ex-partner Karl pops by regularly to help look after the children. And while Rebecca admits that many of their close friends and family would love to see them get back together, and she is open to the idea, she is enjoying being on her own for the time being. Rebecca flaunted her new curves on the red carpet at the Brit Awards last month and says she's proud of her womanly shape.",Critics have made cruel taunts about Rebecca Ferguson's post-baby body. \r\nBut singer is refusing to bow to pressure to 'snap back into shape'\r\nShe wants to enjoy being a new mother to four-month-old Arabella instead.,"[Rebecca, Ferguson, gave, birth, to, gorgeous, baby, Arabella, just, four, months, ago, ,, but, has, faced, constant, questions, ,, even, from, those, close, to, her, ,, about, how, and, when, she, 'll, lose, her, ', baby, weight, ']","[[Critics, have, made, cruel, taunts, about, Rebecca, Ferguson, 's, post, -, baby, body, ., \r\n], [But, singer, , is, refusing, to, bow, to, pressure, to, ', snap, back, into, shape, ', \r\n, She, wants, to, enjoy, being, a, new, mother, to, four, -, month, -, old, Arabella, instead, .]]","[0.07945585, 0.03116878, -0.017548624, 0.0115201315, 0.011223094, -0.028177897, 0.02522583, -0.09628194, 0.03105774, 0.093652345, 0.011807251, -0.12592722, -0.039495848, 0.04215304, -0.070240274, 0.047951255, 0.051271882, 0.106073, -0.04948171, -0.038567606, -0.07802328, 0.017822266, 0.04289932, -0.010419464, 0.03439331, -0.036107317, -0.07584737, 0.03157145, 0.050075278, -0.07332357, -0.022212196, -0.01379598, -0.07698161, -0.044043098, -0.029292997, -0.029631043, 0.037473552, -0.045408122, -0.05388743, 0.031387966, 0.07046712, -0.09456991, 0.087154135, -0.019069545, 0.024141438, 0.02154948, 0.017970784, 0.017529614, 0.051208496, 0.08221232, -0.010247294, 0.0136703495, -0.044740804, 0.016019695, -0.033215333, 0.023954265, -0.035413615, -0.060117595, 0.045495607, -0.06279297, 0.013574218, 0.057317607, -0.054815672, -0.048219807, 0.06969808, -0.006032308, -0.0042531015, 0.054874673, -0.033233516, 0.078875735, 0.05006612, 0.009901683, 0.024999492, 0.07249654, -0.1174174, 0.04287923, 0.07728882, 0.091499835, 0.023565674, 0.15689544, 0.003277588, -0.067134604, 0.031513467, -0.031046549, -0.12768148, -0.076453656, -0.0541982, 0.11446737, -0.008430989, 0.035653688, 0.050678633, 0.02258962, -0.11199341, -0.059719723, -0.10900065, -0.054656472, 0.04755656, 0.026929982, 0.024541982, 0.016853206, ...]"


In [20]:
# Check length. OK.
for name, df in datasets.items():
    print(f"--- {name.upper()} DATASET ---")
    for index, row in df.head().iterrows():
        embedding_length = len(row['mean_sent_embeddings_art'])
        print(f"Row {index}: Embedding Length = {embedding_length}")


--- TRAIN DATASET ---
Row 0: Embedding Length = 300
Row 1: Embedding Length = 300
Row 2: Embedding Length = 300
Row 3: Embedding Length = 300
Row 4: Embedding Length = 300
--- VALIDATION DATASET ---
Row 0: Embedding Length = 300
Row 1: Embedding Length = 300
Row 2: Embedding Length = 300
Row 3: Embedding Length = 300
Row 4: Embedding Length = 300
--- TEST DATASET ---
Row 0: Embedding Length = 300
Row 1: Embedding Length = 300
Row 2: Embedding Length = 300
Row 3: Embedding Length = 300
Row 4: Embedding Length = 300


In [21]:
# Clculate mean embeddings for tokenied highlights column (sublists within lists).

model = api.load("word2vec-google-news-300")

def calculate_avg_word2vec_list(sentence_list, model):
    
    avg_word2vec_list = []
    
    for sentence in sentence_list:
        # Filter out words that are not in the model's vocabulary.
        valid_words = [word for word in sentence if word in model.key_to_index]
        # Check if there are valid words before calculating the average.
        if valid_words:
            # Use numpy's vstack to vertically stack the word vectors.
            word_vectors = np.vstack([model[word] for word in valid_words])

            # Calculate the mean along the first axis (axis=0).
            avg_word2vec = np.mean(word_vectors, axis=0)

            avg_word2vec_list.append(avg_word2vec)
        else:
            # If no valid words, append a zero vector.
            avg_word2vec_list.append(np.zeros(model.vector_size))

    return avg_word2vec_list

for name, df in datasets.items():
    df['mean_sent_embeddings_high'] = df['high_tokenized'].apply(calculate_avg_word2vec_list, model=model)
    datasets[name] = df

In [22]:
datasets['train']['mean_sent_embeddings_high'].head(2)

0    [[0.13956158, 0.11025766, 0.035254844, 0.020601712, 0.02292809, -0.09172117, -0.052386943, -0.19335938, 0.11620624, 0.06234976, -0.01558744, -0.061077412, 0.045069475, 0.06829834, -0.05606989, 0.0028029222, 0.07958515, 0.10207602, 0.051513672, -0.036817405, 0.04653696, 0.078862116, 0.0033663237, -0.0021597056, 0.0034132737, -0.015256441, -0.027314406, 0.0029437726, 0.009549654, 0.065216064, -0.016995944, -0.08030348, -0.013310359, -0.06997446, -0.024423452, 0.029752292, -0.083364636, 0.07654747, 0.06257072, 0.08146785, 0.10569411, -0.030883789, 0.07677753, 0.04806284, 0.04081374, 0.044950046, 0.08163687, -0.016081883, 0.06109619, 0.094242975, -0.07481502, 0.048865683, -0.021474985, -0.0101095345, -0.031062199, 0.030146085, -0.06642503, -0.01903358, -0.102703385, -0.060528096, -0.01566256, 0.024376502, -0.063495345, -0.1263099, -0.0487577, -0.051095083, 0.12062425, 0.028493442, 0.04538668, 0.22952975, 0.035963792, -0.022932786, -0.0095402645, 0.04220229, -0.08510179, -0.07123272, 0

In [23]:
# Check length. OK
for name, df in datasets.items():
    print(f"--- {name.upper()} DATASET ---")
    for index, row in df.head().iterrows():
        # Iterate over each sublist in the 'mean_sent_embeddings_high' list.
        for sublist_index, sublist in enumerate(row['mean_sent_embeddings_high']):
            embedding_length = len(sublist)
            print(f"Row {index}, Sublist {sublist_index}: Embedding Length = {embedding_length}")


--- TRAIN DATASET ---
Row 0, Sublist 0: Embedding Length = 300
Row 0, Sublist 1: Embedding Length = 300
Row 0, Sublist 2: Embedding Length = 300
Row 1, Sublist 0: Embedding Length = 300
Row 1, Sublist 1: Embedding Length = 300
Row 1, Sublist 2: Embedding Length = 300
Row 2, Sublist 0: Embedding Length = 300
Row 2, Sublist 1: Embedding Length = 300
Row 2, Sublist 2: Embedding Length = 300
Row 3, Sublist 0: Embedding Length = 300
Row 3, Sublist 1: Embedding Length = 300
Row 3, Sublist 2: Embedding Length = 300
Row 4, Sublist 0: Embedding Length = 300
Row 4, Sublist 1: Embedding Length = 300
Row 4, Sublist 2: Embedding Length = 300
--- VALIDATION DATASET ---
Row 0, Sublist 0: Embedding Length = 300
Row 0, Sublist 1: Embedding Length = 300
Row 1, Sublist 0: Embedding Length = 300
Row 1, Sublist 1: Embedding Length = 300
Row 2, Sublist 0: Embedding Length = 300
Row 2, Sublist 1: Embedding Length = 300
Row 3, Sublist 0: Embedding Length = 300
Row 3, Sublist 1: Embedding Length = 300
Row 4, S

In [24]:
# Check types.
for name, df in datasets.items():
    first_non_null_entry_art = df['mean_sent_embeddings_art'].dropna().iloc[0]
    first_non_null_entry_high = df['mean_sent_embeddings_high'].dropna().iloc[0]
    print(f"Dataset: {name}")
    print(f"Type in 'mean_sent_embeddings_art': {type(first_non_null_entry_art)}")
    print(f"Type in 'mean_sent_embeddings_high': {type(first_non_null_entry_high)}\n")

Dataset: train
Type in 'mean_sent_embeddings_art': <class 'numpy.ndarray'>
Type in 'mean_sent_embeddings_high': <class 'list'>

Dataset: validation
Type in 'mean_sent_embeddings_art': <class 'numpy.ndarray'>
Type in 'mean_sent_embeddings_high': <class 'list'>

Dataset: test
Type in 'mean_sent_embeddings_art': <class 'numpy.ndarray'>
Type in 'mean_sent_embeddings_high': <class 'list'>



## Cosine Similarity (labels)

In [None]:
#Alternative way without 'parallel_apply' for a smaller dataset (1%).

#from sklearn.metrics.pairwise import cosine_similarity

#def calculate_cosine_similarity(sentence, highlights_avg_word2vec_list):
 #   return [cosine_similarity([sentence], [highlight_sentence])[0][0] for highlight_sentence in highlights_avg_word2vec_list]
    
#for name, df in datasets.items():
 #   df['cosine_similarity_scores'] = df.apply(lambda row: calculate_cosine_similarity(row['mean_sent_embeddings_art'], row['mean_sent_embeddings_high']), axis=1)
  #  datasets[name] = df

In [25]:
# For larger dataset.
def calculate_row_cosine_similarity(row):
    from sklearn.metrics.pairwise import cosine_similarity
    def calculate_cosine_similarity(sentence, highlights_avg_word2vec_list):
        return [cosine_similarity([sentence], [highlight_sentence])[0][0] for highlight_sentence in highlights_avg_word2vec_list]

    return calculate_cosine_similarity(row['mean_sent_embeddings_art'], row['mean_sent_embeddings_high'])

for name, df in datasets.items():
    df['cosine_similarity_scores'] = df.parallel_apply(calculate_row_cosine_similarity, axis=1)
    datasets[name] = df

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=79435), Label(value='0 / 79435')))…

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=3204), Label(value='0 / 3204'))), …

VBox(children=(HBox(children=(IntProgress(value=0, description='0.00%', max=28291), Label(value='0 / 28291')))…

In [26]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
datasets['train']['cosine_similarity_scores'].head()
#datasets['validation'].head()
#datasets['test'].head(2)

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores'],
      dtype='object')


0      [0.8736709, 0.7377715, 0.8184136]
1      [0.82316303, 0.6298492, 0.704492]
2    [0.75521606, 0.71392655, 0.7321466]
3    [0.7498019, 0.75414014, 0.73623145]
4     [0.6759272, 0.51974016, 0.8333621]
Name: cosine_similarity_scores, dtype: object

In [27]:
datasets['train'].head(1)

Unnamed: 0.1,Unnamed: 0,article,highlights,id,internet-free_art,internet-free_high,boiler-free_art,boiler-free_high,sentences,high_tokenized,mean_sent_embeddings_art,mean_sent_embeddings_high,cosine_similarity_scores
0,15378,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,"Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat. \nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win. \nThe defeat leaves Cologne on bottom of the Bundesliga table after five games.,"[Lukas, Podolski, scored, his, first, goal, since, returning, to, Cologne, but, it, was, not, enough, as, they, crashed, to, a, defeat, 2, -, 1, against, Schalke, which, leaves, them, bottom, of, the, German, Bundesliga, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.11470295, 0.075837694, 0.035587706, 0.01364241, -0.03404762, -0.080190465, -0.009287867, -0.12817383, 0.075314224, 0.10144832, 0.014489535, -0.07114384, 0.010094873, 0.051844105, -0.08103456, 0.047095988, 0.05078651, 0.11288952, 0.040007494, -0.01411563, 0.01991364, 0.07349791, 0.015920179, -0.0034584834, 0.067733236, -0.034187052, -0.06529078, 0.013993888, 0.020629358, 0.041065086, -0.032926865, -0.027613146, -0.016833074, -0.075299494, 0.009965568, 0.04200797, -0.028869629, 0.03187403, 0.04321618, 0.054114375, 0.101246804, -0.060622644, 0.1292935, 0.004960159, 0.019059017, 0.0054305503, 0.06652148, -0.049942147, 0.083613954, 0.05633545, -0.05737252, 0.041719634, -0.005005935, -0.02816299, -0.053125843, 0.03927928, -0.047921542, -0.024672937, -0.05697474, -0.04516233, -0.04006853, 0.031866666, -0.09612406, -0.100002944, -0.071878366, -0.047021996, 0.08755046, 0.055411242, -0.0044618803, 0.17888772, 0.025567416, 0.037632383, 0.01860099, 0.0021083436, -0.1098717, -0.084859915, 0.08041592, -0.016483702, 0.04156073, 0.02110343, -0.01001608, -0.16271973, -0.009807192, -0.002482447, 0.045052364, -0.023081813, -0.013436153, 0.103066415, -0.0020857186, -0.0045860554, 0.0072289826, -0.020229997, -0.054310765, 0.0021333366, -0.043879017, -0.121284746, -0.01109998, -0.04515865, -0.024664517, 0.039232977, ...]","[[0.13956158, 0.11025766, 0.035254844, 0.020601712, 0.02292809, -0.09172117, -0.052386943, -0.19335938, 0.11620624, 0.06234976, -0.01558744, -0.061077412, 0.045069475, 0.06829834, -0.05606989, 0.0028029222, 0.07958515, 0.10207602, 0.051513672, -0.036817405, 0.04653696, 0.078862116, 0.0033663237, -0.0021597056, 0.0034132737, -0.015256441, -0.027314406, 0.0029437726, 0.009549654, 0.065216064, -0.016995944, -0.08030348, -0.013310359, -0.06997446, -0.024423452, 0.029752292, -0.083364636, 0.07654747, 0.06257072, 0.08146785, 0.10569411, -0.030883789, 0.07677753, 0.04806284, 0.04081374, 0.044950046, 0.08163687, -0.016081883, 0.06109619, 0.094242975, -0.07481502, 0.048865683, -0.021474985, -0.0101095345, -0.031062199, 0.030146085, -0.06642503, -0.01903358, -0.102703385, -0.060528096, -0.01566256, 0.024376502, -0.063495345, -0.1263099, -0.0487577, -0.051095083, 0.12062425, 0.028493442, 0.04538668, 0.22952975, 0.035963792, -0.022932786, -0.0095402645, 0.04220229, -0.08510179, -0.07123272, 0.07494413, -0.038639948, 0.04403452, 0.039926384, -0.048346885, -0.16545223, 0.0055213342, 0.0019437349, 0.051926833, -0.02926401, -0.012282152, 0.028243138, 0.06518085, -0.016564002, 0.022944523, -0.0024977464, -0.056434043, -0.0046686027, -0.06918804, -0.048183735, -0.0055213342, -0.07339243, -0.04894785, 0.04185838, ...], [0.13324529, 0.023004705, 0.056737725, 0.025457209, 0.14613897, -0.1620428, 0.0014703925, -0.1521218, 0.039115213, 0.16493502, -0.024419611, 0.031194514, -0.08743009, 0.15665506, 0.04331554, 0.078352496, 0.0865756, 0.14964156, 0.0052504106, 0.044633344, 0.09302538, 0.08001154, 0.06681685, 0.07500874, 0.08291349, -0.019042969, -0.02117365, -0.1657049, -0.090166956, 0.07063432, 0.012473366, -0.08670876, 0.034773394, -0.02497725, 0.012606534, -0.008891713, 0.001492587, -0.016645951, -0.094415836, 0.040960137, 0.018488104, -0.07697088, 0.1373402, 0.097844906, 0.08389005, -0.05600114, 0.075881265, -0.1275385, 0.057328656, 0.014093572, -0.1451416, 0.08566561, -0.06463623, -0.10904763, -0.051291727, 0.050187543, 0.03726751, -0.011014071, -0.10653548, -0.111039594, 0.023541538, 0.004928589, -0.12213135, -0.089899234, -0.03803045, -0.057084516, 0.06754103, -0.0063143643, 0.03427124, 0.22185725, 0.0148038, 0.0049327505, -0.037852895, -0.06916948, -0.19472434, -0.0649858, 0.12940702, -0.052592885, 0.107022375, 0.008150968, -0.11157782, -0.042236328, -0.022799404, 0.046031605, 0.10929454, -0.004705256, 0.006525213, 0.05543102, -0.0628995, -0.06668923, 0.02631725, -0.05559748, -0.1337738, 0.016512783, -0.06338778, -0.17221694, -0.028617166, -0.065138385, -0.036155008, 0.089377664, ...], [0.08287557, 0.0826416, 0.06524086, 0.037180584, 0.007334391, -0.040649414, -0.051605225, -0.15689214, 0.08514404, 0.15497844, 0.0048472085, -0.013264974, 0.012659709, 0.09476725, -0.034556706, 0.063090004, 0.026980082, 0.049856823, 0.043874104, -0.016255697, 0.036254883, 0.07067871, -0.059000652, -0.0014731089, 0.071634926, -0.012868245, -0.053273518, 0.04812622, 0.04282506, 0.0037434895, -0.08670553, 0.020209631, -0.015749613, 0.035550434, 0.022364298, 0.029851278, -0.07522583, 0.06434631, 0.046028137, 0.025021872, 0.036468506, -0.10751343, 0.07714844, -0.0028457642, 0.03074646, -0.024305979, 0.04122035, -0.12137858, 0.088643394, 0.0872701, -0.078679405, 0.058970135, -0.0025990803, -0.091181435, -0.06669108, -0.012817383, -0.028500875, -0.03829956, 0.014338176, -0.06524658, -0.15384929, 0.05472533, -0.11427816, -0.09490458, -0.09451294, -0.04926936, 0.116249084, 0.062235516, 0.046596527, 0.15892537, 0.06144206, -0.024823507, 0.060872395, 0.025533041, -0.08202108, -0.11945597, 0.081593834, 0.0037282307, 0.032503765, 0.0003744761, -0.03922526, -0.22275798, -0.018636068, 0.0056355796, 0.073903404, -0.06983948, -0.025390625, 0.09335327, -0.008656819, 0.046351116, 0.03605652, 0.043375652, -0.09618632, -0.08617592, -0.05771637, -0.094182335, 0.05159505, -0.104400635, -0.027852377, 0.08984375, ...]]","[0.8736709, 0.7377715, 0.8184136]"


In [28]:
# Inspect cosine similarity scores and the articles/highlights to decide on the threshold.
datasets['train'][['id', 'sentences', 'high_tokenized', 'cosine_similarity_scores']].tail(40)


Unnamed: 0,id,sentences,high_tokenized,cosine_similarity_scores
1112038,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[Embrace, :, James, Holleran, , is, comforted, by, a, mourner, ahead, of, the, funeral, of, his, 19, -, year, -, old, daughter, Madison, ., ]","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.6680949, 0.58524823, 0.64714825, 0.5218967, 0.25912905]"
1112039,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[Heartbreaking, :, Mourners, ,, including, James, Holleran, , leave, a, memorial, mass, for, Madison, Holleran, at, Guardian, Angel, Church, in, Allendale, ,, New, Jersey, on, January, 21, ., ]","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.646528, 0.5998357, 0.44556028, 0.60113263, 0.29603457]"
1112040,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[Days, after, Holleran, 's, death, ,, more, than, 600, devastated, relatives, and, friends, paid, their, respects, at, a, funeral, service, at, the, Guardian, Angel, Church, in, Allendale, ,, New, Jersey, .]","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.6824951, 0.6384062, 0.574244, 0.5617416, 0.39566964]"
1112041,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[During, the, mass, ,, Mr, Holleran, said, that, his, daughter, used, to, rally, her, team, by, saying, :, ', Now, is, a, time, to, be, strong, ., ']","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.6578653, 0.5314129, 0.725236, 0.54141057, 0.4819475]"
1112042,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[He, added, :, ', Today, ,, we, all, have, to, be, strong, for, Madison, ., ']","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.53882, 0.42934653, 0.53986394, 0.45693952, 0.54796517]"
1112043,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[He, then, urged, the, congregation, to, learn, from, the, loss, of, an, ', iconic, ', young, woman, :, ', Please, seek, therapy, if, you, need, it, .]","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.6165684, 0.5082765, 0.64691746, 0.5265027, 0.40964505]"
1112044,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[This, is, not, a, weakness, ,, but, a, struggle, ., ']","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.37111458, 0.35013446, 0.4688241, 0.34386814, 0.41440636]"
1112045,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[He, also, led, them, in, the, Serenity, Prayer, ,, saying, :, ', God, ,, grant, me, the, serenity, to, accept, the, things, I, can, not, change, ,, the, courage, to, change, the, things, I, can, and, the, wisdom, to, know, the, difference, ., ']","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.5672146, 0.47128528, 0.6104403, 0.47538996, 0.41022947]"
1112046,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[Scene, :, Madison, Holleran, jumped, from, this, Spruce, St, ,, Philadelphia, parking, garage, to, her, death, on, January, 17, ., ]","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.7057005, 0.5920926, 0.50649524, 0.54206926, 0.28533745]"
1112047,c6b2303e6ee4af78227b17243dbd0a93c3087e2f,"[For, confidential, support, in, the, U.S., ,, call, the, National, Suicide, Prevention, Line, on, 1, -, 800, -, 273, -, 8255, .]","[[Madison, Holleran, ,, 19, ,, took, her, life, in, Philadelphia, on, January, 17, after, leaving, a, suicide, note, and, gifts, for, her, family, .], [The, University, of, Pennsylvania, freshman, 's, death, shocked, the, school, and, her, hometown, in, New, Jersey, .], [Her, father, said, she, was, struggling, with, school, work, ,, had, shared, her, suicidal, thoughts, with, her, family, and, was, seeing, a, psychiatrist, .], [Her, former, teacher, , Edward, G., Modica, has, posted, a, petition, on, MoveOn.org, seeking, ', Madison, Holleran, Law, ', ,, requiring, universities, to, keep, statistics, on, suicides, and, attempted, suicides, by, students, .], [It, has, more, than, 3380, signatures, .]]","[0.4996145, 0.426006, 0.35944104, 0.5215356, 0.3578741]"


In [29]:
pandarallel.initialize(progress_bar=False) # Hide output becasue the next cell crashes if set to True.

INFO: Pandarallel will run on 14 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.

https://nalepae.github.io/pandarallel/troubleshooting/


In [30]:
def get_max_cosine_similarity(cosine_similarity_list):
    if cosine_similarity_list:
        return max(cosine_similarity_list)
    else:
        return 0
    
for name, df in datasets.items():
    df['max_cosine_similarity'] = df['cosine_similarity_scores'].parallel_apply(get_max_cosine_similarity)
    datasets[name] = df

In [31]:
datasets['train'][['cosine_similarity_scores', 'max_cosine_similarity']].tail(40)


Unnamed: 0,cosine_similarity_scores,max_cosine_similarity
1112038,"[0.6680949, 0.58524823, 0.64714825, 0.5218967, 0.25912905]",0.668095
1112039,"[0.646528, 0.5998357, 0.44556028, 0.60113263, 0.29603457]",0.646528
1112040,"[0.6824951, 0.6384062, 0.574244, 0.5617416, 0.39566964]",0.682495
1112041,"[0.6578653, 0.5314129, 0.725236, 0.54141057, 0.4819475]",0.725236
1112042,"[0.53882, 0.42934653, 0.53986394, 0.45693952, 0.54796517]",0.547965
1112043,"[0.6165684, 0.5082765, 0.64691746, 0.5265027, 0.40964505]",0.646917
1112044,"[0.37111458, 0.35013446, 0.4688241, 0.34386814, 0.41440636]",0.468824
1112045,"[0.5672146, 0.47128528, 0.6104403, 0.47538996, 0.41022947]",0.61044
1112046,"[0.7057005, 0.5920926, 0.50649524, 0.54206926, 0.28533745]",0.705701
1112047,"[0.4996145, 0.426006, 0.35944104, 0.5215356, 0.3578741]",0.521536


In [32]:
def mark_top_four_sentences(df):
    # Create a column for binary labels initialized to 0.
    df['binary_labels'] = 0
    # Group by article ID and process each group.
    for article_id, group in df.groupby('id'):
        # Get indices of the top 4 sentences based on max_cosine_similarity.
        top_indices = group['max_cosine_similarity'].nlargest(4).index
        df.loc[top_indices, 'binary_labels'] = 1

    return df

for name, df in datasets.items():
    datasets[name] = mark_top_four_sentences(df)

In [33]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
datasets['train']['binary_labels'].tail(20)
#datasets['validation'].head()
#datasets['test'].head(2)

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddi

1112058    0
1112059    0
1112060    0
1112061    0
1112062    0
1112063    0
1112064    0
1112065    0
1112066    0
1112067    1
1112068    0
1112069    0
1112070    0
1112071    0
1112072    0
1112073    0
1112074    0
1112075    0
1112076    0
1112077    0
Name: binary_labels, dtype: int64

## Surface Features: Sentence Position
"The sentences in the earlier parts of a document are more important than
sentences in later parts."

-- p.3, *Extractive Summarization Using Supervised and Semi-Supervised Learning*

In [34]:
def sentence_position_score(row, total_sentences):
    # Calculate the position score for the sentence in its article.
    return 1 - (row['sentence_index'] / (total_sentences - 1)) if total_sentences > 1 else 1

for name, df in datasets.items():
    # Calculate the total number of sentences for each article.
    article_lengths = df.groupby('id').size()

    # Add a column for sentence index within each article.
    df['sentence_index'] = df.groupby('id').cumcount()

    # Apply the sentence position score calculation.
    df['position_score'] = df.apply(lambda row: sentence_position_score(row, article_lengths[row['id']]), axis=1)

    datasets[name] = df


In [35]:
pd.set_option("display.max_columns", None) # show all cols
pd.set_option('display.max_colwidth', None) # show full width of showing cols
pd.set_option('display.max_rows', None)

In [36]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
datasets['train'][['sentence_index', 'position_score']].head(11)

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
     

Unnamed: 0,sentence_index,position_score
0,0,1.0
1,1,0.888889
2,2,0.777778
3,3,0.666667
4,4,0.555556
5,5,0.444444
6,6,0.333333
7,7,0.222222
8,8,0.111111
9,9,0.0


## Surface Features: Sentence Length

"A sentence is important if the number of words in it is within a certain range."

-- p.3, *Extractive Summarization Using Supervised and Semi-Supervised Learning*

In [37]:
# Sentence length can provide insights into the complexity and information density of a sentence, which might correlate with its summary-worthiness.
# In general, both very short and very long sentences might be less likely to be included in a summary.

def get_sentence_length(sentences):
    return len(sentences)

for name, df in datasets.items():
    # Calculate the length of each sentence
    df['sentence_length'] = df['sentences'].parallel_apply(get_sentence_length)

    # Calculate the maximum sentence length for each article
    max_lengths = df.groupby('id')['sentence_length'].transform('max')

    # Normalize the sentence lengths
    df['normalized_sentence_length'] = df['sentence_length'] / max_lengths

    datasets[name] = df

In [38]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
datasets['train'][['sentence_length', 'normalized_sentence_length']].head()

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'i

Unnamed: 0,sentence_length,normalized_sentence_length
0,35,0.813953
1,21,0.488372
2,36,0.837209
3,34,0.790698
4,17,0.395349


## Content Features: Signature terms / Named Enitites (counts)
Using the count of NEs in each sentence as a feature can provide a simple yet effective way to measure the richness and importance of the sentence. Sentences containing significant named entities (such as important people, organizations, locations, etc.) are more likely to be essential to the overall narrative or argument of the article, making them strong candidates for inclusion in summaries. For example, a sentence with a higher number of named entities (NEs) may be more informative or central for the article's main theme, and more likely to be summary-worthy.

In [None]:
# For smaller datasets.
#nlp = spacy.load('en_core_web_sm')

#def count_named_entities_normalized(text):
 #   doc = nlp(text)
  #  ne_count = len(doc.ents)
   # total_tokens = len(doc)  # Total number of tokens in the sentence.
    #return ne_count / total_tokens if total_tokens > 0 else 0


#for name, df in datasets.items():
 #   df['normalized_NE_count'] = df['sentences'].apply(lambda x: ' '.join(x)).apply(count_named_entities_normalized)
  #  datasets[name] = df

In [None]:
# pandarallel.initialize(progress_bar=True)

In [None]:
# Too long for our updated dataset. Takes an hour for a 9% progress in the progress bar.

#def count_named_entities_normalized_from_tokens(tokens):
 #   import spacy
  #  nlp = spacy.load('en_core_web_sm')
    # Join tokens into a single string.
  #  text = ' '.join(tokens)
   # doc = nlp(text)
    #ne_count = len(doc.ents)
    #total_tokens = len(doc)  
    #return ne_count / total_tokens if total_tokens > 0 else 0

#for name, df in datasets.items():
 #   df['normalized_NE_count'] = df['sentences'].parallel_apply(count_named_entities_normalized_from_tokens)
  #  datasets[name] = df


In [40]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
datasets['train'].head()
#datasets['validation'].head()
#datasets['test'].head()

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'i

Unnamed: 0.1,Unnamed: 0,article,highlights,id,internet-free_art,internet-free_high,boiler-free_art,boiler-free_high,sentences,high_tokenized,mean_sent_embeddings_art,mean_sent_embeddings_high,cosine_similarity_scores,max_cosine_similarity,binary_labels,sentence_index,position_score,sentence_length,normalized_sentence_length
0,15378,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,"Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat. \nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win. \nThe defeat leaves Cologne on bottom of the Bundesliga table after five games.,"[Lukas, Podolski, scored, his, first, goal, since, returning, to, Cologne, but, it, was, not, enough, as, they, crashed, to, a, defeat, 2, -, 1, against, Schalke, which, leaves, them, bottom, of, the, German, Bundesliga, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.11470295, 0.075837694, 0.035587706, 0.01364241, -0.03404762, -0.080190465, -0.009287867, -0.12817383, 0.075314224, 0.10144832, 0.014489535, -0.07114384, 0.010094873, 0.051844105, -0.08103456, 0.047095988, 0.05078651, 0.11288952, 0.040007494, -0.01411563, 0.01991364, 0.07349791, 0.015920179, -0.0034584834, 0.067733236, -0.034187052, -0.06529078, 0.013993888, 0.020629358, 0.041065086, -0.032926865, -0.027613146, -0.016833074, -0.075299494, 0.009965568, 0.04200797, -0.028869629, 0.03187403, 0.04321618, 0.054114375, 0.101246804, -0.060622644, 0.1292935, 0.004960159, 0.019059017, 0.0054305503, 0.06652148, -0.049942147, 0.083613954, 0.05633545, -0.05737252, 0.041719634, -0.005005935, -0.02816299, -0.053125843, 0.03927928, -0.047921542, -0.024672937, -0.05697474, -0.04516233, -0.04006853, 0.031866666, -0.09612406, -0.100002944, -0.071878366, -0.047021996, 0.08755046, 0.055411242, -0.0044618803, 0.17888772, 0.025567416, 0.037632383, 0.01860099, 0.0021083436, -0.1098717, -0.084859915, 0.08041592, -0.016483702, 0.04156073, 0.02110343, -0.01001608, -0.16271973, -0.009807192, -0.002482447, 0.045052364, -0.023081813, -0.013436153, 0.103066415, -0.0020857186, -0.0045860554, 0.0072289826, -0.020229997, -0.054310765, 0.0021333366, -0.043879017, -0.121284746, -0.01109998, -0.04515865, -0.024664517, 0.039232977, ...]","[[0.13956158, 0.11025766, 0.035254844, 0.020601712, 0.02292809, -0.09172117, -0.052386943, -0.19335938, 0.11620624, 0.06234976, -0.01558744, -0.061077412, 0.045069475, 0.06829834, -0.05606989, 0.0028029222, 0.07958515, 0.10207602, 0.051513672, -0.036817405, 0.04653696, 0.078862116, 0.0033663237, -0.0021597056, 0.0034132737, -0.015256441, -0.027314406, 0.0029437726, 0.009549654, 0.065216064, -0.016995944, -0.08030348, -0.013310359, -0.06997446, -0.024423452, 0.029752292, -0.083364636, 0.07654747, 0.06257072, 0.08146785, 0.10569411, -0.030883789, 0.07677753, 0.04806284, 0.04081374, 0.044950046, 0.08163687, -0.016081883, 0.06109619, 0.094242975, -0.07481502, 0.048865683, -0.021474985, -0.0101095345, -0.031062199, 0.030146085, -0.06642503, -0.01903358, -0.102703385, -0.060528096, -0.01566256, 0.024376502, -0.063495345, -0.1263099, -0.0487577, -0.051095083, 0.12062425, 0.028493442, 0.04538668, 0.22952975, 0.035963792, -0.022932786, -0.0095402645, 0.04220229, -0.08510179, -0.07123272, 0.07494413, -0.038639948, 0.04403452, 0.039926384, -0.048346885, -0.16545223, 0.0055213342, 0.0019437349, 0.051926833, -0.02926401, -0.012282152, 0.028243138, 0.06518085, -0.016564002, 0.022944523, -0.0024977464, -0.056434043, -0.0046686027, -0.06918804, -0.048183735, -0.0055213342, -0.07339243, -0.04894785, 0.04185838, ...], [0.13324529, 0.023004705, 0.056737725, 0.025457209, 0.14613897, -0.1620428, 0.0014703925, -0.1521218, 0.039115213, 0.16493502, -0.024419611, 0.031194514, -0.08743009, 0.15665506, 0.04331554, 0.078352496, 0.0865756, 0.14964156, 0.0052504106, 0.044633344, 0.09302538, 0.08001154, 0.06681685, 0.07500874, 0.08291349, -0.019042969, -0.02117365, -0.1657049, -0.090166956, 0.07063432, 0.012473366, -0.08670876, 0.034773394, -0.02497725, 0.012606534, -0.008891713, 0.001492587, -0.016645951, -0.094415836, 0.040960137, 0.018488104, -0.07697088, 0.1373402, 0.097844906, 0.08389005, -0.05600114, 0.075881265, -0.1275385, 0.057328656, 0.014093572, -0.1451416, 0.08566561, -0.06463623, -0.10904763, -0.051291727, 0.050187543, 0.03726751, -0.011014071, -0.10653548, -0.111039594, 0.023541538, 0.004928589, -0.12213135, -0.089899234, -0.03803045, -0.057084516, 0.06754103, -0.0063143643, 0.03427124, 0.22185725, 0.0148038, 0.0049327505, -0.037852895, -0.06916948, -0.19472434, -0.0649858, 0.12940702, -0.052592885, 0.107022375, 0.008150968, -0.11157782, -0.042236328, -0.022799404, 0.046031605, 0.10929454, -0.004705256, 0.006525213, 0.05543102, -0.0628995, -0.06668923, 0.02631725, -0.05559748, -0.1337738, 0.016512783, -0.06338778, -0.17221694, -0.028617166, -0.065138385, -0.036155008, 0.089377664, ...], [0.08287557, 0.0826416, 0.06524086, 0.037180584, 0.007334391, -0.040649414, -0.051605225, -0.15689214, 0.08514404, 0.15497844, 0.0048472085, -0.013264974, 0.012659709, 0.09476725, -0.034556706, 0.063090004, 0.026980082, 0.049856823, 0.043874104, -0.016255697, 0.036254883, 0.07067871, -0.059000652, -0.0014731089, 0.071634926, -0.012868245, -0.053273518, 0.04812622, 0.04282506, 0.0037434895, -0.08670553, 0.020209631, -0.015749613, 0.035550434, 0.022364298, 0.029851278, -0.07522583, 0.06434631, 0.046028137, 0.025021872, 0.036468506, -0.10751343, 0.07714844, -0.0028457642, 0.03074646, -0.024305979, 0.04122035, -0.12137858, 0.088643394, 0.0872701, -0.078679405, 0.058970135, -0.0025990803, -0.091181435, -0.06669108, -0.012817383, -0.028500875, -0.03829956, 0.014338176, -0.06524658, -0.15384929, 0.05472533, -0.11427816, -0.09490458, -0.09451294, -0.04926936, 0.116249084, 0.062235516, 0.046596527, 0.15892537, 0.06144206, -0.024823507, 0.060872395, 0.025533041, -0.08202108, -0.11945597, 0.081593834, 0.0037282307, 0.032503765, 0.0003744761, -0.03922526, -0.22275798, -0.018636068, 0.0056355796, 0.073903404, -0.06983948, -0.025390625, 0.09335327, -0.008656819, 0.046351116, 0.03605652, 0.043375652, -0.09618632, -0.08617592, -0.05771637, -0.094182335, 0.05159505, -0.104400635, -0.027852377, 0.08984375, ...]]","[0.8736709, 0.7377715, 0.8184136]",0.873671,1,0,1.0,35,0.813953
1,15378,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,"Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat. \nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win. \nThe defeat leaves Cologne on bottom of the Bundesliga table after five games.,"[Podolski, celebrates, his, first, goal, since, returning, to, Cologne, ,, but, it, did, not, stop, his, side, sliding, to, defeat, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.10943065, 0.09664457, 0.03386733, 0.026137408, -0.016226824, -0.08258595, -0.013872932, -0.18560432, 0.093229406, 0.03232709, -0.01847211, -0.10867848, 0.009917764, 0.06313548, -0.07483067, 0.03030732, 0.07075411, 0.10902674, 0.115575455, -0.052046157, 0.038380343, 0.13636331, -0.0018382353, -0.0043011834, 0.03845933, -0.03297155, -0.018965777, 0.04153263, 0.08156092, 0.027806899, -0.008565524, -0.038485806, -0.036301557, 0.009051154, 0.042114258, 0.018177705, -0.015090045, 0.002532959, 0.03969125, 0.045721617, 0.084190816, -0.05550968, 0.16026396, -0.006606158, 0.005577985, 0.039450254, 0.009426341, -0.013201769, 0.11293299, 0.05029297, -0.023214003, 0.09192433, -0.0001077091, 0.010607551, 0.012242935, 0.05610208, -0.06297302, -0.04613293, -0.023265166, -0.07540086, -0.009399414, 0.068258844, -0.07126034, -0.1408835, -0.022717644, -0.03403013, 0.0645532, 0.011725482, -0.035328586, 0.21449909, 0.058403462, -0.0014720244, 0.030047249, 0.05234662, -0.112836055, -0.115231685, 0.075868495, 0.017412972, 0.03304874, 0.04683012, -0.03270228, -0.14306742, 0.016286064, -0.035761215, 0.061609603, -0.048142377, -0.033554975, 0.0216695, 0.003209731, 0.0024557675, 0.028409172, -0.05585076, -0.046402875, 0.021154515, -0.053940717, -0.07708292, -0.018946031, -0.024581011, -0.06407973, 0.042293772, ...]","[[0.13956158, 0.11025766, 0.035254844, 0.020601712, 0.02292809, -0.09172117, -0.052386943, -0.19335938, 0.11620624, 0.06234976, -0.01558744, -0.061077412, 0.045069475, 0.06829834, -0.05606989, 0.0028029222, 0.07958515, 0.10207602, 0.051513672, -0.036817405, 0.04653696, 0.078862116, 0.0033663237, -0.0021597056, 0.0034132737, -0.015256441, -0.027314406, 0.0029437726, 0.009549654, 0.065216064, -0.016995944, -0.08030348, -0.013310359, -0.06997446, -0.024423452, 0.029752292, -0.083364636, 0.07654747, 0.06257072, 0.08146785, 0.10569411, -0.030883789, 0.07677753, 0.04806284, 0.04081374, 0.044950046, 0.08163687, -0.016081883, 0.06109619, 0.094242975, -0.07481502, 0.048865683, -0.021474985, -0.0101095345, -0.031062199, 0.030146085, -0.06642503, -0.01903358, -0.102703385, -0.060528096, -0.01566256, 0.024376502, -0.063495345, -0.1263099, -0.0487577, -0.051095083, 0.12062425, 0.028493442, 0.04538668, 0.22952975, 0.035963792, -0.022932786, -0.0095402645, 0.04220229, -0.08510179, -0.07123272, 0.07494413, -0.038639948, 0.04403452, 0.039926384, -0.048346885, -0.16545223, 0.0055213342, 0.0019437349, 0.051926833, -0.02926401, -0.012282152, 0.028243138, 0.06518085, -0.016564002, 0.022944523, -0.0024977464, -0.056434043, -0.0046686027, -0.06918804, -0.048183735, -0.0055213342, -0.07339243, -0.04894785, 0.04185838, ...], [0.13324529, 0.023004705, 0.056737725, 0.025457209, 0.14613897, -0.1620428, 0.0014703925, -0.1521218, 0.039115213, 0.16493502, -0.024419611, 0.031194514, -0.08743009, 0.15665506, 0.04331554, 0.078352496, 0.0865756, 0.14964156, 0.0052504106, 0.044633344, 0.09302538, 0.08001154, 0.06681685, 0.07500874, 0.08291349, -0.019042969, -0.02117365, -0.1657049, -0.090166956, 0.07063432, 0.012473366, -0.08670876, 0.034773394, -0.02497725, 0.012606534, -0.008891713, 0.001492587, -0.016645951, -0.094415836, 0.040960137, 0.018488104, -0.07697088, 0.1373402, 0.097844906, 0.08389005, -0.05600114, 0.075881265, -0.1275385, 0.057328656, 0.014093572, -0.1451416, 0.08566561, -0.06463623, -0.10904763, -0.051291727, 0.050187543, 0.03726751, -0.011014071, -0.10653548, -0.111039594, 0.023541538, 0.004928589, -0.12213135, -0.089899234, -0.03803045, -0.057084516, 0.06754103, -0.0063143643, 0.03427124, 0.22185725, 0.0148038, 0.0049327505, -0.037852895, -0.06916948, -0.19472434, -0.0649858, 0.12940702, -0.052592885, 0.107022375, 0.008150968, -0.11157782, -0.042236328, -0.022799404, 0.046031605, 0.10929454, -0.004705256, 0.006525213, 0.05543102, -0.0628995, -0.06668923, 0.02631725, -0.05559748, -0.1337738, 0.016512783, -0.06338778, -0.17221694, -0.028617166, -0.065138385, -0.036155008, 0.089377664, ...], [0.08287557, 0.0826416, 0.06524086, 0.037180584, 0.007334391, -0.040649414, -0.051605225, -0.15689214, 0.08514404, 0.15497844, 0.0048472085, -0.013264974, 0.012659709, 0.09476725, -0.034556706, 0.063090004, 0.026980082, 0.049856823, 0.043874104, -0.016255697, 0.036254883, 0.07067871, -0.059000652, -0.0014731089, 0.071634926, -0.012868245, -0.053273518, 0.04812622, 0.04282506, 0.0037434895, -0.08670553, 0.020209631, -0.015749613, 0.035550434, 0.022364298, 0.029851278, -0.07522583, 0.06434631, 0.046028137, 0.025021872, 0.036468506, -0.10751343, 0.07714844, -0.0028457642, 0.03074646, -0.024305979, 0.04122035, -0.12137858, 0.088643394, 0.0872701, -0.078679405, 0.058970135, -0.0025990803, -0.091181435, -0.06669108, -0.012817383, -0.028500875, -0.03829956, 0.014338176, -0.06524658, -0.15384929, 0.05472533, -0.11427816, -0.09490458, -0.09451294, -0.04926936, 0.116249084, 0.062235516, 0.046596527, 0.15892537, 0.06144206, -0.024823507, 0.060872395, 0.025533041, -0.08202108, -0.11945597, 0.081593834, 0.0037282307, 0.032503765, 0.0003744761, -0.03922526, -0.22275798, -0.018636068, 0.0056355796, 0.073903404, -0.06983948, -0.025390625, 0.09335327, -0.008656819, 0.046351116, 0.03605652, 0.043375652, -0.09618632, -0.08617592, -0.05771637, -0.094182335, 0.05159505, -0.104400635, -0.027852377, 0.08984375, ...]]","[0.82316303, 0.6298492, 0.704492]",0.823163,1,1,0.888889,21,0.488372
2,15378,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,"Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat. \nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win. \nThe defeat leaves Cologne on bottom of the Bundesliga table after five games.,"[The, hosts, had, already, gone, behind, to, a, second, -, minute, Jefferson, Farfan, strike, when, Podolski, ,, who, rejoined, his, boyhood, club, from, Bayern, Munich, in, the, summer, ,, levelled, matters, with, six, minutes, played, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.10468371, 0.120787814, 0.01572813, 0.046383232, 0.029959843, -0.051225334, -0.032605402, -0.15332505, 0.08026123, 0.0971606, -0.023903683, -0.04830617, -0.03349896, 0.05981235, -0.034837656, 0.043309703, 0.11463875, 0.09354585, 0.031407848, 0.011410417, -0.03126368, 0.10256432, -0.022325186, 0.018040624, 0.052536536, 0.008767293, -0.031301036, -0.009693014, 0.009272608, 0.050306648, -0.0053776708, -0.041120857, -0.0031590955, 0.018289829, 0.017000396, -0.007870642, -0.018386316, -0.06571132, -0.012873025, 0.08089579, 0.041360788, -0.0596545, 0.10084271, -0.006306615, -0.0070022056, 0.02464768, 0.04359173, -0.043332856, 0.088512026, 0.072368756, -0.052325018, 0.017062483, -0.05300193, -0.07456694, -0.039218243, 0.043559898, 0.0038136449, -0.051274266, -0.06698503, -0.052946948, -0.07824497, 0.049987793, -0.08836418, -0.10738505, -0.035947602, -0.020674344, 0.013636095, -0.031260785, -0.010328622, 0.12344255, 0.029170595, 0.01999901, 0.019623328, 0.029873552, -0.12295006, -0.06655674, 0.056856878, -0.009502542, 0.062678896, 0.012015508, -0.0048649227, -0.099480994, 0.032765355, 0.0014111749, 0.06629155, -0.050739158, -0.07177524, 0.042763676, -0.03504523, -0.0012196508, 0.03401447, 0.021926355, -0.06334607, -0.028522754, -0.09807192, -0.09360951, -0.059129912, -0.063443415, -0.048358258, 0.016005944, ...]","[[0.13956158, 0.11025766, 0.035254844, 0.020601712, 0.02292809, -0.09172117, -0.052386943, -0.19335938, 0.11620624, 0.06234976, -0.01558744, -0.061077412, 0.045069475, 0.06829834, -0.05606989, 0.0028029222, 0.07958515, 0.10207602, 0.051513672, -0.036817405, 0.04653696, 0.078862116, 0.0033663237, -0.0021597056, 0.0034132737, -0.015256441, -0.027314406, 0.0029437726, 0.009549654, 0.065216064, -0.016995944, -0.08030348, -0.013310359, -0.06997446, -0.024423452, 0.029752292, -0.083364636, 0.07654747, 0.06257072, 0.08146785, 0.10569411, -0.030883789, 0.07677753, 0.04806284, 0.04081374, 0.044950046, 0.08163687, -0.016081883, 0.06109619, 0.094242975, -0.07481502, 0.048865683, -0.021474985, -0.0101095345, -0.031062199, 0.030146085, -0.06642503, -0.01903358, -0.102703385, -0.060528096, -0.01566256, 0.024376502, -0.063495345, -0.1263099, -0.0487577, -0.051095083, 0.12062425, 0.028493442, 0.04538668, 0.22952975, 0.035963792, -0.022932786, -0.0095402645, 0.04220229, -0.08510179, -0.07123272, 0.07494413, -0.038639948, 0.04403452, 0.039926384, -0.048346885, -0.16545223, 0.0055213342, 0.0019437349, 0.051926833, -0.02926401, -0.012282152, 0.028243138, 0.06518085, -0.016564002, 0.022944523, -0.0024977464, -0.056434043, -0.0046686027, -0.06918804, -0.048183735, -0.0055213342, -0.07339243, -0.04894785, 0.04185838, ...], [0.13324529, 0.023004705, 0.056737725, 0.025457209, 0.14613897, -0.1620428, 0.0014703925, -0.1521218, 0.039115213, 0.16493502, -0.024419611, 0.031194514, -0.08743009, 0.15665506, 0.04331554, 0.078352496, 0.0865756, 0.14964156, 0.0052504106, 0.044633344, 0.09302538, 0.08001154, 0.06681685, 0.07500874, 0.08291349, -0.019042969, -0.02117365, -0.1657049, -0.090166956, 0.07063432, 0.012473366, -0.08670876, 0.034773394, -0.02497725, 0.012606534, -0.008891713, 0.001492587, -0.016645951, -0.094415836, 0.040960137, 0.018488104, -0.07697088, 0.1373402, 0.097844906, 0.08389005, -0.05600114, 0.075881265, -0.1275385, 0.057328656, 0.014093572, -0.1451416, 0.08566561, -0.06463623, -0.10904763, -0.051291727, 0.050187543, 0.03726751, -0.011014071, -0.10653548, -0.111039594, 0.023541538, 0.004928589, -0.12213135, -0.089899234, -0.03803045, -0.057084516, 0.06754103, -0.0063143643, 0.03427124, 0.22185725, 0.0148038, 0.0049327505, -0.037852895, -0.06916948, -0.19472434, -0.0649858, 0.12940702, -0.052592885, 0.107022375, 0.008150968, -0.11157782, -0.042236328, -0.022799404, 0.046031605, 0.10929454, -0.004705256, 0.006525213, 0.05543102, -0.0628995, -0.06668923, 0.02631725, -0.05559748, -0.1337738, 0.016512783, -0.06338778, -0.17221694, -0.028617166, -0.065138385, -0.036155008, 0.089377664, ...], [0.08287557, 0.0826416, 0.06524086, 0.037180584, 0.007334391, -0.040649414, -0.051605225, -0.15689214, 0.08514404, 0.15497844, 0.0048472085, -0.013264974, 0.012659709, 0.09476725, -0.034556706, 0.063090004, 0.026980082, 0.049856823, 0.043874104, -0.016255697, 0.036254883, 0.07067871, -0.059000652, -0.0014731089, 0.071634926, -0.012868245, -0.053273518, 0.04812622, 0.04282506, 0.0037434895, -0.08670553, 0.020209631, -0.015749613, 0.035550434, 0.022364298, 0.029851278, -0.07522583, 0.06434631, 0.046028137, 0.025021872, 0.036468506, -0.10751343, 0.07714844, -0.0028457642, 0.03074646, -0.024305979, 0.04122035, -0.12137858, 0.088643394, 0.0872701, -0.078679405, 0.058970135, -0.0025990803, -0.091181435, -0.06669108, -0.012817383, -0.028500875, -0.03829956, 0.014338176, -0.06524658, -0.15384929, 0.05472533, -0.11427816, -0.09490458, -0.09451294, -0.04926936, 0.116249084, 0.062235516, 0.046596527, 0.15892537, 0.06144206, -0.024823507, 0.060872395, 0.025533041, -0.08202108, -0.11945597, 0.081593834, 0.0037282307, 0.032503765, 0.0003744761, -0.03922526, -0.22275798, -0.018636068, 0.0056355796, 0.073903404, -0.06983948, -0.025390625, 0.09335327, -0.008656819, 0.046351116, 0.03605652, 0.043375652, -0.09618632, -0.08617592, -0.05771637, -0.094182335, 0.05159505, -0.104400635, -0.027852377, 0.08984375, ...]]","[0.75521606, 0.71392655, 0.7321466]",0.755216,0,2,0.777778,36,0.837209
3,15378,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,"Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat. \nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win. \nThe defeat leaves Cologne on bottom of the Bundesliga table after five games.,"[But, hopes, Podolski, 's, strike, would, inspire, the, home, side, fell, flat, a, minute, after, half, -, time, when, Levan, Kobiashvili, scored, a, winner, to, move, Schalke, up, to, third, in, the, table, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.07752369, 0.05999982, 0.059902616, 0.039984807, 0.040477045, -0.11609339, 0.054619685, -0.12372504, 0.1007713, 0.12054669, -0.029032389, -0.038427282, -0.02876197, 0.034073442, -0.043869585, 0.036291618, 0.06349012, 0.08969003, 0.058826517, 0.00390625, 0.045642994, 0.06965538, -0.0033038105, -0.011484499, 0.08711638, -0.016531838, -0.040386625, -0.020360876, 0.0175996, 0.06249661, -0.060583893, -0.009650337, -0.03514721, -0.03803253, 0.023111979, -0.00022605613, -0.055812128, 0.01767759, -0.012566037, 0.064889416, 0.057531286, -0.11503092, 0.13779478, 0.0020164207, -0.012339274, -0.0014772768, 0.025131226, -0.025908293, 0.0754485, 0.0419142, -0.050281666, 0.03436053, -0.008234942, 0.0032439055, -0.055067275, 0.021447923, 0.027026141, -0.0024493183, -0.050672743, -0.046461884, -0.030760024, -0.014721482, -0.13037561, -0.12716335, -0.060149018, -0.012738263, 0.036908183, 0.0497931, 0.056382354, 0.14456742, 0.05119041, -0.018866645, 0.07549371, 0.039849177, -0.15531752, -0.08079699, 0.052972864, 0.024946425, 0.041328713, 0.033866882, 0.0026527687, -0.084141836, 0.0021497938, 0.009293168, 0.10914216, -0.023491753, -0.030592741, 0.080115564, -0.01858238, 0.017742015, 0.044483326, 0.0026041667, -0.07914904, -0.041473813, -0.1088675, -0.11467206, -0.051968046, -0.0053055375, -0.06926134, 0.01671685, ...]","[[0.13956158, 0.11025766, 0.035254844, 0.020601712, 0.02292809, -0.09172117, -0.052386943, -0.19335938, 0.11620624, 0.06234976, -0.01558744, -0.061077412, 0.045069475, 0.06829834, -0.05606989, 0.0028029222, 0.07958515, 0.10207602, 0.051513672, -0.036817405, 0.04653696, 0.078862116, 0.0033663237, -0.0021597056, 0.0034132737, -0.015256441, -0.027314406, 0.0029437726, 0.009549654, 0.065216064, -0.016995944, -0.08030348, -0.013310359, -0.06997446, -0.024423452, 0.029752292, -0.083364636, 0.07654747, 0.06257072, 0.08146785, 0.10569411, -0.030883789, 0.07677753, 0.04806284, 0.04081374, 0.044950046, 0.08163687, -0.016081883, 0.06109619, 0.094242975, -0.07481502, 0.048865683, -0.021474985, -0.0101095345, -0.031062199, 0.030146085, -0.06642503, -0.01903358, -0.102703385, -0.060528096, -0.01566256, 0.024376502, -0.063495345, -0.1263099, -0.0487577, -0.051095083, 0.12062425, 0.028493442, 0.04538668, 0.22952975, 0.035963792, -0.022932786, -0.0095402645, 0.04220229, -0.08510179, -0.07123272, 0.07494413, -0.038639948, 0.04403452, 0.039926384, -0.048346885, -0.16545223, 0.0055213342, 0.0019437349, 0.051926833, -0.02926401, -0.012282152, 0.028243138, 0.06518085, -0.016564002, 0.022944523, -0.0024977464, -0.056434043, -0.0046686027, -0.06918804, -0.048183735, -0.0055213342, -0.07339243, -0.04894785, 0.04185838, ...], [0.13324529, 0.023004705, 0.056737725, 0.025457209, 0.14613897, -0.1620428, 0.0014703925, -0.1521218, 0.039115213, 0.16493502, -0.024419611, 0.031194514, -0.08743009, 0.15665506, 0.04331554, 0.078352496, 0.0865756, 0.14964156, 0.0052504106, 0.044633344, 0.09302538, 0.08001154, 0.06681685, 0.07500874, 0.08291349, -0.019042969, -0.02117365, -0.1657049, -0.090166956, 0.07063432, 0.012473366, -0.08670876, 0.034773394, -0.02497725, 0.012606534, -0.008891713, 0.001492587, -0.016645951, -0.094415836, 0.040960137, 0.018488104, -0.07697088, 0.1373402, 0.097844906, 0.08389005, -0.05600114, 0.075881265, -0.1275385, 0.057328656, 0.014093572, -0.1451416, 0.08566561, -0.06463623, -0.10904763, -0.051291727, 0.050187543, 0.03726751, -0.011014071, -0.10653548, -0.111039594, 0.023541538, 0.004928589, -0.12213135, -0.089899234, -0.03803045, -0.057084516, 0.06754103, -0.0063143643, 0.03427124, 0.22185725, 0.0148038, 0.0049327505, -0.037852895, -0.06916948, -0.19472434, -0.0649858, 0.12940702, -0.052592885, 0.107022375, 0.008150968, -0.11157782, -0.042236328, -0.022799404, 0.046031605, 0.10929454, -0.004705256, 0.006525213, 0.05543102, -0.0628995, -0.06668923, 0.02631725, -0.05559748, -0.1337738, 0.016512783, -0.06338778, -0.17221694, -0.028617166, -0.065138385, -0.036155008, 0.089377664, ...], [0.08287557, 0.0826416, 0.06524086, 0.037180584, 0.007334391, -0.040649414, -0.051605225, -0.15689214, 0.08514404, 0.15497844, 0.0048472085, -0.013264974, 0.012659709, 0.09476725, -0.034556706, 0.063090004, 0.026980082, 0.049856823, 0.043874104, -0.016255697, 0.036254883, 0.07067871, -0.059000652, -0.0014731089, 0.071634926, -0.012868245, -0.053273518, 0.04812622, 0.04282506, 0.0037434895, -0.08670553, 0.020209631, -0.015749613, 0.035550434, 0.022364298, 0.029851278, -0.07522583, 0.06434631, 0.046028137, 0.025021872, 0.036468506, -0.10751343, 0.07714844, -0.0028457642, 0.03074646, -0.024305979, 0.04122035, -0.12137858, 0.088643394, 0.0872701, -0.078679405, 0.058970135, -0.0025990803, -0.091181435, -0.06669108, -0.012817383, -0.028500875, -0.03829956, 0.014338176, -0.06524658, -0.15384929, 0.05472533, -0.11427816, -0.09490458, -0.09451294, -0.04926936, 0.116249084, 0.062235516, 0.046596527, 0.15892537, 0.06144206, -0.024823507, 0.060872395, 0.025533041, -0.08202108, -0.11945597, 0.081593834, 0.0037282307, 0.032503765, 0.0003744761, -0.03922526, -0.22275798, -0.018636068, 0.0056355796, 0.073903404, -0.06983948, -0.025390625, 0.09335327, -0.008656819, 0.046351116, 0.03605652, 0.043375652, -0.09618632, -0.08617592, -0.05771637, -0.094182335, 0.05159505, -0.104400635, -0.027852377, 0.08984375, ...]]","[0.7498019, 0.75414014, 0.73623145]",0.75414,0,3,0.666667,34,0.790698
4,15378,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"(CNN) -- Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat .\nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win .\nThe defeat leaves Cologne on bottom of the Bundesliga table after five games .,"Lukas Podolski scored his first goal since returning to Cologne but it was not enough as they crashed to a defeat 2-1 against Schalke which leaves them bottom of the German Bundesliga. Podolski celebrates his first goal since returning to Cologne, but it did not stop his side sliding to defeat. The hosts had already gone behind to a second-minute Jefferson Farfan strike when Podolski, who rejoined his boyhood club from Bayern Munich in the summer, levelled matters with six minutes played. But hopes Podolski's strike would inspire the home side fell flat a minute after half-time when Levan Kobiashvili scored a winner to move Schalke up to third in the table. Defeat for Cologne leaves them rock bottom with just one point from their opening five games. Schalke's early goal came courtesy of a corner to the far post where Gerald Asamoah was waiting to head the ball back across goal. Farfan was alert to the opportunity and sent his flying header into the back of the net for his second of the campaign. The mood inside the RheinEnergie Stadion improved significantly four minutes later when Podolski latched onto a through-ball before coolly stepping around Manuel Neuer to stroke home. The decisive moment in the game arrived just a minute after the interval when the dangerous Farfan attracted the home defence in the middle of the area before laying off a pass for Kobiashvili to thrash left-footed into the net. Meanwhile, in the day's other Bundesliga match, Werder Bremen had the better of the chances but were held to a 0-0 draw by Hannover.",Lukas Podolski scores his first goal since returning to Cologne in 2-1 defeat. \nJefferson Farfan and Levan Kobiashvili scored the goals that gave Schalke win. \nThe defeat leaves Cologne on bottom of the Bundesliga table after five games.,"[Defeat, for, Cologne, leaves, them, rock, bottom, with, just, one, point, from, their, opening, five, games, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.08892441, 0.06593567, 0.03864813, 0.0891037, -0.0019760132, -0.009052277, -0.04333687, -0.1583786, 0.026489258, 0.12824154, -0.023043633, -0.032043457, 0.0027999878, 0.061714172, -0.079055786, -0.012519836, 0.05416298, 0.063913345, -0.003730774, -0.024141312, -0.014410019, 0.07348633, -0.0041570663, -0.017017365, 0.07906723, -0.00019073486, -0.049682617, 0.035068512, 0.031032562, -0.01411438, -0.05053425, 0.036685944, -0.046702385, 0.002589941, 0.009580612, 0.03479767, -0.011993408, 0.022119522, 0.011795044, 0.029209137, 0.096069336, -0.11567688, 0.07014465, -0.009469509, 0.0063228607, -0.030568123, -0.00097084045, -0.033645034, 0.03501892, 0.077308655, -0.0500679, 0.1170845, 0.0012397766, -0.055680275, -0.017494202, -0.00951004, -0.07109451, -0.038318634, 0.037213326, -0.04403305, -0.081645966, 0.053592682, -0.046687126, -0.1666336, -0.09030151, -0.048664093, 0.042922974, 0.052814484, -0.0073871613, 0.14949036, -0.0022411346, -0.04584503, 0.044525146, -0.015764236, -0.104904175, -0.13942719, 0.0076036453, 0.08078766, 0.042621613, 0.022136688, -0.029376984, -0.13811493, 0.023611069, 6.1035156e-05, -0.007741928, -0.0694046, -0.061828613, 0.05001831, 0.010902405, 0.02576828, 0.0065710545, 0.036857605, -0.06416702, -0.08675718, -0.036087036, -0.05927658, 0.083602905, -0.0036735535, -0.037534714, 0.072647095, ...]","[[0.13956158, 0.11025766, 0.035254844, 0.020601712, 0.02292809, -0.09172117, -0.052386943, -0.19335938, 0.11620624, 0.06234976, -0.01558744, -0.061077412, 0.045069475, 0.06829834, -0.05606989, 0.0028029222, 0.07958515, 0.10207602, 0.051513672, -0.036817405, 0.04653696, 0.078862116, 0.0033663237, -0.0021597056, 0.0034132737, -0.015256441, -0.027314406, 0.0029437726, 0.009549654, 0.065216064, -0.016995944, -0.08030348, -0.013310359, -0.06997446, -0.024423452, 0.029752292, -0.083364636, 0.07654747, 0.06257072, 0.08146785, 0.10569411, -0.030883789, 0.07677753, 0.04806284, 0.04081374, 0.044950046, 0.08163687, -0.016081883, 0.06109619, 0.094242975, -0.07481502, 0.048865683, -0.021474985, -0.0101095345, -0.031062199, 0.030146085, -0.06642503, -0.01903358, -0.102703385, -0.060528096, -0.01566256, 0.024376502, -0.063495345, -0.1263099, -0.0487577, -0.051095083, 0.12062425, 0.028493442, 0.04538668, 0.22952975, 0.035963792, -0.022932786, -0.0095402645, 0.04220229, -0.08510179, -0.07123272, 0.07494413, -0.038639948, 0.04403452, 0.039926384, -0.048346885, -0.16545223, 0.0055213342, 0.0019437349, 0.051926833, -0.02926401, -0.012282152, 0.028243138, 0.06518085, -0.016564002, 0.022944523, -0.0024977464, -0.056434043, -0.0046686027, -0.06918804, -0.048183735, -0.0055213342, -0.07339243, -0.04894785, 0.04185838, ...], [0.13324529, 0.023004705, 0.056737725, 0.025457209, 0.14613897, -0.1620428, 0.0014703925, -0.1521218, 0.039115213, 0.16493502, -0.024419611, 0.031194514, -0.08743009, 0.15665506, 0.04331554, 0.078352496, 0.0865756, 0.14964156, 0.0052504106, 0.044633344, 0.09302538, 0.08001154, 0.06681685, 0.07500874, 0.08291349, -0.019042969, -0.02117365, -0.1657049, -0.090166956, 0.07063432, 0.012473366, -0.08670876, 0.034773394, -0.02497725, 0.012606534, -0.008891713, 0.001492587, -0.016645951, -0.094415836, 0.040960137, 0.018488104, -0.07697088, 0.1373402, 0.097844906, 0.08389005, -0.05600114, 0.075881265, -0.1275385, 0.057328656, 0.014093572, -0.1451416, 0.08566561, -0.06463623, -0.10904763, -0.051291727, 0.050187543, 0.03726751, -0.011014071, -0.10653548, -0.111039594, 0.023541538, 0.004928589, -0.12213135, -0.089899234, -0.03803045, -0.057084516, 0.06754103, -0.0063143643, 0.03427124, 0.22185725, 0.0148038, 0.0049327505, -0.037852895, -0.06916948, -0.19472434, -0.0649858, 0.12940702, -0.052592885, 0.107022375, 0.008150968, -0.11157782, -0.042236328, -0.022799404, 0.046031605, 0.10929454, -0.004705256, 0.006525213, 0.05543102, -0.0628995, -0.06668923, 0.02631725, -0.05559748, -0.1337738, 0.016512783, -0.06338778, -0.17221694, -0.028617166, -0.065138385, -0.036155008, 0.089377664, ...], [0.08287557, 0.0826416, 0.06524086, 0.037180584, 0.007334391, -0.040649414, -0.051605225, -0.15689214, 0.08514404, 0.15497844, 0.0048472085, -0.013264974, 0.012659709, 0.09476725, -0.034556706, 0.063090004, 0.026980082, 0.049856823, 0.043874104, -0.016255697, 0.036254883, 0.07067871, -0.059000652, -0.0014731089, 0.071634926, -0.012868245, -0.053273518, 0.04812622, 0.04282506, 0.0037434895, -0.08670553, 0.020209631, -0.015749613, 0.035550434, 0.022364298, 0.029851278, -0.07522583, 0.06434631, 0.046028137, 0.025021872, 0.036468506, -0.10751343, 0.07714844, -0.0028457642, 0.03074646, -0.024305979, 0.04122035, -0.12137858, 0.088643394, 0.0872701, -0.078679405, 0.058970135, -0.0025990803, -0.091181435, -0.06669108, -0.012817383, -0.028500875, -0.03829956, 0.014338176, -0.06524658, -0.15384929, 0.05472533, -0.11427816, -0.09490458, -0.09451294, -0.04926936, 0.116249084, 0.062235516, 0.046596527, 0.15892537, 0.06144206, -0.024823507, 0.060872395, 0.025533041, -0.08202108, -0.11945597, 0.081593834, 0.0037282307, 0.032503765, 0.0003744761, -0.03922526, -0.22275798, -0.018636068, 0.0056355796, 0.073903404, -0.06983948, -0.025390625, 0.09335327, -0.008656819, 0.046351116, 0.03605652, 0.043375652, -0.09618632, -0.08617592, -0.05771637, -0.094182335, 0.05159505, -0.104400635, -0.027852377, 0.08984375, ...]]","[0.6759272, 0.51974016, 0.8333621]",0.833362,1,4,0.555556,17,0.395349


## Event Features: Verbs (counts)
Verbs are integral to conveying actions and events within a text. In news articles, for instance, verbs play a crucial role in reporting what happened, when, and to whom. A higher frequency of verbs in a sentence might indicate that it is action-packed or event-rich, which could be a characteristic of summary-worthy content.

In [None]:
# For smaller datasets.

#def count_verbs_normalized(text):
 #   doc = nlp(text)
  #  verb_count = sum(1 for token in doc if token.pos_ == 'VERB')
   # total_tokens = len(doc)  # Total number of tokens in the sentence.
    #return verb_count / total_tokens if total_tokens > 0 else 0

#for name, df in datasets.items():
    # Normalize the verb counts for each sentence
 #   df['normalized_verb_count'] = df['sentences'].apply(lambda x: ' '.join(x)).apply(count_verbs_normalized)

  #  datasets[name] = df

In [None]:
# For larger datasets. but takes too much time, like NE counts

#def count_verbs_normalized_from_tokens(tokens):
    # Load the SpaCy model inside the function
 #   import spacy
  #  nlp = spacy.load('en_core_web_sm')

    # Join tokens into a single string
   # text = ' '.join(tokens)
#    doc = nlp(text)
 #   verb_count = sum(1 for token in doc if token.pos_ == 'VERB')
  #  total_tokens = len(doc)  # Total number of tokens in the text
   # return verb_count / total_tokens if total_tokens > 0 else 0

#for name, df in datasets.items():
 #   df['normalized_verb_count'] = df['sentences'].parallel_apply(count_verbs_normalized_from_tokens)
  #  datasets[name] = df

In [45]:
# Check columns + results. All ok.
for name, df in datasets.items(): print(f"Columns in {name} dataset:", df.columns)
datasets['train'][['sentence_length', 'normalized_sentence_length']].head(6)
#datasets['validation'].head(6)
#datasets['test'].head(2)

Columns in train dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='object')
Columns in validation dataset: Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='object')
Columns in test dataset: Index(['Unnamed: 0', 'article', 'highlights', 'i

Unnamed: 0,sentence_length,normalized_sentence_length
0,35,0.813953
1,21,0.488372
2,36,0.837209
3,34,0.790698
4,17,0.395349
5,26,0.604651


In [44]:
# check how many columns created so far. All ok.
for name, df in datasets.items():
    print(f"THe number of columns in the {name} dataset is", len(df.columns), "and these are: ", df.columns)

THe number of columns in the train dataset is 19 and these are:  Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='object')
THe number of columns in the validation dataset is 19 and these are:  Index(['Unnamed: 0', 'article', 'highlights', 'id', 'internet-free_art',
       'internet-free_high', 'boiler-free_art', 'boiler-free_high',
       'sentences', 'high_tokenized', 'mean_sent_embeddings_art',
       'mean_sent_embeddings_high', 'cosine_similarity_scores',
       'max_cosine_similarity', 'binary_labels', 'sentence_index',
       'position_score', 'sentence_length', 'normalized_sentence_length'],
      dtype='obje

In [46]:
for name, df in datasets.items(): 
    print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1112078 entries, 0 to 1112077
Data columns (total 19 columns):
 #   Column                      Non-Null Count    Dtype  
---  ------                      --------------    -----  
 0   Unnamed: 0                  1112078 non-null  int64  
 1   article                     1112078 non-null  object 
 2   highlights                  1112078 non-null  object 
 3   id                          1112078 non-null  object 
 4   internet-free_art           1112078 non-null  object 
 5   internet-free_high          1112078 non-null  object 
 6   boiler-free_art             1112078 non-null  object 
 7   boiler-free_high            1112078 non-null  object 
 8   sentences                   1112078 non-null  object 
 9   high_tokenized              1112078 non-null  object 
 10  mean_sent_embeddings_art    1112078 non-null  object 
 11  mean_sent_embeddings_high   1112078 non-null  object 
 12  cosine_similarity_scores    1112078 non-null  object 
 1

In [48]:
columns_to_drop = ['high_sent_embeddings', 'article', 'highlights', 'internet-free_art', 
                   'internet-free_high', 'sentence_length', 'cosine_similarity_scores', 
                   'mean_sent_embeddings_high', 'boiler-free_art', 'boiler-free_high']

for name, df in datasets.items():
    df.drop(columns=[col for col in columns_to_drop if col in df.columns], axis=1, inplace=True)

In [49]:
for name, df in datasets.items(): 
    print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1112078 entries, 0 to 1112077
Data columns (total 10 columns):
 #   Column                      Non-Null Count    Dtype  
---  ------                      --------------    -----  
 0   Unnamed: 0                  1112078 non-null  int64  
 1   id                          1112078 non-null  object 
 2   sentences                   1112078 non-null  object 
 3   high_tokenized              1112078 non-null  object 
 4   mean_sent_embeddings_art    1112078 non-null  object 
 5   max_cosine_similarity       1112078 non-null  float64
 6   binary_labels               1112078 non-null  int64  
 7   sentence_index              1112078 non-null  int64  
 8   position_score              1112078 non-null  float64
 9   normalized_sentence_length  1112078 non-null  float64
dtypes: float64(3), int64(3), object(4)
memory usage: 84.8+ MB
None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 44844 entries, 0 to 44843
Data columns (total 10 columns):
 #

In [52]:
datasets['train'].head()

Unnamed: 0.1,Unnamed: 0,id,sentences,high_tokenized,mean_sent_embeddings_art,max_cosine_similarity,binary_labels,sentence_index,position_score,normalized_sentence_length
0,15378,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"[Lukas, Podolski, scored, his, first, goal, since, returning, to, Cologne, but, it, was, not, enough, as, they, crashed, to, a, defeat, 2, -, 1, against, Schalke, which, leaves, them, bottom, of, the, German, Bundesliga, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.11470295, 0.075837694, 0.035587706, 0.01364241, -0.03404762, -0.080190465, -0.009287867, -0.12817383, 0.075314224, 0.10144832, 0.014489535, -0.07114384, 0.010094873, 0.051844105, -0.08103456, 0.047095988, 0.05078651, 0.11288952, 0.040007494, -0.01411563, 0.01991364, 0.07349791, 0.015920179, -0.0034584834, 0.067733236, -0.034187052, -0.06529078, 0.013993888, 0.020629358, 0.041065086, -0.032926865, -0.027613146, -0.016833074, -0.075299494, 0.009965568, 0.04200797, -0.028869629, 0.03187403, 0.04321618, 0.054114375, 0.101246804, -0.060622644, 0.1292935, 0.004960159, 0.019059017, 0.0054305503, 0.06652148, -0.049942147, 0.083613954, 0.05633545, -0.05737252, 0.041719634, -0.005005935, -0.02816299, -0.053125843, 0.03927928, -0.047921542, -0.024672937, -0.05697474, -0.04516233, -0.04006853, 0.031866666, -0.09612406, -0.100002944, -0.071878366, -0.047021996, 0.08755046, 0.055411242, -0.0044618803, 0.17888772, 0.025567416, 0.037632383, 0.01860099, 0.0021083436, -0.1098717, -0.084859915, 0.08041592, -0.016483702, 0.04156073, 0.02110343, -0.01001608, -0.16271973, -0.009807192, -0.002482447, 0.045052364, -0.023081813, -0.013436153, 0.103066415, -0.0020857186, -0.0045860554, 0.0072289826, -0.020229997, -0.054310765, 0.0021333366, -0.043879017, -0.121284746, -0.01109998, -0.04515865, -0.024664517, 0.039232977, ...]",0.873671,1,0,1.0,0.813953
1,15378,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"[Podolski, celebrates, his, first, goal, since, returning, to, Cologne, ,, but, it, did, not, stop, his, side, sliding, to, defeat, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.10943065, 0.09664457, 0.03386733, 0.026137408, -0.016226824, -0.08258595, -0.013872932, -0.18560432, 0.093229406, 0.03232709, -0.01847211, -0.10867848, 0.009917764, 0.06313548, -0.07483067, 0.03030732, 0.07075411, 0.10902674, 0.115575455, -0.052046157, 0.038380343, 0.13636331, -0.0018382353, -0.0043011834, 0.03845933, -0.03297155, -0.018965777, 0.04153263, 0.08156092, 0.027806899, -0.008565524, -0.038485806, -0.036301557, 0.009051154, 0.042114258, 0.018177705, -0.015090045, 0.002532959, 0.03969125, 0.045721617, 0.084190816, -0.05550968, 0.16026396, -0.006606158, 0.005577985, 0.039450254, 0.009426341, -0.013201769, 0.11293299, 0.05029297, -0.023214003, 0.09192433, -0.0001077091, 0.010607551, 0.012242935, 0.05610208, -0.06297302, -0.04613293, -0.023265166, -0.07540086, -0.009399414, 0.068258844, -0.07126034, -0.1408835, -0.022717644, -0.03403013, 0.0645532, 0.011725482, -0.035328586, 0.21449909, 0.058403462, -0.0014720244, 0.030047249, 0.05234662, -0.112836055, -0.115231685, 0.075868495, 0.017412972, 0.03304874, 0.04683012, -0.03270228, -0.14306742, 0.016286064, -0.035761215, 0.061609603, -0.048142377, -0.033554975, 0.0216695, 0.003209731, 0.0024557675, 0.028409172, -0.05585076, -0.046402875, 0.021154515, -0.053940717, -0.07708292, -0.018946031, -0.024581011, -0.06407973, 0.042293772, ...]",0.823163,1,1,0.888889,0.488372
2,15378,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"[The, hosts, had, already, gone, behind, to, a, second, -, minute, Jefferson, Farfan, strike, when, Podolski, ,, who, rejoined, his, boyhood, club, from, Bayern, Munich, in, the, summer, ,, levelled, matters, with, six, minutes, played, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.10468371, 0.120787814, 0.01572813, 0.046383232, 0.029959843, -0.051225334, -0.032605402, -0.15332505, 0.08026123, 0.0971606, -0.023903683, -0.04830617, -0.03349896, 0.05981235, -0.034837656, 0.043309703, 0.11463875, 0.09354585, 0.031407848, 0.011410417, -0.03126368, 0.10256432, -0.022325186, 0.018040624, 0.052536536, 0.008767293, -0.031301036, -0.009693014, 0.009272608, 0.050306648, -0.0053776708, -0.041120857, -0.0031590955, 0.018289829, 0.017000396, -0.007870642, -0.018386316, -0.06571132, -0.012873025, 0.08089579, 0.041360788, -0.0596545, 0.10084271, -0.006306615, -0.0070022056, 0.02464768, 0.04359173, -0.043332856, 0.088512026, 0.072368756, -0.052325018, 0.017062483, -0.05300193, -0.07456694, -0.039218243, 0.043559898, 0.0038136449, -0.051274266, -0.06698503, -0.052946948, -0.07824497, 0.049987793, -0.08836418, -0.10738505, -0.035947602, -0.020674344, 0.013636095, -0.031260785, -0.010328622, 0.12344255, 0.029170595, 0.01999901, 0.019623328, 0.029873552, -0.12295006, -0.06655674, 0.056856878, -0.009502542, 0.062678896, 0.012015508, -0.0048649227, -0.099480994, 0.032765355, 0.0014111749, 0.06629155, -0.050739158, -0.07177524, 0.042763676, -0.03504523, -0.0012196508, 0.03401447, 0.021926355, -0.06334607, -0.028522754, -0.09807192, -0.09360951, -0.059129912, -0.063443415, -0.048358258, 0.016005944, ...]",0.755216,0,2,0.777778,0.837209
3,15378,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"[But, hopes, Podolski, 's, strike, would, inspire, the, home, side, fell, flat, a, minute, after, half, -, time, when, Levan, Kobiashvili, scored, a, winner, to, move, Schalke, up, to, third, in, the, table, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.07752369, 0.05999982, 0.059902616, 0.039984807, 0.040477045, -0.11609339, 0.054619685, -0.12372504, 0.1007713, 0.12054669, -0.029032389, -0.038427282, -0.02876197, 0.034073442, -0.043869585, 0.036291618, 0.06349012, 0.08969003, 0.058826517, 0.00390625, 0.045642994, 0.06965538, -0.0033038105, -0.011484499, 0.08711638, -0.016531838, -0.040386625, -0.020360876, 0.0175996, 0.06249661, -0.060583893, -0.009650337, -0.03514721, -0.03803253, 0.023111979, -0.00022605613, -0.055812128, 0.01767759, -0.012566037, 0.064889416, 0.057531286, -0.11503092, 0.13779478, 0.0020164207, -0.012339274, -0.0014772768, 0.025131226, -0.025908293, 0.0754485, 0.0419142, -0.050281666, 0.03436053, -0.008234942, 0.0032439055, -0.055067275, 0.021447923, 0.027026141, -0.0024493183, -0.050672743, -0.046461884, -0.030760024, -0.014721482, -0.13037561, -0.12716335, -0.060149018, -0.012738263, 0.036908183, 0.0497931, 0.056382354, 0.14456742, 0.05119041, -0.018866645, 0.07549371, 0.039849177, -0.15531752, -0.08079699, 0.052972864, 0.024946425, 0.041328713, 0.033866882, 0.0026527687, -0.084141836, 0.0021497938, 0.009293168, 0.10914216, -0.023491753, -0.030592741, 0.080115564, -0.01858238, 0.017742015, 0.044483326, 0.0026041667, -0.07914904, -0.041473813, -0.1088675, -0.11467206, -0.051968046, -0.0053055375, -0.06926134, 0.01671685, ...]",0.75414,0,3,0.666667,0.790698
4,15378,ac2e59e27564f0c77b03513cdea6616d8dd8361d,"[Defeat, for, Cologne, leaves, them, rock, bottom, with, just, one, point, from, their, opening, five, games, .]","[[Lukas, Podolski, scores, his, first, goal, since, returning, to, Cologne, in, 2, -, 1, defeat, .], [Jefferson, Farfan, and, Levan, Kobiashvili, scored, the, goals, that, gave, Schalke, win, .], [The, defeat, leaves, Cologne, on, bottom, of, the, Bundesliga, table, after, five, games, .]]","[0.08892441, 0.06593567, 0.03864813, 0.0891037, -0.0019760132, -0.009052277, -0.04333687, -0.1583786, 0.026489258, 0.12824154, -0.023043633, -0.032043457, 0.0027999878, 0.061714172, -0.079055786, -0.012519836, 0.05416298, 0.063913345, -0.003730774, -0.024141312, -0.014410019, 0.07348633, -0.0041570663, -0.017017365, 0.07906723, -0.00019073486, -0.049682617, 0.035068512, 0.031032562, -0.01411438, -0.05053425, 0.036685944, -0.046702385, 0.002589941, 0.009580612, 0.03479767, -0.011993408, 0.022119522, 0.011795044, 0.029209137, 0.096069336, -0.11567688, 0.07014465, -0.009469509, 0.0063228607, -0.030568123, -0.00097084045, -0.033645034, 0.03501892, 0.077308655, -0.0500679, 0.1170845, 0.0012397766, -0.055680275, -0.017494202, -0.00951004, -0.07109451, -0.038318634, 0.037213326, -0.04403305, -0.081645966, 0.053592682, -0.046687126, -0.1666336, -0.09030151, -0.048664093, 0.042922974, 0.052814484, -0.0073871613, 0.14949036, -0.0022411346, -0.04584503, 0.044525146, -0.015764236, -0.104904175, -0.13942719, 0.0076036453, 0.08078766, 0.042621613, 0.022136688, -0.029376984, -0.13811493, 0.023611069, 6.1035156e-05, -0.007741928, -0.0694046, -0.061828613, 0.05001831, 0.010902405, 0.02576828, 0.0065710545, 0.036857605, -0.06416702, -0.08675718, -0.036087036, -0.05927658, 0.083602905, -0.0036735535, -0.037534714, 0.072647095, ...]",0.833362,1,4,0.555556,0.395349


## Create the train input for ML training. 
Keep only relevant columns for machine learning binary classification.

In [53]:
columns_to_keep = ['id', 'sentence_index', 'mean_sent_embeddings_art', 
                   'max_cosine_similarity', 'position_score',  
                   'normalized_sentence_length', 'binary_labels']

# Creating new dataframes with only the specified columns.
machinelearning_datasets = {}
for name, df in datasets.items():
    machinelearning_datasets[name] = df[columns_to_keep].copy()

# Save the new dataframes to CSV files.
for name, df in machinelearning_datasets.items():
    file_name = f"final_{name}_machinelearning_features.csv"
    df.to_csv(file_name, index=False)
    
# TIME: 30'

In [54]:
machinelearning_datasets['test'].head()

Unnamed: 0,id,sentence_index,mean_sent_embeddings_art,max_cosine_similarity,position_score,normalized_sentence_length,binary_labels
0,f001ec5c4704938247d27a44948eebb37ae98d01,0,"[0.01863037, 0.007519531, 0.045720827, -0.026210938, -0.016850585, -0.12905762, 0.05534668, -0.051710814, 0.08017578, 0.08408692, 0.018985596, 0.014121094, 0.00055541995, 0.06067627, -0.14268342, 0.0023498535, 0.012729492, 0.07739258, 0.03246643, 0.012792969, 0.008016128, 0.032766115, -0.0059484863, 0.059751738, 0.03687866, -0.070896454, -0.061679993, 0.07059326, 0.011687622, 0.062841795, 0.06010498, -0.078544006, -0.048083495, 0.065682374, 0.012094727, -0.01517334, 0.0013354492, 0.0106748585, 0.007697754, 0.06078247, 0.00064575195, -0.013206787, 0.034870606, 0.0022258759, -0.023394775, -0.067088015, -0.031342775, -0.012390137, -0.039172363, 0.09900635, 0.013930664, 0.03378662, -0.00018798828, 0.010806885, -0.045269776, 0.06594696, -0.099726565, -0.050844725, -0.032305297, -0.076411135, -0.05074951, 0.046586916, 0.007199707, -0.0377417, -0.065104984, 0.0017022705, 0.044594727, 0.10273926, -0.03487915, -0.042370606, 0.002319336, 0.01404419, 0.06896485, 0.043642577, 0.005576172, -0.11226318, 0.10727539, 0.052562255, 0.12300293, 0.06959167, 0.014470215, -0.07764892, 0.028283691, 0.059782714, -0.031777345, -0.11373596, -0.059125975, 0.054311525, 0.04573303, -0.0031396484, 0.08372071, 0.003527832, -0.078933716, -0.1238977, -0.051153563, -0.019090882, 0.051157836, 0.049095612, 0.00028167723, -0.00041919708, ...]",0.827125,1.0,0.638298,1
1,f001ec5c4704938247d27a44948eebb37ae98d01,1,"[-0.026345147, 0.072157115, 0.086598925, 0.02704027, -0.0095342, -0.07687039, 0.016142104, -0.12132263, 0.11073409, 0.10550944, 0.0004035102, -0.059637282, 0.04836358, 0.062689885, -0.08909183, 0.008592393, 0.022254096, 0.016411675, -0.05710178, -0.028415255, -0.0023735894, 0.08509488, -0.018029107, 0.026077695, 0.06347656, -0.060458712, -0.07486386, 0.06870524, 0.054563735, 0.024275038, 0.040771484, -0.09765625, -0.061143663, 0.03068712, 0.011206733, -0.0019599067, -0.059258357, 0.009733412, 0.028449165, -0.03638649, 0.040881686, -0.027676053, 0.039364286, 0.033466656, 0.038533527, -0.010462443, -0.025438096, 0.02178955, -0.024007162, 0.14838324, 0.05429416, -0.013916016, -0.0047573512, -0.05378045, -0.04107666, 0.059702132, -0.13191731, -0.06272295, -0.03536966, -0.06548394, -0.10267809, 0.020063613, -0.07535807, -0.13737996, -0.039523654, 0.03265381, 0.014987522, 0.09775119, -0.0018920898, 0.028137207, 0.03901503, 0.028355546, 0.07041422, 0.05050998, -0.0011342367, -0.051596746, 0.15212673, 0.09789361, 0.06604004, 0.15382893, 0.04693773, -0.1507348, 0.014619615, 0.026352776, 0.007529365, -0.14176051, -0.0871582, 0.006863064, 0.025482178, 0.025767008, 0.05051337, -0.006391737, -0.06198035, -0.062211778, -0.011873033, -0.0134887695, 0.0381741, -0.025986565, -0.059024386, 0.0056508384, ...]",0.613928,0.961538,0.468085,0
2,f001ec5c4704938247d27a44948eebb37ae98d01,2,"[0.023723347, 0.05842082, 0.073261514, -0.0004679362, -0.035168458, -0.10735931, 0.026655579, -0.042024232, 0.06806844, 0.10252278, -0.035551976, -0.0163208, -0.046455894, 0.036691286, -0.12680486, -0.026194254, -0.006606038, 0.029671224, 0.05464274, 0.016045125, -0.010900879, 0.012538147, -0.017256673, 0.003009669, 0.03470459, -0.020236524, -0.052922565, 0.048105877, 0.0011123657, 0.02070516, 0.058365885, -0.107573256, -0.093984984, 0.04512749, -0.002255249, -0.014235433, -0.014418539, -0.018254599, 0.004402669, 0.06399231, 0.05887451, 0.018044282, 0.06647949, -0.014736303, -0.0051574707, -0.020678965, -0.018784586, 0.0013926189, -0.024125163, 0.082457475, 0.048760988, -0.020689901, -0.045590464, -0.0068850834, -0.08009898, 0.072246805, -0.10896301, -0.05167338, -0.04591592, -0.062778726, -0.023938498, 0.09613037, -0.009847005, -0.06256307, -0.012492879, -0.0050877887, -0.041153207, 0.07437337, -0.033294678, 0.03435669, -0.001558431, -0.0043986, 0.042836506, 0.05767924, 0.0016510009, -0.13587646, 0.07966105, 0.072630815, 0.05538737, 0.051206462, -0.017904663, -0.07586568, -0.0009536743, 0.04328054, -0.04178365, -0.1328659, -0.054223634, 0.03669332, 0.02167155, 0.0015818278, 0.048583984, -0.0215861, -0.07746785, -0.12525775, -0.04193217, -0.0346639, 0.024420166, -5.9763593e-06, 0.0056744893, -0.009495449, ...]",0.869043,0.923077,0.851064,1
3,f001ec5c4704938247d27a44948eebb37ae98d01,3,"[0.024536133, 0.07804362, 0.05803935, 0.016611734, -0.025812784, -0.09988276, 0.05537033, -0.093561806, 0.067466736, 0.07181803, -0.012496948, -0.037377674, -0.007540385, 0.0893453, -0.10045942, 0.033616383, 0.0013368925, 0.030382792, 0.04282506, -0.022428274, 0.033867914, 0.017810822, -0.037118275, -0.025906563, 0.013827006, 0.0013936361, -0.054079056, 0.011583964, 0.054445267, 0.029413858, -0.015314102, -0.05296834, -0.055914562, 0.02390035, -0.029424032, 0.004122416, 0.019201914, -0.02466329, 0.0657959, 0.015950521, 0.07551321, -0.008098602, 0.077461876, 0.024269104, -0.040922802, -0.017993292, -0.034449894, 0.013195038, -0.035442352, 0.09509119, 0.05739339, 0.012560527, -0.035377502, -0.051223755, -0.059468586, 0.041417122, -0.083016716, -0.032197315, -0.04505666, -0.06969961, -0.10239156, 0.05022494, -0.026313782, -0.055730183, -0.048683167, 0.032211304, 0.0012842814, 0.09145101, -0.056292217, -0.0055745444, -0.018351236, 0.03271675, 0.12825012, 0.040224712, -0.055381138, -0.14956792, 0.13104248, 0.09748459, 0.059755962, 0.09383138, 0.004931132, -0.025268555, -0.014475505, 0.05558141, 0.016381582, -0.044614155, -0.081970215, 0.09496053, -0.013151805, 0.015131633, 0.066249214, 0.018593946, -0.087430954, -0.062795006, -0.022773743, -0.060980797, 0.058750153, 0.0143966675, 0.03346761, -0.015099525, ...]",0.855858,0.884615,0.595745,1
4,f001ec5c4704938247d27a44948eebb37ae98d01,4,"[0.014750163, 0.044555664, 0.01999251, 0.07670593, -0.053777058, -0.019012451, 0.054976147, 0.027837118, 0.11566162, 0.06413778, 0.018636068, -0.069356285, -0.049402874, 0.07106781, -0.06078084, -0.0146484375, 0.08955002, 0.038475037, 0.04212443, -0.009557088, 0.003833453, 0.029724121, 0.104044594, 0.045429867, 0.028564453, -0.04714203, -0.11229452, 0.002937317, 0.025887808, 0.019226074, 0.08543905, -0.07371012, -0.07902527, 0.050565083, -0.010945638, 0.01764234, 0.060028076, 0.009953816, 0.049524944, -0.032310486, 0.06085205, 0.05090332, 0.059143066, -0.030644894, -0.04524231, -0.03781764, -0.020767212, 0.025283813, 0.056035995, 0.041770935, 0.025553385, -0.0024464924, -0.058741253, 0.023305258, -0.019907633, 0.04956754, -0.044362385, -0.005971273, 0.0110321045, -0.039449055, -0.08745321, 0.022196451, -0.10231527, -0.060190838, -0.0209198, 0.045686085, 0.038224537, 0.08543905, -0.06208547, -0.018829346, 0.019500732, 0.0121816, 0.12775676, 0.04040019, -0.06271362, -0.12047323, 0.104024254, 0.11860657, 0.058247883, 0.042551678, 0.030255953, -0.058222454, 0.019907633, 0.052256268, 0.014574051, -0.067855835, -0.099100746, 0.14611816, 0.031743366, -0.0024236043, 0.13309224, 0.035939533, -0.08443705, -0.10168457, 0.011383057, -0.0714035, -0.006375631, 0.013809204, -0.05119578, 0.016805014, ...]",0.708244,0.846154,0.361702,0


In [56]:
feature_columns = ['id', 'sentence_index', 'mean_sent_embeddings_art', 
                   'max_cosine_similarity', 'position_score', 
                   'normalized_sentence_length']

# Create the features dataframe.
test_features_df = machinelearning_datasets['test'][feature_columns]

# Create the labels dataframe.
test_labels_df = machinelearning_datasets['test'][['binary_labels']]

In [57]:
test_features_df.to_csv('final_test_features.csv', index=False)

test_labels_df.to_csv('final_test_labels.csv', index=False)

# TIME: 8'

In [None]:
# test_features_df.head()

In [58]:
feature_columns = ['mean_sent_embeddings_art', 
                   'max_cosine_similarity', 'position_score', 
                   'normalized_sentence_length']

# Initialize dictionaries to hold the new dataframes
train_val_features = {}
train_val_labels = {}

# Splitting the train and validation dataframes
for name in ['train', 'validation']:
    # Create the features dataframe
    train_val_features[name] = machinelearning_datasets[name][feature_columns]

    # Create the labels dataframe
    train_val_labels[name] = machinelearning_datasets[name][['binary_labels']]


In [59]:
train_val_features.keys()

dict_keys(['train', 'validation'])

In [60]:
# Saving the feature dataframes
for name, df in train_val_features.items():
    file_name = f"final_{name}_features.csv"
    df.to_csv(file_name, index=False)

# Saving the label dataframes
for name, df in train_val_labels.items():
    file_name = f"final_{name}_labels.csv"
    df.to_csv(file_name, index=False)
    
# TIME: 20'

In [61]:
# Final_test df

columns_to_keep = ['id', 'sentences', 'high_tokenized']

test_df_summaries = datasets['test'][columns_to_keep]

test_df_summaries.to_csv("summary_generation.csv", index=False)


## Future Work

In [None]:
# thematic word: the top n words with the highest frequency in the cleaned text article.
# It shows that eight out of 10 of the top 10 words are contained in the summary while decrease harshly after the top 10 words. Therefore, the thematic
 # words were set as the top 10 words in the thematic word feature extraction
 # THE TOP 10 WORDS WITH THE HIGHEST FREQUENCY IN THE CLEANED TEXT ORDERED IN DESCDENDING ORDER. NAI
 # FIND TOP 10 verbs FOR THIS ARTICLE