In [1]:
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-xsum")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her "first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," referring to her false statements on the
2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly sneaking into the New York subway through an emergency exit, said Detective
Annette Markowski, a police spokeswoman. In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly after the marriages.
Any divorces happened only after such filings were approved. It was unclear whether any of the men will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))

[{'summary_text': 'A New York woman has pleaded not guilty to falsely claiming to be married 10 times, including to eight men from different countries, in what prosecutors say was an immigration scam.'}]


In [2]:
from datasets import load_dataset

dataset = load_dataset("xsum")

Found cached dataset xsum (C:/Users/srina/.cache/huggingface/datasets/xsum/default/1.2.0/082863bf4754ee058a5b6f6525d0cb2b18eadb62c7b370b095d1364050a52b71)


  0%|          | 0/3 [00:00<?, ?it/s]

In [3]:
dataset

DatasetDict({
    train: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 204045
    })
    validation: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11332
    })
    test: Dataset({
        features: ['document', 'summary', 'id'],
        num_rows: 11334
    })
})

In [31]:
text = dataset['train']['document'][3]
print(text)

John Edward Bates, formerly of Spalding, Lincolnshire, but now living in London, faces a total of 22 charges, including two counts of indecency with a child.
The 67-year-old is accused of committing the offences between March 1972 and October 1989.
Mr Bates denies all the charges.
Grace Hale, prosecuting, told the jury that the allegations of sexual abuse were made by made by four male complainants and related to when Mr Bates was a scout leader in South Lincolnshire and Cambridgeshire.
"The defendant says nothing of that sort happened between himself and all these individuals. He says they are all fabricating their accounts and telling lies," said Mrs Hale.
The prosecutor claimed Mr Bates invited one 15 year old to his home offering him the chance to look at cine films made at scout camps but then showed him pornographic films.
She told the jury that the boy was then sexually abused leaving him confused and frightened.
Mrs Hale said: "The complainant's recollection is that on a number

In [32]:
dataset['train']['summary'][3]

'A former Lincolnshire Police officer carried out a series of sex attacks on boys, a jury at Lincoln Crown Court was told.'

In [33]:
ARTICLE = dataset['train']['document'][3]
summary = list(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False)[0].values())[0]
print(summary)

A former police officer has gone on trial accused of sexually abusing two boys in the 1970s and 1980s, the Old Bailey has been told.


In [34]:
from flair.data import Sentence
from flair.models import SequenceTagger
from segtok.segmenter import split_single

tagger = SequenceTagger.load('ner-ontonotes')

sentence = [Sentence(sent, use_tokenizer=True) for sent in split_single(text)]
tagger.predict(sentence)
text_NER = []
for sent in sentence:
    for entity in sent.get_spans('ner'):
        text_NER.append((entity.text.strip(), entity.tag))
        
sentence = [Sentence(sent, use_tokenizer=True) for sent in split_single(summary)]
tagger.predict(sentence)
sum_NER = []
for sent in sentence:
    for entity in sent.get_spans('ner'):
        sum_NER.append((entity.text.strip(), entity.tag))

2023-09-21 12:24:22,366 SequenceTagger predicts: Dictionary with 75 tags: O, S-PERSON, B-PERSON, E-PERSON, I-PERSON, S-GPE, B-GPE, E-GPE, I-GPE, S-ORG, B-ORG, E-ORG, I-ORG, S-DATE, B-DATE, E-DATE, I-DATE, S-CARDINAL, B-CARDINAL, E-CARDINAL, I-CARDINAL, S-NORP, B-NORP, E-NORP, I-NORP, S-MONEY, B-MONEY, E-MONEY, I-MONEY, S-PERCENT, B-PERCENT, E-PERCENT, I-PERCENT, S-ORDINAL, B-ORDINAL, E-ORDINAL, I-ORDINAL, S-LOC, B-LOC, E-LOC, I-LOC, S-TIME, B-TIME, E-TIME, I-TIME, S-WORK_OF_ART, B-WORK_OF_ART, E-WORK_OF_ART, I-WORK_OF_ART, S-FAC


In [35]:
print(sum_NER)
print("\n")
print(text_NER)

[('two', 'CARDINAL'), ('1970s', 'DATE'), ('1980s', 'DATE'), ('the Old Bailey', 'ORG')]


[('John Edward Bates', 'PERSON'), ('Spalding', 'GPE'), ('Lincolnshire', 'GPE'), ('London', 'GPE'), ('22', 'CARDINAL'), ('two', 'CARDINAL'), ('67-year-old', 'DATE'), ('March 1972 and October 1989', 'DATE'), ('Bates', 'PERSON'), ('Grace Hale', 'PERSON'), ('four', 'CARDINAL'), ('Bates', 'PERSON'), ('South Lincolnshire', 'GPE'), ('Cambridgeshire', 'GPE'), ('Hale', 'PERSON'), ('Bates', 'PERSON'), ('one', 'CARDINAL'), ('15 year old', 'DATE'), ('Hale', 'PERSON'), ('second', 'ORDINAL'), ('Bates', 'PERSON'), ('weekend', 'DATE'), ('London', 'GPE'), ('13', 'DATE'), ('14', 'DATE'), ('Hale', 'PERSON'), ('two', 'CARDINAL'), ('Spalding', 'GPE'), ('Bates', 'PERSON'), ('RAF', 'ORG'), ('Lincolnshire Police', 'ORG'), ('between 1976 and 1983', 'DATE'), ('two weeks', 'DATE')]
