## SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization

----------------


Repository: https://github.com/andrejmiscic/simcls-pytorch

PyTorch reimplementation of the work described in [SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization](https://arxiv.org/abs/2106.01890) published at [ACL 2021](https://aclanthology.org/2021.acl-short.135.pdf).

This project was part of the reproducibility challenge in the Machine Learning II Course at [DataScience Master's at the University of Ljubljana](https://datascience.fri.uni-lj.si/).



In [1]:
!nvidia-smi

Sun Oct 17 08:56:42 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74       Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P8    26W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Setup

In [2]:
%%capture
!pip install transformers datasets sentencepiece

In [3]:
!git clone https://github.com/andrejmiscic/simcls-pytorch.git
!cp -R simcls-pytorch/src src/

Cloning into 'simcls-pytorch'...
remote: Enumerating objects: 58, done.[K
remote: Counting objects: 100% (58/58), done.[K
remote: Compressing objects: 100% (45/45), done.[K
remote: Total 58 (delta 16), reused 49 (delta 11), pack-reused 0[K
Unpacking objects: 100% (58/58), done.


In [4]:
from src.model import SimCLS, GeneratorType

## CNN/DailyMail

In [5]:
article = """London (CNN)If you're hunting for the earliest galaxies and clues about potential life on other planets you are going to need a very big mirror and a golf ball of gold. They are both necessary for the construction of The James Webb Space Telescope (JWST), intended as the successor to the Hubble instrument that has been operating in space for 25 years. It's going to be a tough act to follow. Hubble has returned spectacular images during the past quarter century but also helped scientists discover that almost every galaxy has a massive black hole at its heart and that the expansion of the universe is speeding up. But there are limits to how far it can see. Now scientists are working on an alternative way to peer into the past and search space for signs of life with JWST -- scheduled to launch in October 2018 on an Ariane 5 rocket from French Guiana. NASA spokesperson Lynn Chandler told CNN that the mission was like opening up the curtains on the universe and peering inside. "Hubble rewrote the text books and we're planning to rewrite the text books again," she said. "JWST will answer the questions which at the moment we can't think to ask." The Webb telescope is a big probe. Hubble is about the size of a school bus but JWST is as big as a tennis court. There isn't a rocket currently capable of carrying that so as Chandler explained: "It has to be folded up like a flower and then unfurled like a transformer." Named after James E. Webb, a former NASA leader, JWST is being designed to study the first stars and galaxies that formed in the early universe. NASA says that to see these objects the telescope will have to detect objects which are 10 to 100 times fainter than Hubble can currently see. Instead of studying visible and ultraviolet light like Hubble, the JWST will work in the infra-red spectrum, allowing scientists to detect more distant targets. The new telescope requires a huge mirror of 25 square meters (about 270 square feet) -- and a golf ball of gold (about 48 grams or 1.7 ounces) to optimize it for infra-red light. It is then coated with glass. But technology like this doesn't come cheap. According to NASA, the mission, which is in collaboration with the European Space Agency (ESA) and the Canadian Space Agency (CSA) and involves a total of 14 countries, will cost $8.5 billion. NASA says that the project has four main goals -- namely, to search for the first galaxies formed after the Big Bang, find out how galaxies evolved, observe the birth of stars and planets and investigate the potential for life on other planets . Scientists hope the telescope will be able to tell us more about objects that formed 13 billion years ago -- about 700-800 million years after the Big Bang. But closer to home, scientists also believe the new telescope will able to detect planets around nearby stars. NASA says JWST should be able to operate for between five and 10 years, restricted only by the amount of fuel it has to maintain orbit and the ability of the electronics to stand up to the harsh space environment. Opinion: Why astronomy counts on Earth ."""

In [6]:
summarizer = SimCLS(generator_type=GeneratorType.Bart,
                    generator_path="facebook/bart-large-cnn",
                    scorer_path="andrejmiscic/simcls-scorer-cnndm")

Downloading:   0%|          | 0.00/1.55k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.51G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/701 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/476M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

In [7]:
summary = summarizer(article)

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect."
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at  /pytorch/aten/src/ATen/native/BinaryOps.cpp:467.)
  return torch.floor_divide(self, other)


In [8]:
summary

'The James Webb Space Telescope will launch in October 2018. The telescope is designed to study the first galaxies formed after the Big Bang. It will also look for signs of life on other planets. The mission will cost $8.5 billion, according to NASA...'

## XSum

In [9]:
article = """In addition to original versions of the games, Halo: The Master Chief collection will also include a new visually-upgraded version of Halo 2.
Purchasers of the November release are also being promised "beta" access to the multiplayer version of the forthcoming Halo 5.
One company watcher said it illustrated the firm was listening to its fans.
"I think it's a smart move," said David Scarborough, from GamesTM magazine.
"It shows a willingness to satiate the appetite of Xbox hardcore gamers, which is what Microsoft sees the Halo franchise as - it's biggest hardcore gaming franchise.
"It also feeds into the hype for the next entry in the series.
"But I personally don't think it will be something that will incentivise people who haven't yet bought a new console to buy an Xbox One."
Xbox gamers who already bought the titles on earlier versions of the Xbox are being given an added reason to buy the games again: the package will also include access to watch a new live-action series called Halo: Nightfall.
The episodes, produced by Ridley Scott, are currently being filmed in Ireland, and are separate to the Steven Spielberg Halo TV series promised last year.
While new pre-rendered trailer footage of Halo 5 was shown to the audience at the firm's E3 press conference in Los Angeles, developer 343 Studios was not yet ready to demo gameplay or confirm a launch date.
Xbox chief Phil Spencer told the crowd at the Microsoft event  that Halo was the "reason Xbox is here today".
He also acknowledged that his firm had changed its strategy to take account of customer feedback - a nod to it abandoning restrictions on the sale of second-hand disks and releasing a cheaper version of the console without its Kinect voice/camera sensor.
He pointedly said at the event's outset that this year's conference would be dedicated exclusively to showing off new games, rather than sharing the time to show off other multimedia features.
Microsoft's rival Sony had exploited previous attempts to promote the Xbox as both a games machine and a means to control cable TV, pitching the PlayStation 4 as the machine for serious gamers. That move helped the Japanese company to enjoy stronger sales since both machines launched last November.
Other new titles on show for the Xbox One included the hyper-reality game Sunset Overdrive, which features a character who skids across rails blowing up people who have been turned into mutants by a poisonous fizzy drink.
The colourful game is an Xbox One exclusive and due out later this year.
Microsoft Game Studios showed off another new Xbox One exclusive at an earlier stage of development called Phantom Dust - which is billed as a "battle for reality". It had previously released an action-strategy title by the same name for the original Xbox a decade ago.
The in-house games publisher also previewed a dragon-fighting title - developed by Japan's Platinum Games - called Scalebound for the new console, and a fresh version of its open-world third-person shooter Crackdown, originally released for the Xbox 360.
One expert said offering such distinctive titles could prove critical to Microsoft's attempts to woo those who had not yet upgraded to a "new-gen" console.
"Since Xbox dropped the DRM [digital rights management] stuff before launch and then ditched the Kinect as a must-have about a month ago, the actual level of differentiation between PlayStation 4 and Xbox One has really shrunk," said Ed Barton, an analyst at the Ovum consultancy.
"So, one of the only differentiators left is exclusive game titles - there's not much left in to pick between them based on hardware."
Analysis from E3: Dave Lee, Los Angeles
First stop at this year's E3 was the Galen Center, a venue usually used for basketball, where Microsoft rolled out its portfolio for the coming year and a bit.
At its heart, the crowd-pleasing announcement of the Halo Master Collection - a complete compendium of the Halo series, plus a "beta" of the latest entry in the series, Halo 5 Guardians. It won the biggest cheer in the arena, for mostly nostalgia purposes.
But a new console can't rely on old franchises, even if they are of Halo's calibre. That's where Sunset Overdrive comes in - an (almost) open-world game with a lead character that, at first glance, is more than slightly irritating.
While the Xbox event was a typically loud, brash affair - it was intentionally no-nonsense. This was all about games, and the firm has set out a convincing staple for the year ahead, even if it is a little reliant on tried and tested brands.
It all points to a strong 2014 and 2015 for Xbox One. It needs to be if it is to claw back some of the ground lost to the PlayStation 4.
Tellingly, there was only a brief mention of the Kinect - a peripheral touted last year to be integral to the Xbox One experience.
Not so this time - only two titles mentioned the Kinect directly: a dancing game and a baffling music-creation game based on the old Disney film, Fantasia.
Later we get to see what Sony has to offer.
Square Enix picked the show to premiere a trailer for its next Lara Croft game, called Rise of the Tomb Raider. The character appears to be suffering post-traumatic stress disorder after her previous origins story, but little of the new plot was revealed.
Activision was more forthcoming with a long sequence from its forthcoming Call of Duty: Advanced Warfare game, showing soldiers fighting swarms of drones in a battle-hit city. However, there was only a brief glimpse of its star Kevin Spacey in the footage.
Several titles on show took advantage of the Xbox's extra processing power to offer fast-paced multiplayer campaigns."""

In [10]:
summarizer = SimCLS(generator_type=GeneratorType.Pegasus,
                    generator_path="google/pegasus-xsum",
                    scorer_path="andrejmiscic/simcls-scorer-xsum")

Downloading:   0%|          | 0.00/1.36k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.12G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/87.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.36M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/701 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/476M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

In [11]:
summary = summarizer(article)

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect."


In [12]:
summary

'Microsoft is to release a special edition of its Halo games for the Xbox One.'

## BillSum

In [13]:
document = """SECTION 1. SHORT TITLE.

    This Act may be cited as the ``State Innovation Pilot Act of 
2011''.

SEC. 2. PURPOSES.

    The purposes of this Act are--
            (1) to support State, local, and tribal leadership and 
        innovation in preparing all students to meet State-developed 
        college and career ready academic content standards and student 
        academic achievement standards, by establishing a process to 
        permit State, local, and tribal educational leaders to 
        implement alternative and innovative strategies to improve 
        student academic achievement and otherwise meet the purposes of 
        the Elementary and Secondary Education Act of 1965 (20 U.S.C. 
        6301 et seq.); and
            (2) to direct the Secretary of Education to defer to State, 
        local, and tribal judgments regarding how best to accomplish 
        the purposes of the Elementary and Secondary Education Act of 
        1965.

SEC. 3. WAIVERS OF STATUTORY AND REGULATORY REQUIREMENTS.

    Section 9401 of the Elementary and Secondary Education Act of 1965 
(20 U.S.C. 7861) is amended--
            (1) by striking subsection (a) and inserting the following:
    ``(a) In General.--
            ``(1) Request for waiver.--A State educational agency, 
        local educational agency, or Indian tribe that receives funds 
        under a program authorized under this Act may submit a request 
        to the Secretary to waive any statutory or regulatory 
        requirement of this Act.
            ``(2) Receipt of waiver.--Except as provided in subsection 
        (c), the Secretary shall waive any statutory or regulatory 
        requirement of this Act for a State educational agency, local 
        educational agency, Indian tribe, or school (through a local 
        educational agency), that submits a waiver request pursuant to 
        this subsection.'';
            (2) in subsection (b)--
                    (A) in paragraph (1)--
                            (i) in the matter preceding subparagraph 
                        (A), by inserting ``, which shall include a 
                        plan'' after ``waiver request to the 
                        Secretary'';
                            (ii) in subparagraph (B), by striking ``and 
                        how the waiving of those requirements will'' 
                        and all that follows through the end, and 
                        inserting a semicolon;
                            (iii) by redesignating subparagraph (E) as 
                        subparagraph (F); and
                            (iv) by striking subparagraphs (C) and (D), 
                        and inserting the following:
                    ``(C) reasonably demonstrates that the waiver will 
                improve instruction for students, advance student 
                academic achievement, and contribute to student mastery 
                of knowledge and skills, consistent with the State's 
                college and career ready academic content standards and 
                student academic achievement standards;
                    ``(D) describes the methods the State educational 
                agency, local educational agency, or Indian tribe will 
                use to--
                            ``(i) monitor the effectiveness of the 
                        implementation of the plan; and
                            ``(ii) assure regular evaluation and 
                        continuous improvement of the plan;
                    ``(E) as applicable to the waiver request--
                            ``(i) describes the State educational 
                        agency, local educational agency, or Indian 
                        tribe's process for making valid and meaningful 
                        accountability determinations, based on student 
                        academic achievement, to review the success of 
                        schools and local educational agencies or 
                        Indian tribes in implementing the State's 
                        college and career ready academic content 
                        standards and student academic achievement 
                        standards;
                            ``(ii) describes the State educational 
                        agency, local educational agency, or Indian 
                        tribe's process for accurately and meaningfully 
                        identifying, supporting, and intervening in 
                        underperforming schools, consistent with 
                        applicable State or local policy; and
                            ``(iii) includes information on how the 
                        State educational agency, local educational 
                        agency, or Indian tribe will maintain and 
                        improve transparency in reporting to parents 
                        and the public on student achievement and 
                        school performance, including the achievement 
                        of students according to the student subgroups 
                        described in subclauses (I) through (IV) of 
                        section 1111(b)(2)(B)(viii); and'';
                    (B) in paragraph (2)(B)(i)(II), by striking ``(on 
                behalf of, and based on the requests of, local 
                educational agencies)'' and inserting ``(on their own 
                behalf, or on behalf of, and based on the requests of, 
                local educational agencies in the State)'';
                    (C) in paragraph (3)(A), in the matter preceding 
                clause (i), by inserting ``or on behalf of local 
                educational agencies in the State,'' after ``acting on 
                its own behalf,''; and
                    (D) by adding at the end the following:
            ``(4) Peer review.--
                    ``(A) Peer review team.--
                            ``(i) In general.--The Secretary shall 
                        establish multi-disciplinary peer review teams 
                        and appoint members to such teams, including 
                        persons who have experience with a State 
                        educational agency (or local educational agency 
                        or Indian tribe, as appropriate) and broader 
                        education reform experience, to review waiver 
                        requests under this section if--
                                    ``(I) the Secretary requests such 
                                input in order to approve a waiver 
                                request; or
                                    ``(II) the Secretary intends to 
                                disapprove a request.
                            ``(ii) Team in place for all waiver 
                        requests.--The Secretary may, at the 
                        Secretary's discretion, have a peer review team 
                        review all waiver requests submitted under this 
                        section.
                    ``(B) Applicability.--The Secretary may approve a 
                waiver request under this section without conducting a 
                peer review of the request, but shall use the peer 
                review process under this paragraph before disapproving 
                such a request.
                    ``(C) Purpose of peer review.--The peer review 
                process shall be designed to--
                            ``(i) promote effective implementation of 
                        State-developed college and career ready 
                        academic content standards and student academic 
                        achievement standards, through State and local 
                        innovation; and
                            ``(ii) provide transparent feedback to 
                        State educational agencies, local educational 
                        agencies, or Indian tribes, designed to 
                        strengthen the applicant's plan described under 
                        paragraph (1)(C).
                    ``(D) Standard and nature of review.--Peer 
                reviewers shall conduct a good faith review of waiver 
                requests submitted to them under this section. Peer 
                reviewers shall review such waiver requests--
                            ``(i) in their totality;
                            ``(ii) in deference to State and local 
                        judgment; and
                            ``(iii) with the goal of promoting State- 
                        and local-led innovation.
            ``(5) Waiver determination, demonstration, and revision.--
                    ``(A) In general.--The Secretary shall approve a 
                waiver request not more than 90 days after the date on 
                which such request is submitted, unless the Secretary 
                determines and demonstrates that--
                            ``(i) the waiver request does not meet the 
                        requirements of this section;
                            ``(ii) the waiver is not permitted under 
                        subsection (c);
                            ``(iii) the plan that is required under 
                        paragraph (1)(C), and reviewed with deference 
                        to State and local judgment, provides no 
                        reasonable basis to determine that a waiver 
                        will enhance student academic achievement; or
                            ``(iv) the waiver request does not provide 
                        for adequate evaluation to ensure review and 
                        continuous improvement of the plan, consistent 
                        with paragraph (1)(D).
                    ``(B) Waiver determination and revision.--If the 
                Secretary determines and demonstrates that the waiver 
                request does not meet the requirements of this section, 
                the Secretary shall--
                            ``(i) immediately--
                                    ``(I) notify the State educational 
                                agency, local educational agency, or 
                                Indian tribe of such determination; and
                                    ``(II) at the request of the State 
                                educational agency, local educational 
                                agency, or Indian tribe, provide 
                                detailed reasons for such determination 
                                in writing;
                            ``(ii) offer the State educational agency, 
                        local educational agency, or Indian tribe an 
                        opportunity to revise and resubmit the waiver 
                        request not more than 60 days after the date of 
                        such determination; and
                            ``(iii) if the Secretary determines that 
                        the resubmission does not meet the requirements 
                        of this section, at the request of the State 
                        educational agency, local educational agency, 
                        or Indian tribe, conduct a public hearing not 
                        more than 30 days after the date of such 
                        resubmission.
                    ``(C) Waiver disapproval.--The Secretary may 
                disapprove a waiver request if--
                            ``(i) the State educational agency, local 
                        educational agency, or Indian tribe has been 
                        notified and offered an opportunity to revise 
                        and resubmit the waiver request, as described 
                        under clauses (i) and (ii) of subparagraph (B); 
                        and
                            ``(ii) the State educational agency, local 
                        educational agency, or Indian tribe--
                                    ``(I) does not revise and resubmit 
                                the waiver request; or
                                    ``(II) revises and resubmits the 
                                waiver request, and the Secretary 
                                determines that such waiver request 
                                does not meet the requirements of this 
                                section after a hearing conducted under 
                                subparagraph (B)(iii).
                    ``(D) External conditions.--The Secretary shall not 
                disapprove a waiver request under this section based on 
                conditions outside the scope of the waiver request.'';
            (3) in subsection (d)--
                    (A) in the heading, by adding ``; Limitations'' 
                after ``Duration and Extension of Waiver''; and
                    (B) by adding at the end the following:
            ``(3) Specific limitations.--The Secretary shall not 
        require a State educational agency, local educational agency, 
        or Indian tribe, as a condition of approval of a waiver 
        request, to--
                    ``(A) include in, or delete from, such request, 
                specific academic content standards or academic 
                achievement standards;
                    ``(B) use specific academic assessment instruments 
                or items; or
                    ``(C) include in, or delete from, such waiver 
                request any criterion that specifies, defines, or 
                prescribes the standards or measures that a State or 
                local educational agency uses to establish, implement, 
                or improve--
                            ``(i) State academic content standards or 
                        academic achievement standards;
                            ``(ii) assessments;
                            ``(iii) State accountability systems;
                            ``(iv) systems that measure student growth;
                            ``(v) measures of other academic 
                        indicators; or
                            ``(vi) teacher and principal evaluation 
                        systems.'';
            (4) in subsection (e)--
                    (A) in paragraph (1)--
                            (i) by striking the heading and inserting 
                        ``Waiver reports'';
                            (ii) in the matter preceding subparagraph 
                        (A)--
                                    (I) by striking ``local educational 
                                agency that receives'' and inserting 
                                ``State educational agency, local 
                                educational agency, or Indian tribe 
                                that receives''; and
                                    (II) by striking ``submit a report 
                                to the State educational agency that'' 
                                and inserting ``submit a report to the 
                                Secretary that'';
                    (B) by striking paragraphs (2) and (3);
                    (C) by redesignating paragraph (4) as paragraph 
                (2); and
                    (D) in paragraph (2), (as redesignated by 
                subparagraph (C)), by striking ``Beginning in fiscal 
                year 2002 and for each subsequent year, the Secretary 
                shall submit to the Committee'' and inserting ``The 
                Secretary shall annually submit to the Committee''; and
            (5) in subsection (f), by inserting ``and the recipient of 
        the waiver has failed to make revisions needed to carry out the 
        purpose of the waiver,'' after ``has been inadequate to justify 
        a continuation of the waiver''."""

In [14]:
summarizer = SimCLS(generator_type=GeneratorType.Pegasus,
                    generator_path="google/pegasus-billsum",
                    scorer_path="andrejmiscic/simcls-scorer-billsum")

Downloading:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.12G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/88.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/701 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/476M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.31k [00:00<?, ?B/s]

In [15]:
summary = summarizer(document)

  "Passing `max_length` to BeamSearchScorer is deprecated and has no effect."


In [16]:
summary

"State Innovation pilot Act of 2011 - Amends the Elementary and Secondary Education Act of 1965, as amended by the No Child Left Behind Act of 2001, to allow states, local educational agencies, or Indian tribes that receive funds under the Act to request to waive any statutory or regulatory requirement of such Act.<n><n>Requires any request to include a plan that will improve instruction for students, advance student academic achievement, and contribute to student mastery of knowledge and skills, consistent with the state's college and career ready academic content standards and student academic accomplishment standards.<n>satisfying certain requirements with respect to such waivers, such as requiring the state, local Educational Agency, or tribe to review the success of schools and LEAs in implementing the state-developed college and careers ready academic contents and achievement standards and their process for accurately and meaningfully identifying, supporting, and intervening in u