Part 1 of our Major academic project on Automatic Text Summarization
Text Summarization Algorithms

A Comparative Study

Developed as a part of our Semester 7 Major Project, this repository contains scripts and code to run and test the performance of popular text summarization algorithms. The algorithms studied are:


For our experiments, the Opinosis dataset was used. It can be obtained here

 title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
 author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
 booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
 organization={Association for Computational Linguistics}

Performance Metric

To compare the relative performance of the algorithms, a simple implementation of ROGUE-1 metric in python was used.

Replicating project results

To imitate the results of our project, one may do the following:

  1. Clone this repository and ensure that the Opinosis Dataset is present. If not, download from the link above and extract into data/.

  2. Run the run-project script.

    $ sh +x

    This script will clean the dataset, extract keywords, run the algorithms on the dataset, and print their respective running times and ROGUE-1 scores.

    • Individual performances of each of the algorithms can be computed by simply first running the $algorithm/$ script, followed by running the rogue_one script with:
    $ python --gold data/summaries_keywords --test $algorithm/results


  • python 2.7+
  • nltk
  • sumy
  • networkx