Part 1 of our Major academic project on Automatic Text Summarization
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
lexrank
lsa
textrank
.gitignore
README.md
compute_rogue_scores.sh
filter_data.sh
rogue_one.py
rogue_one_human.py
run_project.sh

README.md

Text Summarization Algorithms

A Comparative Study

Developed as a part of our Semester 7 Major Project, this repository contains scripts and code to run and test the performance of popular text summarization algorithms. The algorithms studied are:

DataSet

For our experiments, the Opinosis dataset was used. It can be obtained here

@inproceedings{ganesan2010opinosis,
 title={Opinosis: a graph-based approach to abstractive summarization of highly redundant opinions},
 author={Ganesan, Kavita and Zhai, ChengXiang and Han, Jiawei},
 booktitle={Proceedings of the 23rd International Conference on Computational Linguistics},
 pages={340--348},
 year={2010},
 organization={Association for Computational Linguistics}
}

Performance Metric

To compare the relative performance of the algorithms, a simple implementation of ROGUE-1 metric in python was used.

Replicating project results

To imitate the results of our project, one may do the following:

  1. Clone this repository and ensure that the Opinosis Dataset is present. If not, download from the link above and extract into data/.

  2. Run the run-project script.

    $ sh +x run-project.sh

    This script will clean the dataset, extract keywords, run the algorithms on the dataset, and print their respective running times and ROGUE-1 scores.

    • Individual performances of each of the algorithms can be computed by simply first running the $algorithm/$algorithm.py script, followed by running the rogue_one script with:
    $ python rogue_one.py --gold data/summaries_keywords --test $algorithm/results

Dependencies

  • python 2.7+
  • nltk
  • sumy
  • networkx