Skip to content

Pareek-Yash/Short-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Short-summarizer

The goal of this repo is to compare the results of summarization between facebook/bart-large-cnn using transformers pipelines API and training a custom summarizer using a news dataset available on TensorFlow.

Overview

Text summarization is an important task in natural language processing, which involves condensing a piece of text into a shorter version, retaining the key information. This repository contains two approaches to text summarization:

  1. Using a pre-trained model facebook/bart-large-cnn with Hugging Face's Transformers library.
  2. Training a custom summarization model using a news dataset available through TensorFlow Datasets.

***************** Work in Progress *****************

Getting Started

Prerequisites

  • Python 3.6 or later
  • TensorFlow 2.x
  • Transformers
  • Requests (for fetching text from a remote source)

Install the required libraries using pip:

pip install tensorflow transformers requests

Directory Structure

short-summarizer/
│
├── pre_trained_summarizer/
│   ├── pre_trained_summarizer.py    # script for summarization using pre-trained model
│   
├── custom_summarizer/
│   ├── data_preprocessing.py        # script for data preprocessing
│   ├── model.py                     # script defining the custom summarizer model
│   ├── train.py                     # script for training the custom summarizer
│   ├── evaluate.py                  # script for evaluating the custom summarizer
│   
└── README.md

Usage

Pre-trained Summarizer

Navigate to the pre_trained_summarizer directory.

To summarize text using the facebook/bart-large-cnn pre-trained model, run:

python pre_trained_summarizer.py --url <URL_OF_TEXT_FILE>

Replace <URL_OF_TEXT_FILE> with the URL of the text file you want to summarize.

Custom Summarizer

Navigate to the custom_summarizer directory.

  1. Data Preprocessing:

    Run data_preprocessing.py to download and preprocess the dataset:

    python data_preprocessing.py
  2. Training:

    Run train.py to train the custom summarization model:

    python train.py
  3. Evaluation:

    After training the model, use evaluate.py to evaluate it on test data:

    python evaluate.py

Comparing Results

After generating summaries using both approaches, you can manually compare the quality of the summaries by reading them. Additionally, you can compute ROUGE scores to quantitatively measure the performance of the summarizers.

Contributing

Contributions are welcome! Please read the contribution guidelines first.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages