This project is a web-based application for automatic text summarization using pre-trained models. It leverages the power of machine learning to generate concise summaries of given text inputs. The summarization model used in this project is based on the transformers library by Hugging Face.
- Python 3.x
- PyTorch
- Transformers
- NLTK
You can install the required Python packages using pip:
pip install torch transformers nltk
Additionally, NLTK requires some additional downloads. You can download these resources by running the following commands in a Python environment:
import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')
To use this application, follow these steps:
-
Clone this repository to your local machine.
-
Install the required dependencies as mentioned above.
-
Run the web application script.
-
Access the web application through your browser.
The core functionality of this application lies in the get_summary
function defined in summarizer.py
. This function takes a text input and generates a summary using a pre-trained language model.
The steps involved in summarization are as follows:
- Tokenize the input text into sentences using NLTK's sentence tokenizer.
- Preprocess the text and append necessary tokens for summarization.
- Encode the tokenized text using the pre-trained tokenizer.
- Generate the summary using the pre-trained language model.
- Decode the generated summary and return the result.