Text-Summarization

This repo contains text summarization use cases

Abstractive Text Summarization:

Approaches for Abstractive Text Summarization:

Seq2Seq model with Bidirectional LSTM and attention mechanism
Transformer based architecture(e.g - Google's BERT, Mocrosoft's UniLM etc.)

I have tried out the first approach which consists of a architecure based on Seq2Seq model with Bidirectional LSTM and attention mechanism

Steps followed:

Read data into Python and Data cleansing steps
Saves the clened model in a pickle format for further usage
Data Preparation for training which consists of tokenizing the stories and summaries (highlights)
Apply GLoVe pretraied embedding
Built custom layer for attention
Build model acthitecure for training
Build inferencing logic
Model training
Model prediction logic (here using the actual summary to get the prediction - can be updated to give prediction based on test set)

Notes: The code is in a working condition. However, I could not tune the hyperparameters not I could train it for more epochs to get better accuracy result. If GPU is available we can retrain the model.

Due to system constraint I have used only stories that have sequence lenth <= 1200. This comes around 577 article.

Due to system constraint cound not tune the hyeperparamers

Due to system constraint could not play around with the model atchitecture

I have not implemented model evaluation metrics (ROUGE) as of now.

System Requirement/Dependencies:

TF 2.2.x
Python 3.7
GloVe

Extractive Text Summarization :

Approaches for Abstractive Text Summarization:

Pretrained models such as BERT, GPT-2, XLNet etc. We can apply these through 'bert-extractive-summarizer'
Fine tuning of transformer based architecture. However, these would requird good comupational infrastucture
Unsupervised Approaches (TextRank)

I have tried out the 3rd approach which consists of a graph based reprentation of documents and then applyying TextRank algo to find the important sentences

Steps followed:

Read data into Python and Data cleansing steps
Select a story for further steps (in step 1.3 of the code)
Apply TF-IDF reprentation
Apply Graph reprentation
Apply TextRank Algorithm
Define a threshold
Select top n sentences (based on importance) and create summary

Note:

The code is not highly automated. We need to manually run the prograam step by step. It requires manually selecting the story (in section 1.3 of the code) for further processing
Then run remaining portion of the code to get the summary

System Requirement/Dependencies:

Python 3.7
NLTK
networkx

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
Abstractive_text_summarization.ipynb		Abstractive_text_summarization.ipynb
Extractive_Text_Summarization_Graph_TextRank.ipynb		Extractive_Text_Summarization_Graph_TextRank.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Summarization

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text-Summarization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages