Skip to content

capGoblin/Transformer-from-Scratch-Text-Summarizer

Repository files navigation

Transformer from Scratch for Text Summarization

Transformer built from scratch w/ Tensorflow w/o Hugging Face for Text Summarization (trained with news text) This Jupyter Notebook demonstrates the creation of a Transformer model from scratch using TensorFlow, without utilizing the Hugging Face library.
The purpose of the model is text summarization, where short news articles are condensed into a single-line summary.
The model is trained using news text data from Kaggle's news shorts dataset.

Preprocessing

The notebook performs the following preprocessing steps:

Tokenization: The news texts are tokenized into integer tokens for model input.
Padding/Truncating: Sequences are adjusted to ensure uniform sequence lengths.

Building the Model:

The Transformer model is constructed following these key components:

Scaled Dot Product: A crucial element of self-attention mechanisms.
Multi-Headed Attention: Enhances model capacity to attend to different parts of the input.
Feed Forward Network: A core building block of both the Transformer encoder and decoder.
Encoder Layer: The fundamental unit of the Transformer encoder.
Decoder Layer: The fundamental unit of the Transformer decoder.
Encoder: Comprises multiple Encoder Layers.
Decoder: Comprises multiple Decoder Layers.
Transformer: Bringing together the Encoder and Decoder components.

Hyperparameters:

The model is trained with the following hyperparameters:

Number of Layers: 4 Model Dimension (d_model): 128 Hidden Units: 512 Number of Heads: 8 Epochs: 5

Training and Evaluation:

During the training process, the notebook saves checkpoint files at the end of each epoch. To evaluate the model, a summarization example is provided:

Original News: "A historic achievement has been made in the realm of space exploration. Astronomers have detected the presence of an Earth-like planet orbiting a distant star within the habitable zone. This exciting discovery raises the possibility of finding extraterrestrial life and provides valuable insights into the existence of other habitable worlds beyond our own. Scientists are now planning detailed observations and future missions to explore this intriguing exoplanet further. The discovery marks a significant milestone in our quest to unravel the mysteries of the universe and understand our place in the cosmos."

Generated Summary: "newly made new space detected for earth like planet"

transformer.summary()

Model: "transformer_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 encoder_15 (Encoder)        multiple                  10567424  
                                                                 
 decoder_15 (Decoder)        multiple                  4854912   
                                                                 
 dense_4911 (Dense)          multiple                  3826269   
                                                                 
=================================================================
Total params: 19,248,605
Trainable params: 19,248,605
Non-trainable params: 0

About

Transformer built from scratch w/ Tensorflow w/o Hugging Face for Text Summarization (trained with news text)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published