News Headline generator

As the title explains, I have build a model which generates headlines for a news, this model is 78% accurate and I aim to improve this in future, now lets walk through the code.

Dataset

We have used the dataset called news_summary.csv(from kaggle)

The dataset consists of 4515 examples and contains Author_name, Headlines, Url of Article, Short text, Complete Article. I gathered the summarized news from Inshorts and only scraped the news articles from Hindu, Indian times and Guardian. Time period ranges from febrauary to august 2017.

Lets Look at the head of the dataset

Data Cleaning and preprocessing

In the notebook title data_cleaning.ipynb we have done preprocessing on news and the headlines to by removing unwanted things from the text that do not contribute to reach the goal. Please check out the notebook for further details

Data modeling

Please check out notebook titled data_modeling.ipynb

Lets look at the model

We have used attention layer in this,

The Intuition behind the Attention Mechanism How much attention do we need to pay to every word in the input sequence for generating a word at timestep t? That’s the key intuition behind this attention mechanism concept.

Let’s consider a simple example to understand how Attention Mechanism works:

Source sequence: “Which sport do you like the most? Target sequence: “I love cricket” The first word ‘I’ in the target sequence is connected to the fourth word ‘you’ in the source sequence, right? Similarly, the second-word ‘love’ in the target sequence is associated with the fifth word ‘like’ in the source sequence.

So, instead of looking at all the words in the source sequence, we can increase the importance of specific parts of the source sequence that result in the target sequence. This is the basic idea behind the attention mechanism.

Lets look at some examples

Further steps

Now we know from the examples that the model is not that accurate, so we need to improve this further

Future steps to imporve the model:

Increase the dataset
Hypertuning the parameters
Change the encoder structure
Train for longer time

Author: Smit Shah 😃

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
__pycache__		__pycache__
dataset		dataset
images		images
.gitignore		.gitignore
README.md		README.md
attention_my.py		attention_my.py
data_cleaning.ipynb		data_cleaning.ipynb
data_modeling.ipynb		data_modeling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Headline generator

About

Releases

Packages

Languages

Smit1400/HeadlineHunch

Folders and files

Latest commit

History

Repository files navigation

News Headline generator

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages