Fake-News-Article

Some fake articles have relatively frequent use of terms seemingly intended to inspire outrage and the present writing skill in such articles is generally considerably lesser than in standard news.
Detecting fake news articles by analyzing patterns in writing of the articles.
Made using fine tuning BERT
With an Accuarcy of 80% on the custom dataset.

Installation

All the code required to get started.

Clone

Clone this repo to your local machine using https://github.com/abhilashreddys/Fake-News-Article.git

Setup

Install these libraries/packages.

$ pip3 install pandas numpy scikit-learn bs4
$ pip3 install torch
$ pip3 install keras
$ pip3 install pytorch_pretrained_bert
$ pip3 install transformers

Dataset

Data is collected by scraping the websites of popular news publishing sources.
The collected news articles are judged using the score, quality, bias as metric collected from Politilact and Media Charts.
Some basic preprocessing is also done on the text collected from scraping websites.

Preprocessing

Used BeautifulSoup for scraping articles from the web, Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping
Also used some custom made functions for removing punctuation etc.

scraping from websites listed in politifact_data.csv

$ python3 scrape_politifact.py

scraping from websites listed in Interactive Media Bias Chart - Ad Fontes Media.csv

$ python3 scrape_media.py

Data after scraping and preprocessing politifact_text.csv , pre_media.csv

Model

Trained by fine tuning the BERT
Used BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding with fine tuning
BERT, which stands for Bidirectional Encoder Representations from Transformers.
BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering andlanguage inference, without substantial taskspecific architecture modifications.

class BertBinaryClassifier(nn.Module):
    def __init__(self, dropout=0.1):
        super(BertBinaryClassifier, self).__init__()
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.dropout = nn.Dropout(dropout)
        self.linear = nn.Linear(768, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, tokens, masks=None):
        _, pooled_output = self.bert(tokens, attention_mask=masks, output_all_encoded_layers=False)
        dropout_output = self.dropout(pooled_output)
        linear_output = self.linear(dropout_output)
        proba = self.sigmoid(linear_output)
        return proba

Weights

Download here : Link

Inference

Run inference.py and mention url of the article you want to test in comand line

$ python3 inference.py url

Cautions & Suggestions

Check the file locations properly, change it if required.
If you face any problems with script files use notebooks transfrom_spam.ipynb for training and fake_article.ipynb for inference.
Trained only for 5 Epochs, trying to use a better model with more data.

References

@article{Wolf2019HuggingFacesTS,
  title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
  author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.03771}
}

@article{devlin2018bert,
  title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Interactive Media Bias Chart - Ad Fontes Media.csv		Interactive Media Bias Chart - Ad Fontes Media.csv
LICENSE		LICENSE
README.md		README.md
fake_article.ipynb		fake_article.ipynb
inference.py		inference.py
politifact_data.csv		politifact_data.csv
politifact_text.csv		politifact_text.csv
pre_media.csv		pre_media.csv
scrape_media.py		scrape_media.py
scrape_politifact.py		scrape_politifact.py
train.py		train.py
transfrom_spam.ipynb		transfrom_spam.ipynb

License

abhilashreddys/Fake-News-Article

Folders and files

Latest commit

History

Repository files navigation

Fake-News-Article

Installation

Clone

Setup

Dataset

Preprocessing

Model

Weights

Inference

Cautions & Suggestions

References

Other Implementaions

About

Topics

Resources

License

Stars

Watchers

Forks

Languages