Misinformation-textAnalysis

Misinformation, Fake News and Propaganda

Introduction

Disinformation, defined as the subset of 'misinformation' where there is intent to mislead, has seen a an astronomical rise in both its success in terms of spread and impact as well in the effort to combat it. Though known state-backed disinformation campaigns date back to at least the cold war era, they have perhaps caught the eye of the public after the 2016 US presidential election (Allcot et al., 2017). Later in 2016 disinformation is claimed to have been influential in the 'Brexit' referendum, and later in 2018 a similar development was suspected during the Brazilian presidential elections. As this was unfolding an explosive growth in so called 'fact-checking' organisation and their cooperation with news agencies, social media companies (e.g. facebook/meta) and governments can be seen. Fact-checking organization, often volunteer based or financed through charity, their capacity tend to be outpaced by the sheer volume of suspected disinformation content.

One supposed solution for this is using AI to automate classification of new articles or posts based on its linguistic aspects. Though improvements are being made here, the most accurate models are strongly dependent on meta data such as publication network which are often not available when shared through social media, and have other downsides such as disproportionately high false positive rates when publication networks that have shared disinformation were to post content without disinformation.

An alternative 'intermediate' solution is offered in this project. Instead of classification we aim to discover the topics that are present in Russian propaganda. By doing this we can streamline fact-checking by establishing a basis on which already fact-checked disinformation and propaganda can be matched with with newly published unchecked articles and posts. We use Latent Dirichlet topic modelling (LDA) in order to create our mixed membership model. We use LDA because it is an algorithm that uses a three level hierarchical Bayesian model in which each item of a collection is modeled as a finite mixture over an underlying set of topics (Blei et al., 2001).

Project scope

This projects restricts itself to pro-Kremlin disinformation. This is specified as 'pro-Kremlin' as direct ties with the Russian Internet Research Agency (IRA) and official backing of the Kremlin are perhaps expected but are not verified with hard proof. The scoped will also be limited to a preliminary test of the theoretical possibility and validity of the LDA mixed membership modelling without going into the practical application of the results.

About the data:

The disinformation texts were collected by the EUvsDisinfo project. A project started in 2015 that identifies and fact checks disinformation cases originating from pro-Kremlin media that are spread across the EU. More information about this project can be found here: https://euvsdisinfo.eu/ The dataset collected from EUvsDisinfo runs from 2015 to 2019, and can be found here: https://www.kaggle.com/datasets/stevenpeutz/misinformation-fake-news-text-dataset-79k

Packages & Libraries

Required packages & libraries

packages <- c("textstem","tokenizers","tidytext","dplyr","stringr","corpus","tidyverse","stopwords","SnowballC","tidyr","topicmodels","ldatuning","wordcloud","stm","Rtsne","ggrepel", "knitr")
install.packages(setdiff(packages, rownames(installed.packages())))

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
LICENSE		LICENSE
README.md		README.md
RussianMisinfoLSA+LDA.ipynb		RussianMisinfoLSA+LDA.ipynb
notebook5topicsC.Rmd		notebook5topicsC.Rmd
notebook5topicsC.html		notebook5topicsC.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

LICENSE

LICENSE

README.md

README.md

RussianMisinfoLSA+LDA.ipynb

RussianMisinfoLSA+LDA.ipynb

notebook5topicsC.Rmd

notebook5topicsC.Rmd

notebook5topicsC.html

notebook5topicsC.html

Repository files navigation

Misinformation-textAnalysis

Introduction

Project scope

About the data:

Packages & Libraries

About

Releases

Packages

Contributors 2

Languages

License

StevenPeutz/Disinformation-NLP-R-project

Folders and files

Latest commit

History

Repository files navigation

Misinformation-textAnalysis

Introduction

Project scope

About the data:

Packages & Libraries

About

Topics

Resources

License

Stars

Watchers

Forks

Languages