Skip to content

Marjan-Hosseini/LDAVAE

Repository files navigation

LDAVAE

is an automated fake news detection method that incorporates two types of features from two models:

1. VAE: Deep neural embedding.

  • Motivation: Efficiency
  • Provides a lower dimensional semantic representation of the news article.
  • Provides a lower dimensional semantic representation of the news article.

2. LDA: Probabilistic topic modelling.

  • Motivation: Interpretibility
  • Provides topic-based features
  • Provides a lower dimensional semantic representation of the news article.

Contribution:

(1) We provide interpretability by incorporating Bayesian topic modelling and inferring topic compositions in news articles as added features for classification.

(2) Our model works in the data scarcity scenario where only textual content is available.

(3) We keep our model efficient by coupling a deep architecture (VAE) to LDA.


LDAVAE Notations:

  • $\mathcal{D}$: dataset,
  • $\mathcal{D}_{tr}$: training set,
  • $\mathcal{D}_{te}$: test set
  • $N$: Number of samples, indexed by $i$.
  • $V$: The set of vocabulary detected by word2vec.
  • $n_f$: Number of latent features obtained from encoder.
  • $w$: word2vec dimension.
  • $L = \max {l_i:i = 1,\dots, N }$
  • $l_i$: Length of sample $i$ (number of words).
  • $t_i^{(j)}$: Word $j$ in sample $i$.
  • $\lambda_1$: Regularization parameter ($=0.05$).
  • $\lambda_2$: Regularization parameter ($=0.3$).
  • $K$: Number of topics.

VAE:


VAE Structure:

Encoder:

Layer Output Shape Param # Other Setting
Input [(None, $L$)] 0
Embedding (None, $L$, $w$) $|V| \times w $ Non-trainable (word2vec)
Bi. LSTM (None, $L$, $2n_f$) $8n_f(w+n_f+1) $ activation='tanh'
Bi. LSTM (None, $2n_f$) $8n_f(3n_f+1)$ activation='tanh'
Dense (None, $n_f$) $n_f(2n_f+1)$ activation='tanh'
Dense ($h$) (None, $n_f$) $n_f(n_f+1)$ activation='tanh'
Sampling (None, $n_f$) 0 $= \mu_h + \epsilon \exp(\sigma^2_h/2) $

Decoder:

Layer Output Shape Param # Other Setting
Input [(None, $n_f$)] 0
Dense (None, $n_f$) $n_f(n_f+1)$ activation='tanh'
Repeat Vector (None, $L$, $n_f$) 0
LSTM (None, $L$, $n_f$) $4n_f(2n_f+1)$ activation='tanh'
LSTM (None, $L$, $n_f$) $4n_f(2n_f+1)$ activation='tanh'
Time Dist. (None, $L$, $|V|$) $|V|(n_f+1)$ activation='softmax'

Classifier:

Layer Output Shape Param # Other Setting
Input [(None, $n_f$)] 0
Dense (None, $2n_f$) $2n_f(n_f+1)$ activation='tanh'
Dense (None, $n_f$) $n_f(2n_f+1)$ activation='tanh'
Output (None, 1) $n_f+1$ activation='sigmoid'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published