# Topic Modelling
Plan:
- Brief introduction to tagging software for exam projects
  - Including text classification and sequence labeling
  - Discuss results obtained in the guide and any potential issues (short)
- **Examine how topic models are trained**
- Examining topics of topic models
- Application of topic models
- (15-30 minutes guest talk)

---
# The general Idea
<img src="pics/general_idea.png" width="500"/>

So assuming we have some data we will need a generative model - i.e. our assumptions about the data generating process. Let's start with the first simple ideas:

<img src="./pics/idea1.png" width="500"/>

<img src="./pics/idea2.png" width="600"/>

## What we want
<img src="./pics/what_we_want.png" width="600"/>


# Directed Acylic Graph 🎉
Or graphical model

<img src="./pics/dag2.png" width="600"/>

- Nodes are random variables; edges indicate dependence.
- Shaded nodes are observed; unshaded nodes are hidden.
- Plates indicate replicated variables.

(You will get more on this next semester)


**Where:**

$\theta$ is the topic distribution for the document $M$

$N$ is number of words

$w$ is a word

$z$ is a topic

$\alpha$ and $\beta$ is a hyperparameter



## Alternate Explanation

LDA assumes the following generative process for each document $\mathbf{w}$ in a corpus $D$:

1. Choose $N \sim$ Poisson$(\xi)$ (i.e. choose number of words in the text)
2. Choose $\theta \sim \operatorname{Dir}(\alpha)$ (i.e. choose your distribution of topics)
3. For each of the $N$ words $w_{n}$:

    (a) Choose a topic $z_{n} \sim$ Multinomial$(\theta)$.

    (b) then choose a word $w_{n}$ from $p\left(w_{n} \mid z_{n}, \beta\right),$ a multinomial probability conditioned on the topic $z_{n}$

---
# Infer the Hidden Parameters
We will not do this (😢), but you will learn how you could do this next semester in cognitive modelling.

but here is some algorithms to approximate posteriors:
- **Mean field variational methods (Blei et al., 2001,2003)** <-- original
- Expectation propagation (Minka and Lafferty, 2002)
- **Collapsed Gibbs sampling (Griffiths and Steyvers, 2002)** <-- most similar to what you will do next semester
- Distributed sampling (Newman et al., 2008; Ahmed et al., 2012)
- Collapsed variational inference (Teh et al., 2006)
- **Online variational inference (Hoffman et al., 2010)** <-- most similar to what gensim does
- Factorization based inference (Arora et al., 2012; Anandkumar et al., 2012)

# What about those Hyperparameters?
<img src="./pics/a1.png" width="300"/>
<img src="./pics/a2.png" width="300"/>
<img src="./pics/a3.png" width="300"/>

x-axis is topic. Each image is a document. Thus higher $\alpha$ leads to more uniform distribution and lower $\alpha$ leads to each document consisting of only few topics

$\beta$ is similar but for words to topics instead of topics to documents. i.e. lower $\beta$ leads to topics with only a few dominant words.


---
Plan:
- Brief introduction to tagging software for exam projects
  - Including text classification and sequence labeling
  - Discuss results obtained in the guide and any potential issues (short)
- Examine how topic models are trained
- **Examining topics of topic models**
- Application of topic models

---
##  Examining and Interpreting Topics

Visualisation using LDAvis ([ref](https://ldavis.cpsievert.me/reviews/vis/#topic=6&lambda=0.6&term=)) 


<img src="./pics/topics.png" width="700"/>



---
Plan:
- Brief introduction to tagging software for exam projects
  - Including text classification and sequence labeling
  - Discuss results obtained in the guide and any potential issues (short)
- Examine how topic models are trained
- Examining topics of topic models
- **Application of topic models**
---
##  Application
What could you imagine are some applications of topic models?

```
if Studygroup.size > 3:
    group_1, group_2 = Studygroup.split()
    group_1.discuss(question)
    group_2.discuss(question)
else:
    Studygroup.discuss(question)
```

<!--
- Information retrieval (recommender system)
- Corpus overview
- Input in downstream task
    - classification
    - q&a
    - chatbots
- Quality check (404 errors theme in online corpora)
- Spam filters
- Clustering of bioinformatics data
- Large scale text analysis (e.g. exploring the great unread - [ref](https://www.sciencedirect.com/science/article/pii/S0304422X13000648))

## Example 1:  Change in topic over time
In topic usage:

<img src="./pics/topic_over_time.png" width="700"/>

And in a topic itself:

<img src="./pics/tot1.png" width="700"/>

Simply by allowing the beta to vary accross time:

<img src="./pics/tot_model.png" width="700"/>

# Example 2: Scholarly impact

<img src="./pics/impact_model.png" width="400"/>

<img src="./pics/impact1.png" width="700"/>

-->

In [3]:
# exercise 0: Discuss with group - could you and how could you use topic modelling in your project? 
# exercise 1: what does it do when you change the hyperparameter of X
# exercise 2: What is perplexity? why it is a good selection criteria?