*Note: This message is Jupyter Notebook. You can [download it](https://github.com/bayespy/bayespy-notebooks/blob/master/notebooks/issue47.ipynb) or [run it interactively](http://mybinder.org/repo/bayespy/bayespy-notebooks/notebooks/issue47.ipynb).*

Ok, I now sketched an implementation of the Dirichlet Persona Model. You should double check that this is what you wanted, I'm not absolutely sure. I think I made at least one minor change: persona distribution is global, not document/movie specific.

Anyway, define the configuration

In [None]:
import bayespy as bp
import numpy as np

numTopics = 10      # number of topics 
numPersonas = 4     # protagonist, villain, ...
numRoles = 3        # agent verb, patient verb, attribute
sizeVocabulary = 50 # size of vocabulary
#numDocuments = 8   # number of documents (not used now)
numCharacters = 15  # total number of characters
sizeCorpus = 10000  # size of the dataset

In [None]:
# Generate random dataset from the model
# Data are a set of tuples (word, role, character)
# So, each "datapoint" has a word-index, role-index and character-index.
data_characters = bp.nodes.Categorical(
    np.ones(numCharacters) / numCharacters,
    plates=(sizeCorpus,)
).random()
data_roles = bp.nodes.Categorical(
    np.ones(numRoles) / numRoles,
    plates=(sizeCorpus,)
).random()
data_personas = bp.nodes.Categorical(
    np.ones(numPersonas) / numPersonas,
    plates=(numCharacters,)
).random()
data_topic_dist = bp.nodes.Dirichlet(
    np.ones(numTopics),
    plates=(numPersonas, numRoles)
).random()
data_topics = bp.nodes.Categorical(
    data_topic_dist[data_personas[data_characters], data_roles]
).random()
data_word_dist = bp.nodes.Dirichlet(
    np.ones(sizeVocabulary) / sizeVocabulary,
    plates=(numTopics,)
).random()
data_words = bp.nodes.Categorical(
    data_word_dist[data_topics],
    plates=(sizeCorpus,)
).random()

Below is the model:

In [None]:
# Word distribution for each topic
# (numTopics) x (numWords)
word_dist_in_topics = bp.nodes.Dirichlet(
    np.ones(sizeVocabulary),
    plates=(numTopics,)
)

# Topic distribution for each role and persona
# (numPersonas, numRoles) x (numTopics)
topic_dist_in_personas_and_roles = bp.nodes.Dirichlet(
    np.ones(numTopics),
    plates=(numPersonas, numRoles)
)

# Persona distribution (make this document specific?)
persona_dist = bp.nodes.Dirichlet(
    np.ones(numPersonas)
)

# Persona assignments of the characters
# (numCharacters) x (numPersonas)
personas_of_characters = bp.nodes.Categorical(
    persona_dist,
    plates=(numCharacters,)
)

# Persona assignments for each data point (i.e., each word in the corpus)
# (sizeCorpus) x (numPersonas)
personas = bp.nodes.Gate(
    data_characters,
    personas_of_characters
)

# Topic assignment for each data point (i.e., each word in the corpus)
# (sizeCorpus) x (numTopics)
topics = bp.nodes.Categorical(
    bp.nodes.Gate(
        personas,
        bp.nodes.Gate(
            data_roles[:,None], # a trick to make plates match in this case
            topic_dist_in_personas_and_roles
        )
    )
)

# Words in the corpus
# (sizeCorpus) x (sizeVocabulary)
words = bp.nodes.Categorical(
    bp.nodes.Gate(
        topics,
        word_dist_in_topics
    )
)

Create VB object, initialize some nodes randomly and observe the data. Note that characters and roles data were used as "inputs" in the above model.

In [None]:
Q = bp.inference.VB(
    words,
    word_dist_in_topics,
    topics,
    topic_dist_in_personas_and_roles,
    personas_of_characters,
    persona_dist,
)
topics.initialize_from_random()
personas_of_characters.initialize_from_random()
topic_dist_in_personas_and_roles.initialize_from_random()
persona_dist.initialize_from_random()
words.observe(data_words)

Run inference:

In [None]:
Q.update(repeat=1000)

You can visualize the posterior of the nodes for instance as:

In [None]:
%matplotlib notebook
bp.plot.plt.figure(); bp.plot.hinton(personas_of_characters)
bp.plot.plt.figure(); bp.plot.hinton(word_dist_in_topics)

I hope this helps!