#### Pair Problem

You are given documents as probability distributions over topics, and topics as probability distributions over words.

Implement a function `make_doc` that takes a document (as `topic_probs`) and a number of words. The function should randomly generate a document by choosing a topic for each word using the document's topic probabilities and then choosing a particular word using that topic's word probabilities. The function should return a string containing all the generated document's words.

For example:

```python
docs = [[0.98, 0.01, 0.01],
        [0.01, 0.98, 0.01],
        [0.01, 0.01, 0.98]]
topics = [[ 0.4,      0.4,   0.01,        0.01,    0.01,       0.01,
            0.1,     0.04,   0.01,        0.01],
          [0.01,     0.01,    0.4,         0.4,    0.01,       0.01,
            0.1,     0.04,   0.01,        0.01],
          [0.02,     0.02,   0.01,        0.01,     0.4,        0.4,
           0.02,      0.1,   0.01,        0.01]]
words =  ['cat', 'kitten',  'dog',     'puppy',  'deep', 'learning',
          'fur',  'image',  'GPU', 'asparagus']


def make_doc(topic_probs, n_words):
    raise NotImplementedError

for doc in docs:
    print make_doc(topic_probs=doc, n_words=10)

#  Example output:
## cat learning kitten image kitten cat deep image cat kitten
## puppy puppy learning dog puppy dog dog puppy image dog
## deep learning deep image deep deep deep deep learning learning
```

Extension:

Update your `make_doc` function so that if `topic_probs` isn't specified, it will draw a random set of topic probabilities from a Dirichlet distribution.


In [4]:
docs = [[0.98, 0.01, 0.01],
        [0.01, 0.98, 0.01],
        [0.01, 0.01, 0.98]]

topics = [[ 0.4,  0.4, 0.01, 0.01, 0.01, 0.01,  0.1, 0.04, 0.01, 0.01],
          [0.01, 0.01,  0.4,  0.4, 0.01, 0.01,  0.1, 0.04, 0.01, 0.01],
          [0.02, 0.02, 0.01, 0.01,  0.4,  0.4, 0.02,  0.1, 0.01, 0.01]]
    
words =  ['cat', 'kitten', 'dog', 'puppy', 'deep', 'learning', 'fur', 'image', 'GPU', 'asparagus']


In [27]:
import numpy as np
import random

In [19]:
def make_doc(topic_probs, n_words, word_probs, words):

    doc = []
    for n in range(n_words):
        topic = np.random.choice(len(topics), 1, p=topic_probs)
        word = np.random.choice(words, 1, p=topics[topic[0]])
        doc.append(word[0])
        
    return ' '.join(doc)

In [20]:
for doc in docs:
    print(make_doc(doc, 10, topics, words))

fur cat kitten kitten cat cat kitten kitten kitten deep
dog image dog dog dog puppy fur dog dog dog
deep learning learning learning deep learning image cat learning deep


In [33]:
def make_doc2(topic_probs, n_words, topics, words):
    
    tops = random.choices(topics, weights=topic_probs, k=n_words)
    doc = [random.choices(words, weights=top, k=1)[0] for top in tops]
    
    return ' '.join(doc)

In [34]:
for doc in docs:
    print(make_doc2(doc, 10, topics, words))

cat fur dog image cat dog kitten cat kitten kitten
puppy deep puppy fur asparagus puppy dog fur puppy fur
image learning kitten learning deep learning cat image learning learning


In [35]:
def make_doc3(topic_probs=None, n_words=10, topics=topics, words=words):
    
    if topic_probs is None:
        topic_probs = np.random.dirichlet(alpha=[0.1]*len(topics))
        
    return make_doc(topic_probs, n_words, topics, words)

In [36]:
for doc in docs:
    print(make_doc2(None, 10, topics, words))

deep learning deep learning cat learning puppy deep cat cat
dog learning dog deep dog fur kitten cat cat kitten
deep kitten kitten cat dog dog GPU kitten deep image
