In [1]:
import numpy as np
import pandas as pd
from itertools import product

In [61]:
def prob(E):
    return 1 / (1 + np.exp(-E))

def p(nu, h1, h2, w):
    E = nu * h1 * w[0] + nu * h2 * w[1]
    return prob(E)

def fc(w):

    fc = pd.DataFrame(list(product([0, 1], [0, 1], [0, 1])), columns=['nu', 'h1', 'h2'])
    fc['E'] = w[0] * fc['h1'] *  fc['nu'] + w[1] * fc['h2'] *  fc['nu']
    fc['P'] = prob(fc['E'])
    fc['Pnorm'] = fc['P'] / fc['P'].sum()
    
    return fc

## 1 ##

You may find this explanation of SBNs helpful.

This quiz is going to take you through the details of Sigmoid Belief Networks (SBNs). The most relevant videos are the second video ("Belief Nets", especially from 11:44) and third video ("Learning sigmoid belief nets") of lecture 13.

We'll be working with this network:


[![Foo](https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/M8ULpLKdEea3qApInhZCFg_15ef1fe555538cc6c49ff235689c3cbc_sbn.png?expiry=1504137600000&hmac=4fZFsLsLmG4KfEejQ58q-UVQtZtaU7UVmyL6mwDTmtk)]()


The network has no biases (or equivalently, the biases are always zero), so it has only two parameters: w1 (the weight on the connection from h1 to v) and w2 (the weight on the connection from h2 to v).

Remember, the units in an SBN are all binary, and the logistic function (also known as the sigmoid function) figures prominently in the definition of SBNs. These binary units, with their logistic/sigmoid probability function, are in a sense the stochastic equivalent of the deterministic logistic hidden units that we've seen often in earlier lectures.

Let's start with $w1=−6.90675478$ and $w2=0.40546511$. These numbers were chosen to ensure that the answer to many questions is a very simple answer, which might make it easier to understand more of what's going on. Let's also pick a complete configuration to focus on: h1=0,h2=1,v=1 (we'll call that configuration C011).

Ready to Begin? (Please select a response. This question is reflective and selecting a certain answer will not affect your grade.)

In [62]:
w1 = -6.90675478
w2 = 0.40546511
w = [w1, w2]

## 2. ## 
What is $P(v=1|h1=0,h2=1)$? Write your answer with four digits after the decimal point. Hint: the last three of

those four digits are zeros. (If you're lost on this question, then I strongly recommend that you do whatever you need to do to figure it out, before

proceeding with the rest of this quiz.)

In [63]:
p(nu=1, h1=0, h2=1, w=w)

0.60000000045404056

## 3. ## 
What is the probability of that full configuration, i.e. $P(h1=0,h2=1,v=1)$, which we called P(C011)? Write your answer with four digits

after the decimal point. Hint: it's less than a half, and the last two of those four digits are zeros.

In [64]:
fcw = fc(w)
fcw

Unnamed: 0,nu,h1,h2,E,P,Pnorm
0,0,0,0,0.0,0.5,0.16116
1,0,0,1,0.0,0.5,0.16116
2,0,1,0,0.0,0.5,0.16116
3,0,1,1,0.0,0.5,0.16116
4,1,0,0,0.0,0.5,0.16116
5,1,0,1,0.405465,0.6,0.193392
6,1,1,0,-6.906755,0.001,0.000322
7,1,1,1,-6.50129,0.001499,0.000483


In [65]:
Ph1 = 0.5
Ph2 = 0.5

P_C011 = p(nu=1, h1=0, h2=1, w=w) * (1 - Ph1) * Ph2
P_C011

0.15000000011351014

## 4. ##
What is $\frac{\partial{\log{P}(C011)}}{\partial{w1}}$? Write your answer with at least three digits after the decimal point, and don't be too

surprised if it's a very simple answer.

In [66]:
0.000

0.0

## 5. ## 

What is $\frac{\partial{\log{P}(C011)}}{\partial{w2}}$? Write your answer with at least three digits after the decimal point, and don't be too

surprised if it's a very simple answer.

In [67]:
4 * (P_C011) * np.exp(-w2)

0.39999999954595949

## 6. ##

What is $P(h2=1|v=1,h1=0)$? Give your answer with at least four digits after the decimal point. Hint: it's a fairly small number (and not a round number like for the earlier questions); try to intuitively understand why it's small. Second hint: you might find Bayes' rule useful, but even with that rule, this still requires some thought.

Using Bayes theorem:

$P(h2=1| \nu=1, h1=0) = \frac{P(\nu=1, h1=0 |h2=1) P(h2=1)}{P(\nu=1, h1=0)} = \frac{\frac{1}{4} P(\nu=1|h2=1){P(\nu=1, h1=0)} = \frac{\frac{1}{8} P(\nu=1|h1=0, h2=1) + {\frac{1}{4} P(\nu=1|h1=1, h2=1)}{\frac{1}{4} P(\nu=1| h1=0, h2=0)} + \frac{1}{4} P(\nu=1| h1=0, h2=1)}$
 

In [68]:
ww1 = 10
ww2 = -4
ww = [ww1, ww2]
fcww = fc(ww)
fcww

Unnamed: 0,nu,h1,h2,E,P,Pnorm
0,0,0,0,0,0.5,0.11073
1,0,0,1,0,0.5,0.11073
2,0,1,0,0,0.5,0.11073
3,0,1,1,0,0.5,0.11073
4,1,0,0,0,0.5,0.11073
5,1,0,1,-4,0.017986,0.003983
6,1,1,0,10,0.999955,0.221451
7,1,1,1,6,0.997527,0.220913


In [69]:
phww = [fcww[fcww['h1'] == 1]['Pnorm'].sum(), fcww[fcww['h2'] == 1]['Pnorm'].sum()]
phww

[0.66382528977144051, 0.44635760954835663]

In [71]:
p(1, 0, 1, ww) / (p(1, 0, 1, ww) + p(1, 0, 0, ww))

0.034723337448322934

In [72]:
(p(1, 1, 1, ww)) /  (p(1, 1, 1, ww) + p(1, 1, 0, ww))

0.49939242873941264

## 7. ## 
What is $P(h2=1|v=1,h1=1)$? Give your answer with at least four digits after the decimal point. Hint: it's quite different from the answer to the previous question; try to understand why. The fact that those two are different shows that, conditional on the state of the visible units, the hidden units have a strong effect on each other, i.e. they're not independent. That is what we call explaining away, and the earthquake vs. truck network is another example of that.

https://www.coursera.org/learn/neural-networks/discussions/weeks/13/threads/qlXPInBZEeeNWAqvpYfEJA

In [141]:
0.997527 / (0.500000 + 0.017986)

1.9257798473317815

In [148]:
(p(1, 1, 1, ww)) / (p(1, 0, 1, ww) + p(1, 0, 0, ww) + p(1, 1, 0, ww) + p(1, 1, 1, ww))

0.39655734118625574