Here is a common network example about a sidewalk.  Maybe it will inform a package delivery agent as to the right speed or tires to use on its delivery route!

Our network structure says that the variable **Season** directly influences both **Sprinkler** and **Rain**, that they both in turn influence **Wet**, which in turn influences **Slippery**.

There is a data set with historical observations about the variables.  The data set is in the file slippery.csv.  In this file, Season is coded as 0 to 3 (Winter, Spring, Summer, Fall) and the other variables are binary (0 for false 1 for true).

First determine the parameters you need to build this network.

In [1]:
# Read the file into a data frame and look at the first few rows

from pandas import *
df = pandas.read_csv("slippery.csv")
df.head()

Unnamed: 0,Season,Sprinkler,Rain,Wet,Slippery
0,3,1,0,0,0
1,0,0,0,0,0
2,2,1,0,1,1
3,3,0,1,1,1
4,3,1,0,1,0


In [None]:
# The columns came from the csv file
df.columns

In [None]:
# Values in the Season column and count of each value
df.Season.value_counts().sort_index()

In [None]:
#  Now it's easy to get the prior probability on Season
(df.Season.value_counts() / len(df.Season)).sort_index()

In [None]:
##  Probabilities and Conditional Probabilities
##  Counting:
##      Example 1:  Count the number of observations that are in the summer

len(df[df.Season == 2])


In [None]:
## Another way to count -- if the values are 0 or 1
print(len(df[df.Rain == 1])

In [None]:
##  For variables that are binary, to count the number of observations where a 
##  variable is true, just use sum (sum the 1 values gives you the count)
#
#  Number of observations where the sidewalk is wet
print(df.Wet.sum())
# Percent of the observations where the sidewalk is wet
print(df.Wet.sum() / len(df.Wet))

In [None]:
##  To get P(Slippery | Wet = 0) and P(Slippery | Wet = 1)

## The general idea is to first restrict the dataframe to rows where the conditioning 
##  expression is true.  For example, the rows where Wet == 0.  This is a data frame with the 
##  same columns but a subset of the rows

wet0 = df[df.Wet == 0]
print(wet0.head())

wet1 = df[df.Wet == 1]
print(wet1.head())

In [None]:
##  Now to get P(Slippery | Wet = 0) we get the percentage of records with Slippery ==1 in the restricted dataframe
## This is what % slippery when the pavement is not wet, which we expect to be low

print(wet0.Slippery.sum()/len(wet0.Slippery))


In [None]:
###  P(Slippery | Wet = 1) should be higher
print(wet1.Slippery.sum()/len(wet1.Slippery))

#### Gather the Model Parameters

The network structure tells you the probabilities you need:  a distribution over values for Season, a conditional probability table for Rain that depends on the value of Season, and so on.   Collect these values either just printed or into variables.

In [None]:
## Calculation of model parameters goes here

#### Build the Network

Using the example networks as a guide, build the distributions, the nodes, then the model.
You will have success when you can say **model.bake()**
**Hint:**  It is very easy to make small errors as you go, and if you build the whole model before testing, you will likely get an obscure error message and not know where to look.

Start small and build incrementally, each time building the model and looking at its **proba** distribution to verify your inputs.

* Start just with Season and its unconditional distribution.   Build a model just with that node and no arcs
* Once that works, add Sprinkler with its conditional probability table depending on Season
* Then you can add Rain, which should look exactly the same except for the values in the probability table
* Then add Wet, which depends on both Rain and Sprinkler.  Think first about what its probability table should look like.  Draw out the template of the probability table arrays before putting in actual values.  Be careful about 0 and 1.   You will tend to write your 0 entry before your 1 entry, but you also tend to think of True before False
* After Wet works, Slippery is easy, and you're done!

In [None]:
from pomegranate import *

#seasonDist = DiscreteDistribution({...})
#rainDist = ConditionalProbabilityTable([...], [seasonDist])

#season = Node(seasonDist, name="Season")
#rain = Node(rainDist, name="Rain")

#model = BayesianNetwork("Slippery Sidewalk")
#model.add_states(season, rain, ...)
#model.add_edge(season, rain)

#model.bake()

### Answer some questions

* Run the **predict_proba** method on the network.  What is it telling you?   Are those numbers plausible?  Useful?
* Compare the difference in the probability of **Slippery** based on the season being Summer rather than Winter.  Is it what you were expecting?  Why or why not?
* For the fixed **Season** value Spring, suppose you know the sprinklers are running but it's not raining.  What is the joint probability distribution over the values of **Wet** and **Slippery**
* Suppose you know for sure that **Wet** is true.  What is the value of **Slippery**.  Now fix the value of **Rain** to true.  Does that change the probability of **Wet**?  Why or why not?
