## Load PyEpiDAG

In [1]:
import epidag as dag
import numpy as np

## Define a DAG

Compose a script

```
PCore Exp1 {
    # Definitions of nodes
}
```

Then, .. parse to a DAG

In [2]:
script = '''
PCore Exp1 {
    n
    a = 0.5 
    b ~ beta(1, 1)
    c = min(a, b)
    y ~ binom(n, c)
    z = f(a,b)
}
'''

js = dag.bn_script_to_json(script)
bn = dag.BayesianNetwork(js)
bn

Name:	Exp1
Nodes:
	n
	b ~ beta(1,1)
	a = 0.5
	z = f(a, b)
	c = min(a,b)
	y ~ binom(n,c)

### Single value variable


> VariableName = Number 


In [3]:
SingleValue = bn['a']
print('Node \'a\'')
print('Definition:', SingleValue)

print('\nFind the value')
print(SingleValue())

Node 'a'
Definition: a = 0.5

Find the value
0.5


### Exogenous variable

> VariableName

In [4]:
Exogenous = bn['n']
print('Node \'n\'')
print('Definition:', Exogenous)

print('\nFind the value; must append external resources')
print(Exogenous({'n': 5}))

Node 'n'
Definition: n

Find the value; must append external resources
5


### Random variable

> VariableName ~ p(...)

** p(...) **: a probabilidy density/mass function

In [5]:
Random = bn['b']
print('Node \'b\'')
print('Definition:', Random)

print('\nSample a value')
print(Random())

Node 'b'
Definition: b ~ beta(1,1)

Sample a value
0.3064294338139023


### Equation output

> OutputName = g(...)

** g(...) **: a mathematical function

In [6]:
Equation = bn['c']
print('Node \'c\'')
print('Definition:', Equation)

parents = {
    'a': SingleValue(), 
    'b': Random() 
}

print('\nCalculate a value')
print(Equation(parents))

Node 'c'
Definition: c = min(a,b)

Calculate a value
0.29314699778621023


### Pseudo Node
Pseudo nodes are the nodes which can not be implemented in simulation but in relation inference.

> VariableName = f(...)

** f(...) **: a pseudo function start with ** f ** and a list of parent variables follows

In [7]:
Pseudo = bn['z']
print('Node \'z\'')
print('Definition:', Pseudo)

parents = {
    'a': SingleValue(), 
    'b': Random()
}
print('Parents:', Pseudo.Parents)

try:
    Pseudo(parents)
except AttributeError:
    print('Pseudo nodes do not provide any implementation')

Node 'z'
Definition: z = f(a, b)
Parents: {'a', 'b'}
Pseudo nodes do not provide any implementation


## For simulation model

The **'epidag.simulation'** is used to provide tools for simulation modelling. 

### Reasons to use 'epidag.simulation'
- The parameters of my model have complicated interactions among each other.
- My model includes many random effects, so I don't want to fix the parameters in the begining.
- I don't want to rebuild my model after the parameters changed. (Intervention analysis).
- My study include Monte Carlo inference and model fitting, so I need a convienant interface.


### SimulationCore
SimulationCore is a type of object carrying all the definition of a parameter model.


### ParameterCore
ParameterCore is a type of object can be use directly in a simulation model. A ParameterCore can be generated from its parent ParameterCore or a SimulationCore. After a ParameterCore instantiated 1) the fixed nodes are assigned, 2) the random nodes are ready to be used.

### Purposed workflow

Monte Carlo simulation
1. Prepare a SimulationCore
2. For each iteration, generate a ParameterCore
3. Build simulation models with the ParamterCores
4. Collect results and analysis (summarise or fit to data)

### Example 1. Basic syntex and function, Gaussian distribution

This example shows a normal variable ($x$) with a fixed mean value ($mu_x$) and a standard deviation ($sd$) distributed from an exponential distribution

#### Step 1. generate a simulation given a Bayesian network

In [8]:
script = '''
PCore Exp1 {
   mu_x = 0
   sd ~ exp(1)
   x ~ norm(mu_x, sd)
}
'''

bn = dag.bn_from_script(script)
sc = dag.as_simulation_core(bn)

#### Step 2. Instantiate a ParameterCore which 

- Hyper-parameter ($sd$) is fixed to a certain value
- A sampler of the leaf variable ($x$) is prepared

In [9]:
pc = sc.generate(nickname='exp2')
pc

sd: 1.66383, mu_x: 0

#### Step 3. Get sampler x from the ParameterCore and provide it to a simulation model.

- The sampler can be used repeatly
- You don't need to refer to its hyper-parameters while using

In [10]:
x = pc.get_sampler('x')
x

Actor x (norm(mu_x,sd)) on exp2

In [11]:
x(), np.mean([x() for _ in range(1000)])

(1.5932490610284251, -0.010377068345066066)

#### Step 4. Intervention

You can set impulse on the ParameterCore. Then, 
- The impulse will be passed down to its downstream variables
- You don't need to do anything to the sample  

In [12]:
pc.impulse({'sd': 5, 'mu_x': 100})
pc

sd: 5, mu_x: 100

In [13]:
x(), np.mean([x() for _ in range(1000)])

(105.36702806008006, 100.06651666930844)

### Example 2. Random effects, Beta-Binomial model

Example 2 is a standard example in Baysian inference. Beta-Binomial model are used to model count data ($x$) with a latent variable, probability ($prob$). 


> dag.as_simulation_core(bn, random=[...])

The option **random** defined variables which we do not want then be fixed during ParameterCore instantiation

In [14]:
script = '''
PCore BetaBinom {
    alpha = 1
    beta = 1
    n
    prob ~ beta(alpha, beta)
    x ~ binom(n, prob)
}
'''

bn = dag.bn_from_script(script)
sc = dag.as_simulation_core(bn, random=['prob'])

Since the variable $n$ is an exogenous variable, we introduce it to new ParameterCore by **exo={...}**.

To be noticed that, $prob$ has been set as a random effect, so the variable will be requested whenever new $x$ is requested

In [15]:
pc = sc.generate('exp1', exo={'n': 100})
pc

n: 100, beta: 1, alpha: 1

Again, we get a sampler $x$ for the generated ParameterCore

In [16]:
x = pc.get_sampler('x')
x()

75

**list_all** option print all variables used to sample outcome variable. You can see that $prob$ is not a fixed variable

In [17]:
x(list_all=True)

(92, {'alpha': 1, 'beta': 1, 'n': 100, 'prob': 0.9240764251105434})

In [18]:
x(list_all=True)

(2, {'alpha': 1, 'beta': 1, 'n': 100, 'prob': 0.04978515678161807})

### Example 3. Exposed variables,  Regression model

The example is a linear regression model. Dependant variable ($y$) is composed of $x*b1$ and an intercept $b0$; $var=eps^2$ is a measure of variance.

In [19]:
script = '''
PCore Regression {
    b0 = 1
    b1 = 0.5
    x = 1
    eps ~ norm(0, 1)
    y  = x*b1 + b0 + eps
    var = pow(eps, 2)
}
'''

bn = dag.bn_from_script(script)

However, $var$ is not a variable for simulation but for providing external information, so it do not need to be treated as a sampler.. 

In [20]:
sc = dag.as_simulation_core(bn)
pc = sc.generate('exp3-1')
print('Fixed variables', pc)
print('Samplers', pc.list_samplers())

Fixed variables eps: -0.822585, x: 1, b1: 0.5, b0: 1
Samplers ['y', 'var']


Use ** out ** option in generating ParameterCore can indicate the parameters should be exposed to a simulation model.

In [21]:
sc = dag.as_simulation_core(bn, out=['y'])
pc = sc.generate('exp3-2')
print('Fixed variables', pc)
print('Samplers', pc.list_samplers())

Fixed variables eps: 0.612016, var: 0.374564, x: 1, b1: 0.5, b0: 1
Samplers ['y']


### Example 4. Hierarchy, BMI model

Example 4 describes a parameter model of a BMI (body mass index) simulation model. The model include the layers of model: country, area, and people. A country have many areas; an area have many people; there are two types of people: agA and agB. 

- Each area have its amount of foodstores which can provide food to people.  
- agA and agB preform differently in the variance of BMI
- The simulation model requests the sex of agA individuals in order to model their behavour 

In [22]:
script = '''
PCore BMI {
    b0 ~ norm(12, 1)
    b1 = 0.5
    pf ~ beta(8, 20)
    foodstore ~ binom(100, pf)
    b0r ~ norm(0, .01)
    ageA ~ norm(20, 3)
    ageB ~ norm(30, 2)
    ps ~ beta(5, 6)
    sexA ~ cat({'m': ps, 'f': 1-ps})
    muA = b0 + b0r + b1*ageA
    bmiA ~ norm(muA, sd)
    sdB = sd * 0.5
    muB = b0 + b0r + b1*ageB
    bmiB ~ norm(muB, sdB)
}
'''

bn = dag.bn_from_script(script)

You can define hierarchies by a dictionary with 

1. parameter groups as keys and of respective parameters as values 
2. putting their children groups in the values as well

You do not need to list every variable. The variable outside the list will be optimalised in the SimulationCore

In [23]:
hie = {
    'country': ['area'],
    'area': ['b0r', 'pf', 'ps', 'foodstore', 'agA', 'agB'],
    'agA': ['bmiA', 'ageA', 'sexA'],
    'agB': ['bmiB', 'ageB']
}

sc = dag.as_simulation_core(bn, hie=hie,
                            random=['muA'],
                            out=['foodstore', 'bmiA', 'bmiB'])
sc.deep_print()

country(sdB, sd, b1, b0)
-- area(b0r, ps, pf, foodstore)
---- agA(ageA, sexA, bmiA, muA)
---- agB(bmiB, ageB, muB)


You can use a SimulationCore to generate a root ParameterCore and use the root to $breed$ children ParameterCores

In [24]:
pc = sc.generate('Taiwan', {'sd': 1})
pc_taipei = pc.breed('Taipei', 'area')
pc_taipei.breed('A1', 'agA')
pc_taipei.breed('A2', 'agA')
pc_taipei.breed('B1', 'agB')
pc_taipei.breed('B2', 'agB')

pc.deep_print()

Taiwan (sd: 1, sdB: 0.5, b1: 0.5, b0: 12.7063)
-- Taipei (ps: 0.522209, b0r: -0.00990844, pf: 0.277933)
---- A1 (sexA: m, ageA: 24.1891)
---- A2 (sexA: m, ageA: 17.4916)
---- B1 (ageB: 29.2246, muB: 27.3087)
---- B2 (ageB: 29.7265, muB: 27.5597)


In [25]:
print('Get node from it parent')
a1 = pc_taipei.get_child('A1')
a1.print()

print('\nGet a node from root: use \'@\' to link names of ParameterCores')
a1 = pc.find_descendant('Taiwan@Taipei@A1')
a1.print()

Get node from it parent
A1 (sexA: m, ageA: 24.1891)

Get a node from root: use '@' to link names of ParameterCores
A1 (sexA: m, ageA: 24.1891)


When putting intervention in a node, the impulse will be passed to its children node automatically.

In [26]:
# see the change of muB
pc.impulse({'b0r': 10})
pc.deep_print()

Taiwan (sd: 1, sdB: 0.5, b1: 0.5, b0: 12.7063)
-- Taipei (ps: 0.522209, b0r: 10, pf: 0.277933)
---- A1 (sexA: m, ageA: 24.1891)
---- A2 (sexA: m, ageA: 17.4916)
---- B1 (ageB: 29.2246, muB: 37.3186)
---- B2 (ageB: 29.7265, muB: 37.5696)


In [27]:
pc.impulse(['b0r'])
pc.deep_print()

Taiwan (sd: 1, sdB: 0.5, b1: 0.5, b0: 12.7063)
-- Taipei (ps: 0.522209, b0r: -0.0136767, pf: 0.277933)
---- A1 (sexA: m, ageA: 24.1891)
---- A2 (sexA: m, ageA: 17.4916)
---- B1 (ageB: 29.2246, muB: 27.3049)
---- B2 (ageB: 29.7265, muB: 27.5559)


### Example 5. A simple agent-based model 

In [28]:
script = '''
PCore BetaBin {
    al = 1
    be = 1
    p ~ beta(al, be)
    x ~ binom(5, p)
}
'''


bn = dag.bn_from_script(script)

hie = {
    'root': ['ag'],
    'ag': ['x']
}

sm = dag.as_simulation_core(bn, hie)


In [29]:
class Agent:
    def __init__(self, name, p):
        self.Name = name
        self.Parameters = p
        self.X = p.get_sampler('x')
        
    def produce(self, k):
        return self.X.sample(k)

class AgentBasedModel:
    def __init__(self, pars, n_agents):
        self.Parameters = pars
        self.Agents = list()
        for i in range(n_agents):
            name = 'Ag{}'.format(i)
            p = pars.breed(name, 'ag')
            self.Agents.append(Agent(name, p))
        
    def product(self, k):
        return np.array([ag.produce(k) for ag in self.Agents]).sum()
        
        
        
def fn_sim(pars, data):
    abm = AgentBasedModel(pars, data['N'])
    return abm.product(data['K'])

def fn_mean(sim, data):
    return -abs(sim - data['X'])

In [30]:
pars = sm.generate('Simu A')

abm = AgentBasedModel(pars, 5)

In [31]:
abm.product(100)

0

In [32]:
d = {
    'N': 10,
    'K': 10,
    'X': 200
}
pars = sm.generate('Simu A')
sim = fn_sim(pars, d)
mea = fn_mean(sim, d)
sim, mea

(398, -198)

In [33]:
p = pars.get_child('Ag1')