# Pyro – A Deep Probabilistic Programming Language 

Pyro is a state-of-the-art programming language for deep probabilistic modelling. It is a flexible and scalable probabilistic programming language (PPL). It unifies the modern concepts of deep learning and Bayesian modelling. It has been written in Python and built on top of Pytorch. The Uber AI Labs introduced it in 2017. A team now maintains it at the Broad Institute in collaboration with its developer community.

Are you unfamiliar with the term ‘probabilistic programming’? Refer to the ‘probabilistic programming’ section of [this](https://analyticsindiamag.com/introduction-to-infer-net-a-framework-for-probabilistic-programming-in-dotnet/) article before proceeding!

Before moving on to Pyro’s details, we will briefly talk about its base library PyTorch, which it utilizes as an underlying tensor computation engine.

To read about it more, please refer [this](https://analyticsindiamag.com/guide-to-pyro-a-deep-probabilistic-programming-language/) article.

## Code Implementation

## Installation of Pyro

NOTE: Pyro supports Python 3.6+ versions.

Pyro can be installed using pip command as follows:

pip install pyro-ppl

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras torch --user -q

In [None]:
!python -m pip install pyro-ppl --user -q

Suppose we have a weighing scale that tells us the weight of an object it holds. But the scale lacks accuracy and gives different measurements each time we weigh a given object. Assume, the scale’s errors form a normal distribution around that object’s actual weight, with a standard deviation of 0.1kg. We describe the scaling process and infer the true weight using Pyro in the implementation below.

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

Import the required libraries and modules

In [None]:
import torch
import pyro
#assert pyro.__version__.startswith('1.5.2') # I'm writing this tutorial with version
                                          # 1.3.1. 
pyro.set_rng_seed(0)

import torch.distributions as dist
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import time

%matplotlib inline

import seaborn as sns

Define the method which gives measurement observations

In [None]:
def measure(weight):
    
    my_dist = dist.Normal(weight, 0.1)
    observation = my_dist.sample()
    return observation

Test the results when some weight of say 0.6kg is placed on the scale multiple times

In [None]:
print(measure(0.6))
print(measure(0.6))
print(measure(0.6))
print(measure(0.6))

Note: The output may vary as many times as you execute the code.

It can be seen from the output that every time we do not get the same measurement results. The observations are not always shown to be 0.6.

  Now suppose we do not insist upon getting the exact measurement but want to predict an observation’s probability. For instance, what is the probability that the observed measurement will be above 0.66?

In [None]:
from scipy.stats import norm

rough_measure = np.sum([measure(0.6) > 0.66 for i in range(1000)])/1000
print(f'Rough Estimate: {rough_measure}')
 
reasonable_measure = np.sum([measure(0.6) > 0.66 for i in range(10000)])/10000
print(f'Eeasonable Estimate: {reasonable_measure}')
 
good_measure = np.sum([measure(0.6) > 0.66 for i in range(100000)])/100000
print(f'Good Estimate: {good_measure}')
 
true_measure = 1.0 - norm(0.6, 0.1).cdf(0.66) 
#0.6 is mean and 0.1 is standard deviation of the normal distribution. cdf(x) means the probability that a random sample will be less than or equal to x
print(f'True Estimate: {true_measure}')
#’cdf’ in scipy.stat.norm.cdf stands for cumulative distribution function


Note: The output may vary as per the output of measure() method in each step.

The process done in this step involves somewhat tedious calculations though it gives satisfactory results. 

  Now suppose we have to handle complex queries and the distribution of weights is also not normal. For instance, we have some observations about an object as follows:

First, form a torch tensor (multi-dimensional matrix having elements of a common datatype) of your observations. 

In [None]:
# Pyro only works on torch tensors
observations = torch.tensor([0.77, 0.88, 0.67, 0.77, 0.82, 0.71])
print(f'Mean = {torch.mean(observations)}.')

  Define the Pyro model

In [None]:
# Import our libraries
import pyro.distributions as pyrodist


# Define the process
def model(observations):
    
    # 1. Let's define a prior distribution on the likely values of our weight.
    # We'll use the mean of the observations as our initial guess
    weight_prior = pyrodist.Normal(0.769, 1.0)
    
    # 2. Sample a value from the weight distributoin
    weight = pyro.sample("weight1", weight_prior)
    
    # 3. Now use a that value to define our scale (remember our scale gives us values
    # from Normal(weight, 0.1))
    my_dist = pyrodist.Normal(weight, 0.1)
    
    
    # 4. For each of the observations, let's draw a sample from our distribution.
    # HOWEVER, this is an observed sample, it's a sample that should be in line with
    # the observations we have
    
    for i,observation in enumerate(observations):
        measurement = pyro.sample(f'obs_{i}', my_dist, obs=observation)


Infer the actual weight of the object for which we got the observations.

We are using the Hamiltonian Monte Carlo (HMC) algorithm belonging to the Markov Chain Monte Carlo (MCMC) family of algorithms.

In [None]:
from pyro.infer import MCMC, HMC

# 1. Clear storage of named parameters
pyro.clear_param_store()

# 2. Define the MCMC kernel function we will employ, and tell
# it to use the model function we defined as the basis for
# sampling
my_kernel = HMC(model)


# 3. Define the MCMC algorithm with our specific
# implementation of choice and the number of samples
# to use to evaluate the most likely distribution
# of "weight1".
my_mcmc = MCMC(my_kernel,
               num_samples=30000,
               warmup_steps=150)

# 4. Run the algorithm, send our observations 
# (notice this is the parameter model(observations) recieves)
my_mcmc.run(observations)

Plot the samples 

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(15, 5))
sns.distplot(my_mcmc.get_samples()['weight1'].numpy(), kde=False, label="wt_1")
plt.legend()
plt.xlabel("Weight of the object in kg")
plt.ylabel("No. of observed samples")
plt.show()


Know the predicted most likely weight of the object from the model’s summary.

In [None]:
my_mcmc.summary(prob=0.95)