## [Introduction to Pyro](http://pyro.ai/examples/intro_long.html#Introduction-to-Pyro)

In [1]:
%reset -s -f

In [2]:
import logging
import os

In [3]:
import torch
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
import pyro

In [5]:
assert pyro.__version__.startswith('1.8')

In [6]:
pyro.enable_validation(True)
pyro.set_rng_seed(234)

In [7]:
logging.basicConfig(format='%(message)s', level=logging.INFO)

In [9]:
%matplotlib inline
plt.style.use(style='ggplot')

___

Most data analysis problems can be understood as elaborations on three basic high-level questions:

- What do we know about the problem before observing any data?

- What conclusions can we draw from data given our prior knowledge?

- Do these conclusions make sense?

In the probabilistic or Bayesian approach to data science and machine learning, we formalize these in terms of mathematical operations on probability distributions.

$$p_{\theta}({\bf z} | {\bf x}) = \frac{p_{\theta}({\bf x} , {\bf z})}{
\int \! d{\bf z}\; p_{\theta}({\bf x} , {\bf z}) }$$

Marginal Likelihood or Evidence$$p_{\theta}({\bf x}) = \int \! d{\bf z}\; p_{\theta}({\bf x} , {\bf z})$$

$$\theta_{\rm{max}} = \rm{argmax}_\theta p_{\theta}({\bf x}) = \rm{argmax}_\theta \int \! d{\bf z}\; p_{\theta}({\bf x} , {\bf z})$$

___

In [10]:
data_url = "https://d2hg8soec8ck9v.cloudfront.net/datasets/rugged_data.csv"

In [11]:
data = pd.read_csv(data_url, encoding='ISO-8859-1')

In [12]:
data.columns

Index(['isocode', 'isonum', 'country', 'rugged', 'rugged_popw', 'rugged_slope',
       'rugged_lsd', 'rugged_pc', 'land_area', 'lat', 'lon', 'soil', 'desert',
       'tropical', 'dist_coast', 'near_coast', 'gemstones', 'rgdppc_2000',
       'rgdppc_1950_m', 'rgdppc_1975_m', 'rgdppc_2000_m', 'rgdppc_1950_2000_m',
       'q_rule_law', 'cont_africa', 'cont_asia', 'cont_europe', 'cont_oceania',
       'cont_north_america', 'cont_south_america', 'legor_gbr', 'legor_fra',
       'legor_soc', 'legor_deu', 'legor_sca', 'colony_esp', 'colony_gbr',
       'colony_fra', 'colony_prt', 'colony_oeu', 'africa_region_n',
       'africa_region_s', 'africa_region_w', 'africa_region_e',
       'africa_region_c', 'slave_exports', 'dist_slavemkt_atlantic',
       'dist_slavemkt_indian', 'dist_slavemkt_saharan', 'dist_slavemkt_redsea',
       'pop_1400', 'european_descent'],
      dtype='object')

In [14]:
df = data[['cont_africa', 'rugged', 'rgdppc_2000']]

In [16]:
df = df[np.isfinite(df.rgdppc_2000)]

In [17]:
df['log_rgdppc_2000'] = np.log(df.rgdppc_2000)