# Simple Model
In our simpler model, we will just model each post as posting about a story coming from one of three groups:
- Factual, Disputed Story
- Fake, Disputed Story
- Corrective Story

Have found this to be a useful resource for a hierarchcal model example: https://github.com/pyro-ppl/pyro/blob/dev/examples/baseball.py
As well as https://pyro.ai/examples/forecasting_iii.html

# To start, we will use dummy data

In [89]:
import pandas as pd
import torch
import pyro
from pyro.infer import MCMC, NUTS
import pyro.distributions as dist

In [71]:
pyro.enable_validation(__debug__)
pyro.set_rng_seed(0)

In [72]:
data = pd.DataFrame({"Type": ["Fake", "Fact", "Corrective", "Fake", "Fact", "Corrective", "Fake", "Fact", "Corrective"],
                     "CommentsFirstHour": [100, 50, 20, 250, 100, 40, 125, 150, 30],
                     "Engagement": [1000, 800, 300, 3000, 2500, 500, 1500, 1600, 1000]})
data

Unnamed: 0,Type,CommentsFirstHour,Engagement
0,Fake,100,1000
1,Fact,50,800
2,Corrective,20,300
3,Fake,250,3000
4,Fact,100,2500
5,Corrective,40,500
6,Fake,125,1500
7,Fact,150,1600
8,Corrective,30,1000


In [73]:
data = torch.Tensor([[[100, 1000], [250, 3000], [125, 1500]],
                     [[50,  800],  [100, 2500], [150, 1600]],
                     [[20,  300],  [40,  500],  [30,  1000]]])
# dim 0: Type: (Fake, Fact, Corrective)
# dim 1: post
# dim 2: obs (vars): (commentsFirstHour, Engagement)

In [74]:
data

tensor([[[ 100., 1000.],
         [ 250., 3000.],
         [ 125., 1500.]],

        [[  50.,  800.],
         [ 100., 2500.],
         [ 150., 1600.]],

        [[  20.,  300.],
         [  40.,  500.],
         [  30., 1000.]]])

In [75]:
x = data[:,:,:1]
y = data[:,:,1]

In [76]:
x

tensor([[[100.],
         [250.],
         [125.]],

        [[ 50.],
         [100.],
         [150.]],

        [[ 20.],
         [ 40.],
         [ 30.]]])

In [77]:
y

tensor([[1000., 3000., 1500.],
        [ 800., 2500., 1600.],
        [ 300.,  500., 1000.]])

In [78]:
# x is a 2D tensor of num
def model(x, y):
    num_types, num_posts, num_indeps = x.shape
    
    # construct necessary plates over each level
    type_plate = pyro.plate("type", num_types, dim=-2)
    post_plate = pyro.plate("post", num_posts, dim=-1)
    
    # sample some kind of shared variables here.
#     pyro.sample

    with type_plate:
        type_level = pyro.sample("type_level", dist.Normal(0, 10))
    
    prediction = type_level
    return prediction

In [79]:
nuts_kernel = NUTS(model)

mcmc = MCMC(nuts_kernel, num_samples=2000, warmup_steps=250)
mcmc.run(x, y)

hmc_samples = {k: v.detach().cpu().numpy() for k, v in mcmc.get_samples().items()}

Warmup:   0%|          | 0/2250 [00:00, ?it/s]Warmup:   0%|          | 11/2250 [00:00, 107.77it/s, step size=5.37e+00, acc. prob=0.702]Warmup:   1%|          | 28/2250 [00:00, 120.46it/s, step size=6.46e+00, acc. prob=0.759]Warmup:   2%|▏         | 43/2250 [00:00, 127.77it/s, step size=1.69e+01, acc. prob=0.779]Warmup:   3%|▎         | 61/2250 [00:00, 138.89it/s, step size=9.45e+00, acc. prob=0.779]Warmup:   4%|▎         | 79/2250 [00:00, 144.92it/s, step size=5.20e+00, acc. prob=0.778]Warmup:   4%|▍         | 95/2250 [00:00, 148.45it/s, step size=2.02e+01, acc. prob=0.788]Warmup:   5%|▍         | 109/2250 [00:00, 137.34it/s, step size=1.35e+00, acc. prob=0.779]Warmup:   5%|▌         | 123/2250 [00:00, 101.01it/s, step size=1.11e+00, acc. prob=0.781]Warmup:   6%|▌         | 136/2250 [00:01, 107.32it/s, step size=1.17e+00, acc. prob=0.782]Warmup:   7%|▋         | 149/2250 [00:01, 112.17it/s, step size=1.01e+00, acc. prob=0.783]Warmup:   7%|▋         | 163/2250 [00:01, 115.03

In [84]:
# Utility function to print latent sites' quantile information.
def summary(samples):
    site_stats = {}
    for site_name, values in samples.items():
        marginal_site = pd.DataFrame(values)
        describe = marginal_site.describe(percentiles=[.05, 0.25, 0.5, 0.75, 0.95]).transpose()
        site_stats[site_name] = describe[["mean", "std", "5%", "25%", "50%", "75%", "95%"]]
    return site_stats

In [88]:
hmc_samples["type_level"].shape

(2000, 3, 1)

In [90]:

# for site, values in summary(hmc_samples).items():
#     print("Coefficient: {}".format(site))
#     print(values, "\n")