In this notebook we experiment with implementing Latent Credible Analysis models. Let's build the most simpleLCA

In [6]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [7]:
import pandas as pd
import numpy as np

In [8]:
import seaborn as sns

In [9]:
import sys
sys.path.insert(0, '../')

import os.path as op

In [10]:
import tensorflow as tf
from tensorflow_probability import edward2 as ed
import tensorflow_probability as tfp


from spectrum.preprocessing import encoders
from spectrum.judge import lca_tf, utils
from spectrum import evaluator

In [11]:
tf.__version__, tfp.__version__

('2.1.0', '0.9.0')

In [12]:
tf.random.set_seed(2020)

# Synthetic Dataset

In [13]:
DATA_DIR = '../data'
DATA_SET = 'population'

In [14]:
truths = pd.read_csv(op.join(DATA_DIR, DATA_SET, 'truths.csv'))
raw_claims = pd.read_csv(op.join(DATA_DIR, DATA_SET, 'claims.csv'))

We decide to model city population as discrete value. Moreover we consider the hidden truth value is only from the set of available assertions. Thus we need to label encode `value` of claims data frame.

### Data Preprocessing 

We need to label encode values of objects in order to feed them to our simpleLCA model

In [15]:
claims, le_dict = encoders.fit_and_transform(raw_claims) # this should be named fit and transform

# Truth Discovery

In [126]:
from spectrum.judge.lca_em import LCA_EM as LCA

In [143]:
lca = LCA(claims)
trust, truth = lca.discover()

difference at step 0: 7.405503638872624 - threshold 0.0001
difference at step 1: 0.0 - threshold 0.0001


In [167]:
def compute_truth(truth):
    discovered_truths = dict()
    o_ids = []
    values = []
    for o, rv in truth.items():
        o_ids.append(o)
        values.append(np.argmax(rv.distribution.parameters['probs']))
    return pd.DataFrame(data={'object_id': o_ids, 'value': values})

In [168]:
discovered_truths = compute_truth(truth)

In [169]:
discovered_truths

Unnamed: 0,object_id,value
0,0,1
1,1,1
2,2,0
3,3,0
4,4,0
...,...,...
288,288,0
289,289,0
290,290,0
291,291,1


# Evaluation 

We need to inverse transform the discovered truth value of each object into their original space.

In [170]:
discovered_truths['value'] = discovered_truths.apply(lambda x: le_dict[x['object_id']].inverse_transform([x['value']])[0], axis=1)

In [172]:
evaluator.accuracy(truths, discovered_truths)

0.8156996587030717