# Friendship

This example is taken from the [psl-examples repo](https://github.com/linqs/psl-examples). It has been created with synthetic data.    
Given a list of persons, their location and some indications of similarity, we will have to predict the degree of friendship between people.

In this example, we:
- create the model
- ground the model
- learn rules' weights
- infer relations
- evaluate predictions.

To keep it simple, learning model's weights and inference will be performed on the same data (this is not ideal).

In [1]:
from os import listdir
from os.path import join
import pypsl as psl

from utils import fetch_data, read_data, print_data, print_pred

In [2]:
DATA_DIR = 'data/friendship'

# Data

We download and prepare the data for this example.

In [3]:
fetch_data('friendship')

In [4]:
listdir(DATA_DIR)

['friendship_init.tsv',
 'similarity.tsv',
 '.DS_Store',
 'locations.tsv',
 'friendship_gold.tsv']

### Locations  
- col. 0: the ID of a person
- col. 1: the ID of a location
- col. 2: whether this person lives in this location (0 or 1)

In [5]:
print_data(join(DATA_DIR, 'locations.tsv'))

[
  ['0', '0', 0.0],
  ['0', '1', 0.0],
  ['0', '2', 0.0],
  ...
]


### Similarity  
- col. 0: the ID of a person
- col. 1: the ID of a person
- col. 2: a measure of how similar these persons are (0 means completely different, 1 means identical)

In [6]:
print_data(join(DATA_DIR, 'similarity.tsv'))

[
  ['0', '1', 0.033897113089304015],
  ['0', '2', 0.4210862576738333],
  ['0', '3', 0.7846167630610641],
  ...
]


### Friendship (init values)
- col. 0: the ID of a person
- col. 1: the ID of a person
- col. 2: a measure of how much these persons are friends (0 means not friends at all, 1 means best friends)

Col. 2 has been populated at random and will be predicted.

In [7]:
print_data(join(DATA_DIR, 'friendship_init.tsv'))

[
  ['0', '1', 0.2360677758912635],
  ['0', '2', 0.2619481849323748],
  ['0', '3', 0.3870290373904166],
  ...
]


### Friendship (gold values)
- col. 0: the ID of a person
- col. 1: the ID of a person
- col. 2: a measure of how much these persons are friends (0 means not friends at all, 1 means best friends)

Col. 2 contains the true friendship values, those that we are trying to predict.

In [8]:
print_data(join(DATA_DIR, 'friendship_gold.tsv'))

[
  ['0', '1', 0.0],
  ['0', '2', 0.0],
  ['0', '3', 1.0],
  ...
]


# Predicates
Predicates express relations between terms.

### Similar
We create a predicate for similarity, and provide data.

In [9]:
similar = psl.Predicate(
    'similar',
    read_data(join(DATA_DIR, 'similarity.tsv'))
)

### Is_located

Similarly, this predicate informs about location of people.

In [10]:
is_located = psl.Predicate(
    'is_located',
    read_data(join(DATA_DIR, 'locations.tsv'))
)

### Are_friends

Finally, this informs about how friends two persons are.  

We want this predicate to be predicted, so we set the `predict` parameter to `True`.    
We provide some data with random values, `friendship_init.tsv`, to set the initial atoms' values.    
Optionally, we also give gold data for weights' learning.

In [11]:
friends = psl.Predicate(
    'friends',
    read_data(join(DATA_DIR, 'friendship_init.tsv')),
    read_data(join(DATA_DIR, 'friendship_gold.tsv')),
    predict=True
)

# Rules
Rules express relational dependencies between predicates.

### Prior
*"People are not friends".*

This could be expressed using logic:     
$
\begin{align}
\neg friends(P_1, P_2)
\end{align}
$

In [12]:
rule1 = psl.Rule(
    positive_atoms=[],
    negative_atoms=[
        (friends, ['P1', 'P2'])
    ]
)

### Using similarity
*"People who are at the same location and are similar are friends".*
    
This could be expressed using logic, as a disjunction:  
$
\begin{align}
friends(P_1, P_2) \;\vee\; \neg is\_located(P_1, L) \;\vee\; \neg is\_located(P_2, L) \;\vee\; \neg similar(P_1, P_2)
\end{align}
$

In [13]:
rule2 = psl.Rule(
    positive_atoms=[
        (friends, ['P1', 'P2'])
    ],
    negative_atoms=[
        (is_located, ['P1', 'L']),
        (is_located, ['P2', 'L']),
        (similar, ['P1', 'P2'])
    ]
)

### Using symetry
*"If P1 is the friends of P2, P2 is the friend of P1".*
  
$
\begin{align}
friends(P_1, P_2) \;\vee\; \neg friends(P_2, P_1)
\end{align}
$

In [14]:
rule3 = psl.Rule(
    positive_atoms=[
        (friends, ['P1', 'P2'])
    ],
    negative_atoms=[
        (friends, ['P2', 'P1'])
    ]
)

# Model

We create a model by providing a set of rules with their corresponding weights.  

The weights of rules define their relative importance and can be learnt from data.    
We initialize them with reasonable values.

In [15]:
model = psl.Model([
    (1, rule1),  # prior
    (10, rule2), # similarity
    (10, rule3)  # symetry
])

# Grounding

This step initializes the model, using the provided rules and data.

In [16]:
model.ground()

20736 ground rules and 31314 ground atoms have been created.


# Learning weights

We learn rules' weights from data, by maximizing likelihood.

In [17]:
weights = model.learn_weights(
    step_size=1.0,
    max_iterations=10
)

--- iteration 1 ---
gradient: 2.277732140093356

--- iteration 2 ---
gradient: 2.0272174595431807

--- iteration 3 ---
gradient: 1.8041695805467668

--- iteration 4 ---
gradient: 1.605690944500711

--- iteration 5 ---
gradient: 1.4291335778685827

--- iteration 6 ---
gradient: 1.272126229503757

--- iteration 7 ---
gradient: 1.1325600295488414

--- iteration 8 ---
gradient: 1.0085125161067074

--- iteration 9 ---
gradient: 0.8982716474992248

--- iteration 10 ---
gradient: 0.8002993041595723



Let's see what has been learnt:

In [18]:
weights

(0.16980379644939378, 4.051835027805813, 2.5226477463740933)

# Inference

Model's weights have been updated automatically.    
We now run inference to predict friendship between people.

In [19]:
pred = model.infer()

--- iteration 10 ---
objective: 20557.381715714804
primal residual: 1.1988644440242422
dual residual: 3.6331437969038496

--- iteration 20 ---
objective: 20613.186108545648
primal residual: 0.3380461884438697
dual residual: 1.1929757046209382

--- iteration 30 ---
objective: 20619.582716948324
primal residual: 0.126207946875198
dual residual: 0.3917477301307529

--- iteration 40 ---
objective: 20619.67382265885
primal residual: 0.046741932877169656
dual residual: 0.1295551702218412

--- iteration 50 ---
objective: 20619.898862990052
primal residual: 0.01699225966688555
dual residual: 0.04313820622204003

Completed after 59 iterations


The model has made the following predictions:

In [20]:
print_pred(pred)

'friends': (
  ('66', '62', 0.000342812777265263),
  ('9', '84', 0.0012836835351577583),
  ('48', '81', 0.0012494037438394786),
  ...
)


# Evaluation

We compute precision and recall for the predictions.

In [21]:
gold = read_data(join(DATA_DIR, 'friendship_gold.tsv'))
gold_index = {tuple(e[:-1]): e[-1] for e in gold}

In [22]:
tp, fp, fn = 0, 0, 0

for p in pred['friends']:
    pred_value = p[-1]
    true_value = gold_index[tuple(p[:-1])]
    
    if true_value == 1:
        if pred_value > 0.5:
            tp += 1
        else:
            fn += 1
    else:
        if pred_value > 0.5:
            fp += 1
        
print('precision: {}'.format(round(tp / (tp + fp), 2)))
print('recall: {}'.format(round(tp / (tp + fn), 2)))

precision: 0.86
recall: 0.79
