# Practical Worksheet

In this worksheet, we will be working with a small dataset of hyponym-hypernym pairs. Hyponymy is the `is-a` relation. So we will have pairs like `(cat, mammal)` meaning 'A cat is a mammal'. The hyponym is the more specific term (e.g., cat) and the hypernym is the more general term (e.g., mammal). In this notebook you will:

1. (3 pts) Use Logical Neural Networks with a very small hyponym dataset to infer a set of facts. You will discuss the kinds of facts that you can infer and the limitations of the model as it is implemented
2. (5 pts) Set up a Logic Tensor Network to learn word embeddings and predicates that can model a larger hyponymy dataset.
3. (5 pts) Evaluate the effect of different axioms in the LTN system.
4. (2 pts) Query your model.


## Part 0. Setup
Create an environment and install python 3.12, numpy, pandas, and scikit-learn.

Install LNNs using `pip install git+https://github.com/IBM/LNN`

Install LTNs using `pip install LTNtorch`

Import packages as below.

In [49]:
import pandas as pd
import numpy as np
import torch
import numpy as np
import ltn

## Part 1. Inferring facts using Logical Neural Networks

In this first part, we will manually specify a very small dictionary of hyponym facts. We have three hyponyms and three non-hyponyms. The hyponymy relation is transitive, meaning that if $x$ is a hyponym of $y$ and $y$ is a hyponym of $z$, then $x$ should be a hyponym of $z$.

You will:

a. (1.5 pt) Set up a LNN model with suitable variables, a transitivity axiom, and hyponymy data.

b. (0.5 pt) Run inference over the model.

c. (1 pt) Inspect the output of the model and discuss whether the output is as expected.

In [50]:
# We first set up a small dictionary of hyponyms
from lnn import Fact

hyp_dict = {('cat', 'mammal'):Fact.TRUE,
            ('dog', 'mammal'):Fact.TRUE,
            ('mammal', 'animal'):Fact.TRUE,
            ('cat', 'dog'):Fact.FALSE,
            ('animal', 'mammal'):Fact.FALSE,
            ('mammal', 'dog'):Fact.FALSE,}

### Part 1a) (1.5 pts) Setting up the model.
Set up a LNN model with suitable predicates and variables, a transitivity axiom, and hyponymy data.

In [51]:
import torch.nn as nn
import torch.nn.functional as F

class HyponymNetwork(nn.Module):
    def __init__(self, d):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(2*d, 16),
            nn.ReLU(),
            nn.Linear(16, 1),
            nn.Sigmoid()
        )
    def forward(self, x, y):
        return self.net(torch.cat([x, y], dim=-1))

In [52]:
# Create a predicate of arity 2 called Hyps and three variables x, y, z
## YOUR CODE HERE ##
# All possible words
domain = sorted({x for key in hyp_dict for x in key})
# Give number to all words
sym2idx = {s:i for i,s in enumerate(domain)}
n = len(domain)

individuals = torch.stack([
    F.one_hot(torch.tensor(sym2idx[s]), num_classes=n).float()
    for s in domain
])

X = ltn.Variable("X", individuals)
Y = ltn.Variable("Y", individuals)
Z = ltn.Variable("Z", individuals)

C = {s: ltn.Constant(individuals[sym2idx[s]]) for s in domain}

# Add the predicate
Hyps = ltn.Predicate(model=HyponymNetwork(d=n))

In [53]:
# Create a logical rule that encodes the fact that the hyponymy relation is transitive
## YOUR CODE HERE ##
# Add the needed conectives
And = ltn.Connective(ltn.fuzzy_ops.AndProd())
Implies = ltn.Connective(ltn.fuzzy_ops.ImpliesReichenbach())
Forall = ltn.Quantifier(ltn.fuzzy_ops.AggregPMeanError(p=2), quantifier="f")

# Forall[x,y,z]((hyp(x,y) and hyp(y,z)) -> hyp(x,z))
transitivity = Forall([X, Y, Z],
    Implies(And(Hyps(X, Y), Hyps(Y, Z)), Hyps(X, Z))
)

In [54]:
# Add the knowledge and the data (the hyponymy dict) to the model and print.
## YOUR CODE HERE ##
Not = ltn.Connective(ltn.fuzzy_ops.NotStandard())
SatAgg = ltn.fuzzy_ops.SatAgg()

true_facts = []
false_facts = []

for (a, b), value in hyp_dict.items():
    if value == Fact.TRUE:
        true_facts.append(Hyps(C[a], C[b]))
    else:
        false_facts.append(Not(Hyps(C[a], C[b])))

# Aggregating the truth values of the axiom
data = SatAgg(transitivity, *true_facts, *false_facts)

### Part 1b) (0.5 pts) Inferring facts
Run inference over the model and print the output.

In [55]:
# Part 1b (0.5 pts) Run inference over the model and print the output 
## YOUR CODE HERE ##
print(f'Global satisfaction: {data.item()}\n')

for atom1 in domain:
    for atom2 in domain:
        q = Hyps(C[atom1], C[atom2])
        print(f'Hyps({atom1}, {atom2}): {q.value.item()}')


Global satisfaction: 0.5420331954956055

Hyps(animal, animal): 0.4774361252784729
Hyps(animal, cat): 0.4223968982696533
Hyps(animal, dog): 0.44209885597229004
Hyps(animal, mammal): 0.45686227083206177
Hyps(cat, animal): 0.4569424092769623
Hyps(cat, cat): 0.4297742545604706
Hyps(cat, dog): 0.43440279364585876
Hyps(cat, mammal): 0.4539198577404022
Hyps(dog, animal): 0.47335493564605713
Hyps(dog, cat): 0.44004902243614197
Hyps(dog, dog): 0.4624500870704651
Hyps(dog, mammal): 0.4647778868675232
Hyps(mammal, animal): 0.47959277033805847
Hyps(mammal, cat): 0.4342584013938904
Hyps(mammal, dog): 0.4497668147087097
Hyps(mammal, mammal): 0.462757408618927


### Part 1c) (1 pt) Inspecting the output.

You should see that there are various facts whose truth value is unknown. 

Q1: Why can we not infer the truth value of all facts with the given database and axioms?

Q2: Suggest a suitable axiom to add to this system that would help to infer more facts. You do not need to implement the axiom.

YOUR ANSWER HERE

## Part 2 (5 pts) Building Embeddings with Logic Tensor Networks.
In this part, we will build a Logic Tensor Network to learn embeddings for the hyponyms. You will:

a. (1 pt) Describe why learning embeddings for the hyponyms is a suitable approach.

b. (1 pt) Set up a predicate for the hyponymy relation.

c. (1 pt) Train a simple network on the hyponymy task.

d. (2 pts) Assess satisfaction on the test set  and negative sample set


### Importing the data

Below, we import the data into pandas dataframes. Take a look at the data to familiarise yourself with the format. In each .csv file we have a list of word pairs. 
- In train_hypernyms we have the set of hypernym pairs we will train on. 
- In test_hypernyms we have the set of pairs we will test on. 
- In non_hypernyms we have a set of word pairs that are not hypernym pairs.

In [57]:
import pandas as pd

data_dir = '../data/'

train_df = pd.read_csv(f'{data_dir}train_hypernyms.csv')
test_df = pd.read_csv(f'{data_dir}test_hypernyms.csv')
neg_df = pd.read_csv(f'{data_dir}non_hypernyms.csv')


train_pairs = train_df.values
test_pairs = test_df.values
neg_pairs = neg_df.values

print("Training pairs:")
print(train_pairs[:5])

print("Testing pairs:")
print(test_pairs[:5])

print("Negative pairs:")
print(neg_pairs[:5])


Training pairs:
[['supermarket' 'commercial building']
 ['hand tool' 'tool']
 ['peach' 'fruit']
 ['pike' 'fish']
 ['nail gun' 'power tool']]
Testing pairs:
[['workshop' 'building']
 ['train' 'vehicle']
 ['pine' 'physical object']
 ['snare drum' 'physical object']
 ['grape' 'physical object']]
Negative pairs:
[['jigsaw' 'nail gun']
 ['temple' 'synagogue']
 ['double bass' 'banjo']
 ['turkey' 'turkey']
 ['crocodile' 'snake']]


### Part 2a. (1 pt) Learning Embeddings

When we use a logic tensor network, we can choose to use data from outside sources or to train embeddings within the network. We will be training embeddings. Do you think this is a suitable approach for this dataset? Why or why not?

YOUR ANSWER HERE

Below, we will set up the vocabulary and the initial random word embeddings to be trained.

In [58]:
# Build a set of vocab by taking the union of the hyponyms and hypernyms
vocab = set(train_df.hyper.unique()).union(train_df.hypo.unique())

# Set the dimension of the vocab to 10
vocab_dim = 10

# Build a dictionary of word embeddings initialised randomly and set to be trainable.
word_embeddings = {word: ltn.Constant(torch.rand((vocab_dim,)), trainable=True) \
                   for word in vocab}



### Part 2b. (1 pt) Defining a predicate.
Define a predicate as a feed-forward NN with ELU and sigmoid activation functions and one hidden layer of size 16

In [62]:
# Define a feed-forward NN  with ELU and sigmoid activation functions and one hidden layer of size 16.
class ModelHyp(torch.nn.Module):
    def __init__(self):
        ## YOUR CODE HERE ##
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(2*10, 16),
            nn.ReLU(),
            nn.Linear(16, 1),
            nn.Sigmoid()
        )

    def forward(self, *x):
        # Specify the forward pass with ELU on the hidden layers and sigmoid on the output
        x = list(x)
        x = torch.cat(x, dim=1)
        ## YOUR CODE HERE ##
        return self.net(x, dim=-1)
    
# Wrap the feed-forward NN to make it an LTN predicate called Hyp
Hyp = ltn.Predicate(model=ModelHyp())

# Define connectives, quantifiers, and SatAgg
And = ltn.Connective(ltn.fuzzy_ops.AndProd())
Not = ltn.Connective(ltn.fuzzy_ops.NotStandard())
Implies = ltn.Connective(ltn.fuzzy_ops.ImpliesReichenbach())
Forall = ltn.Quantifier(ltn.fuzzy_ops.AggregPMeanError(p=2), quantifier="f")
SatAgg = ltn.fuzzy_ops.SatAgg()

### Part 2c. (1 pt) Training the network

We set up a simple network in which we view our knowledge base as consisting just of those pairs in the training set. So our knowledge base states that for each word pair in the training set, this is a hyponym pair. We want to maximise the satisfaction over this knowledge base. To do this, we write a suitable axiom to aggregate the satisfaction of the hyponymy predicate over these pairs, and train the parameters of the network.

In [None]:
# We have to optimize the parameters of the predicate and also of the embeddings
params = list(Hyp.parameters()) +[i.value for i in word_embeddings.values()]
optimizer = torch.optim.Adam(params, lr=0.001)

# Set up a training loop for 300 epochs
for epoch in range(300):    
    # Set up a variable sat_agg which is the result of aggregating the truth values of all the axioms
    sat_agg = SatAgg(
# Implement one axiom which aggregates the satisfaction across the (x, y) in train_pairs
        ## YOUR CODE HERE ##
        # Our list of hyponym pairs is in train_pairs.
        # We want to maximise the satisfaction gained by inputting the embeddings of those words into
        # our hyponymy predicate.
        

    )
    
    loss = 1. - sat_agg
    loss.backward()
    optimizer.step()

    # Print metrics every 20 epochs of training
    if epoch % 20 == 0:
        print(f" epoch {epoch} | loss {loss} | Train Sat {sat_agg}")

### Part 2d (2 pts) Assessing the satisfaction on the test set

Calculate the satisfaction over the test set using SatAgg. Do you think the model is generalising well? Now calculate the satisfaction over the negative samples dataset. Is this a suitable satisfaction level? Why or why not?

YOUR ANSWER HERE

In [None]:
print(f"the satisfaction of the test dataset is: ## YOUR CODE HERE ##")

print(f"the satisfaction of the negative dataset is: ## YOUR CODE HERE ##")

## Part 3. (5 pts) Evaluate the effect of different axioms in the LTN system

In this part you will:

a. (2 pts) Retrain the model and evaluate the performance with negation included

b. (2 pts) Retrain the model and evaluate performance with transitivity included

c. (1 pt) Discuss the effect of the different axioms introduced.

### Part 3a. (2pts)  Retraining the model with negation
Reinitialise the model and retrain, including information from the `neg_pairs` dataset.

In [None]:
# Reinitialise the model
Hyp = ## YOUR CODE HERE ##

In [None]:
# Set up the parameters and optimizer
## YOUR CODE HERE ##

# Set up a training loop for 300 epochs
    ## YOUR CODE HERE ##
    # Set up a variable sat_agg which is the result of aggregating the truth values of all the axioms
        ## YOUR CODE HERE ##
        # Implement one axiom which aggregates the satisfaction across the (x, y) in train_pairs
        ## YOUR CODE HERE ##

        # Implement one axiom which aggregates the satisfaction across the (x, y) in neg_pairs
        # Note that this statement should involve a negation.
        ## YOUR CODE HERE ##
        
    
    # Calculate the loss and propagate backwards
    ## YOUR CODE HERE ##

    # Print metrics every 20 epochs of training
    ## YOUR CODE HERE ##

In [None]:
# Calculate the satisfaction across the test dataset and the negated dataset
print(f"the satisfaction of the test dataset is: ## YOUR CODE HERE ##")

print(f"the satisfaction of the negative dataset is: ## YOUR CODE HERE ##")

### Part 3b. (2 pts) Retraining the model with transitivity

As we discussed in Part 1, the hyponymy relation is transitive. This should be reflected in the axioms. Reinitialise the model and add an axiom expressing the rule:

$\forall x, y, z Hyp(x, y) \land Hyp(y, z) \implies Hyp(x, z)$

Retrain the model and evaluate on the test and negated datasets.

In [None]:
# Reinitialise the model
## YOUR CODE HERE ##

In [None]:
# Set up the parameters and optimizer
## YOUR CODE HERE ##

# Set up a training loop for 300 epochs
## YOUR CODE HERE ##
    
    # Create variables x_, y_, and z_, grounded with values from the `word_embeddings` dictionary
    ## YOUR CODE HERE ##

    # Set up a variable sat_agg which is the result of aggregating the truth values of all the axioms
    ## YOUR CODE HERE ##
        
        #Positive instances of hyponymy
        ## YOUR CODE HERE ##

        #Negative instances of hyponymy
        ## YOUR CODE HERE ##
        
        # Transitivity axiom
        ## YOUR CODE HERE ##

    
    # Calculate the loss and propagate backwards
    ## YOUR CODE HERE ##

    # Print metrics every 20 epochs of training
    ## YOUR CODE HERE ##

In [None]:
# Calculate the satisfaction across the test dataset and the negated dataset
print(f"the satisfaction of the test dataset is: ## YOUR CODE HERE ##")

print(f"the satisfaction of the negative dataset is: ## YOUR CODE HERE ##")

### Part 3c. (1 pt)  Evaluating the model
How has the satisfaction changed across the test set and the set of negative examples as you include different axioms? Why has this happened? Write a couple of sentences with your conclusions about the datasets and the model you have built. 

YOUR ANSWER HERE

## Part 4 (2 pts) Querying the model

One of the strengths of Logic Tensor Networks is that you are able to query the models you have built. In this part you will:

a. (0.5 pts) Define a logical statement that you expect to hold in your model.

b. (1 pt) Query the model.

c. (0.5 pts) Discuss your result.

### Part 4a. (0.5 pts) Defining a query

Thinking about the properties of hyponymy, give a logical statement that you would expect to hold in your model. The statement can be quite simple.

YOUR ANSWER HERE

### Part 4b. (1 pt) Querying the model

Write a function that returns the satisfaction level of your logical statement and determine the satisfaction level.

In [None]:
# this function returns the satisfaction level of your logical formula
def phi():
    # Create variables p, q, and r and initialize with the values from 'word_embeddings'
    ## YOUR CODE HERE ##
    # Return the truth value of phi
    ## YOUR CODE HERE ##


In [None]:
# Evaluate phi
## YOUR CODE HERE ##

### Part 4c. (0.5 pts) Discuss the results

Was the satisfaction value what you expected to see? Why or why not?


YOUR ANSWER HERE

## Wrap up

In this worksheet, we looked at the hyponymy relation that can hold between words.

1. We used Logical Neural Networks with a very small hyponym dataset to infer a set of facts, and discussed the kinds of facts that you can infer and the limitations of the model as it is implemented.
2. We set up a Logic Tensor Network to learn word embeddings and predicates that can model a larger hyponymy dataset.
3. We evaluated the effect of different axioms in the LTN system.
4. And finally, you queried your model with new logical statements.

For another 15 points, you can extend this worksheet in a number of different ways. 

### Possible extensions

1. Use a new dataset for the task of inferring relationships over data.
2. Use the same dataset with a different model that we have covered in class. You could potentially use Logical Neural Networks, although they are a little slow.
3. Extend the investigation already started in this notebook. How do you expect the hyponymy relation to behave? Can you improve performance on novel queries?
4. Extend this investigation by including semantic information into the word embeddings from external sources.
5. Other ideas? Feel free to discuss with me!

