In [1]:
%%capture

%cd ..

## Introduction

This notebook shows you how to utilize ALTK for generating a large number of quantifiers and measuring their monotonicity.

 First, let's familiarize ourselves with the classes used in this example, which are subclasses from classes in the ALTK package.

### QuantifierModel

In this example, we'd like to create a large number of quantifiers that are modeled by the class `QuantifierModel`. As stated in the `meaning` module, a `QuantifierModel`, every quantifier model is a triple ** <M, A, B> **, where M corresponds to all possible quantifier referents for a given communicative situation, A and B are differents sets of quantifier referents that correspond to the items of comparison in quantificational logic.

Let's begin by creating a simple QuantifierModel object. A QuantifierModel is initialized by defining a sequence of symbols that denote the set composition of each of `M`, `A`, and `B`. 

The definitions for each symbol are as follows:

`0 => in A`

`1 => in B`

`2 => in (A | B)`

`3 => in M - (A | B)`

`4 => not in (M | A | B)`


In [2]:
from learn_quant.quantifier import QuantifierModel

qm = QuantifierModel("101234")

In [3]:
qm

QuantifierModel(name='101234', M=frozenset({0, 1, 2, 3, 4}), A=frozenset({1, 3}), B=frozenset({0, 2, 3}))

In the `QuantifierModel` instantiated above, we defined a string using only symbols 0 through 4, and the internal set representations were created in a `post_init` method. In the sequence "101234", the "objects" at the specified indices pertain to the sets as follows:
0. M, B
1. M, A
2. M, B
3. M, A, B
4. M
5. X (neither M, A, nor B)

Notice that all sets are typed as `frozenset`s, as this allows for hashing and checking of QuantifierModels in Meaning objects that is required for subsequent routines. 

### Importing a `QuantifierGrammar`

A QuantifierGrammar is a regular Grammar object, but allows for primitives representing integers to be added after a basic grammar has been loaded. This allows for integer primitives to be created on the fly during experiments of certain lengths (you usually would want to allow for primitives up until the length of the size of M, or all referents that are "in play"). 

In [4]:
from learn_quant.grammar import quantifiers_grammar

You can iterate through the grammar to see what rules it contains:

In [5]:
for rule in quantifiers_grammar:
    print(rule[0], ":", rule[1])

and : bool -> and(bool, bool)
or : bool -> or(bool, bool)
not : bool -> not(bool)
union : frozenset -> union(frozenset, frozenset)
intersection : frozenset -> intersection(frozenset, frozenset)
difference : frozenset -> difference(frozenset, frozenset)
index : frozenset -> index(int, frozenset)
cardinality : int -> cardinality(frozenset)
subset_eq : bool -> subset_eq(frozenset, frozenset)
equals : bool -> equals(int, int)
greater_than : bool -> greater_than(int, int)
A : frozenset -> A
B : frozenset -> B


To add primitives for integer indices, use the `add_indices_as_primitives` method on the `QuantifierGrammar` object by specifying either specific indices in a list, or an integer upper bound up until which primitive rules should be added:

In [6]:
from copy import deepcopy
new_grammar = deepcopy(quantifiers_grammar)
new_grammar.add_indices_as_primitives([0,1,2,3], 6.0)
for rule in new_grammar:
    print(rule[0], ":", rule[1])

and : bool -> and(bool, bool)
or : bool -> or(bool, bool)
not : bool -> not(bool)
union : frozenset -> union(frozenset, frozenset)
intersection : frozenset -> intersection(frozenset, frozenset)
difference : frozenset -> difference(frozenset, frozenset)
index : frozenset -> index(int, frozenset)
cardinality : int -> cardinality(frozenset)
subset_eq : bool -> subset_eq(frozenset, frozenset)
equals : bool -> equals(int, int)
greater_than : bool -> greater_than(int, int)
A : frozenset -> A
B : frozenset -> B
0 : int -> 0
1 : int -> 1
2 : int -> 2
3 : int -> 3


In [7]:
new_grammar = deepcopy(quantifiers_grammar)
new_grammar.add_indices_as_primitives(4, 6.0)
for rule in new_grammar:
    print(rule[0], ":", rule[1])

and : bool -> and(bool, bool)
or : bool -> or(bool, bool)
not : bool -> not(bool)
union : frozenset -> union(frozenset, frozenset)
intersection : frozenset -> intersection(frozenset, frozenset)
difference : frozenset -> difference(frozenset, frozenset)
index : frozenset -> index(int, frozenset)
cardinality : int -> cardinality(frozenset)
subset_eq : bool -> subset_eq(frozenset, frozenset)
equals : bool -> equals(int, int)
greater_than : bool -> greater_than(int, int)
A : frozenset -> A
B : frozenset -> B
0 : int -> 0
1 : int -> 1
2 : int -> 2
3 : int -> 3


The second argument defines a default weight to add to the integer primitives rules. These weights are relevant when generating a universe with a `Grammar`.

In [8]:
for rule in new_grammar:
    print(rule[1].name, ":\t\t", rule[1].weight)

and :		 1.0
or :		 1.0
not :		 1.0
union :		 1.0
intersection :		 1.0
difference :		 1.0
index :		 1.0
cardinality :		 1.0
subset_eq :		 1.0
equals :		 1.0
greater_than :		 1.0
A :		 10.0
B :		 10.0
0 :		 6.0
1 :		 6.0
2 :		 6.0
3 :		 6.0


### Define a universe of referents.

In this example, the function "create_universe" creates QuantifierModels, 

In [9]:
from learn_quant.meaning import create_universe

In [10]:
quantifiers_universe = create_universe(m_size=3,x_size=4)

In [11]:
print("The size of the universe is {}".format(len(quantifiers_universe)))

The size of the universe is 256


Access the referents by refering to the `referents` property of the `QuantifierUniverse` object

In [12]:
quantifiers_universe.referents

(QuantifierModel(name='0410', M=frozenset({0, 2, 3}), A=frozenset({0, 3}), B=frozenset({2})),
 QuantifierModel(name='4332', M=frozenset({1, 2, 3}), A=frozenset({3}), B=frozenset({3})),
 QuantifierModel(name='3104', M=frozenset({0, 1, 2}), A=frozenset({2}), B=frozenset({1})),
 QuantifierModel(name='0422', M=frozenset({0, 2, 3}), A=frozenset({0, 2, 3}), B=frozenset({2, 3})),
 QuantifierModel(name='0341', M=frozenset({0, 1, 3}), A=frozenset({0}), B=frozenset({3})),
 QuantifierModel(name='2143', M=frozenset({0, 1, 3}), A=frozenset({0}), B=frozenset({0, 1})),
 QuantifierModel(name='0034', M=frozenset({0, 1, 2}), A=frozenset({0, 1}), B=frozenset()),
 QuantifierModel(name='4231', M=frozenset({1, 2, 3}), A=frozenset({1}), B=frozenset({1, 3})),
 QuantifierModel(name='3342', M=frozenset({0, 1, 3}), A=frozenset({3}), B=frozenset({3})),
 QuantifierModel(name='0433', M=frozenset({0, 2, 3}), A=frozenset({0}), B=frozenset()),
 QuantifierModel(name='1042', M=frozenset({0, 1, 3}), A=frozenset({1, 3}), 

You can access sizes of `X` and `M` in the QuantifierUniverse object:

In [13]:
print(quantifiers_universe.x_size)
print(quantifiers_universe.m_size)

4
3


We created a universe with the number of indices in generated `QuantifierModel`s having 4 indices total, with up to 3 of those indices being considered for pertinence in A or B during the generative process. Therefor, in this example, `['1', '2', '0', '0']` would not be valid, since `M_SIZE` is only 3 and not 4. On the other hand, `['4', '2', '0', '0']` is OK, since the first index is in `X` but not `M`.

Let's enumerate expressions that could be created with the Language of Thought described in the `QuantifierGrammar` we have previously defined.

We'll enumerate expressions up to a depth of 4. Higher depth values allow for more complex expressions that depend on a greater number of rules.

In [14]:
from learn_quant.scripts.generate_expressions import enumerate_quantifiers
expressions_by_meaning = enumerate_quantifiers(4, quantifiers_universe, new_grammar)

In [26]:
len(expressions_by_meaning.values())

3379

Let's save the quantifiers generated in this enumeration process.

In [16]:
from ultk.util.io import write_expressions

write_expressions(expressions_by_meaning.values(), out_path="learn_quant/outputs/generated_expressions.yml")

We can load-in the expressions we just saved in the YAML file we produced in the code block above (provided we also provide the load function a relevant universe of analysis):

In [21]:
from ultk.util.io import read_grammatical_expressions

In [36]:
_, expressions = read_grammatical_expressions("learn_quant/outputs/generated_expressions.yml", quantifiers_grammar)

We get a object that pairs grammatical expressions with their respective `Meaning` objects, which are lists of licensed `QuantifierModels` (<M, A, B>) for that particular expression, given the universe in scope.

In [37]:
print(len(expressions))

3379


In [42]:
list(expressions.keys())[1]

Meaning(mapping=FrozenDict({QuantifierModel(name='0004', M=frozenset({0, 1, 2}), A=frozenset({0, 1, 2}), B=frozenset()): False, QuantifierModel(name='0014', M=frozenset({0, 1, 2}), A=frozenset({0, 1}), B=frozenset({2})): False, QuantifierModel(name='0024', M=frozenset({0, 1, 2}), A=frozenset({0, 1, 2}), B=frozenset({2})): False, QuantifierModel(name='0034', M=frozenset({0, 1, 2}), A=frozenset({0, 1}), B=frozenset()): False, QuantifierModel(name='0040', M=frozenset({0, 1, 3}), A=frozenset({0, 1, 3}), B=frozenset()): False, QuantifierModel(name='0041', M=frozenset({0, 1, 3}), A=frozenset({0, 1}), B=frozenset({3})): False, QuantifierModel(name='0042', M=frozenset({0, 1, 3}), A=frozenset({0, 1, 3}), B=frozenset({3})): False, QuantifierModel(name='0043', M=frozenset({0, 1, 3}), A=frozenset({0, 1}), B=frozenset()): False, QuantifierModel(name='0104', M=frozenset({0, 1, 2}), A=frozenset({0, 2}), B=frozenset({1})): False, QuantifierModel(name='0114', M=frozenset({0, 1, 2}), A=frozenset({0}), B

Every expression object contains a `Meaning` that contains a list of referents that the expression verifies. 

## Measuring Monotonicity

In [27]:
import numpy as np
from learn_quant.measures import MonotonicityMeasurer

  """The informativity of a language is identified with the successful communication between a speaker and a listener.
  """Helper function to compute the literal informativity of a language.
  """Initialize the |M|-by-|E| matrix, S, corresponding to the pragmatic speaker's conditional probability distribution over expressions given meanings.
  """Initialize the |E|-by-|M| matrix, R, corresponding to the pragmatic listener's conditional probability distribution over meanings given expressions.
  """A Bayesian reciever chooses an interpretation according to p(meaning | word), where
  """Compute $p(x) = \sum_x p(x,y)$
  """Compute $p(x,y) = p(y|x) \cdot p(x) $
  """Compute $p(y) = \sum_x p(y|x) \cdot p(x)$
  """Compute $p(x|y) = \\frac{p(y|x) \cdot p(x)}{p(y)}$
  """Compute the entropy of p, $H(X) = - \sum_x x \\log p(x)$"""


The expressions in the enumeration process vary in their degree of monotonicity. We can measure the monotonicity of an expression with the `measures` module.  

In [28]:
mm = MonotonicityMeasurer(quantifiers_universe, down=False)
mm(expressions)

Calculating monotonicity for:  subset_eq(A, A)
Calculating monotonicity for:  subset_eq(A, B)
Calculating monotonicity for:  subset_eq(B, A)
Calculating monotonicity for:  and(subset_eq(A, B), subset_eq(B, A))
Calculating monotonicity for:  or(subset_eq(A, B), subset_eq(B, A))
Calculating monotonicity for:  not(subset_eq(A, B))
Calculating monotonicity for:  not(subset_eq(B, A))
Calculating monotonicity for:  equals(0, cardinality(A))
Calculating monotonicity for:  subset_eq(A, difference(A, B))
Calculating monotonicity for:  subset_eq(A, index(1, B))
Calculating monotonicity for:  equals(0, cardinality(B))
Calculating monotonicity for:  subset_eq(B, index(1, A))
Calculating monotonicity for:  subset_eq(index(1, A), B)
Calculating monotonicity for:  subset_eq(index(1, B), A)
Calculating monotonicity for:  equals(0, cardinality(union(A, B)))
Calculating monotonicity for:  subset_eq(intersection(A, B), index(1, A))
Calculating monotonicity for:  subset_eq(intersection(A, B), index(1, B))

  nopred_logs = np.log2([noq_nopred, q_nopred] / p_nopred)
  pred_logs = np.log2([noq_pred, q_pred] / p_pred)


Calculating monotonicity for:  or(subset_eq(A, B), subset_eq(index(1, A), difference(A, B)))
Calculating monotonicity for:  or(subset_eq(A, B), subset_eq(index(1, A), index(1, B)))
Calculating monotonicity for:  or(subset_eq(A, B), subset_eq(index(1, B), difference(A, A)))
Calculating monotonicity for:  or(subset_eq(A, B), subset_eq(index(1, B), difference(B, A)))
Calculating monotonicity for:  or(subset_eq(A, B), subset_eq(index(1, B), index(1, A)))
Calculating monotonicity for:  or(subset_eq(A, B), equals(1, cardinality(B)))
Calculating monotonicity for:  or(subset_eq(A, B), equals(cardinality(A), cardinality(B)))
Calculating monotonicity for:  or(subset_eq(A, B), greater_than(cardinality(A), 1))
Calculating monotonicity for:  or(subset_eq(A, B), greater_than(cardinality(B), 0))
Calculating monotonicity for:  or(subset_eq(A, B), greater_than(cardinality(B), 1))
Calculating monotonicity for:  or(subset_eq(A, B), greater_than(cardinality(A), cardinality(B)))
Calculating monotonicity fo

Print the values of monotonicity for each expression to a file.

In [29]:
import csv

sorted_monotonicity = sorted(mm.metrics.items(), key=lambda x: x[1]["monotonicity"], reverse=True)

with open("monotonicity.csv", "w", newline='') as file:
    writer = csv.writer(file)
    for key, value in sorted_monotonicity:
        writer.writerow([key, *value.values()])


Let's calculate monotonicity of only a subset of generated expressions.

In [30]:
from learn_quant.util import filter_expressions_by_rules

In [31]:
subset_of_expressions = filter_expressions_by_rules(["subset_eq(A, A)",
                                                     'subset_eq(A, B)' ], expressions)

In [32]:
mm(subset_of_expressions)

Calculating monotonicity for:  subset_eq(A, A)
Calculating monotonicity for:  subset_eq(A, B)


In [33]:
mm.metrics

{'subset_eq(A, A)': {'monotonicity': 1},
 'subset_eq(A, B)': {'monotonicity': 0.020055189603353396}}

In [34]:
M = set([0,1,2])
X = set([0,2,3])
A = set([0,1,2,3,4])

A - (M | X)

{4}