In [None]:
using Pkg
Pkg.activate(joinpath("..", "environments", "modal-association-rules"))
Pkg.instantiate()

In [None]:
using Plots
using Random
using Statistics

Random.seed!(1605)

In [None]:
include(joinpath("..", "scripts", "parse-natops.jl"))

# Modal Association Rules

In this notebook, we follow a different paradigm with respect to the supervised one we saw in [day04-learning-with-non-classical-logical](day04-learning-with-non-classical-logical.ipynb).

The hypothesis here, is that a logical formula is *interesting*, if it happens 
to be frequently satisfied across all the instances of a dataset $\mathcal{I}$.

Given an alphabet $\mathcal{P}$ of propositional literals, the formula we are dealing with
are literal conjunctions called *itemsets*.

An itemset that is also frequent is called *frequent itemset*.

More formally, given a dataset $\mathcal{I}$, a propositional alphabet $\mathcal{P}$
and a minimum threshold $s$, a frequent pattern $\mathsf{P} \subseteq \mathcal{P}$ is such that:

$$\text{support}(\mathcal{I}, \mathsf{P}) = \frac{| \{I \in \mathcal{I} \mid I \models \mathsf{P} \} |}{|\mathcal{I}|} \geq s$$

The ratio above is called *support*.

In [None]:
using ModalAssociationRules

In [None]:
# these are just three toy atoms
p = Atom(ScalarCondition(VariableMax(4), >=, 2)) |> Item
q = Atom(ScalarCondition(VariableMin(5), <=, 1.5)) |> Item
r = Atom(ScalarCondition(VariableMax(6), >=, 0.0)) |> Item

In [None]:
# an Itemset encodes a conjunction of SoleLogics.Formula, 
# but has two advantages:

# 1) performance considerations 
# https://towardsdev.com/set-vs-vector-lookup-in-julia-a-closer-look-9d106d01ccae
# 2) type piracy prevention!

pq = Itemset([p, q])
pr = Itemset([p, r])
qr = Itemset([q, r])
pqr = Itemset([p, q, r])

In [None]:
# an Itemset can wrap any SoleLogics.Formula type;
formula(pq)

##### Exercise:
Define your own `mysupport` function.

Its argument must be of type `SoleLogics.Formula`, `SoleData.AbstractLogiset` and `SoleLogics.AbstractWorld`.

We only want to consider the instances that were originally associated with the `I have command` class.

We want to treat the Kripke model as a degenerate propositional logiset.

Then compute the support of the following itemsets: `p`, `q`, `r`, `p ∧ q`, `p ∧ r`, `r ∧ q`, `p ∧ q ∧ r`. 

The support must be rounded to the second decimal digit.

Solution (Base64):
ZnVuY3Rpb24gbXlzdXBwb3J0KHBoaTo6RiwgWGs6OkwsIHdvcmxkOjpXKSB3aGVyZSB7CiAgICBGPDpTb2xlTG9naWNzLkZvcm11bGEsIAogICAgTDw6U29sZURhdGEuQWJzdHJhY3RMb2dpc2V0LCAKICAgIFc8OlNvbGVMb2dpY3MuQWJzdHJhY3RXb3JsZAp9CiAgICAKICAgIF9uaW5zdGFuY2VzID0gbmluc3RhbmNlcyhYaykKCiAgICBjaGVja19tYXNrID0gemVyb3MoSW50OCwgX25pbnN0YW5jZXMpCgogICAgQGluYm91bmRzIEBzaW1kIGZvciBpIGluIDE6X25pbnN0YW5jZXMgCiAgICAgICAgY2hlY2tfbWFza1tpXSA9IGNoZWNrKHBoaSwgWGssIGksIHdvcmxkKQogICAgZW5kCgogICAgcmV0dXJuIHJvdW5kKG1lYW4oY2hlY2tfbWFzayk7IGRpZ2l0cyA9IDIpCmVuZA==

In [None]:
# Insert your solution here

In [None]:
try
    for phi in [p, q, r, pq, pr, qr, pqr]
        println(
            mysupport(
                formula(phi), 
                SoleData.slicedataset(Xk, 1:30), 
                Interval(1, X_ndatapoints)
            )
        )
    end
catch e
    if e isa UndefVarError
        println("You need to implement mysupport.")
    end
end 

Let us consider an alphabet of propositional literals $\mathcal{P}$, and let us suppose that 
$\mathsf{P} \subseteq \mathcal{P}$ is a frequent pattern we found.

We can partition $\mathsf{P}$ in two smaller frequent patterns, $\mathsf{Q}, \mathsf{R}$, such that $\mathsf{Q} \cap \mathsf{R} = \emptyset$.

We denote with $\mathsf{Q} \Rightarrow \mathsf{R}$ the fact that an *interesting* statistical relation occurs between the antecedent and the consequent: if this is the case, then we have an *association rule*.

Similarly to the case of frequent patterns, the interestingness must be established with specific measures, which are called *meaningfulness measures* in the jargon.

In [None]:
# beware of the difference between an Item (such as p) and an Itemset;
# we need to cast p to Itemset, even if it is a trivial 1-length Itemset.
println(typeof(p))
ARule(Itemset(p), qr)

In [None]:
try
    ARule(pq, qr)
catch e 
    if e isa ArgumentError
        println("Beware: pq ∩ qr is not empty.")
    end
end

In [None]:
rule = ARule(Itemset(p), qr)

In [None]:
ModalAssociationRules.antecedent(rule)

In [None]:
ModalAssociationRules.consequent(rule)

In [None]:
# get the generator Itemset back
Itemset(rule)

##### Quiz

Try to explain the ratio below, which is commonly called *confidence*.

$$\text{confidence}(\mathcal{I}, \mathsf{P} \Rightarrow \mathsf{Q}) = \frac{\text{support}(\mathcal{I}, \mathsf{P} \cup \mathsf{Q})}{\text{support}(\mathcal{I}, \mathsf{P})}$$

##### Exercise

Implement your own `myconfidence` function.

Solution (Base 64):
ZnVuY3Rpb24gbXljb25maWRlbmNlKHJ1bGU6OkFSdWxlLCBYazo6TCwgd29ybGQ6OlcpIHdoZXJlIHsKICAgIEw8OlNvbGVEYXRhLkFic3RyYWN0TG9naXNldCwgCiAgICBXPDpTb2xlTG9naWNzLkFic3RyYWN0V29ybGQKfQogICAgZnVsbF9mb3JtdWxhID0gZm9ybXVsYShJdGVtc2V0KHJ1bGUpKQogICAgYW50ZWNlZGVudF9mb3JtdWxhID0gZm9ybXVsYShNb2RhbEFzc29jaWF0aW9uUnVsZXMuYW50ZWNlZGVudChydWxlKSkKCiAgICByZXR1cm4gbXlzdXBwb3J0KGZ1bGxfZm9ybXVsYSwgWGssIHdvcmxkKSAvIG15c3VwcG9ydChhbnRlY2VkZW50X2Zvcm11bGEsIFhrLCB3b3JsZCkgCmVuZA==

In [None]:
# Insert your solution here

In [None]:
try
    for phi in [p, q, r, pq, pr, qr, pqr]
        println(
            myconfidence(
                rule, 
                SoleData.slicedataset(Xk, 1:30), 
                Interval(1, X_ndatapoints)
            )
        )
    end
catch e
    if e isa UndefVarError
        println("You need to implement myconfidence.")
    end
end

##### Enhancing Modal Association Rules with Modalities

When dealing with Kripke models, a natural dichotomy pops up!

Let us consider an alphabet of modal literals $\Lambda_\mathcal{P}$, obtained by enriching a standard,
propositional alphabet $\mathcal{P}$ with modal operators.

Let us also consider a modal dataset $\mathcal{I}$ and an instance $I = (W,R,v) \in \mathcal{I}$ in it, 
as well as a pattern $\mathsf{P}$.

We can assess the interestingness of $\mathsf{P}$ within an instance by computing its *local support*, and comparing it 
with respect to a *minimum local support threshold* $s_l$.

$$\text{lsupport}(I, \mathsf{P}) = \frac{ |\{w \in W \mid I, w \models \mathsf{P} \}| }{|\mathcal{W}|}$$

The other part of the dichotomy, that is, the notion of *global* support, is left as an exercise (see the Quiz below).

In [None]:
lsupport

In [None]:
gsupport

In [None]:
lconfidence

In [None]:
gconfidence

##### Quiz
How would you aggregate many local support computations, to compute a *global* support?

##### Mining Association Rules from Time Series Items

We want to probe our instances with considerations on the shape of the signal in a certain interval, for a given feature.

We also want to increase the expressiveness of the result association rules with the help of `HS` logic.

In [None]:
X, y = read(
    joinpath(@__DIR__, "..", "datasets", "natops.arff"), String) |> parse_natops

In [None]:
function _normalize(x::Vector{R}) where {R <: Real}
    eps = 1e-10
    return (x .- mean(x)) ./ (std(x) + eps)
end

function zeuclidean(x::Vector{R}, y::Vector{R}) where {R}
    # normalize x and y
    meanx = mean(x)
    meany = mean(y)

    # avoid division by zero
    eps = 1e-10

    x_z = _normalize(x)
    y_z = _normalize(y)

    # z-normalized euclidean distance formula
    return sqrt(sum((x_z .- y_z).^2))
end

In [None]:
# consider only right hand and right elbow
varids = vcat(collect(4:6), collect(10:12));

In [None]:
mar_res_path = joinpath(@__DIR__, "..", "other-resources", "natops-for-mar")

In [None]:
using Serialization

function load_motifs(filepath, save_filename_prefix)
    ids = [id for id in deserialize(joinpath(filepath, "$(save_filename_prefix)-ids"))];
    motifs = [m for m in deserialize(joinpath(filepath, "$(save_filename_prefix)-motifs"))];
    featurenames = [f for f in deserialize(joinpath(filepath, "$(save_filename_prefix)-featurenames"))];
    return ids, motifs, featurenames
end


ids, motifs, featurenames = load_motifs(mar_res_path, "NATOPS-IHCC");

In this example, we only consider intervals of length 10 and 20.

In particular, given a world encoding an interval of such length, we compute the (normalized)
euclidean distance between it and a pool of particular time series, called "motifs".

If the distance is low enough, then it means that the gesture encoded by the motif is happening.

Try to browse the motifs we are playing with, by tweaking the plot below.

In [None]:
i = 1
plot(motifs[i], label = "V$(ids[i]) $(featurenames[i])")

In [None]:
_variables = [
    SoleData.VariableDistance(id, m, distance=zeuclidean, featurename=name)
    for (id, m, name) in zip(ids, motifs, featurenames)
]

syntaxstring.(_variables)[1:3]

In [None]:
# we only consider the instances related to the "I have command" class;
# we are not cheating: we just want to describe the instances
IHCC = reduce(vcat, [X[1:30, :], X[(180+1):(180+30), :]]);
IHCCk = scalarlogiset(IHCC, _variables)

In [None]:
propositionalatoms = [
    Atom(ScalarCondition(v, <=, 1.0))
    for v in _variables
]

syntaxstring.(propositionalatoms)[1:3]

In [None]:
atoms = Vector{Item}(
    reduce(vcat, [
        propositionalatoms,
        diamond(IA_A).(propositionalatoms),
        diamond(IA_B).(propositionalatoms),
        diamond(IA_E).(propositionalatoms),
        diamond(IA_D).(propositionalatoms),
        diamond(IA_O).(propositionalatoms),
    ])
)

syntaxstring.(atoms)[1:3]

In [None]:
_items = Vector{Item}(atoms);

In [None]:
miner = Miner(
    # the data from which we want to find all the frequent itemsets
    IHCCk,

    # the strategy we want to leverage for exploring the frequent itemset space
    apriori,

    # the initial alphabet of facts
    _items,

    # the interestingness measures for the frequent itemsets
    [(gsupport, 0.1, 0.1)],

    # the meaningfulness measures for the association rules
    [(gconfidence, 0.5, 0.5)];
    
    worldfilter=SoleLogics.FunctionalWorldFilter(
        x -> (length(x) == 10) || (length(x) == 20), Interval{Int}
    ),

    itemset_policies=Function[
        isanchored_itemset(ignoreuntillength=1),
        isdimensionally_coherent_itemset()
    ],

    arule_policies=Function[
        islimited_length_arule(consequent_maxlength=3),
        isanchored_arule()
    ]
)

In [None]:
mine!(miner)

In [None]:
length(freqitems(miner))

In [None]:
length(arules(miner))

In [None]:
arules(miner)

Here is a list of interesting characteristics gestures hidden in the *I have command* movements.

*Whenever the right hand of the operator is completely stretching in front of him/her and their elbow goes all the way up on the y-axis, the same elbow started the movement range in a rest position and, near the end of the movement range, the operator’s right hand is moving to the left, but will soon change direction.*

---

*When the right elbow frontally moves away from the operator for about one second and, just after this movement range, the right hand frontally retracts from a stretched position to the same z coordinate of the operator’s ankle, then the right elbow reproduces the same movement but inverting its direction.*

---

We discovered that the entire vertical range movement of the right elbow, first up and then down, in nearly one second, begins with the right hand not moving on the horizontal axis for about 0.2 seconds, and then slightly moving to the right.

The last 0.5 seconds of the movement range involves the right elbow descent phase.

The insight underlying this rule is that, across a 0.48% of candidates in which the local support of the antecedent holds, about half of them perform this movement particularly fast.