# Introduction

:::{.callout-note}
this project is built with nbdev, which is a full literate programming environment built on Jupyter Notebooks. That means that every piece of documentation, including the page you’re reading now, can be accessed as interactive Jupyter notebook. <br>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DeanLight/spannerlib/blob/master/nbs/introduction.ipynb)
:::

This tutorial will teach you the basics of the spannerlog language and the spannerlib framework.

Spannerlog is:
* Similar to Datalog, but has type safety features
* Has support for aggregation functions
* Enables using stateless user defined function called IE functions to derive new relations from existing relations
* has some DRY features to help you write spannerlog code effectively
* comes with support for Document Spanners using the `Span` class.

Spannerlib, via its `Session` object, enables:
* registering IE functions, and aggregations functions to be used as callbacks in spannerlog.
* executing spannerlog code programmatically

## Installation

prerequisites:

* Have [Python](https://www.python.org/downloads/) version 3.8 or above installed

To download and install spannerlog run the following commands in your terminal:

```bash
git clone https://github.com/DeanLight/spannerlib
cd spannerlib
pip install . 
```

Make sure you are calling the pip version of your current python environment.
To install with another python interpreter, run

```bash
<path_to_python_interpreter> -m pip install .
```

You can also install spannerlib in the current Jupyter kernel:
<!-- #endregion -->

```python
!git clone https://github.com/DeanLight/spannerlib
!pip install spannerlib
```

In order to use spannerlib in jupyter notebooks, you must first load it:


In [None]:
#| output: false
import spannerlib

Importing the spannerlog library automatically loads the `%%spannerlog` cell magic which accepts spannerlog queries as shown below.

In [None]:
%%spannerlog
new uncle(str, str)
uncle("bob", "greg")
?uncle(X,Y)

'?uncle(X,Y)'

Unnamed: 0,X,Y
0,bob,greg


## Type safe Datalog

Spannerlog syntax is very similar to datalog, but relations and their types must be declared using the `new` keyword.


In [None]:
%%spannerlog
# defining relations
new parent(str,str)
# defining initial facts
parent('xerces', 'brooke')
parent('brooke', 'damocles')

Rules can be defined that describe how to derive new facts from existing facts.
* We call the part to the left of the `<-` the rule's head (or head clause).
* We call the part to the right of the `<-` the rule's body (made up of body clauses).

In [None]:
%%spannerlog
# you can define relations recursively
# and use line escapes for long rules to make them more readable
ancestor(X, Y) <- parent(X, Y)
ancestor(X, Y) <- parent(X, Z),\
     ancestor(Z, Y)

derived and existing facts can be queried using the `?` operator, with either Free Variables such as `X` or constants.

In [None]:
%%spannerlog
?parent(X,Y)

?ancestor('xerces',Y)

'?parent(X,Y)'

Unnamed: 0,X,Y
0,brooke,damocles
1,xerces,brooke


"?ancestor('xerces',Y)"

Unnamed: 0,Y
0,brooke
1,damocles


Spannerlog has built in support for declaring relations for primitive types:
* `int`
* `str`
* `float`
* `bool`

But programatically, you can define relations and add facts of any pythonic data type.

## Aggregation

You can use aggregation function in rule's head to express groupby logic.
Non aggregated Variables are used to group and the rest are aggregated by their respective functions.

In [None]:
%%spannerlog
numDescendants(X,count(Y)) <- ancestor(X,Y)

?numDescendants(X,N)

'?numDescendants(X,N)'

Unnamed: 0,X,N
0,brooke,1
1,xerces,2


Built in aggregations include:
* min
* max
* sum
* avg
* count

But you will see in later sections that external aggregation functions can be defined.

## IE functions

Given a pure (stateless) function `f(X,Y)->(Z)` we can think of `f` as deriving information from `(x,y)` values to generate `(z)` values.
In the relational settings, IE functions are pure functions that take tuples over some input schema and deriv a number of new tuples from them over some output schema.
We can use IE functions as body clauses to derive new facts.

IE functions are invoked using the `func_name(InputVars...)->(OutputVars...)` syntax.

In [None]:
%%spannerlog

new Texts(str)
Texts("Hello darkness my old friend")
Texts("I've come to talk with you again")

Words(Word) <- Texts(X), rgx("(\w+)",X)->(Word)

?Words(W)

'?Words(W)'

Unnamed: 0,W
0,"[@9a1d0f,0,5) ""Hello"""
1,"[@9a1d0f,6,14) ""darkness"""
2,"[@9a1d0f,15,17) ""my"""
3,"[@9a1d0f,18,21) ""old"""
4,"[@9a1d0f,22,28) ""friend"""
5,"[@c7e66d,0,1) ""I"""
6,"[@c7e66d,2,4) ""ve"""
7,"[@c7e66d,5,9) ""come"""
8,"[@c7e66d,10,12) ""to"""
9,"[@c7e66d,13,17) ""talk"""


`rgx` is part of the built in IE functions. It returns Spans over the original text. We will learn more Spans later.

In [None]:
#TODO from here show vars with DIY

This tutorial will teach you the basics of the spannerlog language and the spannerlib framework.

Spannerlog is:
* Similar to Datalog, but has type safety features
* Has support for aggregation functions
* Enables using stateless user defined function called IE functions to derive new relations from existing relations
* has some DRY features to help you write spannerlog code effectively
* comes with support for Document Spanners using the `Span` class.

Spannerlib, via its `Session` object, enables:
* registering IE functions, and aggregations functions to be used as callbacks in spannerlog.
* executing spannerlog code programmatically

In [None]:
#| hide
# TODO split the into into sections
#TODO change filters into boolean filters

# spannerlog is similar to datalog (also type safe)

# extensions:
# - variables for readability
#   - including reading from files
# - ie functions
# - aggregations
# - spans

# communication between spannerlog and python
# - calling spannerlog code programatically
# - importing relations to spannerlog from python
# - adding ie functions to spannerlog

# TODO extension.
# add . to signal the end of a rule to enable new lines without delimeters
# TODO enable defining relations with non primitive types through the session object.

# Local and free variables<a class="anchor" id="local_and_free_vars"></a>

spannerlog distinguishes two kinds of variables.

The first kind are local variables. These are variables that store a single value (e.g. string). They work similarly to variables in python. A local variable must be defined via assignment before being used.

A local variable name must begin with a lowercase letter or with an underscore (_), and can be continued with letters, digits and underscores

Here are some examples for legal local variable names:
* `a`
* `a_name123`
* `_Some_STRING`

And here are some illegal local variable names:
* `A`
* `A_name`
* `1_a`


The second kind of variables are free variables. Free variables do not hold any value and are used to define relations inside [queries](#queries) and [rules](#rules). Free variables do not need to be declared or defined before being used.

A free variable name must begin with an uppercase letter and can be continued with letters, digits and underscores

Here are some examples for legal free variable names:
* `A`
* `A_name`

And here are some illegal free variable names:
* `a`
* `a_name`
* `_Some_STRING`
* `1A`


# Local variable assignment<a class="anchor" id="local_var_assignment"></a>
spannerlog allows you to use three types of variables: strings, integers and spans.
The assignment of a string is intuitive:

In [None]:
%%spannerlog
b = "bob"
b2 = b #r b2's value is "bob"
# you can write multiline strings using a line overflow escape like in python
b3 = "this is a multiline  \
string"
b4 = "this is a multiline string" # b4 holds the same value as b3

The assignment of integers is also very simple:

In [None]:
%%spannerlog
n = 4
n2 = n # n2 = 4

 A span identifies a substring of a string by specifying its bounding indices. It is constructed from two integers.
 You can assign a span value like this:

In [None]:
%%spannerlog
span1 = [3,7)
span2 = span1 # span2 value is [3,7)

# Reading from a file<a class="anchor" id="read_a_file"></a>
You can also perform a string assignment by reading from a file. You will need to provide a path to a file by either using a string literal or a string variable:

In [None]:
%%spannerlog
a = read("../README.md")
b = "../README.md" 
c = read(b) # c holds the same string value as a

# Declaring a relation<a class="anchor" id="declare_relations"></a>
spannerlog allows you to define and query relations.
You have to declare a relation before you can use it (unless you define it with a rule as we'll see in the "rules" chapter). Each term in a relation could be a string, an integer or a span. Here are some examples for declaring relations:

In [None]:
%%spannerlog
# 'brothers' is a relation with two string terms.
new brothers(str, str)
# 'confused' is a relation with one string term.
new confused(str)
# 'animal' is a relation with one string term and one span term 
new animal(str, span)
# 'scores' is a relation with one string term and one int term
new scores(str, int)

Whenever a relation is defined, a corresponding empty table is created in the database. <br>
The table has the same name as the relation, and its number of columns is equal to the number of variables in the relation.

# Facts<a class="anchor" id="facts"></a>
spannerlog is an extension of Datalog, a declarative logic programming language. In Datalog you can declare "facts", essentially adding tuples to a relation. To do it you use the following syntax:

```
relation_name(term_1,term_2,...term_3)
```

or

```
relation_name(term_1,term_2,...term_3) <- True
```

where each `term` is either a constant or a local variable that is from the same variable type that was declared for `relation_name` at the same location.

For example:

In [None]:
%%spannerlog
# first declare the relation that you want to use
new noun(str, span)
# now you can add facts (tuples) to that relation
# this span indicates that a noun "Life" can be found at indexes 0 to 3
noun("Life finds a way", [0,4)) 
# another example
new sisters(str, str)
sisters("alice", "rin") <- True
# sisters([0,4), "rin") # illegal as [0,4) is not a string

You could also remove a fact using a similar syntax:

```relation_name(term_1,term_2,...term_3) <- False```

if a fact that you try to remove does not exist, the remove fact statement will be silently ignored


```python
%%spannerlog
new goals(str, int)
goals("kronovi", 10)
goals("kronovi", 10) <- False  # 'goals' relation is now empty
goals("kronovi", 10) <- False  # this statement does nothing
```

When adding or removing facts from a relation, the relation's corresponding table in the database gets updated respectively

# Rules<a class="anchor" id="rules"></a>
Datalog allows you to deduce new tuples for a relation.
spannerlog includes this feature as well:

In [None]:
%%spannerlog
new parent(str ,str)
parent("bob", "greg")
parent("greg", "alice")
# now add a rule that deduces that bob is a grandparent of alice
grandparent(X,Z) <- parent(X,Y), parent(Y,Z) # ',' is a short hand to the 'and' operator

spannerlog also supports recursive rules:

In [None]:
%%spannerlog
parent("Liam", "Noah")
parent("Noah", "Oliver")
parent("James", "Lucas")
parent("Noah", "Benjamin")
parent("Benjamin", "Mason")
ancestor(X,Y) <- parent(X,Y)
# This is a recursive rule
ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y)

# Queries are explained in the next section
?ancestor("Liam", X)
?ancestor(X, "Mason")
?ancestor("Mason", X)

'?ancestor("Liam",X)'

Unnamed: 0,X
2,Noah
8,Oliver
9,Benjamin
10,Mason


'?ancestor(X,"Mason")'

Unnamed: 0,X
6,Benjamin
10,Liam
11,Noah


'?ancestor("Mason",X)'

Unnamed: 0,X


You could also remove a rule via the session:

```magic_session.remove_rule(rule_to_delete)```

note: the rule must be written exactly as it appears in the output of `print_all_rules`

In [None]:
%%spannerlog
confused("Josh")
brothers("Drake", "Josh")

# oops! this rule was added by mistake!
ancestor(X, Y) <- brothers(X, Y), confused(Y)

?ancestor(X,Y)

'?ancestor(X,Y)'

Unnamed: 0,X,Y
0,bob,greg
1,greg,alice
2,Liam,Noah
3,Noah,Oliver
4,James,Lucas
5,Noah,Benjamin
6,Benjamin,Mason
7,bob,alice
8,Liam,Oliver
9,Liam,Benjamin


In [None]:
from spannerlib import get_magic_session
magic_session = get_magic_session()
print ("before:")
_ = magic_session.print_rules()
magic_session.remove_rule("ancestor(X,Y) <- brothers(X,Y),confused(Y)")

print ("after:")
_ = magic_session.print_rules()


before:
grandparent(X,Z) <- parent(X,Y),parent(Y,Z)
ancestor(X,Y) <- parent(X,Y)
ancestor(X,Y) <- parent(X,Z),ancestor(Z,Y)
ancestor(X,Y) <- brothers(X,Y),confused(Y)
after:
grandparent(X,Z) <- parent(X,Y),parent(Y,Z)
ancestor(X,Y) <- parent(X,Y)
ancestor(X,Y) <- parent(X,Z),ancestor(Z,Y)


Currently, to remove a rule, you must give the exact code string that is printed by the session, whitespaces and different variable names are not recognized.

In [None]:
%%spannerlog
?ancestor(X,Y)

'?ancestor(X,Y)'

Unnamed: 0,X,Y
0,bob,greg
1,greg,alice
2,Liam,Noah
3,Noah,Oliver
4,James,Lucas
5,Noah,Benjamin
6,Benjamin,Mason
7,bob,alice
8,Liam,Oliver
9,Liam,Benjamin


success! the rule was deleted - Drake and Josh are no longer part of the `?ancestor` query result

In addition you can use `remove_all_rules` to remove all the rules (it won't affect the facts).<br>
You can pass rule head paraemetr to remove all the rules related to it.

In [None]:
magic_session.remove_all_rules()
print("after removing all rules:")
magic_session.print_rules()

# facts are not affected...
%spannerlog ?parent(X, Y)

after removing all rules:


'?parent(X,Y)'

Unnamed: 0,X,Y
0,bob,greg
0,greg,alice
0,Liam,Noah
0,Noah,Oliver
0,James,Lucas
0,Noah,Benjamin
0,Benjamin,Mason


# Queries<a class="anchor" id="queries"></a>
A query is essentially a way to retrieve specific information from a dataset. <br>
querying in spannerlog uses the same synatx and semantics as DataLog. <br>
Under said semantics, we try to find all instantiations of free variables that satisfy the queried relation.

You can query by using constant values, local variables and free variables:

In [None]:
%%spannerlog
# first create a relation with some facts for the example
new grandfather(str, str)
# bob and george are the grandfathers of alice and rin
grandfather("bob", "alice")
grandfather("bob", "rin")
grandfather("george", "alice")
grandfather("george", "rin")
# edward is the grandfather of john
grandfather("edward", "john")

# now for the queries
?grandfather("bob", "alice") # returns an empty tuple () as alice is bob's grandchild
?grandfather("edward", "alice") # returns nothing as alice is not edward's grandchild
?grandfather("george", X) # returns "rin" and "alice" as both rin
# and alice are george's grandchildren
?grandfather(X, "rin") # returns "bob" and "george" (rin's grandfathers)
?grandfather(X, Y) # returns all the tuples in the 'grandfather' relation

new verb(str, span)
verb("Ron eats quickly.", [4,8))
verb("You write neatly.", [4,9))
?verb("Ron eats quickly.", X) # returns [4,8)
?verb(X,[4,9)) # returns "You write neatly."
         
new orders(str, int)
orders("pie", 4)
orders("pizza", 4)
orders("cake", 0)
?orders(X, 4) # retutns "pie" and "pizza"         

'?grandfather("bob","alice")'

True

'?grandfather("edward","alice")'

False

'?grandfather("george",X)'

Unnamed: 0,X
0,alice
0,rin


'?grandfather(X,"rin")'

Unnamed: 0,X
0,bob
0,george


'?grandfather(X,Y)'

Unnamed: 0,X,Y
0,bob,alice
0,bob,rin
0,george,alice
0,george,rin
0,edward,john


'?verb("Ron eats quickly.",X)'

Unnamed: 0,X
0,"[4,8)"


'?verb(X,[4,9))'

Unnamed: 0,X
0,You write neatly.


'?orders(X,4)'

Unnamed: 0,X
0,pie
0,pizza


You may have noticed that the query

```
?grandfather("bob", "alice")
```

returns an empty tuple. This is because of the fact that bob is alice's grandfather is true,
our query has no free variables, which means it asks a specific factual question about the dataset. If the query is true, it means the specified condition exists in the dataset. If false, it means the condition does not exist.
And this is why if we have a query with no free variables, we get an empty set of instantiations if its true and no such set if its false.

A good example for using free variables to construct a relation is the query:

```
?grandfather("george", X)
```

which finds all of george's grandchildren (`X`) and constructs a tuple for each one.

### How Rules and Queries are saved in the database?

Unlike facts, which are immediately stored in the database upon their creation, rules are not computed and added to the database upon declaration. Instead, the logic of a rule is saved separately and is only evaluated when needed (lazy evaluation). When a query is made, the engine utilizes these rules to derive all potential solutions from the existing facts that would fulfill the query.


# Using IE Functions

## Functional regex formulas<a class="anchor" id="RGX_ie"></a>
spannerlog contains IE functions which are registered by default.
Let's go over a couple regex IE functions:


```
rgx_span(regex_input ,regex_formula)->(x_1, x_2, ...,x_n)
```

and

```
rgx_string(regex_input ,regex_formula)->(x_1, x_2, ...,x_n)
```

where:
* `regex_input` is the string that the regex operation will be performed on
* `regex_formula` is either a string literal or a string variable that represents your regular expression.
* `x_1`, `x_2`, ... `x_n` can be either constant terms or free variable terms. They're used to construct the tuples of the resulting relation. the number of terms has to be the same as the number of capture groups used in `regex_formula`. If not capture groups are used, then each returned tuple includes a single, whole regex match, so only one term should be used.

The only difference between the `rgx_span` and `rgx_string` ie functions, is that rgx_string returns strings, while rgx_span returns the spans of those strings. This also means that if you want to use constant terms as return values, they have to be spans if you use `rgx_span`, and strings if you use `rgx_string`

For example consider the following spannerlog code:

In [None]:
%%spannerlog
input_string = "John Doe: 35 years old, Jane Smith: 28 years old"
regex_pattern = "(\w+\s\w+):\s(\d+)"

age(X,Y) <- py_rgx_string(input_string, regex_pattern) -> (X,Y)
age_span(X,Y) <- py_rgx_span(input_string, regex_pattern) -> (X,Y)
?age(X,Y)
?age_span(X,Y)

'?age(X,Y)'

Unnamed: 0,X,Y
0,John Doe,35
1,Jane Smith,28


'?age_span(X,Y)'

Unnamed: 0,X,Y
0,"[0,8)","[10,12)"
1,"[24,34)","[36,38)"


The variables X,Y in the output of the above ie functions are the matches of the capture groups used in the regex_pattern. <br>
capture groups allow us to extract specific parts of a matched pattern in a text using regular expressions. <br>
When you define a regular expression pattern with parentheses (), you create a capturing group

## Creating and Registering a New IE Function<a class="anchor" id="custom_ie"></a>

Using regex is nice, but what if you want to define your own IE function? <br>
spannerlog allows you to define and use your own information extraction functions. You can use them only in rule bodies in the current version. The following is the syntax for custom IE functions:

```
func(term_1,term_2,...term_n)->(x_1, x_2, ..., x_n)
```

where:
* `func` is a IE function that was previously defined and registered
* `term_1`,`term_2`,...,`term_n` are the parameters for func
* `x_1`, ... `x_n` could be any type of terms, and are used to construct tuples of the resulting relation

For example:

### IE function `get_happy`

In [None]:
magic_session.register?

[0;31mSignature:[0m [0mmagic_session[0m[0;34m.[0m[0mregister[0m[0;34m([0m[0mname[0m[0;34m,[0m [0mfunc[0m[0;34m,[0m [0min_schema[0m[0;34m,[0m [0mout_schema[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mFile:[0m      ~/tdk/spannerlib/spannerlib/session.py
[0;31mType:[0m      method

In [None]:
import re

# the function itself, which should yield an iterable of tuples/lists of primitive types
def get_happy(text):
    """
    get the names of people who are happy in `text`
    """
    compiled_rgx = re.compile("(\w+) is happy")
    num_groups = compiled_rgx.groups
    for match in re.finditer(compiled_rgx, text):
        if num_groups == 0:
            matched_strings = [match.group()]
        else:
            matched_strings = [group for group in match.groups()]
        yield matched_strings

# the input types, a list of primitive types
get_happy_in_types = [str]

# the output types, either a list of primitive types or 
# a method which expects an arity and computes the desired types based on it
get_happy_out_types = lambda arity : arity * [str]
# or: `get_happy_out_types = [DataTypes.string]`s

# finally, register the function
magic_session.register(
    name='get_happy',
    func = get_happy,
    in_schema = get_happy_in_types,
    out_schema = get_happy_out_types                
                       )

You may have noticed that when we register a custom ie function, we use `yield` instead of `return`, <br>
and that is because part of making spanner based database systems more performant and memory efficient is to do lazy evaluation, <br>
since building iterators in python is very simple using the generator pattern, we made the ie functions into generators to allow ie functions to also be as lazy as their author desires.

### custom IE using `get_happy`

In [None]:
%%spannerlog
new grandmother(str, str)
grandmother("rin", "alice")
grandmother("denna", "joel")
sentence = "rin is happy, denna is sad."
# note that this statement will fail as 'get_happy' is not registered as an ie_function
happy_grandmother(X) <- grandmother(X,Z),get_happy(sentence)->(X)
?happy_grandmother(X) # assuming get_happy returned "rin", also returns "rin"

'?happy_grandmother(X)'

Unnamed: 0,X
0,rin


## More information about IE functions
* You can remove an IE function via the session:

```magic_session.remove_ie_function(ie_function_name)```

* If you want to remove all the registered ie functions:

```magic_session.remove_all_ie_functions()```

* If you register an IE function with a name that was already registered before, the old IE function will be overwitten by the new one. 
<br><br>
* You can inspect all the registered IE functions using the following command:

```magic_session.print_registered_ie_functions()```

```python
# first, let's print all functions:
magic_session.print_registered_ie_functions()
```

another tremendous triumph! Coref was deleted from the registered functions

# Additional small features<a class="anchor" id="small_features"></a>
You can use line overflow escapes if you want to split your statements into multiple lines

```python pycharm={"name": "#%%\n"}
%%spannerlog
k \
= "some \
string"
```

# spannerlog program example<a class="anchor" id="example_program"></a>

In [None]:
import spannerlib

In [None]:
%%spannerlog
new lecturer(str, str)
lecturer("walter", "chemistry")
lecturer("linus", "operation systems")
lecturer("rick", "physics")

new enrolled(str, str)
enrolled("abigail", "chemistry")
enrolled("abigail", "operation systems")
enrolled("jordan", "chemistry")
enrolled("gale", "operation systems")
enrolled("howard", "chemistry")
enrolled("howard", "physics")

enrolled_in_chemistry(X) <- enrolled(X, "chemistry")
?enrolled_in_chemistry("jordan") # returns True
?enrolled_in_chemistry("gale") # returns False
?enrolled_in_chemistry(X) # returns "abigail", "jordan" and "howard"

enrolled_in_physics_and_chemistry(X) <- enrolled_in_chemistry(X), enrolled(X, "physics")
?enrolled_in_physics_and_chemistry(X) # returns "howard"

lecturer_of(X,Z) <- lecturer(X,Y), enrolled(Z,Y)
?lecturer_of(X,"abigail") # returns "walter" and "linus"

grade_str = "abigail 100 jordan 80 gale 79 howard 60"
grade_of_chemistry_students(Student, Grade) <- \
py_rgx_string(grade_str, "(\w+).*?(\d+)")->(Student, Grade), enrolled_in_chemistry(Student)
?grade_of_chemistry_students(X, "100") # returns "abigail"

'?enrolled_in_chemistry("jordan")'

True

'?enrolled_in_chemistry("gale")'

False

'?enrolled_in_chemistry(X)'

Unnamed: 0,X
0,abigail
1,jordan
2,howard


'?enrolled_in_physics_and_chemistry(X)'

Unnamed: 0,X
0,howard


'?lecturer_of(X,"abigail")'

Unnamed: 0,X
0,walter
3,linus


'?grade_of_chemistry_students(X,"100")'

Unnamed: 0,X
0,abigail


# Useful tricks<a class="anchor" id="Usefull tricks"></a>


## Logical Operators and filters:
Suppose we have a table in which each row contains two strings - pair(str, str).
Our goal is to filter all the rows that contain the same value twice.
<br>
In other words, we want to implement the relation **not equals (NEQ)**.

We would like to have a rule such as:
<br>
```unique_pair(X, Y) <- pair(X, Y), X != Y```
<br><br>
Unfortunately spannerlog doesn't comparison operators out of the box. Therefore, we can't use ```X != Y```.
<br>
Our solution to this problem is to create an ie function that implements NEQ relation:

In [None]:
def NEQ(x, y):
    if x == y:
        # return an empty relation
        return []
    else:
        #return a relation with the values x and y
        return [(x, y)]

magic_session.register("NEQ",NEQ,[str,str],[str,str])

In [None]:
%%spannerlog
#Lets test this solution
new pair(str, str)
pair("Dan", "Tom")
pair("Cat", "Dog")
pair("Apple", "Apple")
pair("Cow", "Cow")
pair("123", "321")

unique_pair(X, Y) <- pair(X, Y), NEQ(X, Y) -> (X, Y)
?unique_pair(X, Y)

'?unique_pair(X,Y)'

Unnamed: 0,X,Y
0,Dan,Tom
1,Cat,Dog
2,123,321


# Python Implementation v.s. spannerlog Implementation

let's try to compare coding in python and coding in spannerlog.
we are given two long strings of enrolled pairs, grades pairs.
our goal is to find all student that are enrolled in biology and chemistry, and have a GPA = 80.

## python 

In [None]:
import re
enrolled = "dave chemistry dave biology rem biology ram biology emilia physics roswaal chemistry roswaal biology roswaal physics"
grades = "dave 80 rem 66 ram 66 roswaal 100 emilia 88"

enrolled_pairs = re.findall(r"(\w+).*?(\w+)", enrolled)
grade_pairs = re.findall(r"(\w+).*?(\d+)", grades)
for student1, course1 in enrolled_pairs:
    for student2, course2 in enrolled_pairs:
        for student3, grade in grade_pairs:
            if (student1 == student2 == student3):
                if (course1 == "biology" and course2 == "chemistry" and int(grade) == 80):
                    print(student1)

dave


## spannerlog

In [None]:
%%spannerlog
enrolled = "dave chemistry dave biology rem biology ram biology emilia physics roswaal chemistry roswaal biology roswaal physics"
grades = "dave 80 rem 66 ram 66 roswaal 100 emilia 88"

enrolled_in(Student, Course) <- py_rgx_string(enrolled, "(\w+).*?(\w+)")->(Student, Course)
student_grade(Student, Grade) <- py_rgx_string(grades, "(\w+).*?(\d+)") -> (Student, Grade)
interesting_student(X) <- enrolled_in(X, "biology"), enrolled_in(X, "chemistry"), student_grade(X, "80")
?interesting_student(X)

'?interesting_student(X)'

Unnamed: 0,X
0,dave


in this case, the python implementation was long and unnatural. on the other hand, the spannerlog implementation was cleaner and allowed us to express our intentions directly, rather than dealing with annoying programming logic.

# Parsing JSON document using spannerlog

spannerlog's JsonPath/JsonFullPath ie functions allow us to easily parse json documents using path expressions.<br>
We will demonstrate how to use the latter. Check out the [jsonpath repo](https://github.com/json-path/JsonPath) for more information.

First, we would like to remove the built-in jsonpath function, to show how we implement it from scratch:

In [None]:
# TODO redo this with a cleared session, after we taught how to work with the session object
# magic_session.remove_ie_function("JsonPathFull")

After removing the function, implementing and registering it is as easy as:

In [None]:
import json
from jsonpath_ng import parse

def parse_match(match) -> str:
    """
    @param match: a match result of json path query.
    @return: a string that represents the match in string format.
    """
    json_result = match.value
    if type(json_result) != str:
        # we replace for the same reason as in json_path implementation.
        json_result = json.dumps(json_result).replace("\"", "'")
    return json_result

def json_path_full(json_document: str, path_expression: str):
    """
    @param json_document: The document on which we will run the path expression.
    @param path_expression: The query to execute.
    @return: json documents with the full results paths.
    """
    json_document = json.loads(json_document.replace("'", "\""))
    jsonpath_expr = parse(path_expression)
    for match in jsonpath_expr.find(json_document):
        json_result = str(match.full_path)
        # objects in full path are separated by dots.
        yield *json_result.split("."), parse_match(match)


magic_session.register('JsonPathFull', json_path_full, [str, str], lambda arity: [str] * arity)

And now for the usage. <br>
Suppose we have a json document of the following format {student: {subject: grade, ...} ,...} <br>
We want to create a rglox relation containing tuples of (student, subject, grade).

In [None]:
%%spannerlog

# we use strings, as spannerlog doesn't support dicts.
json_string = "{ \
                'abigail': {'chemistry': 80, 'operation systems': 99}, \
                'jordan':  {'chemistry': 65, 'physics': 70}, \
                'gale':    {'operation systems': 100}, \
                'howard':  {'chemistry': 90, 'physics':91, 'biology':92} \
                }"

# path expression is the path to the key of each grade (in our simple case it's *.*)
# then JsonPathFull appends the full path to the value
json_table(Student, Subject, Grade) <- JsonPathFull(json_string, "*.*") -> (Student, Subject, Grade)
?json_table(Student, Subject, Grade)

'?json_table(Student,Subject,Grade)'

Unnamed: 0,Student,Subject,Grade
0,abigail,chemistry,80
1,abigail,operation systems,99
2,jordan,chemistry,65
3,jordan,physics,70
4,gale,operation systems,100
5,howard,chemistry,90
6,howard,physics,91
7,howard,biology,92
