# Spanner Workbench Introduction
In this tutorial you will learn the basics of spanner workbench:
* [how to install, import and use RGXLog](#use_rgxlog)
* [local and free variables](#local_and_free_vars)
* [local variable assignment](#local_var_assignment)
* [reading from a file](#read_a_file)
* [declaring a relation](#declare_relations)
* [adding facts](#facts)
* [adding rules](#rules)
* [queries](#queries)
* [using RGXLog's primitive information extractor: functional regex formulas](#RGX_ie)
* [using custom information extractors](#custom_ie)
* [additional small features](#small_features)

At the end of this tutorial there is also an [example for a small RGXLog program.](#example_program)

# Using RGXLog<a class="anchor" id="use_rgxlog"></a>

### Installation

prerequisites:

* Have [Python](https://www.python.org/downloads/) version 3.8 or above installed

To download and install RGXLog run the following command in your terminal:

```bash
git clone https://github.com/DeanLight/spanner_workbench
cd spanner_workbench

pip install src/rgxlog-interpreter 

```
Make sure you are calling the pip version of your current python environment.
To install with another python interpreter, run

```bash

<path_to_python_interpreter> -m pip install  src/rgxlog-interpreter

```
You can also install RGXLog in the current Jupyter kernel:

In [None]:
import sys
from pathlib import Path
current_python=f"{sys.executable}"
package_path=Path("../src/rgxlog-interpreter")


In [None]:
! {current_python} -m pip install {package_path}

In order to use RGXLog in jupyter notebooks, you must first import it:


In [None]:
import rgxlog

Importing the RGXLog library automatically loads the `%%spanner` cell magic which accepts RGXLog queries as shown below.


In [None]:
%%spanner
new uncle(str, str)
uncle("bob", "greg")
?uncle(X,Y)

# Local and free variables<a class="anchor" id="local_and_free_vars"></a>

RGXLog distinguishes two kinds of variables.

The first kind are local variables. These are variables that store a single value (e.g. string). They work similarly to variables in python. A local variable must be defined via assignment before being used.

A local variable name must begin with a lowercase letter or with an underscore (_), and can be continued with letters, digits and underscores

Here are some examples for legal local variable names:
* `a`
* `a_name123`
* `_Some_STRING`

And here are some illegal local variable names:
* `A`
* `A_name`
* `1_a`


The second kind of variables are free variables. Free variables do not hold any value and are used to define relations inside [queries](#queries) and [rules](#rules). Free variables do not need to be declared or defined before being used.

A free variable name must begin with an uppercase letter and can be continued with letters, digits and underscores

Here are some examples for legal free variable names:
* `A`
* `A_name`

And here are some illegal free variable names:
* `a`
* `a_name`
* `_Some_STRING`
* `1A`


# Local variable assignment<a class="anchor" id="local_var_assignment"></a>
RGXLog allows you to use three types of variables: strings, integers and spans.
The assignment of a string is intuitive:

In [None]:
%%spanner
b = "bob"
b2 = b # b2's value is "bob"
# you can write multiline strings using a line overflow escape like in python
b3 = "this is a multiline  \
string"
b4 = "this is a multiline string" # b4 holds the same value as b3

The assignment of integers is also very simple:

In [None]:
%%spanner
n = 4
n2 = n # n2 = 4

 A span identifies a substring of a string by specifying its bounding indices. It is constructed from two integers.
 You can assign a span value like this:

In [None]:
%%spanner
span1 = [3,7)
span2 = span1 # span2 value is [3,7)

# Reading from a file<a class="anchor" id="read_a_file"></a>
You can also perform a string assignment by reading from a file. You will need to provide a path to a file by either using a string literal or a string variable:

In [None]:
%%spanner
a = read("path/to/file")
b = "path/to/file" 
c = read(b) # c holds the same string value as a

# Declaring a relation<a class="anchor" id="declare_relations"></a>
RGXLog allows you to define and query relations.
You have to declare a relation before you can use it (unless you define it with a rule as we'll see in the "rules" chapter). Each term in a relation could be a string, an integer or a span. Here are some examples for declaring relations:

In [None]:
%%spanner
# 'brothers' is a relation with two string terms.
new brothers(str, str)
# 'confused' is a relation with one string term.
new confused(str)
# 'animal' is a relation with one string term and one span term 
new animal(str, spn)
# 'scores' is a relation with one string term and one int term
new scores(str, int)

# Facts<a class="anchor" id="facts"></a>
RGXLog is an extension of Datalog, a declarative logic programming language. In Datalog you can declare "facts", essentially adding tuples to a relation. To do it you use the following syntax:

```
relation_name(term_1,term_2,...term_3)
```

or

```
relation_name(term_1,term_2,...term_3) <- True
```

where each `term` is either a constant or a local variable that is from the same variable type that was declared for `relation_name` at the same location.

For example:

In [None]:
%%spanner
# first declare the relation that you want to use
new noun(str, spn)
# now you can add facts (tuples) to that relation
# this span indicates that a noun "Life" can be found at indexes 0 to 3
noun("Life finds a way", [0,4)) 
# another example
new sisters(str, str)
sisters("alice", "rin") <- True
# sisters([0,4), "rin") # illegal as [0,4) is not a string

You could also remove a fact using a similar syntax:

```
relation_name(term_1,term_2,...term_3) <- False
```

if a fact that you try to remove does not exist, the remove fact statement will be silently ignored


In [None]:
%%spanner
new goals(str, int)
goals("kronovi", 10)
goals("kronovi", 10) <- False  # 'goals' relation is now empty
goals("kronovi", 10) <- False  # this statement does nothing

# Rules<a class="anchor" id="rules"></a>
Datalog allows you to deduce new tuples for a relation.
RGXLog includes this feature as well:

In [None]:
%%spanner
new parent(str ,str)
parent("bob", "greg")
parent("greg", "alice")
# now add a rule that deduces that bob is a grandparent of alice
grandparent(X,Z) <- parent(X,Y), parent(Y,Z) # ',' is a short hand to the 'and' operator

# Queries<a class="anchor" id="queries"></a>
Querying is very simple in RGXLog. You can query by using constant values, local variables and free variables:

In [None]:
%%spanner
# first create a relation with some facts for the example
new grandfather(str, str)
# bob and george are the grandfathers of alice and rin
grandfather("bob", "alice")
grandfather("bob", "rin")
grandfather("george", "alice")
grandfather("george", "rin")
# edward is the grandfather of john
grandfather("edward", "john")

# now for the queries
?grandfather("bob", "alice") # returns an empty tuple () as alice is bob's grandchild
?grandfather("edward", "alice") # returns nothing as alice is not edward's grandchild
?grandfather("george", X) # returns "rin" and "alice" as both rin
# and alice are george's grandchildren
?grandfather(X, "rin") # returns "bob" and "george" (rin's grandfathers)
?grandfather(X, Y) # returns all the tuples in the 'grandfather' relation

new verb(str, spn)
verb("Ron eats quickly.", [4,8))
verb("You write neatly.", [4,9))
?verb("Ron eats quickly.", X) # returns [4,8)
?verb(X,[4,9)) # returns "You write neatly."
         
new orders(str, int)
orders("pie", 4)
orders("pizza", 4)
orders("cake", 0)
?orders(X, 4) # retutns "pie" and "pizza"         

You may have noticed that the query

```
?grandfather("bob", "alice")
```

returns an empty tuple. This is because of the fact that bob is alice's grandfather is true,
but we did not use any free variables to construct the tuple of the query's relation, that is why we get a single empty tuple as a result

A good example for using free variables to construct a relation is the query:

```
?grandfather("george", X)
```

which finds all of george's grandchildren (`X`) and constructs a tuple for each one.

# Functional regex formulas<a class="anchor" id="RGX_ie"></a>
RGXLog supports information extraction using a regular expressions and named capture groups (for now in rule bodies only).
You will first need to define a string variable either by using a literal or a load from a file, and then you can use the following syntax in a rule body:

```
RGX(regex_input ,regex_formula)->(x_1, x_2, ...,x_n)
```

or

```
RGXString(regex_input ,regex_formula)->(x_1, x_2, ...,x_n)
```

where:
* `regex_input` is the string that the regex operation will be performed on
* `regex_formula` is either a string literal or a string variable that represents your regular expression.
* `x_1`, `x_2`, ... `x_n` can be either constant terms or free variable terms. They're used to construct the tuples of the resulting relation. the number of terms has to be the same as the number of capture groups used in `regex_formula`. If not capture groups are used, then each returned tuple includes a single, whole regex match, so only one term should be used.

The only difference between the 'RGX' and 'RGXString' ie functions, is that RGX returns spans while RGXString returns strings. This also means that if you want to use constant terms as return values, they have to be spans if you use 'RGX', or strings if you use 'RGXString'

For example:

In [None]:
%%spanner
report = "In 2019 we earned 2000 EUR"
annual_earning(Year,Amount) <- RGXString(report,"(\d\d\d\d).*?(\d+)")->(Year,Amount)
?annual_earning(X,Y) # returns ("2019", "2000")

# Custom information extractors<a class="anchor" id="custom_ie"></a>
RGXLog allows you to define and use your own information extractors. You can use them only in rule bodies in the current version. The following is the syntax for custom information extractors:

```
func(term_1,term_2,...term_n)->(x_1, x_2, ..., x_n)
```

where:
* `func` is a IE function that was previously defined and registered (see the 'advanced_usage' tutorial)
* `term_1`,`term_2`,...,`term_n` are the parameters for func
* `x_1`, ... `x_n` could be any type of terms, and are used to construct tuples of the resulting relation

For example:

In [None]:
%%spanner
new grandmother(str, str)
grandmother("rin", "alice")
grandmother("denna", "joel")
sentence = "rin is happy, denna is sad."
# note that this statement will fail as 'get_happy' is not registered as an ie_function
happy_grandmother(X) <- grandmother(X,Z),get_happy(sentence)->(X)
?happy_grandmother(X) # assuming get_happy returned "rin", also returns "rin"

# Additional small features<a class="anchor" id="small_features"></a>
You can use line overflow escapes if you want to split your statements into multiple lines

In [None]:
%%spanner
k \
= "some \
string"

# RGXLog program example<a class="anchor" id="example_program"></a>

In [None]:
%%spanner
new lecturer(str, str)
lecturer("walter", "chemistry")
lecturer("linus", "operation systems")
lecturer("rick", "physics")

new enrolled(str, str)
enrolled("abigail", "chemistry")
enrolled("abigail", "operation systems")
enrolled("jordan", "chemistry")
enrolled("gale", "operation systems")
enrolled("howard", "chemistry")
enrolled("howard", "physics")

enrolled_in_chemistry(X) <- enrolled(X, "chemistry")
?enrolled_in_chemistry("jordan") # returns empty tuple ()
?enrolled_in_chemistry("gale") # returns nothing
?enrolled_in_chemistry(X) # returns "abigail", "jordan" and "howard"

enrolled_in_physics_and_chemistry(X) <- enrolled(X, "chemistry"), enrolled(X, "physics")
?enrolled_in_physics_and_chemistry(X) # returns "howard"

lecturer_of(X,Z) <- lecturer(X,Y), enrolled(Z,Y)
?lecturer_of(X,"abigail") # returns "walter" and "linus"

gpa_str = "abigail 100 jordan 80 gale 79 howard 60"
gpa_of_chemistry_students(Student, Grade) <- \
RGXString(gpa_str, "(\w+).*?(\d+)")->(Student, Grade), enrolled_in_chemistry(Student)
?gpa_of_chemistry_students(X, "100") # returns "abigail"

## custom example

In [None]:
%%bash
# git clone https://github.com/DeanLight/spanner_workbench
# cd spanner_workbench
# pip install -e src/rgxlog-interpreter 

In [1]:
%load_ext autoreload
%autoreload 2

import rgxlog
from rgxlog import magic_client,Session
import rgxlog.magic.rgxlog_magic

In [2]:
%%spanner
# define relation and some facts about it
new parent(str ,str)
parent("bob", "greg")
parent("greg", "alice")
parent("greg", "janice")

In [3]:
%%spanner
# now add a rule that deduces that bob is a grandparent of alice
grandparent(X,Z) <- parent(X,Y), parent(Y,Z) # ',' is a short hand to the 'and' operator

In [4]:
%%spanner
# now for the queries
?parent("greg", ChildsOfGreg) # returns all children of greg
?grandparent(X, Y) # returns all tuples in the grandparent relation

printing results for query 'parent("greg", ChildsOfGreg)':
  ChildsOfGreg
----------------
     alice
     janice

printing results for query 'grandparent(X, Y)':
  X  |   Y
-----+--------
 bob | alice
 bob | janice



In [5]:
! cat sons.csv

bob,adam
adam,oren
jane,bob
elmo,bernie
bernie,casie

In [6]:
magic_client.import_relation_from_csv("sons.csv",relation_name="parent",delimiter=",")

In [7]:
%%spanner
?parent(X,Y) # returns all tuples that

printing results for query 'parent(X, Y)':
   X    |   Y
--------+--------
 bernie | casie
  elmo  | bernie
  jane  |  bob
  adam  |  oren
  bob   |  adam
  greg  | janice
  greg  | alice
  bob   |  greg



In [8]:
%%spanner
?grandparent(X, Y)

printing results for query 'grandparent(X, Y)':
  X   |   Y
------+--------
 elmo | casie
 jane |  adam
 jane |  greg
 bob  |  oren
 bob  | alice
 bob  | janice



In [9]:
grand_parents=magic_client.query_into_df("?grandparent(Grandparent, Grandchild)")
grand_parents

Unnamed: 0,Grandparent,Grandchild
0,elmo,casie
1,jane,adam
2,jane,greg
3,bob,oren
4,bob,alice
5,bob,janice


In [10]:
%load_ext autoreload
%autoreload 2

from rgxlog.engine.datatypes.primitive_types import DataTypes
#from rgxlog.stdlib.nlp import   lemma_wrapper, ner_wrapper, dependency_parse_wrapper                                
from rgxlog.stdlib.regex import rgx_string,rgx_string_out_types



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [11]:
magic_client.register(
                ie_function=rgx_string,
                ie_function_name='RGXString',
                in_rel=[DataTypes.string, DataTypes.string],
                out_rel=rgx_string_out_types,
                )

In [12]:
%%spanner
bank_records= "bob 100 elmo 80 jane 79 alice 60"
money(P, Money) <- \
RGXString(bank_records, "(\w+).*?(\d+)")->(P, Money)



In [13]:
%%spanner
?money(P,M)

printing results for query 'money(P, M)':
   P   |   M
-------+-----
 alice |  60
 jane  |  79
 elmo  |  80
  bob  | 100



In [15]:
%load_ext autoreload
%autoreload 2

from rgxlog.stdlib.nlp import   lemma_wrapper, ner_wrapper, dependency_parse_wrapper                                


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [19]:
sentence = "John fathered Greg wonderfully"
print("NER")
print(list(ner_wrapper(sentence)))
print("lemma")
print(list(lemma_wrapper(sentence)))
print("dep_parse")
print(list(dependency_parse_wrapper(sentence)))


NER
[('John', 'PERSON', (0, 4)), ('Greg', 'PERSON', (14, 18))]
lemma
[('John', 'John', (0, 4)), ('fathered', 'father', (5, 13)), ('Greg', 'Greg', (14, 18)), ('wonderfully', 'wonderfully', (19, 30))]
dep_parse
[('ROOT', 0, 'ROOT', 2, 'fathered'), ('nsubj', 2, 'fathered', 1, 'John'), ('obj', 2, 'fathered', 3, 'Greg'), ('advmod', 2, 'fathered', 4, 'wonderfully')]


In [16]:
magic_client.register(ie_function=lemma_wrapper,
             ie_function_name='Lemma',
             in_rel=[DataTypes.string],
             out_rel=[DataTypes.string, DataTypes.string, DataTypes.span],
             )

magic_client.register(ie_function=ner_wrapper,
           ie_function_name='NER',
           in_rel=[DataTypes.string],
           out_rel=[DataTypes.string, DataTypes.string, DataTypes.span],
           )



magic_client.register(ie_function=dependency_parse_wrapper,
                ie_function_name='DepParse',
                in_rel=[DataTypes.string],
                out_rel=[DataTypes.string, DataTypes.integer, DataTypes.string, DataTypes.integer, DataTypes.string]
                )




In [25]:
%%spanner
sentence="John fathered Greg wonderfully"

father_of123(F,S)<- NER(sentence)->(F,"PERSON",T1),NER(sentence)->(S,"PERSON",T2),Lemma(sentence)->(FATHERLIKE ,"father",T3),DepParse(sentence)->("nsubj",T41,FATHERLIKE,T42,F),DepParse(sentence)->("obj",T51,FATHERLIKE,T52,S)

#?father_of123(F,S)

In [26]:
%%spanner
#sentence="John fathered Greg wonderfully"

#father_of123(F,S)<- NER(sentence)->(F,"PERSON",T1),NER(sentence)->(S,"PERSON",T2),Lemma(sentence)->(FATHERLIKE ,"father",T3),DepParse(sentence)->("nsubj",T41,FATHERLIKE,T42,F),DepParse(sentence)->("obj",T51,FATHERLIKE,T52,S)

?father_of123(F,S)

printing results for query 'father_of123(F, S)':
  F   |  S
------+------
 John | Greg



In [None]:
%spanner
Family_tree_report = "alice "