# Spanner Workbench Introduction
In this tutorial you will learn the basics of spanner workbench:
* [how to install, import and use RGXlog](#use_rgxlog)
* [local and free variables](#local_and_free_vars)
* [local variable assignment](#local_var_assignment)
* [reading from a file](#read_a_file)
* [declaring a relation](#declare_relations)
* [adding facts](#facts)
* [adding rules](#rules)
* [queries](#queries)
* [using RGXlog's primitive information extractor: functional regex formulas](#RGX_ie)
* [using custom information extractors](#custom_ie)
* [additional small features](#small_features)

At the end of this tutorial there is also an [example for a small RGXlog program.](#example_program)

# Using RGXlog<a class="anchor" id="use_rgxlog"></a>
prerequisites:

* Have [Python](https://www.python.org/downloads/) version 3.8 or above installed

To install RGXlog run the following command in your terminal:

```
python3 -m pip install --upgrade --index-url https://test.pypi.org/simple/ --no-deps my-pkg-coldfear-rgxlog-interpreter
```

(This command is likely to change in the future)

* If this command doesn't work, try calling python instead of python3 (or whatever name you have for your python installation)

You can also install RGXlog in the current Jupyter kernel:

In [None]:
import sys
!python3 -m pip install --upgrade --index-url https://test.pypi.org/simple/ --no-deps my-pkg-coldfear-rgxlog-interpreter

In order to use RGXlog, you must first import it:

In [None]:
import rgxlog

Now whenever you want to a cell to use RGXlog, simply type '%%spanner' at the beginning
of that cell. For example:

In [None]:
%%spanner
new uncle(str, str)
uncle("bob", "greg")

# Local and free variables<a class="anchor" id="local_and_free_vars"></a>

RGXlog distinguishes two kinds of variables.

The first kind are local variables. These are variables that store a single value (e.g. string). They work similarly to variables in python. A local variable must be defined via assignment before being used.

A local variable name must begin with a lowercase letter or with an underscore (_), and can be continued with letters, digits and underscores

Here are some examples for legal local variable names:
* a
* a_name123
* _Some_STRING

And here are some illegal local variable names:
* A
* A_name
* 1_a


The second kind of variables are free variables. Free variables do not hold any value and are used to define relations inside queries and rules (you will see a lot of examples for these in later chapters). Free variables do not need to be declared or defined before being used.

A free variable name must begin with an uppercase letter and can be continued with letters, digits and underscores

Here are some examples for legal free variable names:
* A
* A_name

And here are some illegal free variable names:
* a
* a_name
* _Some_STRING
* 1A


# Local variable assignment<a class="anchor" id="local_var_assignment"></a>
RGXlog allows you to use two types of variables: strings and spans.
The assignment of a string is intuitive:

In [None]:
%%spanner
b = "bob"
b2 = b # b2's value is "bob"
# you can write multiline strings using a line overflow escape like in python
b3 = "this is a multiline " \
"string"
b4 = "this is a multiline string" # b4 holds the same value as b3

 A span identifies a substring of a string by specifying its bounding indices. It is constructed from two integers.
 You can assign a span value like this:

In [None]:
%%spanner
span1 = [3,7)
span2 = span1 # span2 value is [3,7)

# Reading from a file<a class="anchor" id="read_a_file"></a>
You can also perform a string assignment by reading from a file. You will need to provide a path to a file by either using a string literal or a string variable:

In [None]:
%%spanner
a = read("path/to/file")
b = "path/to/file" 
c = read(b) # c holds the same string value as a

# Derclaring a relation<a class="anchor" id="declare_relations"></a>
RGXlog allows you to define and query relations.
You have to declare a relation before you can use it (unless you define it with a rule as we'll see in the "rules" chapter). Each term in a relation could be a string or a span. Here are some examples for declaring relations:

In [None]:
%%spanner
# 'brothers' is a relation with two string terms.
new brothers(str, str)
# 'confused' is a relation with one string term.
new confused(str)
# 'verb' is a relation with one string term, and one span term 
new verb(str, spn)

# Facts<a class="anchor" id="facts"></a>
RGXlog is an extension of Datalog, a declarative logic programming language. In Datalog you can declare "facts", essentially adding tuples to a relation. To do it you use the following syntax:

relation_name(term_1,term_2,...term_3)

where each term is either a constant or a local variable that is from the same variable type that was declared for relation_name at the same location.

For example:

In [None]:
%%spanner
# first declare the relation that you want to use
new noun(str, spn)
# now you can add facts (tuples) to that relation
# this span indicates that a noun "Life" can be found at indexes 0 to 3
noun("Life finds a way", [0,4)) 
# another example
new sisters(str, str)
sisters("alice", "rin")
# sisters([0,4), "rin") # illegal as [0,4) is not a string

# Rules<a class="anchor" id="rules"></a>
Datalog allows you to deduce new tuples for a relation.
RGXlog includes this feature as well:

In [None]:
%%spanner
new parent(str ,str)
parent("bob", "greg")
parent("greg", "alice")
# now add a rule that deduces that bob is a grandparent of alice
grandparent(X,Z) <- parent(X,Y), parent(Y,Z) # ',' is a short hand to the 'and' operator

# Queries<a class="anchor" id="queries"></a>
Querying is very simple in RGXlog. You can query by using constant values, local variables and free variables:

In [None]:
%%spanner
# first create a relation with some facts for the example
new grandfather(str, str)
# bob and george are the grandfathers of alice and rin
grandfather("bob", "alice")
grandfather("bob", "rin")
grandfather("george", "alice")
grandfather("george", "rin")
# edward is the grandfather of john
grandfather("edward", "john")

# now for the queries
?grandfather("bob", "alice") # returns an empty tuple () as alice is bob's grandchild
?grandfather("edward", "alice") # returns nothing as alice is not edward's grandchild
?grandfather("george", X) # returns "rin" and "alice" as both rin
# and alice are george's grandchildren
?grandfather(X, "rin") # returns "bob" and "george" (rin's grandfathers)
?grandfather(X, Y) # returns all the tuples in the 'grandfather' relation

new verb(str, spn)
verb("Ron eats quickly.", [4,8))
verb("You write neatly.", [4,9))
?verb("Ron eats quickly.", X) # returns [4,8)
?verb(X,[4,9)) # returns "You write neatly."                            

You may have noticed that the query

```
?grandfather("bob", "alice")
```

returns an empty tuple. This is because of the fact that bob is alice's grandfather is true,
but we did not use any free variables to construct the tuple of the query's relation, that is why we get a single empty tuple as a result

A good example for using free variables to construct a relation is the query:

```
?grandfather("george", X)
```

which finds all of george's grandchildren (X) and constructs a tuple for each one.

# Functional regex formulas<a class="anchor" id="RGX_ie"></a>
RGXlog supports information extraction using a regular expressions and named capture groups (for now in rule bodies only).
You will first need to define a string variable either by using a literal or a load from a file, and then you can use the following syntax in a rule body:

extract RGX\<term_1,term_2,...term_n>(x_1, x_2, ...,x_n) from s

where:
* term_1,term_2,...,term_n are strings that repressents regular expressions in .NET syntax (for now we also allow all data types, not just strings, TBD what they're used for)
* x_1, x_2, ... x_n can be any terms including capture groups that appear in the regular expressions. They're used to construct the tuples of the resulting relation
* s is a string variable that the information will be extracted from

For example:

In [None]:
%%spanner
report = "In 2019 we earned 2000 EUR"
# you can use line overflow escape to separate your statement (like in python)
annual_earning(Year,Amount) <- extract RGX<".*(?<Year>\d\d\d\d).*(?<Amount>\d+)\sEUR"> \
(Year, Amount) from report
?annual_earning(X,Y) # returns ("2019", 2000)

# Custom information extractors<a class="anchor" id="custom_ie"></a>
RGXlog allows you to define and use your own information extractors. You can use them only in rule bodies (TBD). The following is the syntax for custom information extractors:

func<term_1,term_2,...term_n>(x_1, x_2, ..., x_n)

where:
* func is a IE function that was previously defined (TBD where it was defined)     
* term_1,term_2,...,term_n are the parameters for func
* x_1, ... x_n could be any type of terms, and are used to construct tuples of the resulting relation

For example:

In [None]:
%%spanner
new grandmother(str, str)
grandmother("rin", "alice")
grandmother("denna", "joel")
sentence = "rin is happy, denna is sad."
happy_grandmother(X) <- grandmother(X,Z),get_happy<sentence>(X)
?happy_grandmother(X) # assuming get_happy returned "rin", also returns "rin"

# Additional small features<a class="anchor" id="small_features"></a>
You can use line overflow escapes if you want to split your statements into multiple lines

In [None]:
%%spanner
b \
= "some " \
"string"

# RGXlog program example<a class="anchor" id="example_program"></a>

In [None]:
%%spanner
new lecturer(str, str)
lecturer("walter", "chemistry")
lecturer("linus", "operation systems")
lecturer("rick", "physics")

new enrolled(str, str)
enrolled("abigail", "chemistry")
enrolled("abigail", "operation systems")
enrolled("jordan", "chemistry")
enrolled("gale", "operation systems")
enrolled("howard", "chemistry")
enrolled("howard", "physics")

enrolled_in_chemistry(X) <- enrolled(X, "chemistry")
?enrolled_in_chemistry("jordan") # returns empty tuple ()
?enrolled_in_chemistry("gale") # returns nothing
?enrolled_in_chemistry(X) # returns "abigail", "jordan" and "howard"

enrolled_in_physics_and_chemistry(X) <- enrolled(X, "chemistry"), enrolled(X, "physics")
?enrolled_in_physics_and_chemistry(X) # returns "howard"

lecturer_of(X,Z) <- lecturer(X,Y), enrolled(Z,Y)
?lecturer_of(X,"abigail") # returns "walter" and "linus"

gpa_str = "\n abigail 100\n jordan 80\n gale 79\n howard 60\n"
gpa_of_chemistry_students(Student, Grade) <- extract \
RGX<".*[\n](?<Student>[a-z]+).*(?<Grade>\d+).*[\n]">(Student, Grade) from gpa_str, \
enrolled_in_chemistry(Student)
?gpa_of_chemistry_students(X, "100") # returns "abigail"