# Introduction

In this tutorial you will learn the basics of spanner workbench

:::{.callout-note}
this project is built with nbdev, which is a full literate programming environment built on Jupyter Notebooks. That means that every piece of documentation, including the page you’re reading now, can be accessed as interactive Jupyter notebook. <br>
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/DeanLight/spanner_workbench/blob/master/nbs/introduction.ipynb)
:::

## Using RGXLog<a class="anchor" id="use_rgxlog"></a>

### Installation

prerequisites:

* Have [Python](https://www.python.org/downloads/) version 3.8 or above installed

To download and install RGXLog run the following commands in your terminal:

```bash
git clone https://github.com/DeanLight/spanner_workbench
cd spanner_workbench
pip install . 
```

Make sure you are calling the pip version of your current python environment.
To install with another python interpreter, run

```bash
<path_to_python_interpreter> -m pip install .
```

You can also install RGXLog in the current Jupyter kernel:
<!-- #endregion -->

In [None]:
#| output: false
import os
!git clone https://github.com/DeanLight/spanner_workbench
os.chdir(os.path.join(os.getcwd(),'spanner_workbench'))
!pip install .

fatal: destination path 'spanner_workbench' already exists and is not an empty directory.
Processing /edit_intro_based_on_reviews/nbs/spanner_workbench
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: rgxlog
  Building wheel for rgxlog (setup.py) ... [?25ldone
[?25h  Created wheel for rgxlog: filename=rgxlog-0.0.1-py3-none-any.whl size=81151 sha256=b8979051a6250843826d263504502adebb3c94fae655aa360d6ec6a77c5cfefb
  Stored in directory: /tmp/pip-ephem-wheel-cache-ni2euz_q/wheels/0c/c3/f1/614798b4fea9e9e48774f9b2b64aa9d1ea29cd772d6f305203
Successfully built rgxlog
Installing collected packages: rgxlog
  Attempting uninstall: rgxlog
    Found existing installation: rgxlog 0.0.1
    Uninstalling rgxlog-0.0.1:
      Successfully uninstalled rgxlog-0.0.1
Successfully installed rgxlog-0.0.1
[0m

In order to use RGXLog in jupyter notebooks, you must first load it:


In [1]:
#| output: false
import rgxlog  # or `load_ext rgxlog`

Importing the RGXLog library automatically loads the `%rgxlog` and `%%rgxlog` cell magics which accepts RGXLog queries as shown below.<br>
use %rgxlog to run a single line, and %%rgxlog to run a block of code:

In [None]:
%rgxlog new relation(str)
print("this is python code")

this is python code


In [None]:
%%rgxlog
new uncle(str, str)
uncle("bob", "greg")
?uncle(X,Y)

printing results for query 'uncle(X, Y)':
  X  |  Y
-----+------
 bob | greg



# Local and free variables<a class="anchor" id="local_and_free_vars"></a>

RGXLog distinguishes two kinds of variables.

The first kind are local variables. These are variables that store a single value (e.g. string). They work similarly to variables in python. A local variable must be defined via assignment before being used.

A local variable name must begin with a lowercase letter or with an underscore (_), and can be continued with letters, digits and underscores

Here are some examples for legal local variable names:
* `a`
* `a_name123`
* `_Some_STRING`

And here are some illegal local variable names:
* `A`
* `A_name`
* `1_a`


The second kind of variables are free variables. Free variables do not hold any value and are used to define relations inside [queries](#queries) and [rules](#rules). Free variables do not need to be declared or defined before being used.

A free variable name must begin with an uppercase letter and can be continued with letters, digits and underscores

Here are some examples for legal free variable names:
* `A`
* `A_name`

And here are some illegal free variable names:
* `a`
* `a_name`
* `_Some_STRING`
* `1A`


# Local variable assignment<a class="anchor" id="local_var_assignment"></a>
RGXLog allows you to use three types of variables: strings, integers and spans.
The assignment of a string is intuitive:

In [None]:
%%rgxlog
b = "bob"
b2 = b #r b2's value is "bob"
# you can write multiline strings using a line overflow escape like in python
b3 = "this is a multiline  \
string"
b4 = "this is a multiline string" # b4 holds the same value as b3

The assignment of integers is also very simple:

In [None]:
%%rgxlog
n = 4
n2 = n # n2 = 4

 A span identifies a substring of a string by specifying its bounding indices. It is constructed from two integers.
 You can assign a span value like this:

In [None]:
%%rgxlog
span1 = [3,7)
span2 = span1 # span2 value is [3,7)

# Reading from a file<a class="anchor" id="read_a_file"></a>
You can also perform a string assignment by reading from a file. You will need to provide a path to a file by either using a string literal or a string variable:

In [None]:
%%rgxlog
a = read("README.md")
b = "README.md" 
c = read(b) # c holds the same string value as a

# Declaring a relation<a class="anchor" id="declare_relations"></a>
RGXLog allows you to define and query relations.
You have to declare a relation before you can use it (unless you define it with a rule as we'll see in the "rules" chapter). Each term in a relation could be a string, an integer or a span. Here are some examples for declaring relations:

In [None]:
%%rgxlog
# 'brothers' is a relation with two string terms.
new brothers(str, str)
# 'confused' is a relation with one string term.
new confused(str)
# 'animal' is a relation with one string term and one span term 
new animal(str, span)
# 'scores' is a relation with one string term and one int term
new scores(str, int)

Whenever a relation is defined, a corresponding empty table is created in the database. <br>
The table has the same name as the relation, and its number of columns is equal to the number of variables in the relation.

# Facts<a class="anchor" id="facts"></a>
RGXLog is an extension of Datalog, a declarative logic programming language. In Datalog you can declare "facts", essentially adding tuples to a relation. To do it you use the following syntax:

```
relation_name(term_1,term_2,...term_3)
```

or

```
relation_name(term_1,term_2,...term_3) <- True
```

where each `term` is either a constant or a local variable that is from the same variable type that was declared for `relation_name` at the same location.

For example:

In [None]:
%%rgxlog
# first declare the relation that you want to use
new noun(str, span)
# now you can add facts (tuples) to that relation
# this span indicates that a noun "Life" can be found at indexes 0 to 3
noun("Life finds a way", [0,4)) 
# another example
new sisters(str, str)
sisters("alice", "rin") <- True
# sisters([0,4), "rin") # illegal as [0,4) is not a string

You could also remove a fact using a similar syntax:

```relation_name(term_1,term_2,...term_3) <- False```

if a fact that you try to remove does not exist, the remove fact statement will be silently ignored


```python
%%rgxlog
new goals(str, int)
goals("kronovi", 10)
goals("kronovi", 10) <- False  # 'goals' relation is now empty
goals("kronovi", 10) <- False  # this statement does nothing
```

When adding or removing facts from a relation, the relation's corresponding table in the database gets updated respectively

# Rules<a class="anchor" id="rules"></a>
Datalog allows you to deduce new tuples for a relation.
RGXLog includes this feature as well:

In [None]:
%%rgxlog
new parent(str ,str)
parent("bob", "greg")
parent("greg", "alice")
# now add a rule that deduces that bob is a grandparent of alice
grandparent(X,Z) <- parent(X,Y), parent(Y,Z) # ',' is a short hand to the 'and' operator

RGXLog also supports recursive rules:

In [None]:
%%rgxlog
parent("Liam", "Noah")
parent("Noah", "Oliver")
parent("James", "Lucas")
parent("Noah", "Benjamin")
parent("Benjamin", "Mason")
ancestor(X,Y) <- parent(X,Y)
# This is a recursive rule
ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y)

# Queries are explained in the next section
?ancestor("Liam", X)
?ancestor(X, "Mason")
?ancestor("Mason", X)

printing results for query 'ancestor("Liam", X)':
    X
----------
 Benjamin
  Mason
   Noah
  Oliver

printing results for query 'ancestor(X, "Mason")':
    X
----------
 Benjamin
   Liam
   Noah

printing results for query 'ancestor("Mason", X)':
[]



You could also remove a rule via the session:

```magic_session.remove_rule(rule_to_delete)```

note: the rule must be written exactly as it appears in the output of `print_all_rules`

In [None]:
%%rgxlog
confused("Josh")
brothers("Drake", "Josh")

# oops! this rule was added by mistake!
ancestor(X, Y) <- brothers(X, Y), confused(Y)

?ancestor(X,Y)

printing results for query 'ancestor(X, Y)':
    X     |    Y
----------+----------
 Benjamin |  Mason
  Drake   |   Josh
  James   |  Lucas
   Liam   | Benjamin
   Liam   |  Mason
   Liam   |   Noah
   Liam   |  Oliver
   Noah   | Benjamin
   Noah   |  Mason
   Noah   |  Oliver
   bob    |  alice
   bob    |   greg
   greg   |  alice



In [None]:
from rgxlog import magic_session
# remove the rule from the current session
print ("before:")
magic_session.print_all_rules()

magic_session.remove_rule("ancestor(X, Y) <- brothers(X, Y), confused(Y)")

print ("after:")
magic_session.print_all_rules()

before:
Printing all the rules:
	1. grandparent(X, Z) <- parent(X, Y), parent(Y, Z)
	2. ancestor(X, Y) <- parent(X, Y)
	3. ancestor(X, Y) <- parent(X, Z), ancestor(Z, Y)
	4. ancestor(X, Y) <- brothers(X, Y), confused(Y)
after:
Printing all the rules:
	1. grandparent(X, Z) <- parent(X, Y), parent(Y, Z)
	2. ancestor(X, Y) <- parent(X, Y)
	3. ancestor(X, Y) <- parent(X, Z), ancestor(Z, Y)


In [None]:
%%rgxlog
?ancestor(X,Y)

printing results for query 'ancestor(X, Y)':
    X     |    Y
----------+----------
 Benjamin |  Mason
  James   |  Lucas
   Liam   | Benjamin
   Liam   |  Mason
   Liam   |   Noah
   Liam   |  Oliver
   Noah   | Benjamin
   Noah   |  Mason
   Noah   |  Oliver
   bob    |  alice
   bob    |   greg
   greg   |  alice



success! the rule was deleted - Drake and Josh are no longer part of the `?ancestor` query result

Note that during this examples we used `print_all_rules`.<br>
This function prints all the registered rules.<br>
If you wan't to print only rules that relevant to spesific rule head, you can pass the rule head as a parameter.

In [None]:
magic_session.print_all_rules("ancestor")

Printing all the rules with head ancestor:
	1. ancestor(X, Y) <- parent(X, Y)
	2. ancestor(X, Y) <- parent(X, Z), ancestor(Z, Y)


In addition you can use `remove_all_rules` to remove all the rules (it won't affect the facts).<br>
You can pass rule head paraemetr to remove all the rules related to it.

In [None]:
magic_session.remove_all_rules("ancestor")
print("after removing ancestor rules:")
magic_session.print_all_rules()

magic_session.remove_all_rules()
print("after removing all rules:")
magic_session.print_all_rules()

# facts are not affected...
%rgxlog ?parent(X, Y)

after removing ancestor rules:
Printing all the rules:
	1. grandparent(X, Z) <- parent(X, Y), parent(Y, Z)
after removing all rules:
Printing all the rules:
printing results for query 'parent(X, Y)':
    X     |    Y
----------+----------
   bob    |   greg
   greg   |  alice
   Liam   |   Noah
   Noah   |  Oliver
  James   |  Lucas
   Noah   | Benjamin
 Benjamin |  Mason



# Queries<a class="anchor" id="queries"></a>
A query is essentially a way to retrieve specific information from a dataset. <br>
querying in rgxlog uses the same synatx and semantics as DataLog. <br>
Under said semantics, we try to find all instantiations of free variables that satisfy the queried relation.

You can query by using constant values, local variables and free variables:

In [None]:
%%rgxlog
# first create a relation with some facts for the example
new grandfather(str, str)
# bob and george are the grandfathers of alice and rin
grandfather("bob", "alice")
grandfather("bob", "rin")
grandfather("george", "alice")
grandfather("george", "rin")
# edward is the grandfather of john
grandfather("edward", "john")

# now for the queries
?grandfather("bob", "alice") # returns an empty tuple () as alice is bob's grandchild
?grandfather("edward", "alice") # returns nothing as alice is not edward's grandchild
?grandfather("george", X) # returns "rin" and "alice" as both rin
# and alice are george's grandchildren
?grandfather(X, "rin") # returns "bob" and "george" (rin's grandfathers)
?grandfather(X, Y) # returns all the tuples in the 'grandfather' relation

new verb(str, span)
verb("Ron eats quickly.", [4,8))
verb("You write neatly.", [4,9))
?verb("Ron eats quickly.", X) # returns [4,8)
?verb(X,[4,9)) # returns "You write neatly."
         
new orders(str, int)
orders("pie", 4)
orders("pizza", 4)
orders("cake", 0)
?orders(X, 4) # retutns "pie" and "pizza"         

printing results for query 'grandfather("bob", "alice")':
[()]

printing results for query 'grandfather("edward", "alice")':
[]

printing results for query 'grandfather("george", X)':
   X
-------
 alice
  rin

printing results for query 'grandfather(X, "rin")':
   X
--------
  bob
 george

printing results for query 'grandfather(X, Y)':
   X    |   Y
--------+-------
  bob   | alice
  bob   |  rin
 george | alice
 george |  rin
 edward | john

printing results for query 'verb("Ron eats quickly.", X)':
   X
--------
 [4, 8)

printing results for query 'verb(X, [4, 9))':
         X
-------------------
 You write neatly.

printing results for query 'orders(X, 4)':
   X
-------
  pie
 pizza



You may have noticed that the query

```
?grandfather("bob", "alice")
```

returns an empty tuple. This is because of the fact that bob is alice's grandfather is true,
our query has no free variables, which means it asks a specific factual question about the dataset. If the query is true, it means the specified condition exists in the dataset. If false, it means the condition does not exist.
And this is why if we have a query with no free variables, we get an empty set of instantiations if its true and no such set if its false.

A good example for using free variables to construct a relation is the query:

```
?grandfather("george", X)
```

which finds all of george's grandchildren (`X`) and constructs a tuple for each one.

### How Rules and Queries are saved in the database?

Unlike facts, which are immediately stored in the database upon their creation, rules are not computed and added to the database upon declaration. Instead, the logic of a rule is saved separately and is only evaluated when needed (lazy evaluation). When a query is made, the engine utilizes these rules to derive all potential solutions from the existing facts that would fulfill the query.


# Using IE Functions

## Functional regex formulas<a class="anchor" id="RGX_ie"></a>
RGXLog contains IE functions which are registered by default.
Let's go over a couple regex IE functions:


```
rgx_span(regex_input ,regex_formula)->(x_1, x_2, ...,x_n)
```

and

```
rgx_string(regex_input ,regex_formula)->(x_1, x_2, ...,x_n)
```

where:
* `regex_input` is the string that the regex operation will be performed on
* `regex_formula` is either a string literal or a string variable that represents your regular expression.
* `x_1`, `x_2`, ... `x_n` can be either constant terms or free variable terms. They're used to construct the tuples of the resulting relation. the number of terms has to be the same as the number of capture groups used in `regex_formula`. If not capture groups are used, then each returned tuple includes a single, whole regex match, so only one term should be used.

The only difference between the `rgx_span` and `rgx_string` ie functions, is that rgx_string returns strings, while rgx_span returns the spans of those strings. This also means that if you want to use constant terms as return values, they have to be spans if you use `rgx_span`, and strings if you use `rgx_string`

For example consider the following rgxlog code:

In [None]:
%%rgxlog
input_string = "John Doe: 35 years old, Jane Smith: 28 years old"
regex_pattern = "(\w+\s\w+):\s(\d+)"

age(X,Y) <- py_rgx_string(input_string, regex_pattern) -> (X,Y)
age_span(X,Y) <- py_rgx_span(input_string, regex_pattern) -> (X,Y)
?age(X,Y)
?age_span(X,Y)

printing results for query 'age(X, Y)':
     X      |   Y
------------+-----
  John Doe  |  35
 Jane Smith |  28

printing results for query 'age_span(X, Y)':
    X     |    Y
----------+----------
  [0, 8)  | [10, 12)
 [24, 34) | [36, 38)



The variables X,Y in the output of the above ie functions are the matches of the capture groups used in the regex_pattern. <br>
capture groups allow us to extract specific parts of a matched pattern in a text using regular expressions. <br>
When you define a regular expression pattern with parentheses (), you create a capturing group

## Creating and Registering a New IE Function<a class="anchor" id="custom_ie"></a>

Using regex is nice, but what if you want to define your own IE function? <br>
RGXLog allows you to define and use your own information extraction functions. You can use them only in rule bodies in the current version. The following is the syntax for custom IE functions:

```
func(term_1,term_2,...term_n)->(x_1, x_2, ..., x_n)
```

where:
* `func` is a IE function that was previously defined and registered (see the 'advanced_usage' tutorial)
* `term_1`,`term_2`,...,`term_n` are the parameters for func
* `x_1`, ... `x_n` could be any type of terms, and are used to construct tuples of the resulting relation

For example:

### IE function `get_happy`

In [None]:
import re
from rgxlog.primitive_types import DataTypes

# the function itself, which should yield an iterable of primitive types
def get_happy(text):
    """
    get the names of people who are happy in `text`
    """
    compiled_rgx = re.compile("(\w+) is happy")
    num_groups = compiled_rgx.groups
    for match in re.finditer(compiled_rgx, text):
        if num_groups == 0:
            matched_strings = [match.group()]
        else:
            matched_strings = [group for group in match.groups()]
        yield matched_strings

# the input types, a list of primitive types
get_happy_in_types = [DataTypes.string]

# the output types, either a list of primitive types or a method which expects an arity
get_happy_out_types = lambda arity : arity * [DataTypes.string]
# or: `get_happy_out_types = [DataTypes.string]`

# finally, register the function
magic_session.register(ie_function=get_happy,
                       ie_function_name = "get_happy",
                       in_rel=get_happy_in_types,
                       out_rel=get_happy_out_types)

You may have noticed that when we register a custom ie function, we use `yield` instead of `return`, <br>
and that is because part of making spanner based database systems more performant and memory efficient is to do lazy evaluation, <br>
since building iterators in python is very simple using the generator pattern, we made the ie functions into generators to allow ie functions to also be as lazy as their author desires.

### custom IE using `get_happy`

In [None]:
%%rgxlog
new grandmother(str, str)
grandmother("rin", "alice")
grandmother("denna", "joel")
sentence = "rin is happy, denna is sad."
# note that this statement will fail as 'get_happy' is not registered as an ie_function
happy_grandmother(X) <- grandmother(X,Z),get_happy(sentence)->(X)
?happy_grandmother(X) # assuming get_happy returned "rin", also returns "rin"

printing results for query 'happy_grandmother(X)':
  X
-----
 rin



## More information about IE functions
* You can remove an IE function via the session:

```magic_session.remove_ie_function(ie_function_name)```

* If you want to remove all the registered ie functions:

```magic_session.remove_all_ie_functions()```

* If you register an IE function with a name that was already registered before, the old IE function will be overwitten by the new one. 
<br><br>
* You can inspect all the registered IE functions using the following command:

```magic_session.print_registered_ie_functions()```

```python
# first, let's print all functions:
magic_session.print_registered_ie_functions()
```

another tremendous triumph! Coref was deleted from the registered functions

# Additional small features<a class="anchor" id="small_features"></a>
You can use line overflow escapes if you want to split your statements into multiple lines

```python pycharm={"name": "#%%\n"}
%%rgxlog
k \
= "some \
string"
```

# RGXLog program example<a class="anchor" id="example_program"></a>

In [None]:
import rgxlog

In [None]:
%%rgxlog
new lecturer(str, str)
lecturer("walter", "chemistry")
lecturer("linus", "operation systems")
lecturer("rick", "physics")

new enrolled(str, str)
enrolled("abigail", "chemistry")
enrolled("abigail", "operation systems")
enrolled("jordan", "chemistry")
enrolled("gale", "operation systems")
enrolled("howard", "chemistry")
enrolled("howard", "physics")

enrolled_in_chemistry(X) <- enrolled(X, "chemistry")
?enrolled_in_chemistry("jordan") # returns empty tuple ()
?enrolled_in_chemistry("gale") # returns nothing
?enrolled_in_chemistry(X) # returns "abigail", "jordan" and "howard"

enrolled_in_physics_and_chemistry(X) <- enrolled_in_chemistry(X), enrolled(X, "physics")
?enrolled_in_physics_and_chemistry(X) # returns "howard"

lecturer_of(X,Z) <- lecturer(X,Y), enrolled(Z,Y)
?lecturer_of(X,"abigail") # returns "walter" and "linus"

grade_str = "abigail 100 jordan 80 gale 79 howard 60"
grade_of_chemistry_students(Student, Grade) <- \
py_rgx_string(grade_str, "(\w+).*?(\d+)")->(Student, Grade), enrolled_in_chemistry(Student)
?grade_of_chemistry_students(X, "100") # returns "abigail"

printing results for query 'enrolled_in_chemistry("jordan")':
[()]

printing results for query 'enrolled_in_chemistry("gale")':
[]

printing results for query 'enrolled_in_chemistry(X)':
    X
---------
 abigail
 jordan
 howard

printing results for query 'enrolled_in_physics_and_chemistry(X)':
   X
--------
 howard

printing results for query 'lecturer_of(X, "abigail")':
   X
--------
 walter
 linus

printing results for query 'grade_of_chemistry_students(X, "100")':
    X
---------
 abigail



# Useful tricks<a class="anchor" id="Usefull tricks"></a>
## Matching Outputs:
Let's write a rgxlog program that gets a table in which each row is a single string - string(str).
<br>
The program will create a new table in which each row is a string and its length.

### First try:

In [None]:
# Step 1: implement an IE function
def length(string):
    #here we append the input to the output inside the ie function!
    yield len(string), string

magic_session.register(length, "Length", [DataTypes.string], [DataTypes.integer, DataTypes.string])


In [None]:
%%rgxlog
# Let's test this solution:
new string(str)
string("a")
string("ab")
string("abc")
string("abcd")

string_length(Str, Len) <- string(Str), Length(Str) -> (Len, Str)
?string_length(Str, Len)

printing results for query 'string_length(Str, Len)':
  Str  |   Len
-------+-------
   a   |     1
  ab   |     2
  abc  |     3
 abcd  |     4



### It works
Our first IE function yield the input in addition to the output. This will ensure that we will get
the right output to his input. But, is this really necessary? Let's try another solution:


In [None]:
#here we don't append the input to the output inside the ie function!
def length2(string):
   yield len(string),

# Step 2: register the function
magic_session.register(length2, "Length2", [DataTypes.string], [DataTypes.integer])

In [None]:
%%rgxlog
# Let's test this solution:
new rel(str)
rel("a")
rel("ab")
rel("abc")
rel("abcd")

string_length(Str, Len) <- rel(Str), Length2(Str) -> (Len)
?string_length(Str, Len)

printing results for query 'string_length(Str, Len)':
  Str  |   Len
-------+-------
   a   |     1
  ab   |     2
  abc  |     3
 abcd  |     4



### It looks good, but why?
First we can see that the IE function yield only an output without any trace to the input. In addition, RGXLog stores all the inputs of each IE function in an input table and all the outputs in an output table.
Then it's joining the input table with the output table. So, why we still got the right solution?
This thanks to the fact that RGXlog stores the input bounded to it's output deductively.



## Logical Operators:
Suppose we have a table in which each row contains two strings - pair(str, str).
Our goal is to filter all the rows that contain the same value twice.
<br>
In other words, we want to implement the relation **not equals (NEQ)**.

We would like to have a rule such as:
<br>
```unique_pair(X, Y) <- pair(X, Y), X != Y```
<br><br>
Unfortunately RGXLog doesn't support True/False values. Therefore, we can't use ```X != Y```.
<br>
Our solution to this problem is to create an ie function that implements NEQ relation:

In [None]:
def NEQ(x, y):
    if x == y:
        # return false (empty tuple represents false)
        yield tuple() 
    else:
        #return true
        yield x, y

in_out_types = [DataTypes.string, DataTypes.string]
magic_session.register(NEQ, "NEQ", in_out_types, in_out_types)

In [None]:
%%rgxlog
#Lets test this solution
new pair(str, str)
pair("Dan", "Tom")
pair("Cat", "Dog")
pair("Apple", "Apple")
pair("Cow", "Cow")
pair("123", "321")

unique_pair(X, Y) <- pair(X, Y), NEQ(X, Y) -> (X, Y)
?unique_pair(X, Y)

printing results for query 'unique_pair(X, Y)':
  X  |  Y
-----+-----
 Dan | Tom
 Cat | Dog
 123 | 321



# Python Implementation v.s. RgxLog Implementation

let's try to compare coding in python and coding in rgxlog.
we are given two long strings of enrolled pairs, grades pairs.
our goal is to find all student that are enrolled in biology and chemistry, and have a GPA = 80.

## python 

In [None]:
import re
enrolled = "dave chemistry dave biology rem biology ram biology emilia physics roswaal chemistry roswaal biology roswaal physics"
grades = "dave 80 rem 66 ram 66 roswaal 100 emilia 88"

enrolled_pairs = re.findall(r"(\w+).*?(\w+)", enrolled)
grade_pairs = re.findall(r"(\w+).*?(\d+)", grades)
for student1, course1 in enrolled_pairs:
    for student2, course2 in enrolled_pairs:
        for student3, grade in grade_pairs:
            if (student1 == student2 == student3):
                if (course1 == "biology" and course2 == "chemistry" and int(grade) == 80):
                    print(student1)

dave


## rgxlog

In [None]:
%%rgxlog
enrolled = "dave chemistry dave biology rem biology ram biology emilia physics roswaal chemistry roswaal biology roswaal physics"
grades = "dave 80 rem 66 ram 66 roswaal 100 emilia 88"

enrolled_in(Student, Course) <- py_rgx_string(enrolled, "(\w+).*?(\w+)")->(Student, Course)
student_grade(Student, Grade) <- py_rgx_string(grades, "(\w+).*?(\d+)") -> (Student, Grade)
interesting_student(X) <- enrolled_in(X, "biology"), enrolled_in(X, "chemistry"), student_grade(X, "80")
?interesting_student(X)

printing results for query 'interesting_student(X)':
  X
------
 dave



in this case, the python implementation was long and unnatural. on the other hand, the rgxlog implementation was cleaner and allowed us to express our intentions directly, rather than dealing with annoying programming logic.

# Parsing JSON document using RgxLog

Rgxlog's JsonPath/JsonFullPath ie functions allow us to easily parse json documents using path expressions.<br>
We will demonstrate how to use the latter. Check out the [jsonpath repo](https://github.com/json-path/JsonPath) for more information.

First, we would like to remove the built-in jsonpath function, to show how we implement it from scratch:

In [None]:
magic_session.remove_ie_function("JsonPathFull")

After removing the function, implementing and registering it is as easy as:

In [None]:
import json
from jsonpath_ng import parse

def parse_match(match) -> str:
    """
    @param match: a match result of json path query.
    @return: a string that represents the match in string format.
    """
    json_result = match.value
    if type(json_result) != str:
        # we replace for the same reason as in json_path implementation.
        json_result = json.dumps(json_result).replace("\"", "'")
    return json_result

def json_path_full(json_document: str, path_expression: str):
    """
    @param json_document: The document on which we will run the path expression.
    @param path_expression: The query to execute.
    @return: json documents with the full results paths.
    """
    json_document = json.loads(json_document.replace("'", "\""))
    jsonpath_expr = parse(path_expression)
    for match in jsonpath_expr.find(json_document):
        json_result = str(match.full_path)
        # objects in full path are separated by dots.
        yield *json_result.split("."), parse_match(match)

JsonPathFull = dict(ie_function=json_path_full,
            ie_function_name='JsonPathFull',
            in_rel=[DataTypes.string, DataTypes.string],
            out_rel=lambda arity: [DataTypes.string] * arity,
            )

magic_session.register(**JsonPathFull)

And now for the usage. <br>
Suppose we have a json document of the following format {student: {subject: grade, ...} ,...} <br>
We want to create a rglox relation containing tuples of (student, subject, grade).

In [None]:
%%rgxlog

# we use strings, as RgxLog doesn't support dicts.
json_string = "{ \
                'abigail': {'chemistry': 80, 'operation systems': 99}, \
                'jordan':  {'chemistry': 65, 'physics': 70}, \
                'gale':    {'operation systems': 100}, \
                'howard':  {'chemistry': 90, 'physics':91, 'biology':92} \
                }"

# path expression is the path to the key of each grade (in our simple case it's *.*)
# then JsonPathFull appends the full path to the value
json_table(Student, Subject, Grade) <- JsonPathFull(json_string, "*.*") -> (Student, Subject, Grade)
?json_table(Student, Subject, Grade)

printing results for query 'json_table(Student, Subject, Grade)':
  Student  |      Subject      |   Grade
-----------+-------------------+---------
  abigail  |     chemistry     |      80
  abigail  | operation systems |      99
  jordan   |     chemistry     |      65
  jordan   |      physics      |      70
   gale    | operation systems |     100
  howard   |     chemistry     |      90
  howard   |      physics      |      91
  howard   |      biology      |      92

