# Advanced IE functions

spannerlog has been enhanced with additional advanced IE functions. 
To utilize these functions, specific installations are required prior to usage. <br>

Rust: To download and utilize the Rust-based ie functions, execute the following code:

In [None]:
#| skip_showdoc: true
#| skip_exec: true

In [None]:
#| output: false
from spannerlib.ie_func.rust_spanner_regex import download_and_install_rust_regex
download_and_install_rust_regex()

enum-spanner-rs was not found on your system
installing package. this might take up to 10 minutes...
info: syncing channel updates for '1.34-x86_64-unknown-linux-gnu'



  1.34-x86_64-unknown-linux-gnu unchanged - rustc 1.34.2 (6c2484dc3 2019-05-13)



info: checking for self-update
    Updating git repository `https://github.com/NNRepos/enum-spanner-rs`
  Installing enum-spanner-rs v0.1.0 (https://github.com/NNRepos/enum-spanner-rs#4c8ab5b3)
error: binary `enum-spanner-rs` already exists in destination as part of `enum-spanner-rs v0.1.0 (https://github.com/NNRepos/enum-spanner-rs#4c8ab5b3)`
Add --force to overwrite
installation completed


# Wrapping shell-based functions

spannerlog's `rgx_string` ie function is a good example of running an external shell as part of spannerlog code, <br>
`rgx_string` is a rust-based ie function, we can use it only after we installed the rust package. <br>
This time we won't remove the built-in function - we'll just show the implementation:

```python
def rgx(text, regex_pattern, out_type: str):
    """
    An IE function which runs regex using rust's `enum-spanner-rs` and yields tuples of strings/spans (not both).

    @param text: the string on which regex is run.
    @param regex_pattern: the pattern to run.
    @param out_type: string/span - decides which one will be returned.
    @return: a tuple of strings/spans.
    """
    with tempfile.TemporaryDirectory() as temp_dir:
        rgx_temp_file_name = os.path.join(temp_dir, TEMP_FILE_NAME)
        with open(rgx_temp_file_name, "w+") as f:
            f.write(text)

        if out_type == "string":
            rust_regex_args = rf"{REGEX_EXE_PATH} {regex_pattern} {rgx_temp_file_name}"
            format_function = _format_spanner_string_output
        elif out_type == "span":
            rust_regex_args = rf"{REGEX_EXE_PATH} {regex_pattern} {rgx_temp_file_name} --bytes-offset"
            format_function = _format_spanner_span_output
        else:
            assert False, "illegal out_type"

        regex_output = format_function(run_cli_command(rust_regex_args, stderr=True))

        for out in regex_output:
            yield out

def rgx_string(text, regex_pattern):
    """
    @param text: The input text for the regex operation.
    @param regex_pattern: the pattern of the regex operation.
    @return: tuples of strings that represents the results.
    """
    return rgx(text, regex_pattern, "string")

RGX_STRING = dict(ie_function=rgx_string,
                  ie_function_name='rgx_string',
                  in_rel=RUST_RGX_IN_TYPES,
                  out_rel=rgx_string_out_type)

# another version of these functions exists (rgx_from_file), it can be seen in the source code
```

`run_cli_command` is an STDLIB function used in spannerlog, which basically runs a command using python's `Popen`.

in order to denote regex groups, use `(?P<name>pattern)`. the output is in alphabetical order.
Let's run the ie function:

In [None]:
import spannerlib

In [None]:
%%spannerlog
text = "zcacc"
pattern = "(?P<group_not_c>[^c]+)(?P<group_c>[c]+)"
string_rel(X,Y) <- rgx_string(text, pattern) -> (X,Y)
?string_rel(X,Y)

printing results for query 'string_rel(X, Y)':
  X  |  Y
-----+-----
  a  | cc
  a  |  c
  z  |  c



Similarly, to use nlp-based ie functions you need to first install nlp:

In [None]:
from spannerlib.ie_func.nlp import download_and_install_nlp
download_and_install_nlp()

In [None]:
%%spannerlog
sentence = "Hello world. Hello world again."
tokens(X, Y) <- Tokenize(sentence) -> (X, Y)
?tokens(Token, Span)

printing results for query 'tokens(Token, Span)':
  Token  |   Span
---------+----------
  Hello  |  [0, 5)
  world  | [6, 11)
    .    | [11, 12)
  Hello  | [13, 18)
  world  | [19, 24)
  again  | [25, 30)
    .    | [30, 31)

