# Semantic API Demo

`Neuro-symbolic programming` is a paradigm for `artificial intelligence` and cognitive computing that combines the strengths of both neural networks and symbolic reasoning.

`Neural networks`, also known as deep learning, are a type of machine learning algorithm that are inspired by the structure and function of the human brain. They are particularly good at tasks such as image recognition, natural language processing, and decision making. However, they are not as good at tasks that require explicit reasoning, such as planning, problem solving, and understanding causal relationships.

`Symbolic reasoning`, on the other hand, is a type of reasoning that uses formal languages and logical rules to represent knowledge and perform tasks such as planning, problem solving, and understanding causal relationships. Symbolic reasoning systems are good at tasks that require explicit reasoning but are not as good at tasks that require pattern recognition or generalization, such as image recognition or natural language processing.

Neuro-symbolic programming aims to combine the strengths of both neural networks and symbolic reasoning to create AI systems that can perform a wide range of tasks. One way this is done is by using neural networks to extract information from data and then using symbolic reasoning to make inferences and decisions based on that information. Another way is by using symbolic reasoning to guide the training of neural networks and make them more interpretable.

Some examples of tasks that may benefit from this integration are Robotics and Intelligent systems, AI based game, Image Captioning, Natural Language Processing, autonomous agents, and assistive technology.

Overall, Neuro-Symbolic programming is an active field of research, many AI experts believe that the integration of neural networks and symbolic reasoning is crucial to creating truly intelligent AI systems.

Below we show an example how Neuro-symbolic programming can close the gap between classical software engineering and modern data science:

<img src="../assets/images/img5.png" width="720px">

This allows for the following computational stack:

<img src="../assets/images/img1.png" width="720px">

### Get Imports

In [1]:
import os
import warnings
warnings.filterwarnings('ignore')
os.chdir('../') # set the working directory to the root of the project
from botdyn import *
from IPython.display import display

## API Illustration

Similar to `word2vec` we intend to preform contextualized operations on different symbols. 

Word2vec is a machine learning algorithm that is used to generate dense vector representations of words. It works by training a shallow neural network to predict a word given its neighbors in a text corpus. The resulting vectors are then used in a wide range of natural language processing applications, such as sentiment analysis, text classification, and clustering.

Below we can see an example how one can perform operations on the word embedding vectors (colored boxes).

<img src="../assets/images/img3.png" width="470px">

In [2]:
Symbol('King - Man + Women').expression()

<class 'botdyn.symbol.Symbol'>(value=Queen)

We now show a compositional pattern how we define the `Semantic API`. The `Symbol` class is the base class for all further definitions an is refered as a terminal symbol. An expression non-terminal symbol. It inherits all the properties from Symbol and overrides the `__call__` function to an evaluation of its currently held statement.. The simplest expression when resolved results again just in an Symbol. From the `Expression` class all other expressions can be derived. The Expression class also adds new functions to `fetch` web URLs, `search` on the internet or `open` files.

Expressions can of course have more complex structures, such as shown in the example of the DQL expression.
The DQL expression has can hold multiple expressions.

<img src="../assets/images/img2.png" width="720px">

The Symbol class holds several operations how to manipulate the current Symbol. However, operations can be overriden by sub-classing the Symbol or Expression class. The resulting expression can then define new ways or context how an operation should behave. 

In a more general term, a currated prompt based on operations from the Symbol class are defined as follows:

<img src="../assets/images/img4.png" width="390px">

Each prompt can have an optional `Global Context` followed by an `Operation` definition with optional `Expamples`. In the end, the prompt can be followed by an optional `Template` enclosing the starting point of the model prediction.

### Convert to Symbol

A `Symbol` takes in any type of object and preserves its original value type:

In [3]:
# convert to symbol
sym = Symbol("This is a test string.")
sym.type()

str

In [4]:
sym = Symbol(np.array([5, 2, 42, 1]))
sym.type()

numpy.ndarray

One can also easily retriev the object by accessing `value`:

In [5]:
sym.value

array([ 5,  2, 42,  1])

### Showing basic operations

Sometimes we can to simply concatinate to Symbols together without any other neural operation. This is easily done with the `@` operation:

In [6]:
# define a second string
sym = Symbol("Welcome to our tutorial.")
test2 = 'Hello world!'
# concatenate strings
res = sym @ test2
res

<class 'botdyn.symbol.Symbol'>(value=Welcome to our tutorial.Hello world!)

When using the `+` operations the neural engine tries to combine two symbols by best effort. In this case it preserves proper spacing between two strings:

In [7]:
# combine strings with neural engine
res = sym + test2
res

<class 'botdyn.symbol.Symbol'>(value=Welcome to our tutorial. Hello world!)

As we saw above, we can evaluate expressions also by best effort and as you see, we can also combine different data types with the basic Symbol operations, as long as one object is a Symbol type. The neural engine will then try to resolve the expression by best effort:

In [8]:
# here the engine evaluates the expression
res = Symbol('5') + '5'
res

<class 'botdyn.symbol.Symbol'>(value=10)

### More Sophisticated Examples

We can of course define full sentences as Symbols and perform several operations on them.

In [9]:
sym = Symbol("""Dynatrace offers several incident management system integrations, such as PagerDuty, VictorOps or OpsGenie that offer SMS alert notification channels.
Regards,

Wolfgang
The_AM
 Dynatrace Champion The_AM
Dynatrace Champion
In response to wolfgang_beer
27 Aug 2019 04:20 AM

Hi Ruchi, It's also possible to write a custom integration if one of those offerings is not available. Such as receiving an email or webhook which is converted to send through your SMS gateway. The other options involve querying the problems API and triggering SMS from there.
There are many options available to accomplish this.

Regards, Andrew""")

Here we translate the existing Symbol to German:

In [10]:
sym.translate('German')

<class 'botdyn.symbol.Symbol'>(value=Dynatrace bietet verschiedene Integrationsmöglichkeiten für das Incident Management-System wie z.B. PagerDuty, VictorOps oder OpsGenie, die SMS-Benachrichtigungskanäle anbieten. Sollte eines dieser Angebote nicht verfügbar sein, ist es auch möglich, eine benutzerdefinierte Integration zu schreiben, wie beispielsweise das Empfang)

And now, we try to classify how the mood of the customer is:

In [11]:
sym.choice(['angry', 'neutral', 'hate-speech', 'happy', 'unk'], default='unk')

<class 'botdyn.symbol.Symbol'>(value=neutral)

## DQL Example

A more real-world like example is to try to generate queries from an domain specific language. In this case the `DQL (Dynatrace Query Language)` expression is defined and passes as an global context the syntax of the DQL language. The DQL expression is then used to generate queries based on the given context. We can then use the generated queries to fetch data from the Dynatrace API. To also show how the DQL operates in the background we will wrap the DQL expression with a `Log` expression. The Log expression logs the current state of the DQL expression to the `outputs/engine.log` file.

In [1]:
import os
import warnings
warnings.filterwarnings('ignore')
os.chdir('../') # set the working directory to the root of the project
from botdyn import *
from IPython.display import display

In [2]:
from examples.dql import DQL
from examples.docs import Docs, CppDocs
docs = Docs()
dql = DQL()

The following query is sent to the neural computation engine and creates a query based on the given context:

In [3]:
res = dql('Query all logs and show the difference between fields and fieldsAdd.\
           While the fields command defines the result table by the fields specified, \
           the fieldsAdd command adds new fields to the existing fields.')
display(res)

We can now try to further manipulate the result by asking the model to incorporate additional information, such as filtering to a specific time range:

In [4]:
res = res << 'limit the query to the last 10 minutes'
display(res)

In [5]:
res.update(feedback="""Explanation: add limits as close as possible to fetch statements:
                    fetch logs, from:now()-10m | limit 50 | fields timestamp, message, content, severity = lower(loglevel)""");

In [6]:
res.clear();

We can also try to remove unwanted fields from the generated query. Notice how the model tries to remove not only the given statement but attributes assitiated with them:

In [7]:
res -= '| fieldsAdd'
display(res)

To wrap up, we might want to go back full circle and ask the model to generate again the explataion based on the given query:

We can also query our result to show us suggestions how to further adapt the query:

In [8]:
answer = res.query("How can you limit the number of results to 30 for an DQL query?")
display(answer)

And we can now even try to convert our query to a more familiar domain specific language, such as `SQL`:

In [9]:
sql_res = res.convert("SQL")
display(sql_res)

In [10]:
answer = res.query("What does this query do?")
display(answer)

In [18]:
locale = Symbol(answer).translate('Spanish') # update prompt to avoid field translations
print(locale)

Esta consulta recupera las últimas 100 entradas de registro de los últimos 10 minutos y devuelve la marca de tiempo, la fuente de registro, el mensaje y el nombre del contenedor Kubernetes de cada entrada.


In [13]:
formatted_docs = docs(locale)

In [17]:
print(formatted_docs)

"""Retrieves the last 100 log entries from the last 10 minutes and returns the timestamp, log source, message, and Kubernetes container name of each entry.

Args:
    start_time (int, optional): The timestamp in milliseconds of the start of the time range to query. Defaults to the current time minus 10 minutes.
    end_time (int, optional): The timestamp in milliseconds of the end of the time range to query. Defaults to the current time.
    limit (int, optional): The maximum number of log entries to return. Defaults to 100.

Returns:
    List[Dict]: A list of dictionaries, each with the keys "timestamp", "source", "message", and "container_name".
"""


## Documentation Example

We can try to use the Semantic API to generate documentations based on a specific documentation style:

In [73]:
from examples.docs import Docs, CppDocs
docs = Log(Docs())

In this example we generate documentaiton based on python coding conventions:

In [33]:
doc = docs("""def execute(default: str = None,
            constraints: List[Callable] = [],
            pre_processor: List[PreProcessor] = [],
            post_processor: List[PostProcessor] = [],
            *wrp_args,
            **wrp_kwargs):
    def decorator(func):
        @functools.wraps(func)
        def wrapper(wrp_self, *args, **kwargs):
            return execute_func(wrp_self, 
                                func=func,
                                code=str(wrp_self),
                                constraints=constraints, 
                                default=default, 
                                pre_processor=pre_processor, 
                                post_processor=post_processor,
                                wrp_args=wrp_args,
                                wrp_kwargs=wrp_kwargs,
                                args=args, kwargs=kwargs)
        return wrapper
    return decorator
""")
doc

<class 'examples.docs.Docs'>(value="""Applies constraints and pre/post-processing to a function.

Args:
    default (str, optional): The default value to be returned if the task cannot be solved. Defaults to None.
    constraints (List[Callable], optional): A list of contrains applied to the model output to verify the output. Defaults to [].
    pre_processor (List[PreProcessor], optional): A list of pre-processors to be applied to the input and shape the input to the model. Defaults to [].
    post_processor (List[PostProcessor], optional): A list of post-processors to be applied to the model output and before returning the result. Defaults to []. 
    *wrp_args: Additional arguments to be passed to the wrapped function.
    **wrp_kwargs: Additional keyword arguments to be passed to the wrapped function.

Returns:
    function: A decorated function that applies the constraints, pre/post-processing, and additional arguments/keyword arguments when executed.
""")

Here the same for C++ conding conventions:

In [34]:
cppdoc = Log(CppDocs())

In [35]:
doc = cppdoc("""DLLEXPORT int research_native_hyperscan_match_direct_bytebuffers(void * db_ptr, void * scratch_ptr, void * matches_ptr,
    void * scan_from_ptr, int max_matches, int n_chars) {

    hs_database_t * db = (hs_database_t *) db_ptr;
    hs_scratch_t * scratch = (hs_scratch_t *) scratch_ptr;
    char * string = (char *) scan_from_ptr;
    unsigned int * matches_area = (unsigned int *) matches_ptr;

    Hyperscan_buffered_positional_match hbpm = {&matches_area[0], max_matches, 0};

    hs_error_t err = hs_scan(db, string, n_chars, 0, scratch, on_match_counter_record_positions_and_ids_buffered, &hbpm);
    if (err != HS_SUCCESS) {
        if (err != HS_SCAN_TERMINATED) {
            throw std::runtime_error("ERROR: Unable to finish scanning the input string (matches att:"
            + std::to_string(hbpm.current) + "). Exiting.\n");
        }
    }

    return hbpm.current;
}
""")
doc

<class 'examples.docs.CppDocs'>(value=/**
 * Research native Hyperscan match direct bytebuffers.
 * 
 * @param db_ptr The pointer to the Hyperscan database.
 * @param scratch_ptr The pointer to the Hyperscan scratch.
 * @param matches_ptr The pointer to the matches area.
 * @param scan_from_ptr The pointer to the string to be scanned.
 * @param max_matches The maximum number of matches.
 * @param n_chars The number of characters to be scanned.
 * 
 * @return The number of matches.
 */
DLLEXPORT int research_native_hyperscan_match_direct_bytebuffers(void * db_ptr, void * scratch_ptr, void * matches_ptr,
    void * scan_from_ptr, int max_matches, int n_chars);)

## Handling large / long context lengths

As we saw earlier, we use create contextual prompts to define the context and operations of our model. However, this takes away a lot of our context size and since the GPT-3 context length is already fairily limited to 4097 tokens, this might quickly become a problem. Luckily we can use the `Stream` processing expression. This expression opens up a data stream and computes the remaining context length for prcoessing the input data. Then it chunks the sequence and computes the result for each chunk. The chunks can be processed with a `Sequence` expression, that allows multiple chained operations in sequential manner.

In the following example we extract the news from a particular website and try to recombine all individual chunks again by clustering the information among the chunks and then recombining them. This gives us a way to consolidate contextually related inforamtion and recombine them in a meaningful way. Furthermore, the clustered information can then be labeled by looking / streaming through the values within the cluster and collecting the most relevant labels.

<img src="../assets/images/img6.png" width="720px">

If we repeat this process, we now get a way of building up a hierarchical cluster with labels as entry points to allow information retrieval from our constructed structure.

In [2]:
import os
from examples.news import News

The following expression generate news websites based on an URL.

In [3]:
# crawling the website and creating an own website based on its facts
news = News(url='https://www.cnbc.com/cybersecurity/',
            pattern='cnbc',
            filters=ExcludeFilter('sentences about subscriptions, licensing, newsletter'),
            render=True)
expr = Log(Trace(news))
res = expr()
os.makedirs('results', exist_ok=True)
path = os.path.abspath('results/news.html')
res.save(path, replace=False)

2023-01-13 11:59:31,980 Driver [/system/user/dinu/.wdm/drivers/chromedriver/linux64/102.0.5005/chromedriver] found in cache
2023-01-13 11:59:31,980 Driver [/system/user/dinu/.wdm/drivers/chromedriver/linux64/102.0.5005/chromedriver] found in cache
2023-01-13 11:59:31,980 Driver [/system/user/dinu/.wdm/drivers/chromedriver/linux64/102.0.5005/chromedriver] found in cache
2023-01-13 11:59:31,980 Driver [/system/user/dinu/.wdm/drivers/chromedriver/linux64/102.0.5005/chromedriver] found in cache
2023-01-13 11:59:31,980 Driver [/system/user/dinu/.wdm/drivers/chromedriver/linux64/102.0.5005/chromedriver] found in cache
2023-01-13 11:59:31,980 Driver [/system/user/dinu/.wdm/drivers/chromedriver/linux64/102.0.5005/chromedriver] found in cache
2023-01-13 11:59:31,980 Driver [/system/user/dinu/.wdm/drivers/chromedriver/linux64/102.0.5005/chromedriver] found in cache


<botdyn.backend.engine_crawler.CrawlerEngine objec, <function Expression.fetch.<locals>._func at 0x7ff2fc4fc550>, {'wrp_self': <class 'examples.news.N ['<html lang="en" prefix="og=https://ogp.me/ns#" itemscope="" itemtype="https://schema.org/WebPage">
<botdyn.backend.engine_gpt3.GPT3Engine object at 0, <function Symbol.clean.<locals>._func at 0x7ff2fc4fc5e0>, {'wrp_self': <class 'botdyn.symbol.Symbo ['\n\nCybersecurity Skip Navigation Markets Pre-Markets U.S. Markets Europe Markets China Markets As
<botdyn.backend.engine_gpt3.GPT3Engine object at 0, <function Symbol.isinstanceof at 0x7ff2fc5494c0>, {'wrp_self': <class 'botdyn.symbol.Symbol'>(valu ['True']
<botdyn.backend.engine_gpt3.GPT3Engine object at 0, <function Symbol.outline.<locals>._func at 0x7ff22f9ea700>, {'wrp_self': <class 'botdyn.symbol.Sym ['\n- Meta sues Voyager Labs for creating fake accounts to scrape user data\n- Dark web criminal min
<botdyn.backend.engine_gpt3.GPT3Engine object at 0, <function Symbol.filter.<locals>

<class 'botdyn.symbol.Symbol'>(value=<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>News</title>
    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/css/bootstrap.min.css">
    <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.1/jquery.min.js"></script>
    <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.2.3/dist/js/bootstrap.bundle.min.js"></script>
    <style>
      body {
        background-color: #303030;
        color: #fbfbfb;
        font-family: 'Arial', sans-serif;
      }
      h1 {
        text-align: center;
      }
    </style>
  </head>
  <body>
  <h1>News Headlines</h1>
  <div class="container">
  <h2>Recent Technology Executive Council Headlines</h2>
  <div class="row">
    <div class="col-md-6">
      <ul class="list-group">
        <li class="list-group-item list-group-item-action list-group-item-primary">Meta has f

Another example is to read in a PDF file and extract the text from it to create a website based on its content.

In [76]:
import os
from examples.paper import Paper

In [77]:
paper = Paper(path='examples/paper.pdf')
expr = Log(Trace(paper))
res = expr(n_pages=1)
os.makedirs('results', exist_ok=True)
path = os.path.abspath('results/news.html')
res.save(path, replace=False)

<botdyn.backend.engine_file.FileEngine object at 0, <function Expression.open.<locals>._func at 0x7f00655fd0d0>, {'wrp_self': <class 'examples.paper.P ['Large Language Models are Zero-Shot Reasoners\nTakeshi Kojima\nThe University of Tokyo\nt.kojima@w
<botdyn.backend.engine_gpt3.GPT3Engine object at 0, <function Symbol.style.<locals>._func at 0x7f006554ef70>, {'wrp_self': <class 'botdyn.symbol.Symbo ['\n<!doctype html>\n<html lang="en">\n  <head>\n    <meta charset="utf-8">\n    <meta name="viewpor


TypeError: 'NoneType' object is not callable