# Patterns : Pattern Specification, Patternic Algebra, and Improved Search

In skilly hands, `gAutoy` patterns provide efficient mechanism for spotlighting relevant information from logs. 
 `gAutoy` patterns were inspired by [`Python` regular expressions](https://docs.python.org/2/library/re.html) and have similar functionality advanced and adopted to the purposes of log search (in contrast to convential text search with regexps).

## Prerequisites

Before trace analysis with `gautoy` pattern facilities, configure `gautoy`, create logger instance with `get_logger()` and load traces (or connect logger for traces listening)

In [1]:
import gautoy
from gautoy import pattern
gautoy.init_printing()
from gautoy.core.output import pprint
#gautoy.core.config.set_option('target.ip', '196.1.1.1')
logs = gautoy.get_logger()

import os
#os.chdir(r'')
logs.load(r'/path/to/trace.xaa')

gautoy.core.config.set_option(r'display.log.output', 'Message TimeStamp')

## Defining Patterns

### Basic patterns

*Basic* (or so called *atomic*) patterns provide basic log search functionality, and serve as building blocks for specilication of more complicates patternic structures. 
`gAutoy` provides two ways to define atomic patterns: via C format strings and via [`Python` regular expressions](https://docs.python.org/2/library/re.html). 
Atomic patterns of both types can be created with function `compile()` from subpackage `gautoy.pattern`.

The easiest way to define atomic pattern in `gAutoy` is to use **C format** based patterns. 
In this case, the user must specify C format string and the list with mnemonic names for each of the formatted pattern fields.
User does not have control on parsing and formatting of pattern fields, these are fully driven by C format string.
Next cell specifies pattern for the new car position message:

In [None]:
patternCarPosition = pattern.compile(r'some log entry format for car position: ts[%d] route[%d] lon=%d lat=%d linkId=%d heading=%f link.heading=%f',
                                     ['ts', 'route', 'lon_WGS84', 'lat_WGS84', 'linkId', 'heading', 'link_heading'])
patternCarPosition

C format based pattern for MOST messages has a following definition:

In [None]:
pattern.compile(r'Some MOST bus trace which contains info about %4X, %4X %4X %2X %2X %3X %1X %4X %s',
                ['no', 'from', 'to', 'funcBlock', 'device', 'funcId', 'opCode', 'size', 'data'])

In contrast, **regexp** based patterns somewhat trickier to define, but they provide full controll on pattern's parsers and formatters.
Here is a regexp based patterns for MOST messages similar to the previous one:

In [None]:
patternMOST = pattern.compile(r"Some MOST bus trace which contains info about (?P<from>[0-9A-F]+) (?P<to>[0-9A-F]+) "\
                              r"(?P<funcBlock>[0-9A-F]+) (?P<device>[0-9A-F]+) (?P<funcId>[0-9A-F]+) "\
                              r"(?P<opCode>[0-9A-F]) (?P<size>[0-9A-F]+) (?P<data>([0-9A-F][0-9A-F])*)",
                                         parsers = {
                                             'funcBlock': lambda s: int(s, 16),
                                             'device'   : lambda s: int(s, 16),
                                             'funcId'   : lambda s: int(s, 16),
                                             'opCode'   : lambda s: int(s, 16),
                                             'size'     : lambda s: int(s, 16),
                                             'data'     : lambda s: tuple(int(x+y,16) for x,y in zip(s[0::2], s[1::2]))},
                                         formatters = {
                                             'funcBlock': lambda x: r'{0:02X}'.format(x),
                                             'device'   : lambda x: r'{0:02X}'.format(x),
                                             'funcId'   : lambda x: r'{0:03X}'.format(x),
                                             'opCode'   : lambda x: r'{0:1X}'.format(x),
                                             'size'     : lambda x: r'{0:04X}'.format(x),
                                             'data'     : lambda x: ''.join('%02X'%a for a in x),}
                             )
patternMOST

Note, some pattern fields are left without parsers and formatters (e.g., `data`). These fields are treated as normal strings.

Here is a bit more sophisticated regexp based pattern for message notifying about screenshots

In [None]:
import datetime
patternDoingScreenshot = pattern.compile(r'Some trace to catch screenshot with filename '\
                                         r'(?P<fullname>(?P<name>something(?P<ext>\.png)) taken at (?P<datetime>(?P<date>\d+)(?P<time>\d+))(?P<type>.+))',
                                         parsers = {
                                             'datetime': lambda d: datetime.datetime.strptime(d, '%Y%m%d-%H%M%S'),
                                             'date'    : lambda d: datetime.datetime.strptime(d, '%Y%m%d').date(),
                                             'time'    : lambda d: datetime.datetime.strptime(d, '%H%M%S').time()},
                                         formatters = {
                                             'datetime': lambda d: d if isinstance(d, basestring) else d.strftime('%Y%m%d-%H%M%S'),
                                             'date'    : lambda d: d if isinstance(d, basestring) else d.strftime('%Y%m%d'),
                                             'time'    : lambda d: d if isinstance(d, basestring) else d.strftime('%H%M%S')}
                                        )
patternDoingScreenshot

Note, in this pattern we use nested groups (e.g., group `time` is a part of group `datetime`, the latter is a part of `name` and `fullname`). You cannot easily make nested groups inside 

In order to narrow search, the user has an opportunity to **specialize** pattern -- assign certain values to the pattern fields.
To do it, simply list the groups with the new values in brackets as shown below. 
This will return you a new narrower pattern. 

In [None]:
patternMOST(funcId=0xC2F,device=0, size=1, data = [0x1A, 0x10])

Fields also can be accessed by their positions (which is the only way to access ananymous fields):

In [None]:
patternMOST(1,'0018', 6, 0xC2F, size=1, data=[0x1A])

### Advanced searcheable structures and algebra of patterns

Atomic patterns are not only simple trace search instruments, but also builing blocks for advanced **searcheable structures**: *multiline patterns* and *pattern multi-sets* (sets with possible repetitions). `gAutoy`'s algebra of patterns provides an easy mechanism to construct both.

#### Multiline patterns

We suggest to use multiline patterns, if the user has a situation when the function/method of interest consequently writes several log messages, each of which conveys piece of relevant information.

To define multiline patterns, use either bit-wise and `&` or multiplication `*`:

In [10]:
a = pattern.compile('a')
b = pattern.compile('b')
x = a & b
x, a * b

(pattern(r'a')&pattern(r'b'), pattern(r'a')&pattern(r'b'))

You can use `&=` and/or `*=` operations as well.
Finally, power taking operator `**` allows to repeat atomic patterns in multiline pattern

In [11]:
x &= a**2
x

pattern(r'a')&pattern(r'b')&pattern(r'a')&pattern(r'a')

Multiline patterns will search groups of closest messages positioned in the given order.
E.g., if the log contains messages 
`['c', 'a', 'c', 'a', 'c', 'b', 'c', 'b', 'a', 'b', 'x', 'a', 'x', 'x']`,
pattern `x` will return single match with message positions `(3 ,7, 8, 11)`:

In [12]:
log_content = list('cacacbcbabxaxx')
match_indices = 3,7,8,11
[log_content[i] for i in match_indices]

#### Pattern multi-sets

Pattern multi-sets serve the purposes of handling several patterns as a whole. E.g., to apply several patterns to log in arbitrary order.

To define multiline patterns, use either bit-wise or `|` or addition `+`:

In [13]:
x = a + b
x

pattern(r'a') | 
pattern(r'b')

You can also use `|=` and/or `+=` operations.

In [14]:
x |= a**2 * b
x

pattern(r'a') | 
pattern(r'b') | 
pattern(r'a')&pattern(r'a')&pattern(r'b')

#### More on algebra

Note that both ways to define patterns -- via arithmetic operators (`*`, `+`) and via bit-wise operators (`&`, `|`) -- are totally similar.

Next note (for mathematicians), patternic "algebra" is not an algebra in a common mathematical sense --
it does not define neither algebra, nor even ring of patterns, though it satisfies most axioms of non-commutative rings. E.g., it follows associative and distributive laws, but it does not have additive identity element (so consequently lacks additive inverses).

In [15]:
a = pattern.compile('a')
b = pattern.compile('b')
c = pattern.compile('c')

a**2 * (b + c)

pattern(r'a')&pattern(r'a')&pattern(r'b') | 
pattern(r'a')&pattern(r'a')&pattern(r'c')

## Search methods

### Conventional search methods

Search facilities of basic patterns and patternic structures are similar (almost 1-2-1) to those provided by `Python` regexps. 
Next table lists the most popular search methods:

<table>
<tr>
<th>Method</th>
<th>Description</th>
</tr>
<tr>
<td>`finditer`</td>
<td>Return an iterator yielding MatchObject instances over all matches for the pattern in `log_frame`. The `log_frame` is scanned top-to-down, and matches are returned in the order found.</td>
</tr>
<tr>
<td>`findall`</td>
<td>Return all matches of pattern in `log_frame`, as a list of matches. The `log_frame` is scanned top-to-down, and matches are returned in the order found.</td>
</tr>
<tr>
<td>`match`</td>
<td>If message at the given line `pos` in `log_frame` matches pattern, return a corresponding MatchObject instance (otherwise `None`).</td>
</tr>
<tr>
<td>`search`</td>
<td>Scan through `log_frame` looking for the first location where the pattern produces a match, and return a corresponding MatchObject instance (otherwise `None`).</td>
</tr>
</table>

Note that all these methods return either MatchObject of MatchObject lists if succeed. If nothing is found, they return `None`.

See a couple of examples below

In [None]:
patternDoingScreenshot.findall(logs)

In [None]:
patternDoingScreenshot(type='KOMBI').findall(logs)

In [None]:
patternDoingScreenshot.search(logs, pos=100)

In [22]:
patternDoingScreenshot.match(logs, pos=100)

### Match objects

Match objects does not only allow pretty output in notebook environment, but also give access to pattern match information user can be interested in. There are two sorts of match objects: atomic pattern matches and multi-line pattern matches (Pattern multi-sets do not have own match type and return individual match objects atomic and multi-line patterns, they are composed of).

Below is a lists of the most common match object properties:

<table>
<tr>
<th>Property</th>
<th>Description</th>
</tr>
<tr>
<td>`pattern`</td>
<td>Host pattern for this match object.</td>
</tr>
<tr>
<td>`pos`</td>
<td>Line no in the log where the match starts.</td>
</tr>
<tr>
<td>`endpos`</td>
<td>Line no of message which follows the match in the log.</td>
</tr>
</table>

Named match fields can be accesssed as attributes for both atomic and multi-line match objects. 
In atomic match objects, you can access messages time stamps and other fields directly as properties.
Information about individual matched messages in multiline match can be accessed by indices. 

In [None]:
m = patternDoingScreenshot.search(logs, pos=100)
if m: print( r'Screenshot "{0}" was taken at {1} (TimeStamp:{2}, line:{3})'.format(m.fullname, m.time, m.TimeStamp, m.pos))

### Message waiting

In order to wait for messages while logger is connected and listens for trace messages, you can use method `wait()` available in atomic patter. In this method you specify logger and maximum waiting time (timeout) in seconds. If *timeout lapses* while logger still have not received message, pattern throws *run-time exception*:

In [26]:
line = pattern.compile('x'*10).wait(logs, 10)
if line: print('{0} {1}'.format(line, logs.Message[line-1]))

RuntimeError: Pattern waiting timeout is lapsed

Otherwise (message is found) it returns the line number next to the observed pattern match.

In [None]:
patternCarPosition = pattern.compile(r'Some trace log for car position which includes ts[%d] route[%d] lon=%d lat=%d linkId=%d heading=%f link.heading=%f',
                                     ['ts', 'route', 'lon_WGS84', 'lat_WGS84', 'linkId', 'heading', 'link_heading'])
line = patternCarPosition.wait(logs, 10)
if line: print('{0} "{1}"'.format(line, logs.Message[line-1]))

### Assigning callbacks to patterns

Assigning callbacks to patterns allows to handle match objects on flight (without collecting them in lists). It is particularly useful for pattern multi-sets (one can use `finditer()` method for this purpose in case of atomic and multi-line patterns).

At first, obtain callable patterns by calling method `call()` with callback function as parameter. And then apply method `walk()` for the composed pattern. See example below:

In [None]:
def cb_mdatDumpManeuver(r): print(r'Maneuver {0.man_id} moves to road "{0.road_prefix}{0.road_no}"'.format(r))
def cb_patternDoingScreenshot(r): 
    if r.date != datetime.date.today(): 
        print(r'Old screenshot "{0.fullname}" (date:{0.date}; today:{1})'.format(r, datetime.date.today()))

( (mdatDumpManeuver & mdatWhereToRoad).call(cb_mdatDumpManeuver) 
 | patternDoingScreenshot.call(cb_patternDoingScreenshot) ).walk(logs) 

### Patternic classes

Patternic classes allow to customize pattern handling. In particular, method `walk()` of patternic classes allows to apply methods which handle patterns in a way simlar to applying callbacks with method `walk()` of patternic structures. Moreover, you can inhrerit pattern handlers from parents. 

In order to make class patternic use decorator `@patternic`. 
Decorator `@handler` allows to define methods that handle patterns.

Next example extends example with callbacks from section *"Assigning callbacks to patterns"* by printing car pasition in from of information about message matches.

In [None]:
@pattern.patternic
class WithCarPosition(object):
    def __init__(self, log):
        self.car_position = 0,0

    @pattern.handler(patternCarPosition)
    def storeCarPosition(self,r): 
        self.car_position = gautoy.converter.WGS84_to_latlon(r.lon_WGS84, r.lat_WGS84)

@pattern.patternic
class Notifier(WithCarPosition): # inherit from class which stores current car position
    def __init__(self, log): 
        # Walk immediately inside constructor
        self.walk(log)
        
    def message(self, msg): 
        # Decorate string output with current car position
        print('[lat={0[0]} lon={0[1]}] {1}'.format(self.car_position, msg))

    @pattern.handler(patternDoingScreenshot)
    def handleDoingScreenshot(self,r): 
        if r.date != datetime.date.today(): 
            self.message(r'Old screenshot "{0.fullname}" (date:{0.date}; today:{1})'.format(r, datetime.date.today()))

    @pattern.handler(mdatDumpManeuver & mdatWhereToRoad)
    def handleWhereToRoad(self,r): self.message(r'Maneuver {0.man_id} moves to road "{0.road_prefix}{0.road_no}"'.format(r))

Notifier(logs) # Run Notifier constructor (which automatically calls ``walk()``)
None

## Other

Somtimes it is useful to be aware of pattern which can be used for search directly inside logger backend. For this purposes you can use method `logger_pattern()`

In [None]:
print(patternMOST(funcId=0xC2F,device=0, size=1).logger_pattern(logs))