Let's do a model inference using the actual trace generated from a real-world software system (Week 4).
In this exercise, we will use `covasim` and focus on the function calls without their line numbers.

In [None]:
trace_file = 'data/lineTrace.csv'
trace = []
with open(trace_file, 'r') as f:
    for line in f:
        event = line.strip().split(':')[0]
        if '__' not in event:  # ignore internal functions
            trace.append(event)

We can see that `trace` contains a sequence of events (without line numbers).

In [None]:
short_trace = trace[:100]
short_trace

The set of unique events in the trace becomes the alphabet of the model.

In [None]:
alphabet = set(short_trace)
print(f'length of the trace: {len(short_trace)}, size of the alphabet: {len(alphabet)}')

From the above, we can see that the size of the alphabet is significantly lower than the length of the trace.
This means the trace contains a lot of redundant events, and therefore, the inferred model could be much more compact than the trace.

Since the input of `ktail` in `ktail.py` is words (list of traces separated by a `sep`), we need to convert the trace into a list of words.

In [None]:
seperator = ' '
word = seperator.join(short_trace)
words = [word]

In [None]:
from ktail import ktail
k = 1
model = ktail(words, k, sep=seperator)

In [None]:
from datetime import datetime
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
model.draw(f'{k}-tail-result-{timestamp}')

Can you interpret the model? If not, you may want to do the followings:
- Try different `k` values
- Try to filter out unnecessary events from the trace