Parsing Log files with Python
---

Writing Python snippets transcribing what linux commands oneliners do.

* Logs sourced from [loghub](https://github.com/logpai/loghub/blob/master/Apache/Apache_2k.log)

In [None]:
# Equivalent grep "error" Apache_2k.log
with open("data/Apache_2k.log", "r", encoding="utf-8", errors="replace") as f:
    for line in f:
        if "error" in line:
            print(line, end="")

In [None]:
# Equivalent to grep -i "error" Apache_2k.log
with open("data/Apache_2k.log", "r", encoding="utf-8") as f:
    for line in f:
        if "error" in line.lower():
            print(line)

In [None]:
# Equivalent grep -E "\[error\]" Apache_2k.log
import re

pattern = re.compile(r"\[error\]")
with open("data/Apache_2k.log", "r", encoding="utf-8") as f:
    for line in f:
        if pattern.search(line):
            print(line, end="")

In [None]:
# grep -n (line numbers)

with open("data/Apache_2k.log", "r", encoding="utf-8") as file:
    for i, line in enumerate(file, start=1):
        """
        Return an enumerate object.
        iterable an object supporting iteration
        The enumerate object yields pairs containing a count (from start, which defaults to zero) and a value yielded by the iterable argument
        """
        if "error" in line:
            print(f"{i}:\t{line}", end="\n")

In [11]:
# grep "error" file.log | wc -l
count = 0
with open("data/Apache_2k.log", "r", encoding="utf-8") as file:
    for line in file:
        if "error" in line:
            count += 1
count

595

 In the above similar to grep the snippet counts lines where "errors" shows. However, the snippet has error two times in some cases.
Creating an alternative to count for how many times the "error" shows per line
log line: [Mon Dec 05 18:56:04 2005] [error] mod_jk child workerEnv in error state 6

```bash
 grep "error" Apache_2k.log | wc -l
 595
```



In [14]:
count = 0
with open("data/Apache_2k.log", "r", encoding="utf-8") as file:
    for line in file:
        count += line.count("error")
        """
        count() Return the number of non-overlapping occurrences of substring sub in string S[start:end]
        """
count

1134