# Metadata

**L1 Taxonomy** - Software Architecture & Design

**L2 Taxonomy** - Event-Driven Architecture

**Subtopic** - Implementing a Simple Event-Driven System

**Use Case** - Develop a Python-based Event Driven System which responds to changes in a local file. The system will watch a certain directory for changes, and upon detection, parse the modified contents of the text file. The parser will be able to sort and categorize inputs based on predefined patterns.

**Programming Language** - Python

**Target Model** - GPT-4o

# Setup

```requirements.txt
```


# Prompt

## Problem Overview

You need to implement a **log classification system** that processes new lines appended to a text file incrementally.

Your system must:

* Track how far it has read in the file (using a byte offset).
* Append new lines to the file.
* Read only the new lines added since the last read position.
* Categorize these new lines according to predefined rules.
* Return the categorized output.
* Update the last read position for the file so subsequent calls continue reading only new content.

This simulates a log ingestion process that incrementally consumes appended log data.


## Input Format

You must implement a class with the following interface:

```python
class LogProcessor:
    def __init__(self, file_path: str):
        """Initialize the processor for the given file."""
        pass

    def append_and_categorize(self, new_lines: list[str]) -> dict:
        """
        Append the provided lines to the file, then read only the
        newly appended content, categorize each line, and return
        the categorized results.

        Returns a dictionary with keys: "INFO", "ERROR", "WARNING", "DATE", "OTHER",
        each mapping to a list of categorized lines.

        Also updates internal read position to support incremental reads.
        """
        pass
```


## Output Format

The `append_and_categorize` function returns a dictionary:

```python
{
    "INFO": [...],
    "ERROR": [...],
    "WARNING": [...],
    "DATE": [...],
    "OTHER": [...]
}
```

Each value is a list of lines (strings) classified according to the rules below.


## Classification Rules

Classify each line into one or more of the following categories:

* `"INFO"` if the line starts with `"INFO"`
* `"ERROR"` if the line starts with `"ERROR"`
* `"WARNING"` if the line starts with `"WARNING"`
* `"DATE"` if the line contains a valid ISO 8601 date in the format `"YYYY-MM-DD"`
* `"OTHER"` if the line does not match any of the above categories

Lines should be stripped of leading and trailing whitespace before classification. Completely blank lines should be ignored and not included in the output.


## Example

Suppose the file initially is empty and you call:

```python
processor = LogProcessor("logs.txt")
result = processor.append_and_categorize([
    "INFO Service started",
    "ERROR 2025-07-10 Disk failure",
    "Random debug message"
])
```

The returned dictionary should be:

```python
{
    "INFO": ["INFO Service started"],
    "ERROR": ["ERROR 2025-07-10 Disk failure"],
    "WARNING": [],
    "DATE": ["ERROR 2025-07-10 Disk failure"],
    "OTHER": ["Random debug message"]
}
```

If you then call:

```python
result2 = processor.append_and_categorize([
    "WARNING High memory usage",
    "INFO Service restarted"
])
```

The result should be:

```python
{
    "INFO": ["INFO Service restarted"],
    "ERROR": [],
    "WARNING": ["WARNING High memory usage"],
    "DATE": [],
    "OTHER": []
}
```

# Requirements

## Explicit Requirements

* The class tracks the last read position internally per file.
* The append and categorize method reads only **newly appended** lines.
* Blank or whitespace-only lines are ignored.
* Categorization supports multiple categories per line.
* Output dictionary always contains all five categories.
* The method appends new lines to the file with newline separators.


## Function Signatures

```python
class LogProcessor:
    def __init__(self, file_path: str):
        pass

    def append_and_categorize(self, new_lines: list[str]) -> dict:
        pass
```


## Constraints

* File encoding is UTF-8.
* The number of new lines per call will not exceed 1000.
* The file size is limited to 10 MB.
* Only standard Python libraries may be used (`os`, `re`, etc.).

In [None]:
# code

"""
LogProcessor Module.

This module defines a LogProcessor class that allows incremental reading
and classification of lines appended to a log file. It supports appending
new lines to a text file, reading only newly appended content, and
classifying lines based on predefined patterns.

Each line may belong to multiple categories: INFO, ERROR, WARNING, DATE.
Lines that do not match any of these are classified as OTHER.

Lines are stripped of whitespace before processing. Completely blank
lines are ignored. The last read position is tracked to ensure only
new lines are processed on each call.
"""

import re


class LogProcessor:
    """
    A processor that appends and categorizes new lines to a log file.

    Tracks the last read position and categorizes only newly added
    content on each operation.
    """

    def __init__(self, file_path: str):
        """
        Initialize the processor for the given file path.

        Creates the file if it does not already exist.

        Args:
            file_path (str): Path to the log file.
        """
        self.file_path = file_path
        self.last_offset = 0

        # Ensure the file exists
        with open(self.file_path, 'a', encoding='utf-8'):
            pass

    def append_and_categorize(self, new_lines: list[str]) -> dict:
        """
        Append new lines to the log file and categorize them.

        Appends new lines to the file, reads only the newly written
        lines since the last read, and classifies them into INFO,
        ERROR, WARNING, DATE, or OTHER.

        Returns:
            dict: A dictionary with the following keys:
                  "INFO", "ERROR", "WARNING", "DATE", "OTHER".
                  Each maps to a list of matching lines.
        """
        # Append new lines to the file
        with open(self.file_path, 'a', encoding='utf-8') as file:
            for line in new_lines:
                file.write(line + '\n')

        # Prepare the result dictionary
        result = {
            "INFO": [],
            "ERROR": [],
            "WARNING": [],
            "DATE": [],
            "OTHER": []
        }

        # Read newly appended lines
        with open(self.file_path, 'r', encoding='utf-8') as file:
            file.seek(self.last_offset)
            new_content = file.readlines()
            self.last_offset = file.tell()

        # Regex pattern for ISO 8601 date format YYYY-MM-DD
        date_pattern = re.compile(r'\b\d{4}-\d{2}-\d{2}\b')

        for raw_line in new_content:
            line = raw_line.strip()
            if not line:
                continue

            categorized = False

            if line.startswith("INFO"):
                result["INFO"].append(line)
                categorized = True

            if line.startswith("ERROR"):
                result["ERROR"].append(line)
                categorized = True

            if line.startswith("WARNING"):
                result["WARNING"].append(line)
                categorized = True

            if date_pattern.search(line):
                result["DATE"].append(line)
                categorized = True

            if not categorized:
                result["OTHER"].append(line)

        return result


In [None]:
# tests
"""Unit tests for the LogProcessor class."""

import unittest
import os
import tempfile
from main import LogProcessor


class TestLogProcessor(unittest.TestCase):
    """Test cases for the LogProcessor class."""

    def setUp(self):
        """Create a temporary file and LogProcessor before each test."""
        self.temp_file = tempfile.NamedTemporaryFile(
            delete=False, mode='w+', encoding='utf-8'
        )
        self.processor = LogProcessor(self.temp_file.name)

    def tearDown(self):
        """Remove the temporary file after each test."""
        self.temp_file.close()
        os.remove(self.temp_file.name)

    def test_single_info_line(self):
        """Test a single INFO line is categorized correctly."""
        result = self.processor.append_and_categorize(
            ["INFO This is an info message"]
        )
        self.assertEqual(result["INFO"], ["INFO This is an info message"])

    def test_single_error_line(self):
        """Test a single ERROR line is categorized correctly."""
        result = self.processor.append_and_categorize(
            ["ERROR Something went wrong"]
        )
        self.assertEqual(result["ERROR"], ["ERROR Something went wrong"])

    def test_single_warning_line(self):
        """Test a single WARNING line is categorized correctly."""
        result = self.processor.append_and_categorize(["WARNING Check this"])
        self.assertEqual(result["WARNING"], ["WARNING Check this"])

    def test_single_date_line(self):
        """Test a line with only a date gets categorized as DATE."""
        result = self.processor.append_and_categorize(
            ["Event on 2025-07-14"]
        )
        self.assertEqual(result["DATE"], ["Event on 2025-07-14"])

    def test_line_with_multiple_categories(self):
        """Test line matching WARNING and DATE categories."""
        line = "WARNING System failure at 2025-07-14"
        result = self.processor.append_and_categorize([line])
        self.assertIn(line, result["WARNING"])
        self.assertIn(line, result["DATE"])

    def test_line_with_no_category(self):
        """Test unclassified lines go to OTHER."""
        result = self.processor.append_and_categorize(["Random text here"])
        self.assertEqual(result["OTHER"], ["Random text here"])

    def test_blank_line_ignored(self):
        """Test completely blank lines are ignored."""
        result = self.processor.append_and_categorize(["   "])
        for category in result:
            self.assertEqual(result[category], [])

    def test_multiple_lines_various_categories(self):
        """Test multiple lines with different categories."""
        lines = [
            "INFO Starting service",
            "ERROR Failed to bind port",
            "WARNING Low memory",
            "Logged on 2025-01-01",
            "Unrecognized format"
        ]
        result = self.processor.append_and_categorize(lines)
        self.assertIn(lines[0], result["INFO"])
        self.assertIn(lines[1], result["ERROR"])
        self.assertIn(lines[2], result["WARNING"])
        self.assertIn(lines[3], result["DATE"])
        self.assertIn(lines[4], result["OTHER"])

    def test_multiple_date_matches(self):
        """Test multiple lines with date patterns."""
        lines = ["Date1: 2022-01-01", "Date2: 1999-12-31"]
        result = self.processor.append_and_categorize(lines)
        self.assertEqual(result["DATE"], lines)

    def test_line_matches_info_and_date(self):
        """Test line matching both INFO and DATE categories."""
        line = "INFO System started on 2023-03-15"
        result = self.processor.append_and_categorize([line])
        self.assertIn(line, result["INFO"])
        self.assertIn(line, result["DATE"])

    def test_multiple_calls_accumulate_correctly(self):
        """Test correct categorization after multiple calls."""
        self.processor.append_and_categorize(["INFO First"])
        result = self.processor.append_and_categorize(["ERROR Second"])
        self.assertEqual(result["ERROR"], ["ERROR Second"])
        self.assertEqual(result["INFO"], [])

    def test_only_new_lines_are_processed(self):
        """Test that only newly added lines are read and categorized."""
        self.processor.append_and_categorize(["INFO First"])
        self.processor.append_and_categorize(["WARNING Second"])
        result = self.processor.append_and_categorize(["2025-07-14"])
        self.assertEqual(result["DATE"], ["2025-07-14"])
        self.assertEqual(result["INFO"], [])
        self.assertEqual(result["WARNING"], [])

    def test_line_with_trailing_newline(self):
        """Test lines with newline characters are stripped."""
        result = self.processor.append_and_categorize(["INFO Hello\n"])
        self.assertIn("INFO Hello", result["INFO"])

    def test_line_with_mixed_case_prefix(self):
        """Test lowercase prefixes are not matched as categories."""
        result = self.processor.append_and_categorize(
            ["info lowercase should be OTHER"]
        )
        self.assertIn("info lowercase should be OTHER", result["OTHER"])
        self.assertEqual(result["INFO"], [])

    def test_line_with_embedded_info(self):
        """Test embedded keywords do not trigger categorization."""
        result = self.processor.append_and_categorize(["This is not INFO"])
        self.assertIn("This is not INFO", result["OTHER"])

    def test_no_lines_passed(self):
        """Test behavior when no lines are passed."""
        result = self.processor.append_and_categorize([])
        for category in result:
            self.assertEqual(result[category], [])

    def test_whitespace_line(self):
        """Test line with only whitespace is ignored."""
        result = self.processor.append_and_categorize(["   \t  \n"])
        for category in result:
            self.assertEqual(result[category], [])

    def test_multiple_categories_accumulate_properly(self):
        """Test multiple categories accumulate correctly."""
        self.processor.append_and_categorize(["INFO A", "INFO B"])
        result = self.processor.append_and_categorize(
            ["ERROR C", "WARNING D", "ERROR E"]
        )
        self.assertEqual(result["INFO"], [])
        self.assertEqual(result["ERROR"], ["ERROR C", "ERROR E"])
        self.assertEqual(result["WARNING"], ["WARNING D"])

    def test_reading_only_appended_content(self):
        """Test manually appended content is skipped."""
        self.processor.append_and_categorize(["INFO Start"])
        with open(self.temp_file.name, 'a', encoding='utf-8') as f:
            f.write("Manually added line\n")
        result = self.processor.append_and_categorize(["ERROR Forced"])
        self.assertEqual(result["ERROR"], ["ERROR Forced"])


# Model Breaking Proof

#### Model Breaking Task URL: https://rlhf-v3.turing.com/prompt/e2f6f980-914c-4a35-892d-0a7d70afc39f?origin=https%3A%2F%2Flabeling-ms.turing.com&redirect_url=https%3A%2F%2Flabeling-ms.turing.com%2Fconversations%2F74261%2Fview

#### Model code:

```python
import os
import re

class LogProcessor:
    def __init__(self, file_path: str):
        """Initialize the processor for the given file."""
        self.file_path = file_path
        self.last_offset = 0
        
        # Ensure the file is created
        with open(self.file_path, 'a', encoding='utf-8') as f:
            pass
    
    def append_and_categorize(self, new_lines: list[str]) -> dict:
        """Append new lines, read new content, categorize, and return results."""
        # Append new lines to the file
        with open(self.file_path, 'a', encoding='utf-8') as f:
            for line in new_lines:
                f.write(line + '\n')
        
        # Prepare to read only the newly appended content
        categorized_logs = {
            "INFO": [],
            "ERROR": [],
            "WARNING": [],
            "DATE": [],
            "OTHER": []
        }
        
        # Read from the file starting at the last read position
        with open(self.file_path, 'r', encoding='utf-8') as f:
            f.seek(self.last_offset)
            new_content = f.readlines() # Read newly appended lines
            self.last_offset = f.tell() # Update offset for next time
        
        # Categorize the new content
        date_pattern = re.compile(r"\d{4}-\d{2}-\d{2}")
        
        for line in map(str.strip, new_content):
            if not line:
                continue
            
            categorized = False
            
            if line.startswith("INFO"):
                categorized_logs["INFO"].append(line)
                categorized = True
            
            if line.startswith("ERROR"):
                categorized_logs["ERROR"].append(line)
                categorized = True
            
            if line.startswith("WARNING"):
                categorized_logs["WARNING"].append(line)
                categorized = True
            
            if date_pattern.search(line):
                categorized_logs["DATE"].append(line)
                # <Issue>: Does not set the categorized flag to true when the line is categorized under DATE. This causes all date only lines to be categorized in DATE as well as OTHER.

            if not categorized:
                # This means it wasn't categorized in INFO, ERROR, or WARNING
                categorized_logs["OTHER"].append(line)
        
        return categorized_logs
```