# Metadata

**L1 Taxonomy** - Backend Integration

**L2 Taxonomy** - Webhooks

**Subtopic** - Logging and auditing all received webhook events for debugging

**Use Case** - Develop a Python script that receives webhook events, logs all incoming requests and saves them to a local JSON file for future auditing and debugging.

**Programming Language** - Python

**Target Model** - GPT-4o

# Setup

```requirements.txt
```



# Prompt

I want to build a Python system that receives webhook POST events and logs all incoming requests. Each request is in JSON format and will be simulated as raw string input.

My system should validate every incoming JSON webhook payload for structural correctness, ensure all required fields are present and correct and then persist the entire payload into a local file named webhook_audit_log.json. Each payload should be appended as a new object in a JSON array.

The structure of a valid webhook payload is:

```
{
  "event_id": "string (uuid format)",
  "event_type": "string (user.created, payment.failed)",
  "timestamp": "ISO 8601 UTC string",
  "source": {
    "ip": "IPv4 format",
    "user_agent": "string"
  },
  "payload": {
    "user": {
      "id": "integer",
      "email": "valid email string"
    },
    "metadata": {
      "tags": ["list of strings, optional"],
      "priority": "low | medium | high"
    }
  }
}
```

**Input Format**

- A JSON string

**Output Format**

- String showing the status of the request.
- Output appended to file.

**Examples**

```
Input:
'{"event_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "event_type": "user.created", "timestamp": "2025-07-10T12:00:00Z", "source": {"ip": "192.168.1.1", "user_agent": "Mozilla"}, "payload": {"user": {"id": 101, "email": "user@example.com"}, "metadata": {"priority": "high"}}}'

Output:
"Logged"

webhook_audit_log.json:
[
  {
    "event_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
    "event_type": "user.created",
    "timestamp": "2025-07-10T12:00:00Z",
    "source": {
      "ip": "192.168.1.1",
      "user_agent": "Mozilla/5.0"
    },
    "payload": {
      "user": {
        "id": 123,
        "email": "user@example.com"
      },
      "metadata": {
        "priority": "high"
      }
    }
  }
]
```
```
Input:
'{"event_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "event_type": "user.created", "timestamp": "2030-01-01T00:00:00Z", "source": {"ip": "192.168.1.1", "user_agent": "Mozilla"}, "payload": {"user": {"id": 101, "email": "user@example.com"}, "metadata": {"priority": "high"}}}'

Output:
"Invalid timestamp"

```


# Requirements

**Explicit and Implicit Points**

- Validate JSON structure before anything else
- Do not allow event timestamps in the future (relative to UTC)
- Validate that all required fields are present
- Prevent duplicate event_ids from being logged
- Use uuid.UUID, datetime.datetime, ipaddress and re to validate formats


**Solution Expectations**

- Open and append to a local JSON array in webhook_audit_log.json
- Perform all validation steps before writing
- Raise no exceptions, all errors must be caught and turned into strings


**Function Signature**

```python
def log_webhook_event(payload: str) -> str:
    pass
```

**Edge Case Behavior**

- Errors should be logged as:
  - Same event_id twice, "Duplicate event"
  - Missing priority, "Bad payload"
  - Timestamp in the future, "Invalid timestamp"
  - List of tags missing, Valid
  - priority not in ["low", "medium", "high"], "Bad payload"
  - IP not valid IPv4, "Bad payload"
- If the output string is some error nothing will be written to the file
- If payload string is out of bound return -1 and do nothing else, and do the same if webhook_audit_log.json does not exist

**Constraints**

- If the existing file is malformed JSON, reset it to an empty array
- Use only built in Python libraries
- Payload string must be 10 <= len <= 500

In [1]:
# code

"""Webhook audit logger module.

Validates and logs webhook events into a JSON file,
ensuring data integrity and preventing duplicates.
"""

import json
import uuid
import datetime
import ipaddress
import re
import os


def log_webhook_event(payload: str) -> str:
    """Validate and log a webhook event into the audit log file.

    Args:
        payload (str): A JSON string containing the webhook event.

    Returns:
        str: Status message indicating result of the operation.
    """
    if not (10 <= len(payload) <= 500):
        return "-1"

    if not os.path.exists("webhook_audit_log.json"):
        return "-1"

    try:
        data = json.loads(payload)
    except json.JSONDecodeError:
        return "Bad payload"

    required_fields = [
        "event_id", "event_type", "timestamp", "source", "payload"
    ]
    for field in required_fields:
        if field not in data:
            return "Bad payload"

    try:
        uuid.UUID(data["event_id"])
    except (ValueError, TypeError):
        return "Bad payload"

    try:
        timestamp = datetime.datetime.fromisoformat(
            data["timestamp"].replace("Z", "+00:00")
        )
        if timestamp > datetime.datetime.now(datetime.timezone.utc):
            return "Invalid timestamp"
    except Exception:
        return "Invalid timestamp"

    source = data.get("source")
    if not isinstance(source, dict):
        return "Bad payload"
    if "ip" not in source or "user_agent" not in source:
        return "Bad payload"

    try:
        ip = ipaddress.ip_address(source["ip"])
        if not isinstance(ip, ipaddress.IPv4Address):
            return "Bad payload"
    except ValueError:
        return "Bad payload"

    if not isinstance(source["user_agent"], str):
        return "Bad payload"

    payload_data = data.get("payload")
    if not isinstance(payload_data, dict):
        return "Bad payload"

    user = payload_data.get("user")
    if not isinstance(user, dict):
        return "Bad payload"

    if "id" not in user or not isinstance(user["id"], int):
        return "Bad payload"

    if "email" not in user or not isinstance(user["email"], str):
        return "Bad payload"

    email_regex = r"^[\w\.-]+@[\w\.-]+\.\w+$"
    if not re.match(email_regex, user["email"]):
        return "Bad payload"

    metadata = payload_data.get("metadata")
    if not isinstance(metadata, dict):
        return "Bad payload"

    if "priority" not in metadata:
        return "Bad payload"

    if metadata["priority"] not in ["low", "medium", "high"]:
        return "Bad payload"

    try:
        with open("webhook_audit_log.json", "r") as f:
            try:
                logs = json.load(f)
                if not isinstance(logs, list):
                    logs = []
            except json.JSONDecodeError:
                logs = []
    except Exception:
        logs = []

    is_duplicate = any(
        isinstance(entry, dict) and
        entry.get("event_id") == data["event_id"]
        for entry in logs
    )
    if is_duplicate:
        return "Duplicate event"

    logs.append(data)

    try:
        with open("webhook_audit_log.json", "w") as f:
            json.dump(logs, f, indent=2)
    except Exception:
        return "File write error"

    return "Logged"


In [3]:
# tests

import unittest
import tempfile
import os
import json
from main import log_webhook_event

class TestLogWebhookEvent(unittest.TestCase):
    def setUp(self):
        self.tempdir = tempfile.TemporaryDirectory()
        self.original_cwd = os.getcwd()
        os.chdir(self.tempdir.name)

    def tearDown(self):
        os.chdir(self.original_cwd)
        self.tempdir.cleanup()

    def test_payload_length_too_short(self):
        short_payload = '{}'  # length < 10
        result = log_webhook_event(short_payload)
        self.assertEqual(result, -1)
        self.assertFalse(os.path.exists('webhook_audit_log.json'))

    def test_payload_length_too_long(self):
        long_payload = '"' + 'a' * 499 + '"'  # total length 501
        result = log_webhook_event(long_payload)
        self.assertEqual(result, -1)
        self.assertFalse(os.path.exists('webhook_audit_log.json'))

    def test_file_missing_returns_error(self):
        if os.path.exists('webhook_audit_log.json'):
            os.remove('webhook_audit_log.json')
        valid_payload = json.dumps({
            "event_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
            "event_type": "user.created",
            "timestamp": "2020-01-01T00:00:00Z",
            "source": {"ip": "192.168.1.1", "user_agent": "Mozilla"},
            "payload": {
                "user": {"id": 101, "email": "user@example.com"},
                "metadata": {"priority": "low"}
            }
        })
        result = log_webhook_event(valid_payload)
        self.assertEqual(result, -1)
        self.assertFalse(os.path.exists('webhook_audit_log.json'))

    def test_valid_event_logged(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('[]')
        payload = json.dumps({
            "event_id": "11111111-1111-1111-1111-111111111111",
            "event_type": "payment.failed",
            "timestamp": "2025-01-01T00:00:00Z",
            "source": {"ip": "10.0.0.1", "user_agent": "TestAgent"},
            "payload": {
                "user": {"id": 42, "email": "foo@bar.com"},
                "metadata": {"priority": "medium"}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Logged")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(len(data), 1)
        self.assertEqual(data[0]["event_id"], "11111111-1111-1111-1111-111111111111")
        self.assertEqual(data[0]["payload"]["metadata"]["priority"], "medium")

    def test_duplicate_event_id(self):
        with open('webhook_audit_log.json', 'w') as f:
            json.dump([{"event_id": "22222222-2222-2222-2222-222222222222"}], f)
        payload = json.dumps({
            "event_id": "22222222-2222-2222-2222-222222222222",
            "event_type": "user.created",
            "timestamp": "2024-01-01T00:00:00Z",
            "source": {"ip": "127.0.0.1", "user_agent": "Agent"},
            "payload": {
                "user": {"id": 1, "email": "x@y.com"},
                "metadata": {"priority": "high"}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Duplicate event")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(len(data), 1)

    def test_missing_priority(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('[]')
        payload = json.dumps({
            "event_id": "33333333-3333-3333-3333-333333333333",
            "event_type": "user.created",
            "timestamp": "2023-01-01T00:00:00Z",
            "source": {"ip": "1.2.3.4", "user_agent": "UA"},
            "payload": {
                "user": {"id": 2, "email": "test@test.com"},
                "metadata": {}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Bad payload")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(len(data), 0)

    def test_invalid_timestamp_future(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('[]')
        future_time = "2100-01-01T00:00:00Z"
        payload = json.dumps({
            "event_id": "44444444-4444-4444-4444-444444444444",
            "event_type": "payment.failed",
            "timestamp": future_time,
            "source": {"ip": "8.8.8.8", "user_agent": "UA"},
            "payload": {
                "user": {"id": 3, "email": "a@b.com"},
                "metadata": {"priority": "low"}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Invalid timestamp")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(len(data), 0)

    def test_invalid_ip_format(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('[]')
        payload = json.dumps({
            "event_id": "55555555-5555-5555-5555-555555555555",
            "event_type": "user.created",
            "timestamp": "2020-01-01T00:00:00Z",
            "source": {"ip": "999.999.999.999", "user_agent": "UA"},
            "payload": {
                "user": {"id": 4, "email": "test@e.com"},
                "metadata": {"priority": "medium"}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Bad payload")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(len(data), 0)

    def test_invalid_priority_value(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('[]')
        payload = json.dumps({
            "event_id": "66666666-6666-6666-6666-666666666666",
            "event_type": "payment.failed",
            "timestamp": "2020-01-01T00:00:00Z",
            "source": {"ip": "127.0.0.1", "user_agent": "Agent"},
            "payload": {
                "user": {"id": 5, "email": "ok@ok.com"},
                "metadata": {"priority": "urgent"}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Bad payload")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(len(data), 0)

    def test_optional_tags_missing(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('[]')
        payload = json.dumps({
            "event_id": "77777777-7777-7777-7777-777777777777",
            "event_type": "user.created",
            "timestamp": "2020-01-01T00:00:00Z",
            "source": {"ip": "123.123.123.123", "user_agent": "UA"},
            "payload": {
                "user": {"id": 6, "email": "tt@tt.com"},
                "metadata": {"priority": "high"}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Logged")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertNotIn("tags", data[0]["payload"]["metadata"])

    def test_tags_present(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('[]')
        payload = json.dumps({
            "event_id": "99999999-9999-9999-9999-999999999999",
            "event_type": "payment.failed",
            "timestamp": "2025-01-01T00:00:00Z",
            "source": {"ip": "2.2.2.2", "user_agent": "UA"},
            "payload": {
                "user": {"id": 7, "email": "yes@yes.com"},
                "metadata": {"priority": "low", "tags": ["test", "sample"]}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Logged")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(data[0]["payload"]["metadata"]["tags"], ["test", "sample"])

    def test_malformed_json_file_resets(self):
        with open('webhook_audit_log.json', 'w') as f:
            f.write('not a json')
        payload = json.dumps({
            "event_id": "88888888-8888-8888-8888-888888888888",
            "event_type": "payment.failed",
            "timestamp": "2025-01-01T00:00:00Z",
            "source": {"ip": "1.1.1.1", "user_agent": "UA"},
            "payload": {
                "user": {"id": 7, "email": "mm@mm.com"},
                "metadata": {"priority": "low"}
            }
        })
        result = log_webhook_event(payload)
        self.assertEqual(result, "Logged")
        with open('webhook_audit_log.json') as f:
            data = json.load(f)
        self.assertEqual(len(data), 1)

if __name__ == '__main__':
    unittest.main(argv=[''])

ModuleNotFoundError: No module named 'main'

# Model Breaking Proof

#### Model Breaking Task URL: <https://rlhf-v3.turing.com/prompt/26abccdd-da36-47a6-98be-a914e1c211e2?origin=https%3A%2F%2Flabeling-ms.turing.com&redirect_url=https%3A%2F%2Flabeling-ms.turing.com%2Fconversations%2F74348%2Fview>

#### Model code:

```python
import json
import uuid
import datetime
import ipaddress
import re
import os

def log_webhook_event(payload: str) -> str:
    # Constants
    FILENAME = 'webhook_audit_log.json'
    
    # Helper functions
    def is_valid_email(email):
        return re.match(r"[^@]+@[^@]+\.[^@]+", email)
    
    # <Issue>: <No validation if webhook_audit_log.json exists or not, it should check for the file and if it does not exist then return -1>
    
    # Validate payload length
    if not (10 <= len(payload) <= 500):
        return -1
    # <Issue>: <It should log a error "Bad Payload", instead it returns -1, which is invalid>
    # Parse payload
    try:
        data = json.loads(payload)
    except json.JSONDecodeError:
        return -1  # If parsing fails, structure is invalid.
    
    # Validate event_id
    try:
        uuid.UUID(data.get('event_id'))
    except (ValueError, TypeError):
        return "Bad payload"
    
    # Validate event_type
    if data.get('event_type') not in ["user.created", "payment.failed"]:
        return "Bad payload"

    # Validate timestamp
    try:
        event_time = datetime.datetime.fromisoformat(data['timestamp'].replace('Z', '+00:00'))
        if event_time > datetime.datetime.now(datetime.timezone.utc):
            return "Invalid timestamp"
    except (ValueError, KeyError):
        return "Bad payload"
    
    # Validate source
    source = data.get('source', {})
    try:
        ipaddress.IPv4Address(source.get('ip', ''))
    except ipaddress.AddressValueError:
        return "Bad payload"
    
    if not source.get('user_agent'):
        return "Bad payload"
    
    # Validate payload
    payload = data.get('payload', {})
    user = payload.get('user', {})
    if not isinstance(user.get('id'), int) or not is_valid_email(user.get('email', '')):
        return "Bad payload"
    
    metadata = payload.get('metadata', {})
    if metadata.get('priority') not in ["low", "medium", "high"]:
        return "Bad payload"

    # Read existing log file
    if not os.path.exists(FILENAME):
        log = []
    else:
        try:
            with open(FILENAME, 'r') as f:
                log = json.load(f)
        except json.JSONDecodeError:
            log = []  # Reset to empty if malformed
    
    # Check for duplicate event_id
    if any(entry['event_id'] == data['event_id'] for entry in log):
        return "Duplicate event"
    
    # Append new event and write back to log file
    log.append(data)
    with open(FILENAME, 'w') as f:
        json.dump(log, f, indent=2)
    
    return "Logged"
```