feat: add synthetic surveillance event generator utility#139
Conversation
|
Warning Review limit reached
More reviews will be available in 53 minutes. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a synthetic event generator utility module ( ChangesSynthetic Event Generation Utility
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_synthetic_event_generator.py (1)
29-53: ⚡ Quick winAdd tests for deterministic output and interval validation.
Given the feature goal, include a deterministic-generation test (same seed ⇒ same payload sequence) and a negative/zero
interval_secondsvalidation test to prevent regressions.Suggested test additions
+def test_generate_events_is_reproducible_with_seed(): + events_a = generate_events(count=3, seed=123) + events_b = generate_events(count=3, seed=123) + assert events_a == events_b + +def test_generate_events_rejects_invalid_interval(): + with pytest.raises(ValueError): + generate_events(count=3, interval_seconds=0)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_synthetic_event_generator.py` around lines 29 - 53, Add two tests to tests/test_synthetic_event_generator.py: one that verifies deterministic output by calling generate_events with a fixed seed (e.g., seed=123) twice and asserting the two returned event lists are identical (use generate_events function and EVENT_TYPES to validate payloads), and another that asserts generate_events raises ValueError when called with interval_seconds=0 or a negative value to enforce interval validation (test the function generate_events for invalid interval_seconds). Ensure the tests use the same helpers already present (generate_events, EVENT_TYPES) and follow the existing pytest patterns (with pytest.raises for the negative case).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@utils/synthetic_event_generator.py`:
- Around line 17-24: generate_event currently uses global randomness which
prevents reproducible datasets; add an optional rng: Optional[random.Random] =
None (or seed: Optional[int]) parameter to generate_event and use
rng.choice/rng.random instead of random.*; update any batch functions in this
file (e.g., the batch generation functions that call generate_event in the 28-44
and 81-95 regions) to accept/propagate the same rng or seed and construct a
local random.Random when a seed is provided so callers can pass a deterministic
RNG instance for reproducible output.
- Around line 84-95: The generator currently allows interval_seconds <= 0 which
can produce duplicate or reversed timestamps; add a validation at the start of
the function that checks interval_seconds is an int > 0 and raise
ValueError("interval_seconds must be greater than 0") (or similar) if not; place
this check alongside the existing count check in the same function (the one
returning the list comprehension that calls generate_event with
timestamp=start_time + timedelta(seconds=i * interval_seconds)) and leave the
rest of the logic (start_time defaulting and list comprehension) unchanged.
---
Nitpick comments:
In `@tests/test_synthetic_event_generator.py`:
- Around line 29-53: Add two tests to tests/test_synthetic_event_generator.py:
one that verifies deterministic output by calling generate_events with a fixed
seed (e.g., seed=123) twice and asserting the two returned event lists are
identical (use generate_events function and EVENT_TYPES to validate payloads),
and another that asserts generate_events raises ValueError when called with
interval_seconds=0 or a negative value to enforce interval validation (test the
function generate_events for invalid interval_seconds). Ensure the tests use the
same helpers already present (generate_events, EVENT_TYPES) and follow the
existing pytest patterns (with pytest.raises for the negative case).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 59fb0757-95ff-4ff7-a8d9-7815e657d223
📒 Files selected for processing (3)
tests/test_synthetic_event_generator.pyutils/__init__.pyutils/synthetic_event_generator.py
| def generate_event( | ||
| event_type: Optional[str] = None, | ||
| person_id: Optional[int] = None, | ||
| timestamp: Optional[datetime] = None, | ||
| ) -> Dict[str, Any]: | ||
| if event_type is None: | ||
| event_type = random.choice(EVENT_TYPES) | ||
|
|
There was a problem hiding this comment.
Add deterministic generation support to meet the reproducibility objective.
Event generation always uses global randomness, so callers cannot reproduce the same dataset across runs. Add an optional seed or RNG parameter and thread it through batch generation.
Proposed fix
-def generate_event(
+def generate_event(
event_type: Optional[str] = None,
person_id: Optional[int] = None,
timestamp: Optional[datetime] = None,
+ rng: Optional[random.Random] = None,
) -> Dict[str, Any]:
+ rng = rng or random
if event_type is None:
- event_type = random.choice(EVENT_TYPES)
+ event_type = rng.choice(EVENT_TYPES)
@@
if person_id is None:
- person_id = random.randint(1, 50)
+ person_id = rng.randint(1, 50)
@@
"x": random.randint(0, 1920),
"y": random.randint(0, 1080),
},
- "confidence": round(random.uniform(0.65, 0.99), 2),
- "metadata": _build_metadata(event_type),
+ "confidence": round(rng.uniform(0.65, 0.99), 2),
+ "metadata": _build_metadata(event_type, rng),
}
-def _build_metadata(event_type: str) -> Dict[str, Any]:
+def _build_metadata(event_type: str, rng: random.Random) -> Dict[str, Any]:
@@
- "zone_id": random.choice(["restricted_lab", "server_room", "staff_only"]),
- "severity": random.choice(["medium", "high"]),
+ "zone_id": rng.choice(["restricted_lab", "server_room", "staff_only"]),
+ "severity": rng.choice(["medium", "high"]),
@@
-def generate_events(
+def generate_events(
count: int = 10,
start_time: Optional[datetime] = None,
interval_seconds: int = 30,
+ seed: Optional[int] = None,
) -> List[Dict[str, Any]]:
+ rng = random.Random(seed) if seed is not None else random
@@
return [
- generate_event(timestamp=start_time + timedelta(seconds=i * interval_seconds))
+ generate_event(
+ timestamp=start_time + timedelta(seconds=i * interval_seconds),
+ rng=rng,
+ )
for i in range(count)
]Also applies to: 28-44, 81-95
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@utils/synthetic_event_generator.py` around lines 17 - 24, generate_event
currently uses global randomness which prevents reproducible datasets; add an
optional rng: Optional[random.Random] = None (or seed: Optional[int]) parameter
to generate_event and use rng.choice/rng.random instead of random.*; update any
batch functions in this file (e.g., the batch generation functions that call
generate_event in the 28-44 and 81-95 regions) to accept/propagate the same rng
or seed and construct a local random.Random when a seed is provided so callers
can pass a deterministic RNG instance for reproducible output.
| interval_seconds: int = 30, | ||
| ) -> List[Dict[str, Any]]: | ||
| if count <= 0: | ||
| raise ValueError("count must be greater than 0") | ||
|
|
||
| if start_time is None: | ||
| start_time = datetime.now(timezone.utc) | ||
|
|
||
| return [ | ||
| generate_event(timestamp=start_time + timedelta(seconds=i * interval_seconds)) | ||
| for i in range(count) | ||
| ] |
There was a problem hiding this comment.
Validate interval_seconds to prevent invalid timestamp sequences.
interval_seconds currently accepts 0 or negative values, which can create duplicate or reverse-ordered timelines.
Proposed fix
def generate_events(
@@
) -> List[Dict[str, Any]]:
if count <= 0:
raise ValueError("count must be greater than 0")
+ if interval_seconds <= 0:
+ raise ValueError("interval_seconds must be greater than 0")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| interval_seconds: int = 30, | |
| ) -> List[Dict[str, Any]]: | |
| if count <= 0: | |
| raise ValueError("count must be greater than 0") | |
| if start_time is None: | |
| start_time = datetime.now(timezone.utc) | |
| return [ | |
| generate_event(timestamp=start_time + timedelta(seconds=i * interval_seconds)) | |
| for i in range(count) | |
| ] | |
| interval_seconds: int = 30, | |
| ) -> List[Dict[str, Any]]: | |
| if count <= 0: | |
| raise ValueError("count must be greater than 0") | |
| if interval_seconds <= 0: | |
| raise ValueError("interval_seconds must be greater than 0") | |
| if start_time is None: | |
| start_time = datetime.now(timezone.utc) | |
| return [ | |
| generate_event(timestamp=start_time + timedelta(seconds=i * interval_seconds)) | |
| for i in range(count) | |
| ] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@utils/synthetic_event_generator.py` around lines 84 - 95, The generator
currently allows interval_seconds <= 0 which can produce duplicate or reversed
timestamps; add a validation at the start of the function that checks
interval_seconds is an int > 0 and raise ValueError("interval_seconds must be
greater than 0") (or similar) if not; place this check alongside the existing
count check in the same function (the one returning the list comprehension that
calls generate_event with timestamp=start_time + timedelta(seconds=i *
interval_seconds)) and leave the rest of the logic (start_time defaulting and
list comprehension) unchanged.
|
Thanks for the review and suggestions. I've addressed the requested changes, including:
All tests are now passing. Please let me know if any further changes are needed. |
Description
This PR introduces a synthetic surveillance event generator utility to help developers create realistic test data for reasoning, analytics, and future AI workflows without requiring real surveillance footage.
Changes Made
Added
utils/synthetic_event_generator.pyAdded support for generating synthetic surveillance events:
Added configurable event generation with timestamps and metadata
Added JSON export functionality for generated events
Added unit tests for event creation, validation, and export behavior
Added package initialization file (
utils/__init__.py)Why This Change?
Currently, testing surveillance-related workflows requires manually creating event payloads or relying on external datasets. This utility provides a reusable way to generate realistic surveillance events for development, testing, demonstrations, and future benchmarking.
Testing
All tests pass successfully.
Related Issue
Fixes #138
Summary by CodeRabbit
New Features
Tests