Skip to content

fix: cap preview_num_records to entity row count in replace workflow#59

Merged
lipikaramaswamy merged 1 commit into
mainfrom
lipikaramaswamy/fix/preview-row-count
Mar 18, 2026
Merged

fix: cap preview_num_records to entity row count in replace workflow#59
lipikaramaswamy merged 1 commit into
mainfrom
lipikaramaswamy/fix/preview-row-count

Conversation

@lipikaramaswamy
Copy link
Copy Markdown
Collaborator

Summary

When preview() is called with Substitute replacement, llm_replace_workflow.py filters the input down to only entity-bearing rows before passing to the adapter. Previously it passed the original preview_num_records (total dataset size) unchanged, causing Data Designer's seed generator to wrap around and produce duplicate rows.

The fix caps preview_num_records to len(entity_rows) before the adapter call.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Refactoring

Testing

  • Tests pass locally
  • Added/updated tests for changes

The fix is a one-liner capping a value passed to a mocked dependency. A unit test would only assert that min() works, not that duplicates are actually prevented — the real invariant lives at the Data Designer seed wrap-around boundary which requires a live NIM endpoint to exercise.

Related Issues

Closes #58

@lipikaramaswamy lipikaramaswamy requested a review from a team as a code owner March 18, 2026 06:02
@lipikaramaswamy lipikaramaswamy merged commit 1bc52c0 into main Mar 18, 2026
5 checks passed
@lipikaramaswamy lipikaramaswamy deleted the lipikaramaswamy/fix/preview-row-count branch March 18, 2026 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: preview() produces duplicate rows in substitute replace output

2 participants