Summary: InputtersFlow.InitFlow constructs each inputFlow with inputFlow.LinkTo(_resultTransformer, FilterInputData) — Gridsum's predicated-link overload. With the default Gridsum.DataflowEx 2.0.0 behaviour and no LinkLeftToNull() escape hatch, items the predicate rejects (deduped repeats AND too-short items) are NOT discarded — they accumulate in inputFlow's source-block buffer and prevent inputFlow from ever completing. The downstream _resultTransformer (which has RegisterDependency(inputFlow)) therefore never completes, OutputBlock.Completion never fires, and any caller awaiting end-to-end completion deadlocks.
Production impact (latent bug)
Any production input data with duplicates OR too-short entries hangs Shell.Run indefinitely. Verified in the integration suite with a 30-second xUnit timeout that consistently hits the cap with no items received in the rejection case; the no-rejection happy path (1 unique non-empty item) completes in ~90 ms.
Where
Sources/Libraries/ProjectV.DataPipeline/InputtersFlow.cs — InitFlow method, predicated LinkTo call against FilterInputData.
Suggested fix
Add .LinkLeftToNull() (or an equivalent discard sink) to the predicated LinkTo. After the fix, items the predicate rejects flow to the null sink and inputFlow.Completion fires normally.
Verification today (test-side workaround)
Sources/Tests/ProjectV.DataPipeline.Tests/InputtersFlowTests.cs exercises the dedup + length-filter branches by reflecting on the private FilterInputData(string) predicate directly — a minimal-invariant probe that confirms the production behaviour without depending on Gridsum's deadlocking completion semantics. A separate happy-path smoke test (ProcessAsync_WithSingleUniqueItem_EmitsItDownstream) exercises the no-rejection case end-to-end. The class-level <remarks> block documents the deadlock root cause.
Acceptance
InputtersFlow.InitFlow adds .LinkLeftToNull() (or equivalent discard) to the predicated link.
- The dedup + length-filter tests in
ProjectV.DataPipeline.Tests flip from reflection-probe to end-to-end driving (ProcessAsync(..., completeFlowOnFinish: true) + sink.Completion) and complete within the 30-second xUnit timeout.
Shell.Run end-to-end test (currently "tested around" in ProjectV.Core.Tests) becomes drivable when fed input with duplicates or filtered-by-length entries.
Surfaced by
Phase 2 Test Coverage (milestone v0.9.8) — Test (Integration) stage on ProjectV.DataPipeline.Tests. Companion to the DataflowPipeline.Execute(string) terminal-pipeline deadlock (separate issue) — both are Gridsum-completion-semantics latent bugs surfaced together by the Phase 2 test build-out.
Summary:
InputtersFlow.InitFlowconstructs eachinputFlowwithinputFlow.LinkTo(_resultTransformer, FilterInputData)— Gridsum's predicated-link overload. With the default Gridsum.DataflowEx 2.0.0 behaviour and noLinkLeftToNull()escape hatch, items the predicate rejects (deduped repeats AND too-short items) are NOT discarded — they accumulate ininputFlow's source-block buffer and preventinputFlowfrom ever completing. The downstream_resultTransformer(which hasRegisterDependency(inputFlow)) therefore never completes,OutputBlock.Completionnever fires, and any caller awaiting end-to-end completion deadlocks.Production impact (latent bug)
Any production input data with duplicates OR too-short entries hangs
Shell.Runindefinitely. Verified in the integration suite with a 30-second xUnit timeout that consistently hits the cap with no items received in the rejection case; the no-rejection happy path (1 unique non-empty item) completes in ~90 ms.Where
Sources/Libraries/ProjectV.DataPipeline/InputtersFlow.cs—InitFlowmethod, predicatedLinkTocall againstFilterInputData.Suggested fix
Add
.LinkLeftToNull()(or an equivalent discard sink) to the predicatedLinkTo. After the fix, items the predicate rejects flow to the null sink andinputFlow.Completionfires normally.Verification today (test-side workaround)
Sources/Tests/ProjectV.DataPipeline.Tests/InputtersFlowTests.csexercises the dedup + length-filter branches by reflecting on the privateFilterInputData(string)predicate directly — a minimal-invariant probe that confirms the production behaviour without depending on Gridsum's deadlocking completion semantics. A separate happy-path smoke test (ProcessAsync_WithSingleUniqueItem_EmitsItDownstream) exercises the no-rejection case end-to-end. The class-level<remarks>block documents the deadlock root cause.Acceptance
InputtersFlow.InitFlowadds.LinkLeftToNull()(or equivalent discard) to the predicated link.ProjectV.DataPipeline.Testsflip from reflection-probe to end-to-end driving (ProcessAsync(..., completeFlowOnFinish: true)+sink.Completion) and complete within the 30-second xUnit timeout.Shell.Runend-to-end test (currently "tested around" inProjectV.Core.Tests) becomes drivable when fed input with duplicates or filtered-by-length entries.Surfaced by
Phase 2 Test Coverage (milestone v0.9.8) —
Test (Integration)stage onProjectV.DataPipeline.Tests. Companion to theDataflowPipeline.Execute(string)terminal-pipeline deadlock (separate issue) — both are Gridsum-completion-semantics latent bugs surfaced together by the Phase 2 test build-out.