Skip to content

Fix recCeiver issues#153

Merged
anderslindho merged 6 commits into
masterfrom
fix-recceiver-issues
May 7, 2026
Merged

Fix recCeiver issues#153
anderslindho merged 6 commits into
masterfrom
fix-recceiver-issues

Conversation

@anderslindho
Copy link
Copy Markdown
Contributor

See individual commits

@anderslindho anderslindho self-assigned this May 6, 2026
@anderslindho anderslindho force-pushed the fix-recceiver-issues branch from 0eb1f08 to 7421a18 Compare May 6, 2026 11:01
Copy link
Copy Markdown
Contributor

@jacomago jacomago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we tests for the non-logging changes?

Comment thread server/recceiver/recast.py
Comment thread server/recceiver/cfstore.py
@jacomago
Copy link
Copy Markdown
Contributor

jacomago commented May 6, 2026

Tried to compare against my per ioc lock branch https://github.com/jacomago/recsync/tree/ioc-per-lock .
Looks like could still do both.

@anderslindho
Copy link
Copy Markdown
Contributor Author

anderslindho commented May 6, 2026

Tried to compare against my per ioc lock branch https://github.com/jacomago/recsync/tree/ioc-per-lock. Looks like could still do both.

Yeah, I think the per-IOC lock really is a better solution than this, but this is the more minimal fix that hopefully would settle our production issues @jacomago

When a session closes, self.C.cancel() was discarding any data commit
still queued behind the global DeferredLock. The disconnect transaction
then acquired the lock first and marked all channels Inactive in CF,
with the channel data never written. Under load with many concurrent
IOCs this caused a near-total loss of active-channel registrations.

Removing the cancel chains the disconnect transaction after self.C so
it executes only once any pending data commit has completed.
Logging the full CFConfig repr at INFO level exposed the CF password
in plaintext to anyone with log access. Override __repr__ to replace
cf_password with '***' so startup logs remain informative without
leaking credentials.
The previous code checked `iocid in self.iocs` before decrementing
channelcount but then accessed self.iocs[iocid] unconditionally on the
next lines, raising KeyError when an iocid was absent — possible when
disconnect transactions race with initial transactions under load.

Early-return when iocid is missing; consolidate the channelcount == 0
and < 0 branches so they share a single iocs lookup.
trlimit defaulted to 0, permanently disabling the count-based flush
branch in flushSafely. Fast IOCs transmit all records in milliseconds
so the timer branch never fires, causing a single giant transaction
per IOC — particularly bad for large IOCs that cause Humongous GC on
the CF JVM.

Set the default to 5000 records so fast large uploads are split
automatically. Add DEBUG logging when either limit triggers so the
behaviour is observable in practice.
The warning serialised the entire ioc dict — several kilobytes per
line under production load. It fired on every reconnect as a direct
side-effect of Issue 1, flooding the log aggregation pipeline.

Log only the count of known IOCs, which is enough to distinguish
a race condition (non-zero) from a completely cold startup (zero).
Silent fallbacks made it hard to find misconfigured IOCs: port numbers
appearing as iocName and the recceiver service account appearing as
owner were invisible unless you already knew what to look for. Log at
DEBUG when either fallback is used so misconfigured IOCs are surfaced
without generating noise on every commit.
@anderslindho anderslindho force-pushed the fix-recceiver-issues branch from 7421a18 to 4db0d9d Compare May 6, 2026 13:11
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

@anderslindho anderslindho merged commit c83d26a into master May 7, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants