feat: scheduled connector sync with cron expressions (#178)#276
feat: scheduled connector sync with cron expressions (#178)#276
Conversation
Add ConnectorScheduler class that runs connector syncs on cron schedules. - Cron-based scheduling via node-cron - Schedule config stored alongside connector configs - CLI commands: schedule list, schedule set, schedule remove - API endpoint: GET /api/v1/connectors/schedules - Auto-starts with API server (opt-out via enableScheduler: false) - 12 unit tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
There was a problem hiding this comment.
Pull request overview
This PR implements scheduled connector sync using cron expressions, addressing issue #178. It introduces a ConnectorScheduler class that registers and manages node-cron jobs for each connector with a configured schedule, integrates the scheduler with the API server lifecycle, and adds CLI sub-commands for managing schedules.
Changes:
- New
src/core/scheduler.tswithConnectorSchedulerclass andloadScheduleEntries()helper - API server (
src/api/server.ts) auto-starts the scheduler on boot; newGET /api/v1/connectors/schedulesendpoint insrc/api/routes.ts - CLI (
src/cli/index.ts) gainsschedule list,schedule set, andschedule removesub-commands withnode-cronadded as a dependency
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/core/scheduler.ts |
New file: ConnectorScheduler class + loadScheduleEntries function |
tests/unit/scheduler.test.ts |
New unit tests covering scheduler lifecycle and config loading |
src/core/index.ts |
Re-exports ConnectorScheduler, loadScheduleEntries, and related types |
src/cli/index.ts |
Adds schedule list/set/remove sub-commands |
src/api/server.ts |
Auto-starts scheduler with server; exposes scheduler in return value |
src/api/routes.ts |
Adds GET /api/v1/connectors/schedules endpoint |
package.json / package-lock.json |
Adds node-cron ^4.2.1 + @types/node-cron ^3.0.11 |
| stop(): void { | ||
| const log = getLogger(); | ||
| for (const [key, job] of this.jobs) { | ||
| void job.task.stop(); |
There was a problem hiding this comment.
The void operator is applied to job.task.stop(), which according to the @types/node-cron type definitions returns void (i.e., is a synchronous call). Using void on an already-void-returning synchronous function call is meaningless and misleading — it signals to readers that the call is async and intentionally unhandled, when it's not. The void keyword should be removed here.
| void job.task.stop(); | |
| job.task.stop(); |
src/core/scheduler.ts
Outdated
| connectorName: string; | ||
| cronExpression: string; | ||
| lastRun?: string | undefined; | ||
| nextRun?: string | undefined; |
There was a problem hiding this comment.
The nextRun field is declared in the ScheduledJob interface but is never assigned a value anywhere in the implementation, and is not surfaced via SchedulerStatus. It is dead code that adds noise without any benefit. Either populate it (e.g., using a cron-next-date library or a calculation from the cron expression) and expose it in SchedulerStatus, or remove it entirely.
| nextRun?: string | undefined; |
src/core/scheduler.ts
Outdated
| }; | ||
| } | ||
| default: | ||
| throw new Error(`Unknown connector type: ${connectorType}`); |
There was a problem hiding this comment.
The default case in executeSync throws a raw new Error(...). The project convention (established in src/connectors/index.ts, src/config.ts, and src/providers/index.ts) is to throw typed errors from src/errors.ts — specifically ConfigError for unknown/invalid configuration, never raw Error. A ConfigError (or a ValidationError) would be more appropriate here so that callers can distinguish it from transient errors and it is properly caught and categorized throughout the system.
| if (pathname === "/api/v1/connectors/schedules" && method === "GET") { | ||
| const entries = loadScheduleEntries(); | ||
| const took = Math.round(performance.now() - start); | ||
| sendJson(res, 200, { schedules: entries }, took); | ||
| return; | ||
| } |
There was a problem hiding this comment.
The new GET /api/v1/connectors/schedules endpoint added in src/api/routes.ts has no test coverage in tests/unit/api.test.ts. Every other connector-related API endpoint has dedicated tests in the describe("Connector status endpoint", ...) block (lines 595–669 of api.test.ts). A test for the schedules endpoint — even a simple one that verifies a 200 response and the expected { schedules: [] } shape — should be added to maintain coverage consistency.
src/core/scheduler.ts
Outdated
| interface ConnectorScheduleEntry { | ||
| connectorType: string; | ||
| connectorName: string; | ||
| cronExpression: string; | ||
| } |
There was a problem hiding this comment.
The ConnectorScheduleEntry interface is used as the parameter type for the public start() method of ConnectorScheduler, but the interface itself is not exported. Consumers of the public API who need to construct these entries independently (rather than via loadScheduleEntries()) cannot use the type explicitly. Since ConnectorScheduler and loadScheduleEntries are both re-exported from src/core/index.ts, the entry type should also be exported so the public API is fully usable by downstream code.
| } | ||
|
|
||
| const config = loadCfg<Record<string, unknown>>(connector); | ||
| delete config.schedule; |
There was a problem hiding this comment.
In schedule remove, the command loads the connector config, deletes the schedule key, and saves it back regardless of whether the connector actually had a schedule configured. If there was no schedule on the connector, the config file is still needlessly rewritten and the user receives the misleading message ✓ Schedule removed for ${connector}. Consider checking whether config.schedule exists before proceeding, and printing a clear message if there was nothing to remove.
| delete config.schedule; | |
| if (!("schedule" in config)) { | |
| console.log(`No schedule configured for ${connector}.`); | |
| return; | |
| } | |
| delete (config as { schedule?: unknown }).schedule; |
src/core/scheduler.ts
Outdated
| stop(): void { | ||
| const log = getLogger(); | ||
| for (const [key, job] of this.jobs) { | ||
| void job.task.stop(); | ||
| log.debug({ job: key }, "Stopped scheduled job"); | ||
| } | ||
| this.jobs.clear(); | ||
| this.started = false; | ||
| log.info("Connector scheduler stopped"); | ||
| } | ||
|
|
There was a problem hiding this comment.
The stop() method calls this.jobs.clear() and sets this.started = false immediately, but does not wait for any in-progress sync jobs to complete. If a cron-triggered sync is currently awaiting executeSync(), the following occurs: the cron task is stopped (no new invocations), the job is removed from the map, and started is set to false. However, the in-progress async runSync call continues running and will complete — including writing to the database — without the scheduler being aware of it. This means close() on the server may return before all syncs have actually finished, potentially leaving database writes mid-flight when the process shuts down. At minimum, this should be documented; ideally, stop() should track and await pending sync promises.
| stop(): void { | |
| const log = getLogger(); | |
| for (const [key, job] of this.jobs) { | |
| void job.task.stop(); | |
| log.debug({ job: key }, "Stopped scheduled job"); | |
| } | |
| this.jobs.clear(); | |
| this.started = false; | |
| log.info("Connector scheduler stopped"); | |
| } | |
| async stop(): Promise<void> { | |
| const log = getLogger(); | |
| // Stop scheduling new runs for all jobs. | |
| for (const [key, job] of this.jobs) { | |
| job.task.stop(); | |
| log.debug({ job: key }, "Stopped scheduled job"); | |
| } | |
| // Wait for any in-progress syncs to complete before clearing state. | |
| await this.waitForRunningJobsToFinish(); | |
| this.jobs.clear(); | |
| this.started = false; | |
| log.info("Connector scheduler stopped"); | |
| } | |
| /** | |
| * Check whether any scheduled job is currently marked as running. | |
| * | |
| * This relies on each job correctly updating its `running` flag. | |
| */ | |
| private hasRunningJobs(): boolean { | |
| for (const job of this.jobs.values()) { | |
| if (job.running) { | |
| return true; | |
| } | |
| } | |
| return false; | |
| } | |
| /** | |
| * Wait until all in-progress scheduled syncs finish. | |
| * | |
| * This method does not cancel running jobs; it only polls their `running` | |
| * flags and resolves once they have all completed. | |
| */ | |
| private async waitForRunningJobsToFinish(): Promise<void> { | |
| const log = getLogger(); | |
| while (this.hasRunningJobs()) { | |
| const runningJobs = [...this.jobs.entries()].filter( | |
| ([, job]) => job.running, | |
| ); | |
| log.debug( | |
| { | |
| runningJobKeys: runningJobs.map(([key]) => key), | |
| runningJobCount: runningJobs.length, | |
| }, | |
| "Waiting for scheduled jobs to finish before stopping scheduler", | |
| ); | |
| await new Promise((resolve) => { | |
| setTimeout(resolve, 100); | |
| }); | |
| } | |
| } |
| "devDependencies": { | ||
| "@types/better-sqlite3": "^7.6.0", | ||
| "@types/node": "^22.0.0", | ||
| "@types/node-cron": "^3.0.11", |
There was a problem hiding this comment.
The @types/node-cron package at version ^3.0.11 provides type definitions for node-cron v2/v3, but this project installs node-cron at ^4.2.1. Starting with node-cron v4, the package ships its own bundled TypeScript type definitions. Installing @types/node-cron v3 alongside node-cron v4 introduces a type mismatch: the bundled v4 types differ from the community v3 types (for example, the ScheduledTask interface changed in v4). This can cause TypeScript compilation errors, incorrect type inference, or silent behavior mismatches. The @types/node-cron dev dependency should be removed since node-cron v4 already includes its own types.
| "@types/node-cron": "^3.0.11", |
- Add connectors-config tests (file/DB config, sync tracker, deleteConnectorDocuments) - Add api-server tests (startApiServer with various options) - Add db-validation tests (validateRow, validateCountRow edge cases) - Add dedup tests for semantic/both strategies (exercises catch blocks) - Add schema tests for createVectorTable validation - Add export tests for import validation branches - Add config tests for env var overrides (ollama URL, private URLs, LLM) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rewrite scheduler.test.ts with vi.mock for node-cron and all connector modules - Test all 5 connector types (Slack, Confluence, OneNote, Notion, Obsidian) - Test error handling, unknown connector types, concurrent run protection - Scheduler coverage: 100% statements, 91.66% branches Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The envOverrides.indexing spread was missing from the config merge, so LIBSCOPE_ALLOW_PRIVATE_URLS and LIBSCOPE_ALLOW_SELF_SIGNED_CERTS environment variables were silently ignored. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Export ConnectorScheduleEntry with JSDoc (#24) - Remove unused nextRun field from ScheduledJob (#21) - Use ValidationError instead of raw Error in default case (#22) - Make stop() async, wait for in-flight syncs (#20, #26) - Add schedule remove check for existing schedule (#25) - Add GET /api/v1/connectors/schedules API test (#23) - Document @types/node-cron v3 vs node-cron v4 compatibility (#27) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- schema.test.ts: keep both initLogger and DatabaseError imports - db-validation.test.ts: take main's more comprehensive test suite Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Implements scheduled connector sync (cron-based), closing #178.
Changes
New:
src/core/scheduler.tsConnectorSchedulerclass: manages cron jobs for each connector with a scheduleloadScheduleEntries(): reads schedule configs from connector config filesnode-cronAPI Server (
src/api/server.ts)enableScheduler: false)API Routes (
src/api/routes.ts)GET /api/v1/connectors/schedules— list all configured schedulesCLI (
src/cli/index.ts)schedule list— show all configured schedulesschedule set <connector> <cron>— set a cron schedule for a connectorschedule remove <connector>— remove a connector's scheduleTests
tests/unit/scheduler.test.tsDependencies
node-cron+@types/node-cronCloses #178