perf: sqlite performance improvements#291
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 20 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (6)
WalkthroughAdds SQLite performance tuning: new Changes
Sequence Diagram(s)sequenceDiagram
participant Sequencer
participant Config
participant Utils as tune_sqlite_connection
participant SQLiteDB
Sequencer->>Config: load config (includes [sqlite])
Sequencer->>SQLiteDB: sqlite3.connect(db_path)
Sequencer->>Utils: tune_sqlite_connection(connection, Config)
Utils->>Config: read sqlite_* attributes
Utils->>SQLiteDB: EXECUTE PRAGMA journal_mode = <value>
Utils->>SQLiteDB: EXECUTE PRAGMA synchronous = <value>
Utils->>SQLiteDB: EXECUTE PRAGMA temp_store = <value>
Utils->>SQLiteDB: EXECUTE PRAGMA mmap_size = <value>
Utils->>SQLiteDB: EXECUTE PRAGMA cache_size = <value>
SQLiteDB-->>Utils: OK / Error
Utils-->>Sequencer: return (logs/warnings)
Sequencer->>Sequencer: continue initialization / run
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@temoa/utilities/sqlite_utils.py`:
- Around line 36-52: The current block attempts multiple PRAGMA statements using
con.execute(...) inside a single try/except so one sqlite3.Error aborts all
remaining PRAGMAs; change this to attempt each PRAGMA individually (e.g.,
iterate a list of tuples like ('journal_mode', journal_mode), ('synchronous',
synchronous), etc. or wrap each con.execute(...) in its own try/except) so a
failure on one setting does not stop others, and log a warning that includes the
specific PRAGMA name and exception for each failure; keep the final logger.debug
that reports the effective applied values (or log per-success) to preserve
observability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fcbf3018-f471-4220-bfd9-7413e320f9ad
📒 Files selected for processing (6)
docs/source/database.rsttemoa/_internal/temoa_sequencer.pytemoa/core/config.pytemoa/extensions/myopic/myopic_sequencer.pytemoa/tutorial_assets/config_sample.tomltemoa/utilities/sqlite_utils.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@temoa/core/config.py`:
- Around line 324-329: Update the explanatory label and include the missing temp
store field in the config repr: change the "SQLite cache size (pages)" label
used when composing the message (referencing self.sqlite_cache_size) to indicate
that negative values are KiB and positive are pages (e.g. "SQLite cache size
(pages or KiB if negative)"), and add a new line to the same repr/building
message that shows self.sqlite_temp_store so the configured sqlite_temp_store
value is displayed alongside sqlite_journal_mode, sqlite_synchronous,
sqlite_mmap_size and sqlite_cache_size.
- Around line 162-178: Validate and normalize SQLite PRAGMA inputs from
self.sqlite_inputs for sqlite_journal_mode, sqlite_synchronous, and
sqlite_temp_store by accepting both numeric and string forms: if the value is
int-like (int/float/str numeric), coerce to int and ensure it maps to an allowed
numeric PRAGMA value; if it is a non-numeric string, normalize with .upper() and
validate against the allowed string tokens for each PRAGMA; on invalid input
fall back to the existing defaults (or log/raise a clear error) and set
self.sqlite_journal_mode, self.sqlite_synchronous, and self.sqlite_temp_store
accordingly so invalid strings or unsupported types cannot propagate to PRAGMA
execution. Include the same robust numeric/string handling approach used for
sqlite_mmap_size and sqlite_cache_size to avoid rejecting valid numeric mode
values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ac3ca42e-7cee-4abf-80a5-b224ae69cd30
📒 Files selected for processing (1)
temoa/core/config.py
d03311b to
2602ee9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
temoa/core/config.py (1)
165-193:⚠️ Potential issue | 🟠 MajorReject numeric
journal_modeand whitelist numeric ranges for mode PRAGMAs.Line 170 currently permits numeric
journal_mode, and Lines 180/190 accept any digit forsynchronous/temp_storewithout validating allowed numeric ranges. This can propagate unsupported values into PRAGMA execution.Proposed hardening diff
- jm = self.sqlite_inputs.get('journal_mode', 'WAL') - if isinstance(jm, str) and jm.upper() in jm_allowed: - self.sqlite_journal_mode = jm.upper() - elif isinstance(jm, (int, float, str)) and str(jm).isdigit(): - self.sqlite_journal_mode = int(jm) - else: - self.sqlite_journal_mode = 'WAL' + jm = self.sqlite_inputs.get('journal_mode', 'WAL') + if isinstance(jm, str) and jm.upper() in jm_allowed: + self.sqlite_journal_mode = jm.upper() + else: + self.sqlite_journal_mode = 'WAL' @@ - sync = self.sqlite_inputs.get('synchronous', 'NORMAL') - if isinstance(sync, str) and sync.upper() in sync_allowed: - self.sqlite_synchronous = sync.upper() - elif isinstance(sync, (int, float, str)) and str(sync).isdigit(): - self.sqlite_synchronous = int(sync) - else: - self.sqlite_synchronous = 'NORMAL' + sync_num_allowed = {0, 1, 2, 3} + sync = self.sqlite_inputs.get('synchronous', 'NORMAL') + if isinstance(sync, str): + s = sync.strip() + if s.isdigit() and int(s) in sync_num_allowed: + self.sqlite_synchronous = int(s) + elif s.upper() in sync_allowed: + self.sqlite_synchronous = s.upper() + else: + self.sqlite_synchronous = 'NORMAL' + elif isinstance(sync, int) and sync in sync_num_allowed: + self.sqlite_synchronous = sync + else: + self.sqlite_synchronous = 'NORMAL' @@ - ts = self.sqlite_inputs.get('temp_store', 'MEMORY') - if isinstance(ts, str) and ts.upper() in temp_allowed: - self.sqlite_temp_store = ts.upper() - elif isinstance(ts, (int, float, str)) and str(ts).isdigit(): - self.sqlite_temp_store = int(ts) - else: - self.sqlite_temp_store = 'MEMORY' + temp_num_allowed = {0, 1, 2} + ts = self.sqlite_inputs.get('temp_store', 'MEMORY') + if isinstance(ts, str): + t = ts.strip() + if t.isdigit() and int(t) in temp_num_allowed: + self.sqlite_temp_store = int(t) + elif t.upper() in temp_allowed: + self.sqlite_temp_store = t.upper() + else: + self.sqlite_temp_store = 'MEMORY' + elif isinstance(ts, int) and ts in temp_num_allowed: + self.sqlite_temp_store = ts + else: + self.sqlite_temp_store = 'MEMORY'SQLite PRAGMA accepted values for journal_mode, synchronous, and temp_store, including which of those pragmas allow numeric forms.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@temoa/core/config.py` around lines 165 - 193, The code in temoa/core/config.py currently allows numeric journal_mode and accepts any digit for synchronous/temp_store which can produce invalid PRAGMA values; update the setters that read from self.sqlite_inputs (variables jm -> self.sqlite_journal_mode, sync -> self.sqlite_synchronous, ts -> self.sqlite_temp_store) so that journal_mode only accepts the whitelisted string names (no numeric branch), and for synchronous and temp_store allow numeric input only after parsing to int and validating against the allowed numeric ranges (synchronous: 0..3, temp_store: 0..2) while keeping the existing string-name checks (uppercased against sync_allowed and temp_allowed); on invalid inputs fall back to the defaults ('WAL', 'NORMAL', 'MEMORY').
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@temoa/utilities/sqlite_utils.py`:
- Around line 15-22: Multiple production call sites open SQLite connections with
sqlite3.connect() but do not apply the performance PRAGMAs via
tune_sqlite_connection(), so ensure every direct sqlite3.connect() is followed
by tune_sqlite_connection(con, config) (or replace with a single helper like
get_tuned_sqlite_connection() that calls sqlite3.connect(...) then
tune_sqlite_connection(con, config) and returns con); specifically update the
connection in temoa/_internal/table_writer.py (the output DB open) and all other
locations that call sqlite3.connect() to either call tune_sqlite_connection(...)
immediately after obtaining the sqlite3.Connection or switch to the new wrapper;
preserve any config/context values (pass the same TemoaConfig or None) and
ensure tests/imports still resolve tune_sqlite_connection symbol.
---
Duplicate comments:
In `@temoa/core/config.py`:
- Around line 165-193: The code in temoa/core/config.py currently allows numeric
journal_mode and accepts any digit for synchronous/temp_store which can produce
invalid PRAGMA values; update the setters that read from self.sqlite_inputs
(variables jm -> self.sqlite_journal_mode, sync -> self.sqlite_synchronous, ts
-> self.sqlite_temp_store) so that journal_mode only accepts the whitelisted
string names (no numeric branch), and for synchronous and temp_store allow
numeric input only after parsing to int and validating against the allowed
numeric ranges (synchronous: 0..3, temp_store: 0..2) while keeping the existing
string-name checks (uppercased against sync_allowed and temp_allowed); on
invalid inputs fall back to the defaults ('WAL', 'NORMAL', 'MEMORY').
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 72ded3a3-6314-46e8-a701-eae2c481990b
📒 Files selected for processing (2)
temoa/core/config.pytemoa/utilities/sqlite_utils.py
2602ee9 to
35e84b2
Compare
removes VACUUM operations and performance tuning PRAGMAs that are now exposed in the config
35e84b2 to
ccfa004
Compare
Introduces two major performance optimizations for SQLite database operations in Temoa, targeting long-running runs and large-scale databases.
Implemented a centralized
tune_sqlite_connectionutility that appliesPRAGMAsettings to the database connection. These settings are now the default for all Temoa runs:journal_mode = WAL: Enables Write-Ahead Logging for better concurrency and write speed.synchronous = NORMAL: Balances safety and performance by reducing the frequency of disk flushes.mmap_size = 8GB: Uses memory-mapped I/O to significantly speed up database reads.cache_size = 500MB: Increases the internal SQLite page cache to reduce disk I/O.Removed the per-period
VACUUMoperation in MyopicSequencer.Users can override the performance defaults in their configuration file.
Summary by CodeRabbit
New Features
Configuration
Documentation
Improvements