Feat/enrich rowlimit session by thrcle · Pull Request #234 · CausalInferenceLab/Lang2SQL

thrcle · 2026-06-06T05:16:41Z

Summary

EnrichSchemaTool (/enrich 커맨드): 컬럼 샘플 데이터 기반 LLM 자동 메타데이터 보강
RowLimitLayer: 최상위 SELECT에 LIMIT 자동 추가로 대용량 쿼리 안전 실행
Session.compress(): 매 턴 tool call/result 메시지 제거로 세션 오염 방지
Bug fixes: OpenAI null content 400 오류, NeonDB psycopg3 드라이버 자동 변환, Discord tree.sync() rate limit 방지

변경 내용

EnrichSchemaTool (`/enrich`)

DISTINCT 값 샘플링 후 단일 LLM 호출로 컬럼 설명 + FK 관계 추론
결과를 KV 스토어에 영속화 (enriched_desc:table:col, schema_relationships)
시스템 프롬프트에 Column Descriptions + Table Relationships 섹션 주입
Discord /enrich [table] [clear] 슬래시 커맨드 추가

RowLimitLayer

최상위 SELECT 쿼리에 LIMIT N 자동 추가 (서브쿼리 LIMIT은 건드리지 않음)
Safety pipeline 기본 순서: WhitelistLayer → RowLimitLayer → TimeoutLayer

Session.compress()

매 턴 저장 전 tool call / tool result 메시지를 제거해 세션 오염 방지
세션 SQL 추출 범위를 현재 턴(pre_loop_len)으로 제한

Bug Fixes

OpenAI: ASSISTANT 메시지 null content → 400 오류 수정 (content="" fallback)
NeonDB: psycopg3 드라이버 자동 변환 + sslmode=require 자동 추가
Discord: tree.sync() rate limit 방지 — LANG2SQL_SYNC_COMMANDS=true 환경변수 설정 시에만 동기화
gpt-4o-mini: 모델 ID 및 .env.example 업데이트

Test plan

pytest tests/test_integration.py
pytest tests/test_safety.py — RowLimitLayer 동작 확인
Discord /enrich 커맨드 실행 후 시스템 프롬프트에 컬럼 설명 주입 확인
NeonDB DSN 연결 정상 확인

- EnrichSchemaTool: 컬럼 샘플 데이터 기반 LLM 자동 메타데이터 보강 (/enrich 커맨드) - DISTINCT 값 샘플링 후 단일 LLM 호출로 컬럼 설명 + FK 관계 추론 - KV 영속화(enriched_desc:table:col, schema_relationships) - 시스템 프롬프트에 컬럼 설명 + Table relationships 섹션 주입 - RowLimitLayer: 최상위 SELECT에 LIMIT 자동 추가 (서브쿼리 LIMIT 무시) - Safety pipeline 기본 순서: WhitelistLayer → RowLimitLayer → TimeoutLayer - Session.compress(): 매 턴 저장 전 tool call/result 메시지 제거로 세션 오염 방지 - 세션 SQL 추출 범위를 현재 턴(pre_loop_len)으로 제한 - OpenAI ASSISTANT null content 400 오류 수정 - NeonDB psycopg3 드라이버 자동 변환 + sslmode 자동 추가 - Discord tree.sync() rate limit 방지 (LANG2SQL_SYNC_COMMANDS=true 시만 동기화) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

seyoung4503

전반적으로 잘 동작합니다 👍 로컬에서 테스트 113개 통과 확인했고, RowLimit / enrich / NeonDB DSN 핵심 로직도 엣지케이스 포함해서 정상이었어요. 🧪

머지 전에 디버그 print 제거만 부탁드리고(blocking 🚨), 나머지는 제안 사항입니다. 🙏

seyoung4503 · 2026-06-06T12:16:06Z

+        history = ctx.session.history()
+        current_turn = history[pre_loop_len:]
+
+        sql_queries = [


❓ (question) 여기서 현재 턴의 run_sql 호출을 전부 모으는데, 실패한 중간 시도까지 포함되는 것 같아요. 시스템 프롬프트는 "최종 성공 SQL만 표시"라고 안내하고 있어서 동작이 어긋날 수 있을 것 같아서요.

seyoung4503 · 2026-06-06T12:16:06Z

+                schema_lines: list[str] = []
+                for tbl in tables:
+                    try:
+                        described = await ctx.explorer.describe_table(tbl.name)


🐢 (suggestion) enrichment가 있으면 매 쿼리마다 모든 테이블에 describe_table()를 호출하게 되는데, 테이블 수만큼 DB 왕복이 생겨서 스키마가 크면 응답이 느려질 수 있어요. 결과를 캐싱하거나 한 번에 모으는 방식 고려해볼 만합니다. ⚡

seyoung4503 · 2026-06-06T12:16:06Z

-        insp = inspect(self._get_engine())
-        schema = self._schema or insp.default_schema_name
+        engine = self._get_engine()
+        engine.dispose()  # flush stale pool connections so schema changes are visible


♻️ (suggestion) 테이블 목록 조회마다 풀을 비우는데, 매번 dispose()는 비용이 좀 있어요. enrich 직후처럼 스키마가 바뀌는 시점에만 무효화하는 방향은 어떨까요?

seyoung4503 · 2026-06-06T12:16:06Z

    def __init__(
        self,
-        model: str = "gpt-4.1-mini",
+        model: str = "gpt-4o-mini",


🔎 (question) 기본 모델이 gpt-4.1-mini → gpt-4o-mini로 바뀌었는데, 4.1을 사용하는게 더 좋아보입니다!

PR CausalInferenceLab#234 review feedback: replace stdout debug prints with logging, and pair run_sql tool_call_id with results to show only successful queries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

thrcle · 2026-06-06T13:07:07Z

디버그 print 제거 완료

seyoung4503

👍 좋습니다!

thrcle and others added 2 commits June 5, 2026 20:49

fix: OpenAI gpt-4o-mini, Neon DSN 파싱 버그 수정, 자동 재시작 스크립트 추가

f363174

seyoung4503 reviewed Jun 6, 2026

View reviewed changes

seyoung4503 approved these changes Jun 7, 2026

View reviewed changes

seyoung4503 merged commit f81c3b7 into CausalInferenceLab:master Jun 7, 2026
1 check failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/enrich rowlimit session#234

Feat/enrich rowlimit session#234
seyoung4503 merged 3 commits into
CausalInferenceLab:masterfrom
thrcle:feat/enrich-rowlimit-session

thrcle commented Jun 6, 2026

Uh oh!

seyoung4503 left a comment

Uh oh!

Uh oh!

seyoung4503 Jun 6, 2026 •

edited

Loading

Uh oh!

seyoung4503 Jun 6, 2026

Uh oh!

seyoung4503 Jun 6, 2026 •

edited

Loading

Uh oh!

seyoung4503 Jun 6, 2026 •

edited

Loading

Uh oh!

thrcle commented Jun 6, 2026

Uh oh!

seyoung4503 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thrcle commented Jun 6, 2026

Summary

변경 내용

EnrichSchemaTool (/enrich)

RowLimitLayer

Session.compress()

Bug Fixes

Test plan

Uh oh!

seyoung4503 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seyoung4503 Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seyoung4503 Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

seyoung4503 Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seyoung4503 Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thrcle commented Jun 6, 2026

Uh oh!

seyoung4503 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EnrichSchemaTool (`/enrich`)

seyoung4503 Jun 6, 2026 •

edited

Loading

seyoung4503 Jun 6, 2026 •

edited

Loading

seyoung4503 Jun 6, 2026 •

edited

Loading