feat(diagnosis): add LLM-powered dataset analysis with redesigned UI by tendtoyj · Pull Request #4 · Fluxloop-AI/pluto-duck

tendtoyj · 2026-01-27T11:28:34Z

Summary

파일 진단 기능에 LLM 기반 데이터셋 분석을 추가하고, 스캔 결과 UI를 전면 리디자인했습니다.

Changes

Backend

LLM 분석 서비스 신규 추가 (llm_analysis_service.py, LLMService)
파일 진단 서비스에 LLM 분석 통합 (2단계 처리: 파일 스캔 → LLM 분석)
데이터셋 분석용 프롬프트 템플릿 추가
병합 시 중복 행 수 미리보기 기능 추가
chardet 패키지 추가 (파일 인코딩 감지용)

Frontend

데이터셋 카드 컴포넌트 신규 추가 (DatasetCard.tsx)
분석 중 화면 구현 - 파일별 진행 상태 표시 및 스태킹 애니메이션 (DatasetAnalyzingView.tsx)
진단 결과 UI 간소화 - 데이터셋 통계 중심으로 재구성
에이전트 추천 체크박스 인터랙션 복구
파일 목록 레이아웃 세로 방향으로 변경

Test plan

파일 업로드 후 스캔 진행 시 분석 애니메이션 정상 표시 확인
LLM 분석 결과가 데이터셋 카드에 올바르게 표시되는지 확인
여러 파일 병합 시 중복 행 수 미리보기 정상 동작 확인
에이전트 추천 체크박스 선택/해제 정상 동작 확인
다양한 인코딩의 파일 업로드 시 정상 처리 확인

🤖 Generated with Claude Code

…e diagnosis - Add new Pydantic response models: EncodingInfoResponse, ParsingIntegrityResponse, NumericStatsResponse, CategoricalStatsResponse, DateStatsResponse, ColumnStatisticsResponse - Extend FileDiagnosisResponse with encoding, parsing_integrity, column_statistics, sample_rows - Update diagnose_files endpoint handler to map all new fields from dataclass to response - Add _log_diagnosis_result() method for detailed diagnosis logging - Add chardet dependency for encoding detection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add DatasetAnalyzingView component with 9-step analysis flow - Extend FileDiagnosis types with encoding, parsing_integrity, column_statistics - Integrate analyzing step between preview and diagnose in AddDatasetModal - Display per-file information for each analysis step (files, parsing, columns, quality, statistics) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add LLMAnalysisResult, PotentialItem, IssueItem dataclasses - Create llm_analysis_service.py with batch processing support - Add dataset_analysis_prompt.md for LLM prompts - Update FileDiagnosis to include llm_analysis field - Add diagnose_files_with_llm() async method - Update API endpoint to async and include LLM response models - Add OPENAI_API_KEY fallback in providers.py - Add unit tests for LLM analysis and caching Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add include_llm parameter to diagnose endpoint for optional LLM analysis - Implement two-phase UI: fast diagnosis first, then LLM analysis - Update DatasetAnalyzingView to handle phased step progression - Add LLM analysis types to frontend API Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use refs instead of state for phase tracking to avoid useEffect cleanup when diagnosisResults updates from second API call. Add dataArrived state to detect first data arrival without re-triggering on subsequent updates. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add new `services/llm` module with centralized LLM configuration - LLMSettings resolves settings with DB > ENV > default priority - LLMService provides get_chat_model(), complete(), and complete_structured() - Pydantic schemas for type-safe structured LLM responses - Refactor deep agent to use LLMService instead of inline OpenAI config - Refactor llm_analysis_service to use complete_structured() instead of manual JSON parsing - Deprecate old get_llm_provider() with migration guide Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add POST /files/count-duplicates API endpoint - Add count_cross_file_duplicates method using DuckDB UNION ALL + DISTINCT - Skip calculation when total rows exceed 100,000 (returns skipped=true) - Call API automatically when file schemas match during diagnosis - Pass duplicateInfo to DiagnosisResultView (UI display in future PR) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When multiple files with identical schemas are being merged, the LLM now generates a unified suggested name and context description for the combined dataset. This helps users understand what the merged result will represent. Changes: - Add MergeContext and MergedAnalysis dataclasses for merge info - Update diagnose_files_with_llm to accept merge_context parameter - Extend LLM prompt to handle merge context and generate merged analysis - Add merged_suggested_name and merged_context to BatchAnalysisSchema - Update frontend to pass duplicate info as merge context to LLM - Display merged dataset suggestion in DiagnosisResultView Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add StatusBadge component for ready/review status display - Add DatasetCard component with editable dataset name - Add AgentRecommendation component for merge suggestions - Redesign DiagnosisResultView with new component composition - Support custom dataset names during import (LLM suggested or user edited) - Update header to Korean: "{N}개 파일 스캔 완료" - Change import button text to "Create Datasets" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Increase Phase 1 step timing (600-1000ms processing, 500ms transition) - Replace cycling message with stacking structure that accumulates - Add 16 sequential LLM waiting messages (up from 10) - Show column names instead of type counts (max 6 + N more) - Remove unused getColumnTypeSummaryForFile helper Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add stats (rows, columns, issues) to DatasetCard component - Remove FileDiagnosisCard expandable schema details - Update AgentRecommendation to use Package icon and English text - Change StatusBadge colors (emerald for ready, orange for review) - Simplify DiagnosisResultView by removing expandable sections Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Connect toggleMerge and toggleRemoveDuplicates click handlers - Add conditional styling based on checked state (dark gray when selected, light gray when unselected) - Disable remove duplicates option when merge is not selected Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Replace inline-flex with flex for vertical stacking - Use flex-col gap-2 for consistent spacing - Add w-fit to constrain item width to content Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove parse_llm_response tests (function removed in favor of structured output) - Update TestAnalyzeBatchWithLLM to mock LLMService instead of provider - Update TestAnalyzeDatasetsWithLLM to use BatchLLMAnalysisResult Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

DiagnosisError with "File not found" now returns 404 instead of 500. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

FluxloopAdmin

✅ LGTM - Approved

LLM 기반 데이터셋 분석 기능이 매우 잘 설계되었습니다!

👍 Highlights

LLMService 아키텍처: LangChain + structured output으로 일관된 응답 보장
2단계 진단: 빠른 기술 분석 + 선택적 LLM 분석
Merge Context: 동일 스키마 파일 병합 시 통합 분석 제공
DatasetAnalyzingView: 순차적 메시지 애니메이션 UX 훌륭
프롬프트 엔지니어링: 명확한 가이드라인

💡 향후 개선 (Optional)

LLM 실패 시 사용자 피드백/재시도 옵션
AgentRecommendation 접근성 개선 (실제 checkbox 사용)
LLM_BATCH_SIZE 설정으로 조정 가능하게

대규모 기능 추가인데 구조가 깔끔합니다! 🚀

tendtoyj and others added 13 commits January 25, 2026 23:41

fix(dataset-card): change file list layout to vertical

5f8ee6d

- Replace inline-flex with flex for vertical stacking - Use flex-col gap-2 for consistent spacing - Add w-fit to constrain item width to content Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tendtoyj self-assigned this Jan 27, 2026

tendtoyj requested review from FluxloopAdmin and chuckgu January 27, 2026 11:29

tendtoyj and others added 2 commits January 27, 2026 20:36

fix(api): return 404 for nonexistent file diagnosis

11face0

DiagnosisError with "File not found" now returns 404 instead of 500. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

FluxloopAdmin approved these changes Jan 27, 2026

View reviewed changes

FluxloopAdmin merged commit d7f9c6a into main Jan 27, 2026
1 check passed

tendtoyj deleted the feat/dataset-llm-scan branch January 27, 2026 11:48

tendtoyj mentioned this pull request Feb 9, 2026

fix(frontend): todo list empty bug in agent chat #21

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(diagnosis): add LLM-powered dataset analysis with redesigned UI#4

feat(diagnosis): add LLM-powered dataset analysis with redesigned UI#4
FluxloopAdmin merged 15 commits into
mainfrom
feat/dataset-llm-scan

tendtoyj commented Jan 27, 2026 •

edited

Loading

Uh oh!

FluxloopAdmin left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tendtoyj commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Backend

Frontend

Test plan

Uh oh!

FluxloopAdmin left a comment

Choose a reason for hiding this comment

✅ LGTM - Approved

👍 Highlights

💡 향후 개선 (Optional)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tendtoyj commented Jan 27, 2026 •

edited

Loading