# üß† O-ISAC CoT Master Pipeline (V4)

**"Optical Integrated Sensing and Communication"** Sistematik Derlemesi i√ßin Ana Y√∂netim Paneli.

**A≈üamalar:**
1. üì¶ Setup & Mount
2. üè≠ Phase 1: Data Prep (PDF ‚Üí Markdown)
3. üëÅÔ∏è Phase 2: Visual Analysis (Gemini Vision)
4. üß† Phase 3: Integrated Reasoning (V4 Llama Engine) **[NEW]**
5. üìä Results & Export

**Gereksinimler:**
- Colab GPU Runtime (T4 veya A100)
- GROQ_API_KEY (Colab Secrets)
- GOOGLE_API_KEY (Colab Secrets)

---
**Son G√ºncelleme:** 2025-12-13
**Versiyon:** 4.0 (The Factory)

---
## üì¶ Section 1: Setup & Mount

In [None]:
# @title 1.1 Install Dependencies
# Phase 1 & 2 dependencies
!pip install marker-pdf -q
!pip install transformers torch pillow -q

# Phase 3 & V4 Engine dependencies
!pip install groq nest_asyncio pandas pyyaml -q
!pip install -q -U google-generativeai

print("‚úÖ T√ºm baƒüƒ±mlƒ±lƒ±klar y√ºklendi!")

In [None]:
# @title 1.2 Mount Google Drive & Setup Paths
from google.colab import drive
from google.colab import userdata
import os
import sys

# Mount Drive
drive.mount('/content/drive')

# Project Paths
PROJECT_ROOT = '/content/drive/MyDrive/AKU_WorkSpace/survey_fdgit/OISAC_PRISMA_COMST'
NOTEBOOKS_DIR = os.path.join(PROJECT_ROOT, 'analysis/notebooks')
COT_LAB_DIR = os.path.join(PROJECT_ROOT, 'analysis/cot_laboratory')
PDF_DIR = os.path.join(PROJECT_ROOT, 'data/retrieved_docs')
MARKDOWN_DIR = os.path.join(PROJECT_ROOT, 'data/processed_markdowns')
OUTPUT_DIR_V4 = os.path.join(PROJECT_ROOT, 'data/extraction_results_v4')

# Add to Python Path
sys.path.insert(0, NOTEBOOKS_DIR)
sys.path.insert(0, PROJECT_ROOT)

print(f"üìÅ Project Root: {PROJECT_ROOT}")
print(f"üìÑ PDF Directory: {PDF_DIR}")
print(f"üìù Markdown Directory: {MARKDOWN_DIR}")
print(f"üìä V4 Output Directory: {OUTPUT_DIR_V4}")
print("‚úÖ Paths configured!")

In [None]:
# @title 1.3 Load API Keys
try:
    os.environ["GROQ_API_KEY"] = userdata.get('GROQ_API_KEY')
    os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
    print("‚úÖ API Keys (Groq + Google) y√ºklendi!")
except Exception as e:
    print("‚ùå HATA: Sol men√ºden üîë Secrets b√∂l√ºm√ºne API anahtarlarƒ±nƒ± ekleyin!")
    print(f"   Hata detayƒ±: {e}")

---
## üè≠ Section 2: Phase 1 - Digitalization (PDF ‚Üí Markdown)

**‚ö†Ô∏è GPU Gerektirir!** Bu adƒ±m PDF'leri OCR ile markdown'a √ßevirir.
*Engine: `marker-pdf`*

In [None]:
# @title 2.1 Import & Status Check
import extraction_pipeline_v3 as v3
from extraction_pipeline_v3 import Config

# Initialize
Config.init_dirs()
checkpoint = v3.CheckpointManager(Config.CHECKPOINT_FILE)

# Show status
processed_count = len(checkpoint.data.get('processed', {}))
import glob
pdf_count = len(glob.glob(os.path.join(PDF_DIR, '*.pdf')))

print(f"üìä PDF Durumu: {pdf_count} toplam, {processed_count} i≈ülenmi≈ü.")

In [None]:
# @title 2.2 Run Digitization (Phase 1)
# ‚ö†Ô∏è paper ba≈üƒ±na ~2 dk s√ºrer

FORCE_REPROCESS = False # @param {type:"boolean"}

print("‚è≥ Phase 1: Dijitalle≈ütirme ba≈ülƒ±yor...")
v3.phase1_marker_conversion(checkpoint, force_all=FORCE_REPROCESS)
print("‚úÖ Phase 1 tamamlandƒ±!")

---
## üñºÔ∏è Section 3: Phase 2 - Visual Analysis

Grafikleri ve ≈üemalarƒ± anlamlandƒ±rƒ±r.
*Engine: `Gemini 2.5 Flash`*

In [None]:
# @title 3.1 Run Visual Analysis
print("‚è≥ Phase 2: G√∂rsel analiz ba≈ülƒ±yor...")
v3.phase2_visual_analysis(checkpoint)
print("‚úÖ Phase 2 tamamlandƒ±!")

---
## üß† Section 4: Phase 3 - Integrated Reasoning (V4)

**YENƒ∞:** Hem akƒ±l y√ºr√ºtme (CoT) hem de veri √ßƒ±karmayƒ± tek seferde yapar.
*Engine: `Llama 3.3 70B` + `CoTAssembler`*

In [None]:
# @title 4.1 Import V4 Engine
import extraction_pipeline_v4 as v4
from extraction_pipeline_v4 import ConfigV4

# Init V4 environment
ConfigV4.init_dirs()
v4_checkpoint = v4.CheckpointManager(os.path.join(ConfigV4.OUTPUT_DIR, "checkpoint_v4.json"))

print("‚úÖ V4 Engine (Factory) Hazƒ±r!")
print(f"üìÇ V4 √áƒ±ktƒ± Hedefi: {ConfigV4.OUTPUT_DIR}")

In [None]:
# @title 4.2 Run Integrated Extraction (Phase 3)

LIMIT = 5 # @param {type:"integer"}

print(f"üöÄ Phase 3: Akƒ±l Y√ºr√ºtme ve √áƒ±karma (Max {LIMIT} paper)...")

results = v4.phase3_integrated_extraction(v4_checkpoint, limit=LIMIT)

print(f"\nüéâ ƒ∞≈ülem Tamamlandƒ±! {len(results)} makale analiz edildi.")

---
## üìä Section 5: Results & Dashboard

In [None]:
# @title 5.1 Show Latest Extractions
import pandas as pd
import glob

csv_path = os.path.join(ConfigV4.OUTPUT_DIR, "extraction_v4_summary.csv")

if os.path.exists(csv_path):
    df = pd.read_csv(csv_path)
    print(f"üìä Toplam {len(df)} kayƒ±t bulundu.")
    display(df.head())
else:
    print("‚ÑπÔ∏è Hen√ºz sonu√ß csv dosyasƒ± olu≈ümamƒ±≈ü.")