# IPMK Orchestrator v3 (VAST)

Интеллектуальный парсинг методологического знания (IPMK).

Этапы: Sync (GCS→локально) → Render (300 dpi) → GroundedDINO+SAM2 → Analyze (LLM) → Ingest (индекс) → Review/Eval.

Запускать в удалённом окне VSCode (SSH: vast-4090). Перед запуском: `. ./.env` (или используйте scripts/vast_run.sh).

In [1]:
# Опциональная подгрузка .env (без внешних пакетов)
import os, shlex, pathlib

# Меняем рабочую директорию на корень проекта, если запускаем из папки notebooks
if os.path.basename(os.getcwd()) == 'notebooks':
    os.chdir('..')
print(f'Current working directory: {os.getcwd()}')

def load_env_file(path='.env'):
    p = pathlib.Path(path)
    if not p.exists():
        return
    for line in p.read_text(encoding='utf-8').splitlines():
        line = line.strip()
        if not line or line.startswith('#') or '=' not in line:
            continue
        k, v = line.split('=', 1)
        if v and v[0] == v[-1]:
            if v[0] in ("'", '"'):
                v = v[1:-1]
        os.environ.setdefault(k, v)

# Раскомментируйте для явной загрузки .env (обычно .env уже подгружается в окружение)
# load_env_file()

MODEL_DIR = os.getenv('MODEL_DIR', '/root/models')
DATA_DIR  = os.getenv('DATA_DIR',  '/root/data')
# Источники
PLAYBOOKS_SRC = os.getenv('PLAYBOOKS_SRC', 'gs://pik_source_bucket/playbooks')
FRAMES_SRC    = os.getenv('FRAMES_SRC',    'gs://pik_source_bucket/frames')
PLAYBOOKS_DST = os.getenv('PLAYBOOKS_DST', '/root/data/playbooks')
FRAMES_DST    = os.getenv('FRAMES_DST',    '/root/data/frames')

# Входы для текущего прогонки/интеграции (если нужен один файл)
PDF_PATH  = os.getenv('PDF_PATH',  '/root/data/playbook.pdf')
JSON_PATH = os.getenv('JSON_PATH', '/root/data/playbook.json')

# Артефакты
OUT_PAGES = os.getenv('OUT_PAGES', 'out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11')
OUT_DET   = os.getenv('OUT_DET',   'out/visual/grounded_regions')
INDEX_PATH = os.getenv('INDEX_PATH', 'out/openai_embeddings.ndjson')

CHAT_MODEL = os.getenv('CHAT_MODEL', 'gpt-4o')
EMB_MODEL  = os.getenv('EMB_MODEL',  'text-embedding-3-large')
OPENAI_SET = bool(os.getenv('OPENAI_API_KEY'))

# Тогглы этапов
DO_SYNC   = os.getenv('IPMK_DO_SYNC',   '1') == '1'
DO_RENDER = os.getenv('IPMK_DO_RENDER', '1') == '1'
DO_DETECT = os.getenv('IPMK_DO_DETECT', '1') == '1'
DO_ANALYZE= os.getenv('IPMK_DO_ANALYZE','1') == '1'
DO_INGEST = os.getenv('IPMK_DO_INGEST', '1') == '1'
DO_REVIEW = os.getenv('IPMK_DO_REVIEW', '1') == '1'
DO_EVAL   = os.getenv('IPMK_DO_EVAL',   '1') == '1'

print('OPENAI set?', OPENAI_SET)
print('OUT_PAGES =', OUT_PAGES)
print('OUT_DET   =', OUT_DET)
print('INDEX_PATH=', INDEX_PATH)
# Ensure a writable TMPDIR for OCR (pytesseract)
from pathlib import Path
Path('/tmp/tess_tmp').mkdir(parents=True, exist_ok=True)
os.environ.setdefault('TMPDIR','/tmp/tess_tmp')


Current working directory: /root/AiPIK
OPENAI set? True
OUT_PAGES = out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11
OUT_DET   = out/visual/grounded_regions
INDEX_PATH= out/openai_embeddings.ndjson


In [2]:
# Вспомогательные функции запуска
import subprocess, shlex
from pathlib import Path

def run(cmd: list[str]):
    print('[run]', ' '.join(shlex.quote(x) for x in cmd))
    subprocess.run(cmd, check=True)

Path('out').mkdir(exist_ok=True)
Path(OUT_PAGES).parent.mkdir(parents=True, exist_ok=True)
Path(OUT_DET).mkdir(parents=True, exist_ok=True)


## 1) Sync источников из GCS (playbooks, frames)

In [3]:
if DO_SYNC:
    try:
        run(['python', 'scripts/batch_sync_gcs.py', '--playbooks-src', PLAYBOOKS_SRC, '--frames-src', FRAMES_SRC, '--playbooks-dst', PLAYBOOKS_DST, '--frames-dst', FRAMES_DST])
    except subprocess.CalledProcessError:
        print('gsutil/gcloud not installed or sync failed; skip sync')
else:
    print('Skip sync (DO_SYNC=0)')


[run] python scripts/batch_sync_gcs.py --playbooks-src gs://pik_source_bucket/playbooks --frames-src gs://pik_source_bucket/frames --playbooks-dst /root/data/playbooks --frames-dst /root/data/frames


Building synchronization state...


Starting synchronization...


Building synchronization state...


Running: gsutil -m rsync -r gs://pik_source_bucket/playbooks /root/data/playbooks
Running: gsutil -m rsync -r gs://pik_source_bucket/frames /root/data/frames


Starting synchronization...


## 2) Render всех PDF в 300 dpi

In [4]:
if DO_RENDER:
    run(['python', 'scripts/batch_render.py', '--src', PLAYBOOKS_DST, FRAMES_DST, '--out-root', 'out/page_images', '--dpi', '300'])
else:
    print('Skip render (DO_RENDER=0)')


[run] python scripts/batch_render.py --src /root/data/playbooks /root/data/frames --out-root out/page_images --dpi 300


Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 1 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-1.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 2 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-2.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 3 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-3.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 4 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-4.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 5 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-5.png
Rendered 2023-06 - fastbreakOne - Expert Guide - E

Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 43 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-43.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 44 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-44.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 45 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-45.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 46 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-46.png
Rendered 2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf page 47 -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-47.png
Rendered 2023-06 - fastbreakOne - Expert

Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 1 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-1.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 2 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-2.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 3 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-3.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 4 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-4.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 5 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-5.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 6 -> out/page_images/PIK - Expert Guide

Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 47 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-47.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 48 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-48.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 49 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-49.png
Rendered PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf page 50 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-50.png


Rendered PIK 5-0 - Introduction - English.pdf page 1 -> out/page_images/PIK 5-0 - Introduction - English/page-1.png
Rendered PIK 5-0 - Introduction - English.pdf page 2 -> out/page_images/PIK 5-0 - Introduction - English/page-2.png
Rendered PIK 5-0 - Introduction - English.pdf page 3 -> out/page_images/PIK 5-0 - Introduction - English/page-3.png
Rendered PIK 5-0 - Introduction - English.pdf page 4 -> out/page_images/PIK 5-0 - Introduction - English/page-4.png
Rendered PIK 5-0 - Introduction - English.pdf page 5 -> out/page_images/PIK 5-0 - Introduction - English/page-5.png
Rendered PIK 5-0 - Introduction - English.pdf page 6 -> out/page_images/PIK 5-0 - Introduction - English/page-6.png
Rendered PIK 5-0 - Introduction - English.pdf page 7 -> out/page_images/PIK 5-0 - Introduction - English/page-7.png
Rendered PIK 5-0 - Introduction - English.pdf page 8 -> out/page_images/PIK 5-0 - Introduction - English/page-8.png
Rendered PIK 5-0 - Introduction - English.pdf page 9 -> out/page_images/

Rendered L1 - Ecosystems Portfolio Map - v01.pdf page 1 -> out/page_images/L1 - Ecosystems Portfolio Map - v01/page-1.png


Rendered L2 - Ecosystem Strategy Map.pdf page 1 -> out/page_images/L2 - Ecosystem Strategy Map/page-1.png


Rendered L2 - Market Ecosystem Journey.pdf page 1 -> out/page_images/L2 - Market Ecosystem Journey/page-1.png


Rendered L3 - Platform Value Network Canvas.pdf page 1 -> out/page_images/L3 - Platform Value Network Canvas/page-1.png


Rendered PIK - Expert Guide - Platform IT Architecture - Assessment - v01.pdf page 1 -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Assessment - v01/page-1.png


Rendered PIK - Platform IT Architecture Canvas - Table View - v01.pdf page 1 -> out/page_images/PIK - Platform IT Architecture Canvas - Table View - v01/page-1.png


Rendered PIK - Platform IT Architecture Canvases - v01.pdf page 1 -> out/page_images/PIK - Platform IT Architecture Canvases - v01/page-1.png


Rendered PIK 5-0 - Ecosystem Forces Scan - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Ecosystem Forces Scan - ENG/page-1.png


Rendered PIK 5-0 - Longtail Discovery - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Longtail Discovery - ENG/page-1.png


Rendered PIK 5-0 - MVP - ENG.pdf page 1 -> out/page_images/PIK 5-0 - MVP - ENG/page-1.png


Rendered PIK 5-0 - Market Sizing - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Market Sizing - ENG/page-1.png


Rendered PIK 5-0 - NFX Reinforcement Engines - ENG.pdf page 1 -> out/page_images/PIK 5-0 - NFX Reinforcement Engines - ENG/page-1.png


Rendered PIK 5-0 - Network Effects Stimulation - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Network Effects Stimulation - ENG/page-1.png


Rendered PIK 5-0 - Platform AARRR Funnel - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform AARRR Funnel - ENG/page-1.png


Rendered PIK 5-0 - Platform Business Model - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Business Model - ENG/page-1.png


Rendered PIK 5-0 - Platform Crisis Response - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Crisis Response - ENG/page-1.png


Rendered PIK 5-0 - Platform Evolution - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Evolution - ENG/page-1.png


Rendered PIK 5-0 - Platform Experience - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Experience - ENG/page-1.png


Rendered PIK 5-0 - Platform Monetization Canvas - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Monetization Canvas - ENG/page-1.png


Rendered PIK 5-0 - Platform Opportunity - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Opportunity - ENG/page-1.png


Rendered PIK 5-0 - Platform Stakeholder Persona - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Stakeholder Persona - ENG/page-1.png


Rendered PIK 5-0 - Platform Team - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Team - ENG/page-1.png


Rendered PIK 5-0 - Platform USP - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform USP - ENG/page-1.png


Rendered PIK 5-0 - Platform Value Network Canvas - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Platform Value Network Canvas - ENG/page-1.png


Rendered PIK 5-0 - Problem-Prioritization Matrix Canvas - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Problem-Prioritization Matrix Canvas - ENG/page-1.png


Rendered PIK 5-0 - Unfair Advantage - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Unfair Advantage - ENG/page-1.png


Rendered PIK 5-0 - User Behaviour - ENG.pdf page 1 -> out/page_images/PIK 5-0 - User Behaviour - ENG/page-1.png


Rendered PIK 5-0 - Value Chain Scan - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Value Chain Scan - ENG/page-1.png


Rendered PIK 5-0 - Venture Governance - ENG.pdf page 1 -> out/page_images/PIK 5-0 - Venture Governance - ENG/page-1.png
Rendering /root/data/playbooks/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English.pdf -> out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English (pages: 62, dpi=300)
Rendering /root/data/playbooks/PIK - Expert Guide - Platform IT Architecture - Playbook - v11.pdf -> out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11 (pages: 50, dpi=300)
Rendering /root/data/playbooks/PIK 5-0 - Introduction - English.pdf -> out/page_images/PIK 5-0 - Introduction - English (pages: 49, dpi=300)
Rendering /root/data/frames/L1 - Ecosystems Portfolio Map - v01.pdf -> out/page_images/L1 - Ecosystems Portfolio Map - v01 (pages: 1, dpi=300)
Rendering /root/data/frames/L2 - Ecosystem Strategy Map.pdf -> out/page_images/L2 - Ecosystem Strategy Map (pages: 1, dpi=300)
Rendering /root/data/frames/L2 - Market Ecosystem Jou

## 3) GroundedDINO + SAM2 на всех PNG

In [5]:
if DO_DETECT:
    prompts = ['diagram','canvas','table','legend','node','arrow','textbox']
    run(['python', 'scripts/batch_gdino_sam2.py', '--pages-root', 'out/page_images', '--outdir', OUT_DET, '--prompts', *prompts])
else:
    print('Skip detect (DO_DETECT=0)')


[run] python scripts/batch_gdino_sam2.py --pages-root out/page_images --outdir out/visual/grounded_regions --prompts diagram canvas table legend node arrow










final text_encoder_type: bert-base-uncased
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-1.png: сохранено 3 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-10.png: сохранено 5 регионов -> out/visual/grounded_regions/page-10/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-11.png: сохранено 4 регионов -> out/visual/grounded_regions/page-11/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-12.png: сохранено 3 регионов -> out/visual/grounded_regions/page-12/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-13.png: сохранено 2 регионов -> out/visual/grounded_regions/page-13/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-14.

[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-49.png: сохранено 3 регионов -> out/visual/grounded_regions/page-49/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-5.png: сохранено 4 регионов -> out/visual/grounded_regions/page-5/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-50.png: сохранено 4 регионов -> out/visual/grounded_regions/page-50/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-51.png: сохранено 3 регионов -> out/visual/grounded_regions/page-51/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-52.png: сохранено 3 регионов -> out/visual/grounded_regions/page-52/regions
[OK] out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-53.png: сохранено 2 регионов -> out/visual/gro







final text_encoder_type: bert-base-uncased
[OK] out/page_images/L2 - Market Ecosystem Journey/page-1.png: сохранено 2 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/L3 - Platform Value Network Canvas/page-1.png: сохранено 2 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Assessment - v01/page-1.png: сохранено 3 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-1.png: сохранено 2 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-10.png: сохранено 2 регионов -> out/visual/grounded_regions/page-10/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-11.png: сохранен�� 4 регионов -> out/visual/grounded_regions/page-11/regions
[OK] out/page_images/PIK - Expert

[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-48.png: сохранено 2 регионов -> out/visual/grounded_regions/page-48/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-49.png: сохранено 3 регионов -> out/visual/grounded_regions/page-49/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-5.png: сохранено 3 регионов -> out/visual/grounded_regions/page-5/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-50.png: сохранено 5 регионов -> out/visual/grounded_regions/page-50/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-6.png: сохранено 4 регионов -> out/visual/grounded_regions/page-6/regions
[OK] out/page_images/PIK - Expert Guide - Platform IT Architecture - Playbook - v11/page-7.png: сохранено 3 регионов -> out/visual/grounded_regions/page-7/regions
[OK] out/page_im









final text_encoder_type: bert-base-uncased
[OK] out/page_images/PIK 5-0 - Introduction - English/page-17.png: сохранено 3 регионов -> out/visual/grounded_regions/page-17/regions
[OK] out/page_images/PIK 5-0 - Introduction - English/page-18.png: сохранено 2 регионов -> out/visual/grounded_regions/page-18/regions
[OK] out/page_images/PIK 5-0 - Introduction - English/page-19.png: сохранено 4 регионов -> out/visual/grounded_regions/page-19/regions
[OK] out/page_images/PIK 5-0 - Introduction - English/page-2.png: сохранено 2 регионов -> out/visual/grounded_regions/page-2/regions
[OK] out/page_images/PIK 5-0 - Introduction - English/page-20.png: сохранено 2 регионов -> out/visual/grounded_regions/page-20/regions
[OK] out/page_images/PIK 5-0 - Introduction - English/page-21.png: сохранено 3 регионов -> out/visual/grounded_regions/page-21/regions
[OK] out/page_images/PIK 5-0 - Introduction - English/page-22.png: сохранено 2 регионов -> out/visual/grounded_regions/page-22/regions
[OK] out/page_

[OK] out/page_images/PIK 5-0 - Platform Stakeholder Persona - ENG/page-1.png: сохранено 3 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK 5-0 - Platform Team - ENG/page-1.png: сохранено 2 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK 5-0 - Platform USP - ENG/page-1.png: сохранено 4 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK 5-0 - Platform Value Network Canvas - ENG/page-1.png: сохранено 2 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK 5-0 - Problem-Prioritization Matrix Canvas - ENG/page-1.png: сохранено 1 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK 5-0 - Unfair Advantage - ENG/page-1.png: сохранено 2 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK 5-0 - User Behaviour - ENG/page-1.png: сохранено 1 регионов -> out/visual/grounded_regions/page-1/regions
[OK] out/page_images/PIK 5-0 - Va

Running: python scripts/grounded_sam_detect.py --images out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-1.png out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-10.png out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-11.png out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-12.png out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-13.png out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-14.png out/page_images/2023-06 - fastbreakOne - Expert Guide - Ecosystem Strategy  - English/page-15.png ... (+more)
Running: python scripts/grounded_sam_detect.py --images out/page_images/L2 - Market Ecosystem Journey/page-1.png out/page_images/L3 - Platform Value Network Canvas/page-1.png out/page_images/PIK - Expert Guide - Platform IT Architecture - Assessme

## 4) LLM-анализ регионов

In [6]:
if DO_ANALYZE:
    if not OPENAI_SET:
        raise SystemExit('OPENAI_API_KEY is required for analysis')
    run(['python', 'scripts/analyze_detected_regions.py', '--detected-dir', OUT_DET, '--all', '--outdir', OUT_DET, '--chat-model', CHAT_MODEL, '--skip-existing', '--profile', 'auto', '--synonyms', 'config/semantic_synonyms.yaml', '--weights', 'config/visual_objects_weights.yaml'])
else:
    print('Skip analyze (DO_ANALYZE=0)')


[run] python scripts/analyze_detected_regions.py --detected-dir out/visual/grounded_regions --all --outdir out/visual/grounded_regions --chat-model gpt-4o --skip-existing


Analyzed 4 detected regions in unit page-1 -> out/visual/grounded_regions/page-1/regions
Analyzed 5 detected regions in unit page-10 -> out/visual/grounded_regions/page-10/regions
Analyzed 5 detected regions in unit page-11 -> out/visual/grounded_regions/page-11/regions
Analyzed 3 detected regions in unit page-12 -> out/visual/grounded_regions/page-12/regions
Analyzed 3 detected regions in unit page-13 -> out/visual/grounded_regions/page-13/regions
Analyzed 5 detected regions in unit page-14 -> out/visual/grounded_regions/page-14/regions
Analyzed 3 detected regions in unit page-15 -> out/visual/grounded_regions/page-15/regions
Analyzed 3 detected regions in unit page-16 -> out/visual/grounded_regions/page-16/regions
Analyzed 5 detected regions in unit page-17 -> out/visual/grounded_regions/page-17/regions
Analyzed 4 detected regions in unit page-18 -> out/visual/grounded_regions/page-18/regions
Analyzed 4 detected regions in unit page-19 -> out/visual/grounded_regions/page-19/regions
A

## 5) Ingest индекса + 6) Review/Eval

In [7]:
if DO_INGEST:
    run(['python', 'scripts/ingest_visual_artifacts.py', '--source-json', JSON_PATH, '--regions-dir', OUT_DET, '--out', INDEX_PATH, '--model', EMB_MODEL])
else:
    print('Skip ingest (DO_INGEST=0)')

if DO_REVIEW or DO_EVAL:
    pathlib.Path('eval').mkdir(exist_ok=True)

if DO_REVIEW:
    run(['python', 'scripts/generate_visual_review.py', '--inline'])
else:
    print('Skip review (DO_REVIEW=0)')

if DO_EVAL:
    run(['python', 'scripts/eval_metrics.py', '--index', INDEX_PATH, '--eval', 'eval/queries.jsonl', '--prefer-visual'])
else:
    print('Skip eval (DO_EVAL=0)')


[run] python scripts/ingest_visual_artifacts.py --source-json /root/data/playbook.json --regions-dir out/visual/grounded_regions --out out/openai_embeddings.ndjson --model text-embedding-3-large


Ingested 1111 visual items into out/openai_embeddings.ndjson
[run] python scripts/generate_visual_review.py --inline
Wrote eval/visual_review.html
[run] python scripts/eval_metrics.py --index out/openai_embeddings.ndjson --eval eval/queries.jsonl --prefer-visual


recall@1: 0.000
recall@3: 0.000
recall@5: 0.000
ndcg@1: 0.000
ndcg@3: 0.000
ndcg@5: 0.000
MRR: 0.010 (over 78 annotated queries)
