# 삼성전자 4Q25 — Self-Planning Sellside Report Generator

**핵심 아이디어**
1. 사용자가 자연어로 질문 (예: '8페이지짜리 sellside report 써줘')
2. LLM이 스스로 목차(outline)를 생성
3. LLM이 자신이 만든 목차의 각 섹션을 RAG로 개별 생성
4. 전체를 합쳐서 최종 리포트 출력

## 1~6. 기존 설정 (기존 코드와 동일)

In [1]:
from dotenv import load_dotenv
import os
import re
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

load_dotenv()
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
file_paths = [
    "C:/Users/user/Downloads/삼성_2025Q4_conference_eng_presentation.pdf",
    "C:/Users/user/Downloads/삼성_2025Q4_script_eng_AudioScript.pdf"
]

docs = []
for path in file_paths:
    loader = PyPDFLoader(path)
    docs.extend(loader.load())

print(f"총 로드된 페이지 수: {len(docs)}")

총 로드된 페이지 수: 49


In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
splits = text_splitter.split_documents(docs)
print(f"분할된 청크 수: {len(splits)}")

embeddings = HuggingFaceEmbeddings(
    model_name="BAAI/bge-m3",
    model_kwargs={'device': 'cuda'}
)
vectorstore = FAISS.from_documents(documents=splits, embedding=embeddings)

# k=8로 높여 섹션별 생성 시 더 많은 컨텍스트 확보
retriever = vectorstore.as_retriever(search_kwargs={"k": 8})

분할된 청크 수: 92


In [4]:
repo_id = "google/gemma-2-9b-it"

# ★ 핵심 변경: max_new_tokens를 2048로 올립니다
# 이것이 '8페이지 리포트가 1페이지로 잘리는' 문제의 직접적인 원인입니다.
llm_endpoint = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_new_tokens=2048,
    temperature=0.1,
    huggingfacehub_api_token=hf_token,
)
chat_llm = ChatHuggingFace(llm=llm_endpoint)
print("LLM 준비 완료")

LLM 준비 완료


## 7. 자동 플래닝 + 섹션별 생성 함수

### 흐름
```
사용자 질문
    ↓
[Step 1] PLANNING CHAIN
  - RAG 없이 LLM에게 목차 생성 요청
  - 출력: 섹션 제목 리스트 (파싱 가능한 형식)
    ↓
[Step 2] SECTION GENERATION CHAIN (섹션별 반복)
  - 각 섹션 제목 + 원본 질문을 RAG에 쿼리
  - 출력: 섹션별 상세 내용
    ↓
[Step 3] COMBINE
  - 모든 섹션을 순서대로 합침
  - 출력: 최종 완성 리포트
```

In [5]:
# ============================================================
# STEP 1: PLANNING CHAIN
# LLM이 사용자 질문을 보고 스스로 목차를 만듭니다.
# RAG retriever 없이 순수 LLM만 사용합니다
# (여기서 목차 생성에는 문서 검색이 필요 없음).
# ============================================================

PLANNING_PROMPT = ChatPromptTemplate.from_template("""
You are a senior sell-side equity analyst. A client has made the following request:

REQUEST: {user_question}

Your task is ONLY to create a detailed report outline (table of contents).
Do NOT write the report content yet.

Output a numbered list of section titles that together would cover the request completely.
Each section title should be specific enough to guide a detailed financial analysis.
Output ONLY the numbered list, nothing else. No introduction, no explanation.

Example format:
1. Executive Summary & Investment Thesis
2. Consolidated Financial Results (4Q25 P&L)
3. ...

Section titles:
""")

# Planning chain 생성 (Retrieval is not included: Context x)
planning_chain = PLANNING_PROMPT | chat_llm | StrOutputParser()


def parse_sections(outline_text: str) -> list[str]:
    #LLM이 생성한 목차 텍스트를 섹션 제목 리스트로 파싱
    lines = outline_text.strip().split("\n")
    sections = []
    for line in lines:
        line = line.strip()
        if not line:
            continue
        # '1. Title', '1) Title', '- Title' 등 다양한 형식 처리
        cleaned = re.sub(r'^[\d]+[.)\-]\s*', '', line).strip()
        if cleaned:
            sections.append(cleaned)
    return sections


print("Planning chain 준비 완료\n")
 # 예시 질문
# outline_text = planning_chain.invoke({"user_question": "삼성전자 2025년 Q4 분기실적발표에 대해 상세하게 모든 디테일을 놓치지 않고 financial sellside analyst report 형식으로 8장으로 요약해줘"})
# print(outline_text)

Planning chain 준비 완료



In [6]:
# ============================================================
# STEP 2: SECTION GENERATION CHAIN
# 각 섹션을 RAG로 개별 생성합니다.
# ============================================================

SECTION_PROMPT = ChatPromptTemplate.from_template("""
You are a senior sell-side equity analyst writing a research report on Samsung Electronics 4Q 2025.

The overall report request is: {user_question}

You are now writing ONLY this specific section:
SECTION: {section_title}

Use the following source material from Samsung's earnings call and presentation:
---
{context}
---

INSTRUCTIONS:
- Write in full sentences and professional analyst prose. No bullet points.
- Include ALL specific numbers, financial figures, and percentages from the context.
- Use financial terms: QoQ, YoY, bps, OP margin, ASP, etc.
- Be thorough and detailed. This section should fill roughly one page.
- Do NOT include a section title header in your output.
- Do NOT write intro/conclusion boilerplate. Just write the section body.

Section content:
""")


def generate_section(section_title: str, user_question: str) -> str:
    """단일 섹션을 RAG로 생성."""
    # RAG: 섹션 제목 + 원본 질문을 합쳐서 검색 쿼리로 사용
    search_query = f"{section_title} Samsung Electronics 4Q 2025 earnings"
    context_docs = retriever.invoke(search_query)
    context_text = "\n\n".join([doc.page_content for doc in context_docs])
    
    # 섹션 생성
    section_chain = SECTION_PROMPT | chat_llm | StrOutputParser()
    content = section_chain.invoke({
        "user_question": user_question,
        "section_title": section_title,
        "context": context_text
    })
    return content


print("Section generation chain 준비 완료")

Section generation chain 준비 완료


In [7]:
# ============================================================
# STEP 3: COMBINE + 전체 파이프라인
# ============================================================

def generate_full_report(user_question: str) -> str:
    """
    사용자 질문 하나로 전체 sellside 리포트를 생성합니다.
    
    Plan → Section-by-Section → Combine
    """
    
    # --- STEP 1: PLANNING ---
    print("[Step 1/3] 목차 생성 중...")
    outline_text = planning_chain.invoke({"user_question": user_question})
    
    print("\n생성된 목차:")
    print("-" * 40)
    print(outline_text)
    print("-" * 40)
    
    sections = parse_sections(outline_text)
    print(f"\n파싱된 섹션 수: {len(sections)}개")
    
    # --- STEP 2: SECTION-BY-SECTION GENERATION ---
    print("\n[Step 2/3] 섹션별 내용 생성 중...")
    generated_sections = []
    
    for i, section_title in enumerate(sections):
        print(f"  [{i+1}/{len(sections)}] {section_title}")
        content = generate_section(section_title, user_question)
        generated_sections.append((section_title, content))
        print(f"         → {len(content)}자 생성")
    
    # --- STEP 3: COMBINE ---
    print("\n[Step 3/3] 최종 리포트 합성 중...")
    
    report_parts = [
        "=" * 72,
        "SAMSUNG ELECTRONICS (005930 KS)",
        "4Q 2025 Earnings Review — Sell-Side Research Report",
        "=" * 72,
        ""
    ]
    
    for i, (title, content) in enumerate(generated_sections):
        report_parts.append(f"{i+1}. {title}")
        report_parts.append("-" * 40)
        report_parts.append(content.strip())
        report_parts.append("")
    
    report_parts += [
        "=" * 72,
        "DISCLAIMER: Based on Samsung Electronics 4Q 2025 earnings materials.",
        "=" * 72
    ]
    
    full_report = "\n".join(report_parts)
    
    total_chars = len(full_report)
    total_lines = len(full_report.splitlines())
    print(f"\n완료! 총 {total_chars:,}자 / {total_lines:,}줄")
    
    return full_report


print("generate_full_report() 함수 준비 완료")

generate_full_report() 함수 준비 완료


In [8]:
# ============================================================
# 실행: 기존 질문 그대로 사용
# ============================================================

user_question = "삼성전자 2025년 Q4 분기실적발표에 대해 상세하게 모든 디테일을 놓치지 않고 financial sellside analyst report 형식으로 8장으로 요약해줘"

report = generate_full_report(user_question)

[Step 1/3] 목차 생성 중...

생성된 목차:
----------------------------------------
1. Executive Summary & Investment Thesis
2. Consolidated Financial Results (4Q25 P&L)
3. Segment Performance Analysis (Semiconductors, Display, Consumer Electronics, etc.)
4. Key Drivers of Performance (Demand Trends, Pricing, Costs, FX)
5. Balance Sheet & Cash Flow Analysis
6. Guidance & Outlook (FY26 Expectations)
7. Valuation & Relative Performance
8. Risks & Opportunities 



----------------------------------------

파싱된 섹션 수: 8개

[Step 2/3] 섹션별 내용 생성 중...
  [1/8] Executive Summary & Investment Thesis
         → 1686자 생성
  [2/8] Consolidated Financial Results (4Q25 P&L)
         → 1014자 생성
  [3/8] Segment Performance Analysis (Semiconductors, Display, Consumer Electronics, etc.)
         → 2121자 생성
  [4/8] Key Drivers of Performance (Demand Trends, Pricing, Costs, FX)
         → 1574자 생성
  [5/8] Balance Sheet & Cash Flow Analysis
         → 1902자 생성
  [6/8] Guidance & Outlook (FY26 Expectations)
         → 1340

In [9]:
print(report)

SAMSUNG ELECTRONICS (005930 KS)
4Q 2025 Earnings Review — Sell-Side Research Report

1. Executive Summary & Investment Thesis
----------------------------------------
Samsung Electronics delivered a stellar fourth quarter of 2025, achieving record quarterly revenue of 93.8 trillion won, a 9% sequential increase. This strong performance was driven by a robust rebound in the DS Division, which more than offset the decline in the DX Division. Operating profit surged to 20.1 trillion won, representing a 7.3 percentage point sequential expansion to an impressive 21.4% operating margin.  

The DS Division's exceptional performance was fueled by significant improvements in memory profitability, driven by robust bit growth in both DRAM and NAND, coupled with a favorable rise in average selling prices (ASP).  Currency movements also played a positive role, with the appreciation of the US dollar and other major currencies adding approximately 1.6 trillion won to company-wide operating profit, pr

In [10]:
# 파일 저장
output_path = "C:/Users/user/Downloads/Samsung_4Q25_AutoPlan_Report.txt"
with open(output_path, "w", encoding="utf-8") as f:
    f.write(report)
print(f"저장 완료: {output_path}")

저장 완료: C:/Users/user/Downloads/Samsung_4Q25_AutoPlan_Report.txt


## 참고: Gemma-2-9b-it의 토큰 한계와 현실적 기대치

| 단계 | 토큰 사용량 | 비고 |
|------|------------|------|
| Step 1 (목차 생성) | ~300-400 tokens | 빠름 |
| Step 2 (섹션당) | ~800-1500 tokens | max_new_tokens=2048 필요 |
| Step 3 (합치기) | 0 tokens | 코드 단에서 처리 |

**섹션 수가 8개라면:** 총 LLM 호출 = 1(계획) + 8(섹션) = 9번

**HuggingFace Inference API 주의사항:**
- 무료 플랜은 분당 요청 수 제한 있음 → 섹션이 많으면 rate limit 오류 가능
- 오류 발생 시 `time.sleep(5)`을 `generate_section()` 호출 사이에 추가:
  ```python
  import time
  # generate_section() 호출 후
  time.sleep(5)
  ```