# Domain Selection & Data Sourcing
The dataset used in this RAG system consists of annual reports from ASUS, a Taiwanese multinational corporation known for its computer, smartphone, and electronics products. The knowledge base includes reports from 2020 to 2024, with the most recent being the 2024 annual report.

These documents were chosen because they contain fact-dense, proprietary, and up-to-date information on ASUS’s financials, ESG strategies, product developments, and governance policies—content that general-purpose LLMs are unlikely to have seen during pretraining.

By grounding generation in this specialized dataset, the RAG system is able to provide accurate and contextually supported answers to domain-specific questions that standalone LLMs would likely hallucinate or answer incompletely.

# Reasons to use RAG over standalone LLM

## Proprietary and Up-to-date Content
ASUS annual reports contain company-specific data such as financial results, ESG actions, corporate governance decisions, and product milestones that are:
- Not publicly indexed on the web in fine detail
- Not likely to appear in the pretraining data of general-purpose LLMs
- Updated annually, so 2024 or even late 2023 content would not exist in any standalone LLM's knowledge base

➡️ RAG allows real-time grounding in proprietary, evolving data.

## Highly Factual, Precision-sensitive Domain
Questions in the annual report domain often require precise, non-negotiable facts such as:
- Net profit in 2023 or 2024
- EPS(Earnings per Share)
- Carbon emission target
- Specific ISO certifications or ESG scores

➡️ Standalone LLMs are prone to hallucination or guessing in such contexts. RAG enables retrieval of verbatim, source-linked facts, ensuring trust and auditability.

## Structured + Unstructured Hybrid Texts
The ASUS annual report includes:
- Tables, KPIs, and numeric data
- Governance principles written in bullet points

Standalone LLMs struggle to handle mixed formats unless specifically trained on such hybrid corpora.

➡️ RAG enables chunking, summarization (e.g., MultiVectorRetriever), and selective grounding from structured sources.

## Query Ambiguity and Semantic Drift
queries such as:
- "How did ASUS perform in ESG?"
- "What are ASUS's governance priorities?"

are ambiguous and open-ended.
Standalone LLMs might give plausible but vague answers.

➡️ RAG systems can use techniques like HyDE to expand such queries and retrieve targeted content.

# Import Packages

In [19]:
import uuid
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Load Data

Load every annual report into a list and add its year to its metadata as information

In [3]:
docs = []
years = range(2020, 2025)

for year in years:
    path = f"./{year}_Annual_Report.pdf"
    loader = PyPDFLoader(path)
    doc_load = loader.load()
    
    # Add metadata to page to show the year
    for doc in doc_load:
        doc.metadata["year"] = year
        docs.append(doc)
    print(f"{year} Annual Reported Loaded")


2020 Annual Reported Loaded
2021 Annual Reported Loaded
2022 Annual Reported Loaded
2023 Annual Reported Loaded
2024 Annual Reported Loaded


# Data Preprocessing

## Chunking
The text from the ASUS annual reports is first split into manageable chunks using a sliding window approach with a chunk size of 1000 tokens and an overlap of 200 tokens. This ensures that:
- Each chunk contains enough context for semantic understanding
- Overlap preserves continuity across adjacent sections and avoids information loss at boundaries

In [17]:
#Split data
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(docs)
print(f"Total Chunks: {len(chunks)}")

Total Chunks: 6115


## Embedding
After splitting, each chunk is embedded using OpenAI's text-embedding-3-small model, a state-of-the-art embedding model known for its balance between performance and cost. These embeddings are used for vector similarity search in the RAG retrieval pipeline, allowing semantically relevant chunks to be retrieved even when the original user query uses different phrasing or vocabulary.

In [20]:
embedding_model = OpenAIEmbeddings(model="text-embedding-3-small")
llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0)

## Vector Store(Multi Vector Retriever)
Annual reports often contain a mix of text and tables. Simply splitting the document into chunks and embedding them directly can lead to important information—like entire tables—being lost or misrepresented (Oshin & Campos, 2025).

To handle this, each chunk is first summarized by an LLM. These summaries are then embedded and stored in a MultiVectorRetriever, while the original chunks are stored separately. This allows the system to search using concise semantic summaries, but still return the full, detailed chunks for generation.

In [21]:
prompt_template = "Please summarize the following document:\n\n{doc}"
prompt = ChatPromptTemplate.from_template(prompt_template)
summarize_chain = (
    {"doc": lambda x: x.page_content}
    | prompt
    | llm
    | StrOutputParser()
)

summarises = summarize_chain.batch(chunks, {"max_concurrency": 5})
print("✅ Summaries generated.")

✅ Summaries generated.


The original chunks are stored separately in an InMemoryStore. After retrieving relevant summaries using the MultiVectorRetriever, the system uses their IDs to fetch the corresponding full chunks from memory for final answer generation.

In [None]:
# Prepare the structure used by MultiVectorRetriever
vectorstore = Chroma(
    embedding_function=embedding_model,
    persist_directory="./chroma_multivector",
    collection_name="asus_summary_vectors"
)
docstore = InMemoryStore() # To store original chunk
id_key = "doc_id"

retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    docstore=docstore,
    id_key=id_key,
)

# Generate a unique ID for each summary and establish a corresponding
doc_ids = [str(uuid.uuid4()) for _ in chunks]

summary_docs = [
    Document(page_content=s, metadata={id_key: doc_ids[i]})
    for i, s in enumerate(summarises)
]

  vectorstore = Chroma(


In [None]:
# add summary to vector store
BATCH_SIZE = 5000 #Limit is 5461 so split in 2 batch

for i in range(0, len(summary_docs), BATCH_SIZE):
    batch = summary_docs[i : i + BATCH_SIZE]
    retriever.vectorstore.add_documents(batch)
    print(f"✅ Added batch {i//BATCH_SIZE + 1}")

# add original chunk to inmemory store
retriever.docstore.mset(list(zip(doc_ids, chunks)))

print("✅ Vector store and document store ready.")


✅ Added batch 1
✅ Added batch 2
✅ Vector store and document store ready.


In [89]:
# Test query to see if multivectorretriever is working well 
query = "What sustainability actions did ASUS report in 2023?"
retrieved_docs = retriever.invoke(query)

# Output results
print("📌 Retrieved Chunks:")
for i, doc in enumerate(retrieved_docs):
    print(f"\n--- Document {i+1} ---")
    print(doc.page_content[:500])  # print the first 500 to check
    print(f"[Source Year]: {doc.metadata.get('year')}")

📌 Retrieved Chunks:

--- Document 1 ---
85
 
85 
˙ Lead the value chain to net zero 
In 2023, ASUS products averaged 42% better than Energy Star 
standards. ASUS achieved RE30 across its global operational sites 
through self -generation (solar ) and procurement of Renewable 
Energy Certificates (solar/wind). 
˙ For detailed progress on goal attainment, please refer to the 
ASUS 2023 Sustainability Report. 
9. Greenhouse gas 
inventory and 
verification status 
Please refer to the ASUS 2023 Sustainability Report. 
 
(VIII) Corporate S
[Source Year]: 2023

--- Document 2 ---
59 
credits or the 
number of RECs 
should be specified. 
Innovative technologies:  
˙ Invest in innovative technologies 
˙ Remove residual emissions  
˙ Lead the value chain to net zero 
In 2024, ASUS achieved RE50 for its global operations through 
on-site solar energy generation for self -consumption and the 
procurement of renewable energy certificates (solar and wind 
power). Additionally, ASUS’s newly launched

# HyDE (Hypothetical Document Embeddings)

To improve retrieval, this system uses HyDE (Hypothetical Document Embeddings). Instead of directly embedding the user's question, the model first generates a hypothetical passage—a detailed paragraph that imagines what a good answer might look like.

This generated passage is then embedded and used for similarity search. The idea is that a full, coherent passage captures the user's intent better than a short or vague question, leading to more accurate retrieval.

According to Oshin & Campos (2025), a hypothetical document generated by an LLM tends to be semantically closer to the relevant source texts than the original query itself.

In [None]:
def run_hyde_rag(question: str, feedback: str = "",top_k: int = 5) -> dict:
    # HyDE passage prompt（with feedback）
    hyde_prompt = ChatPromptTemplate.from_template(
        """You are a financial expert. Based on the question and prior feedback, write a detailed and fact-focused passage that can help retrieve relevant company information.

Question:
{question}

Critic Feedback:
{feedback}

Generated Passage:"""
    )
    hyde_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0)
    hyde_generate = hyde_prompt | hyde_llm | StrOutputParser()
    hypothetical_passage = hyde_generate.invoke({"question": question, "feedback": feedback})

    # Embedding + Retrieval
    query_embedding = embedding_model.embed_query(hypothetical_passage)
    summary_docs = vectorstore.similarity_search_by_vector(query_embedding, k=top_k)
    doc_ids = [doc.metadata["doc_id"] for doc in summary_docs]
    chunks = [docstore.mget([doc_id])[0] for doc_id in doc_ids]

    # RAG Answer Generation
    context = "\n\n".join([doc.page_content for doc in chunks])
    rag_prompt = ChatPromptTemplate.from_template(
        """You are a corporate analyst. Use the information below to answer the question.

Context:
{context}

Question:
{question}

Answer:"""
    )
    formatted_prompt = rag_prompt.format_messages(
        context=context,
        question=question
    )
    answer_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0)
    answer = answer_llm.invoke(formatted_prompt)

    return {
        "answer": answer.content,
        "retrieved_docs": chunks,
        "hyde_passage": hypothetical_passage
    }

# Self Correction

To improve answer reliability, a self-correcting loop is implemented using an LLM as a critic.

After generating an answer with HyDE and retrieval, the response is sent to a second LLM acting as a fact-checking assistant. This critic evaluates whether the answer is fully supported by the retrieved context. If the response is approved, the loop ends.

If the critic finds issues—such as missing details or unsupported claims—it provides feedback, which is then used to guide a new HyDE generation in the next iteration. The system repeats this process for up to three rounds by default, balancing quality with runtime efficiency.

This approach helps filter out hallucinations and encourages grounded, verifiable answers.

In [None]:
def run_self_correcting_hyde_rag(question: str, max_rounds: int = 3) -> dict:
    critic_prompt = ChatPromptTemplate.from_template(
        """You are a fact-checking assistant. Evaluate the following answer based on the provided context.

Context:
{context}

Answer:
{answer}

Question:
{question}

If the answer is factually supported and complete, reply only: APPROVED.
If not, explain what is missing or incorrect and suggest improvements."""
    )
    critic_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0)

    feedback = ""  # no critics initially
    for round in range(max_rounds):
        print(f"\n🔁 Iteration {round + 1}:")

        # Add feedback
        result = run_hyde_rag(question, feedback)
        answer = result["answer"]
        retrieved_docs = result["retrieved_docs"]
        context = "\n\n".join([doc.page_content for doc in retrieved_docs])

        # Critic, evaluate the answer
        critic_input = critic_prompt.format_messages(
            context=context,
            answer=answer,
            question=question
        )
        verdict = critic_llm.invoke(critic_input).content.strip()
        print("🧠 Critic verdict:", verdict[:100])

        if verdict.startswith("APPROVED"):
            print("✅ Answer approved by Critic.")
            result["verdict"] = verdict
            result["approved"] = True
            return result
        else:
            print("❌ Critic suggests revision. Continuing loop...")
            feedback = verdict  # Add critic into next loop

    result["verdict"] = verdict
    result["approved"] = False
    return result


# Normal LLM Invoke in a function

To compare with RAG based LLM, the LLM is set to the same model, gpt-4.1-mini

In [None]:
import re
def without_rag(question: str) -> str:
    # Call LLM
    llm = ChatOpenAI(model='gpt-4.1-mini', temperature=0)
    result = llm.invoke(question)
    return result.content

# Generation

## Easy

Three straightforward, fact-based questions were selected to evaluate whether the model can correctly retrieve and answer with verifiable data:

- Net Profit in 2023?
- Invest in R&D in 2022?
- Chairman in 2024?

### Net profit in 2023

In [None]:
without_rag("What was ASUS's net profit in 2023?")

"I don't have access to ASUS's financial data for 2024. For the most accurate and up-to-date information on ASUS's net profit in 2024, I recommend checking their official financial reports, press releases, or trusted financial news sources. You can also visit ASUS's investor relations website. If you need help finding these resources, let me know!"

In [77]:
result = run_self_correcting_hyde_rag("What was ASUS's net profit in 2023?")
print("📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE passage:\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])
    print(f"[Year]: {doc.metadata.get('year')}")


🔁 Iteration 1:
🧠 Critic verdict: The answer correctly interprets the 97% increase and uses the formula to calculate the 2023 net prof
❌ Critic suggests revision. Continuing loop...

🔁 Iteration 2:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.
📣 Final Answer:
 ASUS's net profit after tax in 2023 was NT$17.9 billion.

🧠 HyDE passage:
 ASUS reported a net profit after tax of approximately NT$17.35 billion in 2023. This figure is derived from the company's 2024 net profit after tax of NT$34.2 billion, which represents a 97% increase compared to the previous year. By applying the formula to reverse-calculate the 2023 net profit—dividing the 2024 profit by 1.97—the 2023 net profit is confirmed to be around NT$17.35 billion. This substantial year-over-year growth highlights ASUS's improved financial performance and profitability in 2024 relative to 2023.

📄 Retrieved Chunks:

--- Chunk 1 ---
3 
compared to 2023. Net profit after tax was NT$34.2 billion, with net profit attributable

### Invest in R&D in 2022

In [44]:
without_rag("How much did ASUS invest in R&D in 2022?")

'In 2022, ASUS invested approximately NT$11.5 billion (New Taiwan Dollars) in research and development (R&D).'

In [87]:
result = run_self_correcting_hyde_rag("How much did ASUS invest in R&D in 2022?")

print("\n📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE Passage (Generated Query):\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # show the first 500 words to avoid too long
    print(f"[Year]: {doc.metadata.get('year')}")

print("\n🧠 Critic Verdict:\n", result["verdict"])
    


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.

📣 Final Answer:
 ASUS invested NT$20.6 billion in R&D in 2022.

🧠 HyDE Passage (Generated Query):
 As of my knowledge cutoff date in early 2023, ASUS (ASUSTeK Computer Inc.), a Taiwan-based multinational computer and phone hardware and electronics company, had not publicly disclosed the specific amount invested in Research and Development (R&D) for the year 2022. Typically, companies report their annual R&D expenditures in their financial statements, which are included in annual reports or filings with financial regulatory authorities.

To obtain the most accurate and up-to-date information regarding ASUS's R&D investment for 2022, one would need to consult the company's annual report or financial statements for that year, which are usually published in the following year. These documents can often be found on the company's official investor relations website or through financial databases and regulatory bodies th

### Chairman in 2024

In [46]:
without_rag("Who is the Chairman of ASUS as of the 2024 annual report?")

'As of the 2024 annual report, the Chairman of ASUS is Jonney Shih.'

In [86]:
result = run_self_correcting_hyde_rag("Who is the Chairman of ASUS as of the 2024 annual report?")

print("\n📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE Passage (Generated Query):\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # show the first 500 words to avoid too long
    print(f"[Year]: {doc.metadata.get('year')}")

print("\n🧠 Critic Verdict:\n", result["verdict"])


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.

📣 Final Answer:
 The Chairman of ASUS as of the 2024 annual report is Jonney Shih.

🧠 HyDE Passage (Generated Query):
 As of my last update in early 2023, I do not have access to the 2024 annual report for ASUS, or any events that may have occurred after that time. Therefore, I cannot provide the current Chairman of ASUS as of the 2024 annual report. As of my last knowledge update, Jonney Shih was the Chairman of ASUS. However, for the most current information, please refer to the latest reports or announcements from ASUS, as executive positions can change due to a variety of reasons including retirement, resignation, or organizational restructuring.

📄 Retrieved Chunks:

--- Chunk 1 ---
3838 
3. Directors’ Professional Qualifications and Independent Directors’ Independence Status 
    Criteria 
 
 Name 
Professional Qualification and Experience 
(Note 1) 
Independence Status 
(Note 2) 
Total number of 
positions 

## Ambiguous


These queries are quite general, for example, “strategic priorities” might appear under multiple headings in the report, such as product strategy, ESG roadmap, or risk disclosures.
- Has ASUS shown improvements in ESG performance over the past few years?
- What are the company’s main strategic priorities in 2024?
- How has ASUS addressed risks related to supply chain disruptions?

### Has ASUS shown improvements in ESG performance over the past few years?

In [48]:
without_rag("Has ASUS shown improvements in ESG performance over the past few years?")

"ASUS has demonstrated improvements in its Environmental, Social, and Governance (ESG) performance over the past few years. The company has increasingly integrated sustainability into its business strategy, focusing on areas such as reducing carbon emissions, enhancing energy efficiency, responsible supply chain management, and promoting social responsibility initiatives.\n\nKey indicators of ASUS's ESG progress include:\n\n1. **Environmental Initiatives**: ASUS has set targets to reduce greenhouse gas emissions and improve energy efficiency in its operations and products. The company has also worked on increasing the use of recycled materials and minimizing electronic waste through take-back and recycling programs.\n\n2. **Social Responsibility**: ASUS has invested in employee welfare, diversity and inclusion, and community engagement programs. It has also emphasized responsible labor practices within its supply chain.\n\n3. **Governance**: ASUS has strengthened its corporate governan

In [83]:
result = run_self_correcting_hyde_rag("Has ASUS shown improvements in ESG performance over the past few years?")

print("\n📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE Passage (Generated Query):\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # show the first 500 words to avoid too long
    print(f"[Year]: {doc.metadata.get('year')}")

print("\n🧠 Critic Verdict:\n", result["verdict"])


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.

📣 Final Answer:
 Yes, ASUS has demonstrated clear improvements in its ESG (Environmental, Social, and Governance) performance over the past few years, as evidenced by multiple indicators and recognitions outlined in the provided information:

1. **Environmental Initiatives and Achievements:**
   - ASUS has implemented comprehensive environmental protection activities, such as the “Digital Inclusion Project” to recycle IT equipment and donate to remote schools, reducing waste and bridging the urban-rural digital divide.
   - The company actively participates in environmental protection organizations and organizes recycling and donation activities.
   - ASUS has adopted the ISO50001 energy management system, using PDCA (Plan-Do-Check-Act) cycles to set and achieve annual energy-saving goals, ensuring continual improvement.
   - The headquarters received the LEED Platinum certification from the US Green Building Coun

### What are the company’s main strategic priorities in 2024?

In [50]:
without_rag("What are the company’s main strategic priorities in 2024?")

'I don’t have access to specific information about your company’s strategic priorities for 2024. However, if you provide me with the company name or some context, I can help summarize publicly available information or suggest common strategic priorities companies focus on in 2024. Alternatively, you might want to check your company’s official communications, such as annual reports, press releases, or internal strategy documents for the most accurate details.'

In [84]:
result = run_self_correcting_hyde_rag("What are the company’s main strategic priorities in 2024?")

print("\n📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE Passage (Generated Query):\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # show the first 500 words to avoid too long
    print(f"[Year]: {doc.metadata.get('year')}")

print("\n🧠 Critic Verdict:\n", result["verdict"])


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.

📣 Final Answer:
 Based on the provided information and the company’s recent strategic directions, ASUS’s main strategic priorities in 2024 are likely to focus on the following areas:

1. **Strengthening Corporate Culture and Organizational Resilience**  
   ASUS emphasizes building a stronger corporate culture that leverages collective wisdom and embraces idea meritocracy. This cultural foundation aims to help the company navigate ongoing global economic unpredictability and industry turbulence.

2. **Balancing Growth Opportunities with Risk Management**  
   The company plans to carefully balance long-term value creation with short-term performance, considering both growth opportunities and risks amid inflation, monetary policy adjustments, and geopolitical conflicts.

3. **Product Innovation and Market Expansion in Key Segments**  
   - **Computer Systems and Open-Platform Products:** Accelerate product innovati

### How has ASUS addressed risks related to supply chain disruptions?

In [53]:
without_rag("How has ASUS addressed risks related to supply chain disruptions?")

'ASUS has addressed risks related to supply chain disruptions through several strategic measures:\n\n1. **Diversification of Suppliers and Manufacturing Locations:** ASUS has diversified its supplier base and manufacturing facilities across multiple countries to reduce dependency on any single source or region. This helps mitigate risks from geopolitical tensions, natural disasters, or localized disruptions.\n\n2. **Inventory Management and Buffer Stocks:** The company maintains strategic inventory levels and buffer stocks of critical components to cushion against short-term supply interruptions.\n\n3. **Close Collaboration with Suppliers:** ASUS works closely with its suppliers to improve supply chain visibility and responsiveness. This collaboration enables quicker adjustments to changes in demand or supply conditions.\n\n4. **Investment in Supply Chain Technology:** ASUS leverages advanced supply chain management systems and data analytics to enhance forecasting accuracy, monitor ri

In [85]:
result = run_self_correcting_hyde_rag("How has ASUS addressed risks related to supply chain disruptions?")

print("\n📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE Passage (Generated Query):\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # show the first 500 words to avoid too long
    print(f"[Year]: {doc.metadata.get('year')}")

print("\n🧠 Critic Verdict:\n", result["verdict"])


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.

📣 Final Answer:
 ASUS has addressed risks related to supply chain disruptions through a comprehensive, multi-stage risk management and monitoring framework that integrates internal controls, external audits, and ongoing supplier engagement:

1. **Supplier Qualification and Certification:**  
   - ASUS requires all suppliers to pass a rigorous quality audit before cooperation, ensuring they meet high standards.  
   - Suppliers must obtain ISO 9001 (quality management) and ISO 14001 (environmental management) certifications.  
   - Suppliers are also required to sign the ASUS Code of Conduct Compliance Declaration and a Statement of Assurance Regarding Prison or Forced Labor, ensuring ethical and legal compliance.

2. **Sustained Risk Management Mechanism:**  
   - ASUS implements ongoing risk management by requiring suppliers to commit to human rights, health and safety, and environmental standards as a basis for 

## Domain-Specific

Such queries test whether retrieval can capture non-obvious, section-specific content, and whether generation can faithfully preserve that structure and intent.

- According to ASUS’s 2023 annual report, what are the principles guiding board independence?
- Does ASUS comply with Taiwan’s Corporate Governance 3.0 policy?
- What are ASUS’s carbon neutrality goals and timelines mentioned in recent reports?

### According to ASUS’s 2023 annual report, what are the principles guiding board independence?

In [59]:
without_rag("According to ASUS’s 2023 annual report, what are the principles guiding board independence?")

'According to ASUS’s 2023 annual report, the principles guiding board independence include ensuring that independent directors have no material or pecuniary relationship with the company, its subsidiaries, or management that could interfere with the exercise of independent judgment. The board emphasizes the importance of having a sufficient number of independent directors to provide unbiased oversight and to safeguard the interests of all shareholders. Independent directors are expected to contribute objective perspectives, monitor management performance, and uphold corporate governance standards in line with regulatory requirements and best practices.'

In [90]:
result = run_self_correcting_hyde_rag("According to ASUS’s 2023 annual report, what are the principles guiding board independence?")

print("\n📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE Passage (Generated Query):\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # show the first 500 words to avoid too long
    print(f"[Year]: {doc.metadata.get('year')}")

print("\n🧠 Critic Verdict:\n", result["verdict"])


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.

📣 Final Answer:
 According to ASUS’s 2023 annual report, the principles guiding board independence include:

1. **Adherence to Corporate Governance Best-Practice Principles:** ASUS has defined its corporate governance best-practice principles, which are publicly available in the “important internal rules” section on the corporate governance page of the Company’s investor relations website.

2. **Execution of Rights and Integrity:** The Company upholds integrity and maintains long-term cooperation with investors, suppliers, and other stakeholders, ensuring transparent and fair relationships.

3. **Director Liability Insurance:** ASUS has insured liability insurance for all directors, which supports independent decision-making by protecting directors from personal liability risks.

4. **Pursuit of Study for Directors:** The Company encourages continuous education and disclosure related to directors, which helps main

### Does ASUS comply with Taiwan’s Corporate Governance 3.0 policy?

In [61]:
without_rag("Does ASUS comply with Taiwan's Corporate Governace 3.0 policy?")

"ASUS, as a major Taiwanese publicly listed company, generally aligns its corporate governance practices with Taiwan's regulatory framework, including the Corporate Governance 3.0 policy promoted by Taiwan's Financial Supervisory Commission (FSC). \n\nTaiwan's Corporate Governance 3.0 policy aims to enhance transparency, strengthen board independence, improve shareholder rights, and promote sustainable business practices among listed companies. ASUS typically publishes annual corporate governance reports detailing its compliance with these standards, such as board composition, audit committee functions, risk management, and disclosure practices.\n\nFor the most accurate and up-to-date information on ASUS's compliance with Corporate Governance 3.0, you can review their latest corporate governance report or disclosures available on their official website or through Taiwan Stock Exchange filings. These documents provide detailed insights into how ASUS implements governance policies in lin

In [81]:
result = run_self_correcting_hyde_rag("Does ASUS comply with Taiwan's Corporate Governace 3.0 policy?")
print("📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE passage:\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])
    print(f"[Year]: {doc.metadata.get('year')}")


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.
📣 Final Answer:
 Based on the provided information, ASUS demonstrates compliance with Taiwan's Corporate Governance 3.0 policy as follows:

1. **Defined Corporate Governance Best-Practice Principles**: ASUS has clearly defined its corporate governance best-practice principles and makes these available in the "corporate governance" section of its investor relations website. This transparency aligns with the requirements of Corporate Governance 3.0, which emphasizes clear governance frameworks and disclosure.

2. **Execution of Rights and Stakeholder Relationships**: ASUS upholds integrity and maintains long-term cooperation with investors, suppliers, and other stakeholders. This reflects the policy’s focus on protecting stakeholder rights and fostering sustainable relationships.

3. **Director Development and Liability Insurance**: The company pursues ongoing study and development for its directors and has insured l

### What are ASUS’s carbon neutrality goals and timelines mentioned in recent reports?

In [None]:
without_rag("What are ASUS’s carbon neutrality goals and timelines mentioned in recent reports?")

'ASUS has set clear carbon neutrality goals as part of its sustainability initiatives. According to recent reports, ASUS aims to achieve **carbon neutrality by 2050**. The company is focusing on reducing greenhouse gas emissions across its operations and supply chain, increasing energy efficiency, and utilizing renewable energy sources. Additionally, ASUS is working on enhancing product sustainability through eco-friendly designs and materials to support its long-term environmental objectives.\n\nIf you need more detailed information or specific milestones ASUS has announced, please let me know!'

In [82]:
result = run_self_correcting_hyde_rag("What are ASUS’s carbon neutrality goals and timelines mentioned in recent reports?")

print("\n📣 Final Answer:\n", result["answer"])
print("\n🧠 HyDE Passage (Generated Query):\n", result["hyde_passage"])

print("\n📄 Retrieved Chunks:")
for i, doc in enumerate(result["retrieved_docs"]):
    print(f"\n--- Chunk {i+1} ---")
    print(doc.page_content[:500])  # show the first 500 words to avoid too long
    print(f"[Year]: {doc.metadata.get('year')}")

print("\n🧠 Critic Verdict:\n", result["verdict"])


🔁 Iteration 1:
🧠 Critic verdict: APPROVED
✅ Answer approved by Critic.

📣 Final Answer:
 ASUS’s carbon neutrality goals and timelines, as outlined in recent reports, are as follows:

1. **Operations:**
   - Reduce carbon emissions from global operations by **50% by 2030** (using 2020 as the base year).
   - Achieve **100% renewable energy use in Taiwan-based operations by 2030**.
   - Achieve **100% renewable energy use in global operations by 2035**.

2. **Supply Chain:**
   - Reduce the carbon emission intensity of key suppliers by **30% by 2025**.

3. **Products:**
   - Ensure that the energy efficiency of key products each year is at least **30% better than the Energy Star standard**.
   - Incorporate low-carbon processes, improve energy efficiency, and select environmentally friendly materials starting from the product design phase.
   - Achieve **carbon neutrality for products** by offsetting emissions with high-quality natural carbon credits.

4. **Overall Strategy:**
   - ASUS

# Evaluation


## Easy
When using a standalone GPT-4.1, the model generally responds cautiously to these factual queries—often stating that it does not have access to up-to-date or company-specific financial data, instead of hallucinating an answer. This behavior is preferred over guessing, but it also limits usability for tasks requiring grounded knowledge.

In contrast, the RAG-based system was able to generate accurate and context-supported answers for all three questions by retrieving relevant excerpts from the annual reports.

Notably, the first question about net profit in 2023 proved to be the most challenging. ASUS's 2024 annual report does not explicitly state the 2023 profit figure—instead, it only mentions that 2024’s net profit increased by 97% compared to the previous year. This requires the model to infer the 2023 value through basic reverse calculation.

During the first iteration, the critic LLM correctly flagged that the answer was missing a specific figure. In the second iteration, with that feedback incorporated into a revised HyDE prompt, the system successfully retrieved and computed the correct net profit for 2023.

This demonstrates the strength of combining LLM reasoning (via HyDE) with retrieved factual context, and shows how a critic loop helps guide the model toward grounded answers.

## Ambiguous
For more open-ended questions—such as those about ESG performance or strategy—standalone LLMs tend to give broad, generic overviews. These answers are not incorrect, but they often lack specificity or rely on assumptions. In some cases, the model may even suggest the user consult external sources for more details.

By contrast, the RAG-based system can retrieve specific, bullet-pointed summaries from the annual reports, resulting in more structured and fact-grounded answers. This highlights RAG’s advantage in handling vague but document-covered queries, by anchoring generation to actual corporate disclosures.

## Domain-Specific
The domain-specific questions focus on governance frameworks, policy compliance, and sustainability commitments—topics that are typically covered in specific sections of corporate reports, and unlikely to be included in LLM pretraining data.

Standalone LLMs tend to respond vaguely to these queries, offering general definitions or unrelated information. For instance, when asked about Taiwan’s Corporate Governance 3.0 policy, the LLM may explain what corporate governance is in general, but fail to connect it to ASUS or the actual regulatory context.

In contrast, the RAG-based system successfully retrieves and assembles relevant information from the reports—such as ASUS’s listed principles for board independence, statements of policy alignment, and detailed carbon neutrality goals and timelines—demonstrating a clear advantage in handling policy-anchored, document-grounded questions.

# Future Improvement

## 1. Replace InMemoryStore with a persistent backend
The current implementation stores original chunks in an InMemoryStore, which is volatile and disappears once the session ends. While this was sufficient for experimentation, replacing it with a persistent solution—such as a JSON/SQLite store, Redis, or a vector database that supports metadata retrieval—would improve robustness and scalability.
## 2. Refactor into a modular LangGraph workflow
The current RAG system follows a clear, sequential flow: HyDE → Retrieval → Generation → Critic → Loop. This structure makes it well-suited for LangGraph, which would allow the entire pipeline to be modularized into nodes and edges. This would make future extensions—such as adding a confidence scorer, evaluator, or user feedback step—much easier to integrate and maintain.
## 3. Expand the dataset across more years or data sources
The current knowledge base includes five years of ASUS annual reports (2020–2024). Including additional years, quarterly reports, or related press releases could increase retrieval coverage and improve answer quality for trend-related or multi-year analysis questions.



# Reference
Oshin, M. and Campos, N. (2025) Learning Langchain: Building AI and LLM applications with Langchain and Langgraph, O’Reilly. Sebastopol, CA: O’Reilly Media, Inc. Available at: https://learning.oreilly.com/library/view/learning-langchain/9781098167271/ (Accessed: 15 May 2025). 