# Day 2 - Lab 2: Documenting Key Decisions with ADRs

**Objective:** Use an LLM as a research assistant to compare technical options and synthesize the findings into a formal, version-controlled Architectural Decision Record (ADR).

**Estimated Time:** 60 minutes

**Introduction:**
Great architectural decisions are based on research and trade-offs. A critical practice for healthy, long-lived projects is documenting *why* these decisions were made. In this lab, you will use an LLM to research a key technical choice for our application and then generate a formal ADR to record that decision for the future.

For definitions of key terms used in this lab, please refer to the [GLOSSARY.md](../../GLOSSARY.md).

## Step 1: Setup

We'll start by ensuring our environment is ready and adding the standard pathing solution to reliably import our `utils.py` helper.

**Model Selection:**
For research and synthesis tasks, models with large context windows and strong reasoning abilities are ideal. `gpt-4.1`, `gemini-2.5-pro`, or `meta-llama/Llama-3.3-70B-Instruct` would be excellent choices.

**Helper Functions Used:**
- `setup_llm_client()`: To configure the API client.
- `get_completion()`: To send prompts to the LLM.
- `load_artifact()`: To read the ADR template.
- `save_artifact()`: To save the generated ADR template and the final ADR.

In [1]:
import sys
import os

# Add the project's root directory to the Python path to ensure 'utils' can be imported.
try:
    project_root = os.path.abspath(os.path.join(os.getcwd(), '..', '..'))
except IndexError:
    project_root = os.path.abspath(os.path.join(os.getcwd()))

if project_root not in sys.path:
    sys.path.insert(0, project_root)

from utils import setup_llm_client, get_completion, save_artifact, load_artifact

client, model_name, api_provider = setup_llm_client(model_name="gemini-2.5-pro")

2025-09-23 14:28:05,304 ag_aisoftdev.utils INFO LLM Client configured provider=google model=gemini-2.5-pro latency_ms=None artifacts_path=None


## Step 2: The Challenges

### Challenge 1 (Foundational): The ADR Template

**Task:** A good ADR follows a consistent format. Your first task is to prompt an LLM to generate a clean, reusable ADR template in markdown.

**Instructions:**
1.  Write a prompt that asks the LLM to generate a markdown template for an Architectural Decision Record.
2.  The template should include sections for: `Title`, `Status` (e.g., Proposed, Accepted, Deprecated), `Context` (the problem or forces at play), `Decision` (the chosen solution), and `Consequences` (the positive and negative results of the decision).
3.  Save the generated template to `templates/adr_template.md`.

In [None]:
# TODO: Write a prompt to generate a markdown ADR template.
adr_template_prompt = """

Please generate a markdown template for an Architecture Decision Record (ADR). The template should include the following sections:

1. Title
2. Status (e.g., Proposed, Accepted, Deprecated)
3. Context (the problem or focus at play)
4. Decision (the chosen solution)
5. Consequences (the positive and negative results of the decision)

The template should be in markdown format and include placeholders for each section.

"""

print("--- Generating ADR Template ---")
adr_template_content = get_completion(adr_template_prompt, client, model_name, api_provider)

# Extract only the markdown template
import re
match = re.search(r"```(?:markdown)?\s*([\s\S]*?)```", adr_template_content)
if match:
    adr_template_content = match.group(1).strip()
    print(adr_template_content)  # Print only the template
else:
    print("No markdown template found in LLM response.")

# Save the artifact
if adr_template_content:
    save_artifact(adr_template_content, "templates/adr_template.md", overwrite=True)

--- Generating ADR Template ---
# ADR-XXX: [Short Title of the Decision]

**Status:** [Proposed | Accepted | Rejected | Deprecated | Superseded]

**Date:** YYYY-MM-DD

## Context

*This section describes the "why" of the decision. It outlines the problem, the driving forces, the constraints, and the overall context.*

*   **Problem:** What is the issue, problem, or opportunity that this decision addresses?
*   **Drivers:** What are the key technical, business, or user requirements motivating this change? (e.g., improve performance, reduce costs, support a new feature).
*   **Constraints:** What are the limitations or constraints we must work within? (e.g., budget, timeline, existing technology stack, team skills).

## Decision

*This section clearly and concisely states the chosen solution. It is the "what" of the decision.*

We will [describe the chosen solution here].

For example:
*   "We will adopt PostgreSQL as the primary relational database for the new microservice."
*   "We wil

### Challenge 2 (Intermediate): AI-Assisted Research

**Task:** Use the LLM to perform unbiased research on a key technical decision for our project: choosing a database for semantic search.

**Instructions:**
1.  Write a prompt instructing the LLM to perform a technical comparison.
2.  Ask it to compare and contrast two technical options: **"Using PostgreSQL with the `pgvector` extension"** versus **"Using a specialized vector database like ChromaDB or FAISS"**.
3.  The prompt should ask for a balanced view for the specific use case of our new hire onboarding tool.
4.  Store the output in a variable for the next step.

> **Tip:** To get a balanced comparison, explicitly ask the LLM to 'act as an unbiased research assistant' and to list the 'pros and cons for each approach.' This prevents the model from simply recommending the more popular option and encourages a more critical analysis.

In [14]:
# TODO: Write a prompt to research database options.
db_research_prompt = """
Act as an unbiased research assistant. For the new hire onboarding tool, compare and contrast the following two technical options for semantic search:

1. Using PostgreSQL with the pgvector extension
2. Using a specialized vector database like ChromaDB or FAISS

For each option, list the pros and cons, and provide a balanced analysis for our use case. Consider factors such as scalability, ease of integration, cost, performance, operational complexity, and long-term maintainability. Avoid recommending one option outright; instead, present a critical comparison to help the team make an informed decision.
"""

print("--- Researching Database Options ---")
db_research_output = get_completion(db_research_prompt, client, model_name, api_provider)

import re
match = re.search(r"```(?:markdown)?\s*([\s\S]*?)```", db_research_output)
if match:
    db_research_output = match.group(1).strip()
    print(db_research_output)
else:
    # Try to find the first major section (e.g., after '---')
    split_sections = re.split(r"\n---+\n", db_research_output)
    if len(split_sections) > 1:
        db_research_output = split_sections[1].strip()
        print(db_research_output)
    else:
        print(db_research_output)

--- Researching Database Options ---
### Option 1: PostgreSQL with the `pgvector` Extension

This approach involves adding the `pgvector` extension to an existing or new PostgreSQL database. This allows you to store vector embeddings as a native data type directly alongside your other relational data (e.g., document text, author, creation date, access permissions).

#### **Pros:**

*   **Unified Data Management:** Your vectors and their associated metadata live in the same database. This eliminates data synchronization issues between two separate systems. Deleting a document and its vector can be done in a single, atomic transaction (ACID compliance).
*   **Powerful Hybrid Queries:** This is a key advantage. You can combine vector similarity search with standard SQL `WHERE` clauses in a single query. For an onboarding tool, this is highly practical (e.g., "Find documents similar to 'how to set up VPN' that were created in the last year and are relevant to the 'Engineering' department")

### Challenge 3 (Advanced): Synthesizing the ADR

**Task:** Provide the LLM with your research from the previous step and have it formally document the decision.

**Instructions:**
1.  Load the `adr_template.md` you created in the first challenge.
2.  Create a new prompt instructing the LLM to act as a Staff Engineer.
3.  Provide the `db_research_output` as context.
4.  Instruct the LLM to populate the ADR template, formally documenting the decision to **use PostgreSQL with pgvector** and justifying the choice based on the synthesized pros and cons.
5.  Save the final, completed ADR as `artifacts/adr_001_database_choice.md`.

In [16]:
adr_template = load_artifact("templates/adr_template.md")

# Synthesis prompt for the final ADR
synthesis_prompt = f"""
Act as a Staff Engineer. Your task is to synthesize a formal Architectural Decision Record (ADR) for our new hire onboarding tool.

You are provided with:
- An ADR template in markdown format:
{adr_template}
- Research comparing two database options for semantic search:
{db_research_output}

Using the template, formally document the decision to use PostgreSQL with the pgvector extension for semantic search. Populate each section of the ADR template, justifying the choice based on the synthesized pros and cons from the research. Ensure the ADR is clear, professional, and suitable for version-controlled documentation.
"""

print("--- Synthesizing Final ADR ---")
if adr_template and 'db_research_output' in locals() and db_research_output:
    import re
    match = re.search(r"```(?:markdown)?\s*([\s\S]*?)```", db_research_output)
    if match:
        db_research_output = match.group(1).strip()
    final_adr = get_completion(synthesis_prompt, client, model_name, api_provider)
    print(final_adr)
    save_artifact(final_adr, "artifacts/adr_001_database_choice.md", overwrite=True)
else:
    print("Skipping ADR synthesis because template or research is missing.")

--- Synthesizing Final ADR ---
Of course. Here is the formal Architectural Decision Record, synthesized from the provided information and written from the perspective of a Staff Engineer.

***

# ADR-001: Database for Semantic Search in the Onboarding Tool

**Status:** Accepted

**Date:** 2023-10-27

## Context

This section describes the "why" of the decision. It outlines the problem, the driving forces, the constraints, and the overall context.

*   **Problem:** The new hire onboarding tool requires a robust search functionality that goes beyond simple keyword matching. New hires need to be able to ask natural language questions (e.g., "how do I set up my development environment?") and receive contextually relevant documents. This necessitates a semantic search capability, which relies on storing and querying vector embeddings generated from our documentation. We must choose a database technology that can efficiently handle these vectors alongside traditional metadata.
*   **Drivers:

## Lab Conclusion

Well done! You have used an LLM to automate a complex but critical part of the architectural process. You leveraged its vast knowledge base for research and then used it again for synthesis, turning raw analysis into a formal, structured document. This `adr_001_database_choice.md` file now serves as a permanent, valuable record for anyone who works on this project in the future.

> **Key Takeaway:** The pattern of **Research -> Synthesize -> Format** is a powerful workflow. You can use an LLM to gather unstructured information and then use it again to pour that information into a structured template, creating high-quality, consistent documentation with minimal effort.