# Gen-AI Workshop: Automatic Detection of Misplaced Business Logic in Java

This notebook demonstrates using RAG, Agents, and Workflows to automatically detect Clean Architecture violations in Java code.

**Focus:** Identify misplaced business logic (e.g., in controllers, repositories, entities) and explain violations.

**Tech Stack:** 
- Python, OpenAI GPT-4.1-nano
- sentence-transformers (embeddings)
- FAISS (vector search)
- LangChain (agents, workflows)

In [9]:
# installation of the dependencies
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 23.2.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


## Setup: Install Dependencies

**Before running the notebook, install required packages:**

```bash
pip install -r requirements.txt
```

**Installed packages:**
- `sentence-transformers` - Text embedding generation
- `faiss-cpu` - Fast similarity search
- `openai` - OpenAI API client
- `langchain` - Agent and workflow framework
- `langchain-openai` - OpenAI integration for LangChain
- `langchain-community` - Additional LangChain tools

**Note:** First execution will download the sentence transformer model (~90MB).

In [10]:
import re
import os
import numpy as np
from sentence_transformers import SentenceTransformer
import faiss
from openai import OpenAI
from langchain_openai import ChatOpenAI
from langchain.agents import Tool, initialize_agent, AgentType
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain, SequentialChain, TransformChain

  from .autonotebook import tqdm as notebook_tqdm


In [None]:
# Read the OpenAI API key from the api-key.txt file
try:
    with open('api-key.txt', 'r') as f:
        OPENAI_API_KEY = f.read().strip()
    print("API key loaded from the api-key.txt file.")
except FileNotFoundError:
    raise FileNotFoundError(
        "Error: 'api-key.txt' not found.\n"
        "Please create a file named 'api-key.txt' in the project root directory "
        "containing the OpenAI API key provided and re-run this cell."
    )

API key loaded from file.


In [12]:
# Load Clean Architecture knowledge base from knowledge-base directory
def load_text_file(filepath):
    """Load text file and return its content."""
    try:
        with open(filepath, 'r', encoding='utf-8') as f:
            content = f.read()
        return content.replace('\u200b', '').replace('\ufeff', '')
    except FileNotFoundError:
        raise FileNotFoundError(f"Error: File not found: {filepath}")

# Load all knowledge base markdown files
kb_files = [
    'knowledge-base/01-layering-principles.md',
    'knowledge-base/02-controller-layer.md',
    'knowledge-base/03-service-layer.md',
    'knowledge-base/04-repository-layer.md',
    'knowledge-base/05-entity-layer.md',
    'knowledge-base/06-anti-patterns-overview.md'
]

# Combine all knowledge base files into single corpus
KB_MARKDOWN = ""
for kb_file in kb_files:
    content = load_text_file(kb_file)
    KB_MARKDOWN += f"\n\n# Source: {kb_file}\n\n{content}"

print("Knowledge base loaded from:")
for kb_file in kb_files:
    print(f"  - {kb_file}")
print(f"\nTotal knowledge base size: {len(KB_MARKDOWN)} characters")

Knowledge base loaded from:
  - knowledge-base/01-layering-principles.md
  - knowledge-base/02-controller-layer.md
  - knowledge-base/03-service-layer.md
  - knowledge-base/04-repository-layer.md
  - knowledge-base/05-entity-layer.md
  - knowledge-base/06-anti-patterns-overview.md

Total knowledge base size: 55076 characters


In [13]:
# Load leaky code samples from dummy-project directory
LEAKY_SAMPLES = {
    "application": load_text_file('dummy-project/LeakyDemoApplication.java'),
    "order_entity": load_text_file('dummy-project/Order.java'),
    "order_controller": load_text_file('dummy-project/OrderController.java'),
    "order_repository": load_text_file('dummy-project/OrderRepository.java')
}

print("Leaky code samples loaded from dummy-project:")
for key in LEAKY_SAMPLES.keys():
    print(f"  - {key}")

print("\nNote: These are intentionally leaky examples for violation detection practice.")

Leaky code samples loaded from dummy-project:
  - application
  - order_entity
  - order_controller
  - order_repository

Note: These are intentionally leaky examples for violation detection practice.


In [14]:
# Initialize RAG components: Sentence transformer and FAISS index
print("Initializing RAG components...")

# Load embedding model (downloads on first run)
model = SentenceTransformer('all-MiniLM-L6-v2')
print("Sentence transformer model loaded (all-MiniLM-L6-v2)")

# Split knowledge base into chunks (by double newlines = paragraphs)
chunks = re.split(r'\n\s*\n', KB_MARKDOWN.strip())
print(f"Knowledge base split into {len(chunks)} chunks")

# Generate embeddings for all chunks
embeddings = model.encode(chunks)
print(f"Generated embeddings with dimension {embeddings.shape[1]}")

# Create FAISS index for similarity search
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))
print(f"FAISS index created with {index.ntotal} vectors")

print("\nRAG setup complete. Ready for semantic retrieval.")

Initializing RAG components...
Sentence transformer model loaded (all-MiniLM-L6-v2)
Knowledge base split into 404 chunks
Generated embeddings with dimension 384
FAISS index created with 404 vectors

RAG setup complete. Ready for semantic retrieval.


In [15]:
def retrieve_relevant_rules(query, top_k=3):
    """
    Core retrieval function: Embed query, fetch top-k relevant chunks from knowledge base.
    
    Args:
        query (str): Input query (typically Java code or architectural question)
        top_k (int): Number of relevant chunks to retrieve (default: 3)
        
    Returns:
        str: Concatenated relevant rule chunks from knowledge base
    """
    query_embedding = model.encode([query])
    _, indices = index.search(np.array(query_embedding), top_k)
    relevant = "\n\n".join([chunks[i] for i in indices[0]])
    return relevant.replace('\u200b', '').replace('\ufeff', '')

# Test retrieval with sample query
test_query = "business logic in repository layer"
test_result = retrieve_relevant_rules(test_query)
print(f"Test retrieval for query: '{test_query}'")
print(f"Retrieved {len(test_result)} characters from knowledge base\n")
print("Sample output (first 400 chars):")
print(test_result[:400], "...\n")
print("Retrieval function working correctly")

Test retrieval for query: 'business logic in repository layer'
Retrieved 352 characters from knowledge base

Sample output (first 400 chars):
**Problems:**
- Business rules (eligibility, discount) in repository
- Data transformations based on business logic
- Filtering based on business conditions
- Repository knows too much about business domain

## Violation: Business Logic in Repository

Repositories can use database-level operations for performance, but must not include business logic. ...

Retrieval function working correctly


---

# Section 1: RAG (Retrieval-Augmented Generation)

**Goal:** Build a RAG pipeline to retrieve relevant architecture rules and use an LLM to detect violations.

**Why RAG?**
- Augments LLM with domain-specific Clean Architecture knowledge
- Ensures analysis references concrete rules and patterns
- Improves accuracy by grounding responses in retrieved context

**Workflow:**
1. **Retrieve:** Semantic search for relevant rules based on code
2. **Augment:** Inject retrieved rules into LLM prompt
3. **Generate:** LLM analyzes code against rules, detects violations

**Hands-on:**
- Analyze leaky code samples and observe violations detected
- Experiment with different code snippets

In [16]:
# Select sample for RAG analysis
# Options: "order_controller", "order_repository", "order_entity", "application"
sample_name = "order_controller"
java_code = LEAKY_SAMPLES[sample_name]

print(f"Analyzing: {sample_name} (leaky code from dummy-project)")
print("=" * 70)
print("Code snippet (first 600 chars):")
print(java_code[:600], "...\n")

# Retrieve relevant architecture rules using semantic search
relevant_rules = retrieve_relevant_rules(java_code)
print("\nRetrieved relevant architecture rules:")
print("=" * 70)
print(relevant_rules[:700], "...\n")
print(f"Total retrieved content: {len(relevant_rules)} characters")

Analyzing: order_controller (leaky code from dummy-project)
Code snippet (first 600 chars):
package com.example.leakydemo;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import java.util.List;

@RestController
public class OrderController {

    @Autowired
    private OrderRepository orderRepository;

    // Business logic leakage: Controller handling business rules like approval checks
    @GetMapping("/orders/eligible")
    public List<Order> getEligibleOrders() {
        List<Order> eligibleOrders = orderRepository.findEligibleForDiscount();
   ...


Retrieved relevant architecture rules:
    @GetMapping("/orders/eligible")
    public List<Order> getEligibleOrders() {
        List<Order> orders = orderRepository.findAll();

    @GetMapping("/orders/eligible")
    public List<Order> getEligibleOrders() {
        return orderService.getEligibleOrde

In [17]:
# Augment LLM with retrieved rules for violation analysis
client = OpenAI(api_key=OPENAI_API_KEY)

# Note: gpt-4.1-nano is a real model - do not change
response = client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[
        {
            "role": "system", 
            "content": (
                "You are a Java architecture expert specializing in Clean Architecture. "
                "Analyze code using the provided architecture rules to detect misplaced business logic violations."
            )
        },
        {
            "role": "user", 
            "content": (
                f"Java Code to Analyze:\n{java_code}\n\n"
                f"Relevant Architecture Rules:\n{relevant_rules}\n\n"
                f"Task: Identify all Clean Architecture violations in this code.\n\n"
                f"For each violation, provide:\n"
                f"1. Exact location (class, method, line number if visible)\n"
                f"2. Type of violation (e.g., 'Business logic in controller')\n"
                f"3. Why it violates Clean Architecture principles\n"
                f"4. Impact on maintainability and testability\n"
                f"5. How to fix it (move to which layer)\n\n"
                f"Reference specific rules from the provided architecture rules."
            )
        }
    ]
)

print("RAG-Enhanced Analysis:")
print("=" * 70)
print(response.choices[0].message.content)

RAG-Enhanced Analysis:
Analyzing the provided code against the referenced Clean Architecture rules reveals multiple violations related to separation of concerns, specifically regarding the placement of business logic and validation.

---

### Overall Observations:
- Business logic and decision-making are embedded directly within the Controller layer.
- Validation checks that are part of business rules are performed inside controller methods.
- The Controller directly interacts with repositories and performs business decisions, leading to tight coupling and reduced testability.

---

### Violation 1

**Location:**  
`OrderController` (Line 13, method `getEligibleOrders()`)

**Type of Violation:**  
**Business logic in controller**

**Why it violates principles:**  
According to Clean Architecture principles, Controllers should be thin and delegate all business logic to the domain or application layer. Here, the controller not only retrieves data but also applies business rules (e.g., ap

**Exercise:**

1. **Try Different Samples:**
   ```python
   sample_name = "order_repository"  # or "order_entity"
   ```
   Re-run the previous cells to analyze different violation types.

2. **Adjust Retrieval:**
   - Modify `top_k` parameter in `retrieve_relevant_rules()` (try 5 or 10)
   - Does more context improve analysis quality or introduce noise?

3. **Custom Code Analysis:**
   ```python
   java_code = """
   // Paste your own Java code here
   """
   relevant_rules = retrieve_relevant_rules(java_code)
   # Then run LLM analysis
   ```

---

# Section 2: Agents (ReAct Framework)

**Goal:** Create an autonomous agent that reasons about when to retrieve rules and how to analyze code step-by-step.

**Why Agents?**
- **Autonomy:** Agent decides if/when to use the retrieval tool
- **Reasoning:** Breaks down complex analysis into logical steps
- **Flexibility:** Handles multi-file or contextual analysis

**ReAct Pattern:** 
- **Reason (Thought):** Agent thinks about what to do next
- **Act (Action):** Agent uses a tool (e.g., RetrieveArchitectureRules)
- **Observe (Observation):** Agent sees tool output
- **Repeat:** Continue until reaching final answer

**Builds on RAG:** Wraps `retrieve_relevant_rules` as a tool the agent can call autonomously.

**Hands-on:**
- Observe agent's reasoning process (`verbose=True` shows thoughts)
- See how it decides to use the retrieval tool
- Experiment with different prompts

In [18]:
# Wrap retrieval function as an agent tool
tools = [
    Tool(
        name="RetrieveArchitectureRules",
        func=retrieve_relevant_rules,
        description=(
            "Retrieve Clean Architecture rules, anti-patterns, and violation examples "
            "for analyzing Java code. Input should be Java code or a description of "
            "the architectural concern. Returns relevant rules from the knowledge base."
        )
    )
]

print("Agent tools defined:")
for tool in tools:
    print(f"  - Tool: {tool.name}")
    print(f"    Description: {tool.description[:100]}...")

Agent tools defined:
  - Tool: RetrieveArchitectureRules
    Description: Retrieve Clean Architecture rules, anti-patterns, and violation examples for analyzing Java code. In...


In [19]:
# Initialize ReAct agent with tools
llm = ChatOpenAI(model="gpt-4.1-nano", api_key=OPENAI_API_KEY)

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,  # Shows reasoning process (Thought/Action/Observation)
    handle_parsing_errors=True  # Retries if LLM response format is incorrect
)

print("Agent initialized successfully")
print("  - Agent type: ZERO_SHOT_REACT_DESCRIPTION")
print("  - Verbose mode: ON (reasoning will be visible)")
print("  - Error handling: Enabled")
print("\nAgent will now reason step-by-step using the ReAct pattern.")

Agent initialized successfully
  - Agent type: ZERO_SHOT_REACT_DESCRIPTION
  - Verbose mode: ON (reasoning will be visible)
  - Error handling: Enabled

Agent will now reason step-by-step using the ReAct pattern.


  agent = initialize_agent(


In [20]:
# Select sample for agent analysis
agent_sample_name = "order_repository"
agent_code = LEAKY_SAMPLES[agent_sample_name]

# Craft prompt to encourage tool use and step-by-step reasoning
agent_prompt = (
    f"Analyze the following Java repository interface for Clean Architecture violations. "
    f"First, use the RetrieveArchitectureRules tool to get relevant rules about repositories. "
    f"Then, identify all violations step-by-step.\n\n"
    f"Java Code:\n{agent_code}"
)

print(f"Running agent analysis on: {agent_sample_name}")
print("=" * 70)
print("Watch the agent's reasoning process below:\n")

# Agent will show: Thought → Action → Observation → ... → Final Answer
result = agent.run(agent_prompt)

print("\n" + "=" * 70)
print("Agent's Final Analysis:")
print("=" * 70)
print(result)

  result = agent.run(agent_prompt)


Running agent analysis on: order_repository
Watch the agent's reasoning process below:



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: RetrieveArchitectureRules

Action Input: Java code of the OrderRepository interface, focusing on repository design and separation principles
[0m
Observation: [36;1m[1;3m// Repository: Data access
@Repository
public interface OrderRepository extends JpaRepository<Order, Long> {
}
```

```java
// GOOD: Repository with pure data access
@Repository
public interface OrderRepository extends JpaRepository<Order, Long> {
    List<Order> findAll();
    List<Order> findByTotalGreaterThan(double threshold);
}

```java
// BAD: Repository with business logic
@Repository
public interface OrderRepository extends JpaRepository<Order, Long> {[0m
Thought:[32;1m[1;3mQuestion: Analyze the following Java repository interface for Clean Architecture violations. First, use the RetrieveArchitectureRules tool to get relevant rules about repositories. 

In [21]:
# Demonstrate agent handling multiple file context
print("Agent Analysis: Multiple Files from dummy-project")
print("=" * 70)

multi_file_prompt = (
    f"I have a Spring Boot application with potential architecture violations. "
    f"Analyze these three files and identify which layers are violating Clean Architecture:\n\n"
    f"1. Order Controller:\n{LEAKY_SAMPLES['order_controller']}\n\n"
    f"2. Order Repository:\n{LEAKY_SAMPLES['order_repository']}\n\n"
    f"3. Order Entity:\n{LEAKY_SAMPLES['order_entity']}\n\n"
    f"For each file, identify violations and explain their impact on maintainability."
)

print("Agent will analyze all three files...\n")
multi_result = agent.run(multi_file_prompt)

print("\n" + "=" * 70)
print("Multi-File Analysis Result:")
print("=" * 70)
print(multi_result)

Agent Analysis: Multiple Files from dummy-project
Agent will analyze all three files...



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To accurately identify which layers are violating Clean Architecture principles, I need to analyze the responsibilities and dependencies within each file. 

- **Order Controller**: Contains business logic like approving high-value orders and modifying order totals directly, which should reside in domain or service layers, not in presentation controllers. This is a violation as controllers should delegate business logic to services.

- **Order Repository**: Contains business rules such as filtering orders and mutating entities within the data access layer. These operations should be in service or domain layers, not in the data access layer.

- **Order Entity**: Contains business logic (calculating discounts) within getter methods, which blurs the domain model responsibilities and introduces logic into what should be a simple data 

**Exercise:**

1. **Add Custom Tool:**
   ```python
   def suggest_fix(code_description):
       return "Move business logic to service layer. Create OrderService class."
   
   tools.append(Tool(
       name="SuggestFix",
       func=suggest_fix,
       description="Suggest refactoring approach for violations"
   ))
   # Re-initialize agent with new tools
   ```

2. **Different Prompts:**
   - "Prioritize violations by severity (critical, major, minor)"
   - "Find only controller violations, ignore other layers"
   - "Explain which violations would fail a code review"

3. **Test Agent Limits:**
   - Provide non-Java code (Python, JavaScript) - what happens?
   - Ask simple questions ("What is Clean Architecture?") - does agent still call tools?
   - Very long code snippets - does reasoning quality degrade?

4. **Observe Reasoning:**
   - Count how many times agent calls RetrieveArchitectureRules
   - Does it retrieve rules for each file separately in multi-file analysis?
   - When does agent decide it has enough information?

---

# Section 3: Workflows (Deterministic Pipelines)

**Goal:** Orchestrate a fixed, predictable sequence of steps: Retrieve → Analyze → Output.

**Why Workflows?**
- **Deterministic:** Same input always produces same sequence of operations
- **Production-ready:** Suitable for CI/CD integration
- **Debuggable:** Easy to trace execution with verbose logging
- **Consistent:** Every code sample analyzed the same way

**Workflow Steps:**
1. **TransformChain (Retrieval):** Fetch relevant rules based on input code
2. **LLMChain (Analysis):** Analyze code with retrieved rules, generate violation report

**Builds on RAG:** Uses retrieval from Section 1, chains it with LLM analysis.

**vs Agents:** Workflows lack autonomy but guarantee predictable execution.

**Hands-on:**
- Run workflow on different samples
- Observe verbose logging showing each step
- Compare deterministic workflow vs agent flexibility

In [22]:
# Initialize LLM for workflow chains
llm = ChatOpenAI(model="gpt-4.1-nano", api_key=OPENAI_API_KEY)
print("LLM initialized for workflow chains")

LLM initialized for workflow chains


In [23]:
# Chain 1: Retrieval step - wraps retrieval function as TransformChain
def transform_retrieval(inputs):
    """
    Transform function for retrieval chain.
    Takes code as input, retrieves relevant rules from knowledge base.
    Note: Returns only 'rules' to avoid key duplication in SequentialChain.
    """
    code = inputs["code"]
    rules = retrieve_relevant_rules(code)
    return {"rules": rules}

retrieval_chain = TransformChain(
    input_variables=["code"],
    output_variables=["rules"],
    transform=transform_retrieval
)

print("Retrieval chain created (TransformChain)")
print("  - Input: code")
print("  - Output: rules")
print("  - Function: retrieve_relevant_rules() via transform")

Retrieval chain created (TransformChain)
  - Input: code
  - Output: rules
  - Function: retrieve_relevant_rules() via transform


In [24]:
# Chain 2: Analysis step - LLM analyzes code with retrieved rules
analysis_prompt = PromptTemplate.from_template(
    "You are a Java architecture expert analyzing code for Clean Architecture violations.\n\n"
    "Java Code:\n{code}\n\n"
    "Relevant Architecture Rules:\n{rules}\n\n"
    "Task:\n"
    "1. List all violations with exact locations (class, method, line)\n"
    "2. Explain why each violates Clean Architecture principles\n"
    "3. Cite specific rules from the provided architecture rules\n"
    "4. Describe impact on maintainability, testability, and scalability\n"
    "5. Provide refactoring recommendations (which layer should contain the logic)\n\n"
    "Format your analysis clearly with numbered sections for each violation."
)

analysis_chain = LLMChain(
    llm=llm,
    prompt=analysis_prompt,
    output_key="analysis"
)

print("Analysis chain created (LLMChain)")
print("  - Input: code, rules")
print("  - Output: analysis")
print("  - Prompt: Structured violation analysis with citations")

Analysis chain created (LLMChain)
  - Input: code, rules
  - Output: analysis
  - Prompt: Structured violation analysis with citations


  analysis_chain = LLMChain(


In [25]:
# Compose full workflow: Retrieval → Analysis
workflow = SequentialChain(
    chains=[retrieval_chain, analysis_chain],
    input_variables=["code"],
    output_variables=["analysis"],
    verbose=True  # Logs each chain execution
)

print("Workflow created with SequentialChain")
print("  Step 1: TransformChain (retrieval)")
print("  Step 2: LLMChain (analysis)")
print("  - Verbose mode: ON (execution logs will be shown)")
print("\nWorkflow ready for execution")

Workflow created with SequentialChain
  Step 1: TransformChain (retrieval)
  Step 2: LLMChain (analysis)
  - Verbose mode: ON (execution logs will be shown)

Workflow ready for execution


In [26]:
# Execute workflow on leaky controller
workflow_sample_name = "order_controller"
workflow_code = LEAKY_SAMPLES[workflow_sample_name]

print(f"Executing workflow on: {workflow_sample_name} (leaky code)")
print("=" * 70)

result = workflow({"code": workflow_code})

print("\n" + "=" * 70)
print("Workflow Output:")
print("=" * 70)
print(result["analysis"])

Executing workflow on: order_controller (leaky code)


[1m> Entering new SequentialChain chain...[0m


  result = workflow({"code": workflow_code})



[1m> Finished chain.[0m

Workflow Output:
### Analysis of Violations of Clean Architecture Principles

---

### 1. List of Violations with Exact Locations

| Violation # | Class & Method                                              | Line(s) | Description of Violation                                                      |
|--------------|--------------------------------------------------------------|---------|-------------------------------------------------------------------------------|
| 1            | `OrderController.getEligibleOrders()`                        | 12-20   | Controller performs business logic (e.g., applying discounts and approval logic). |
| 2            | `OrderController.getEligibleOrders()`                        | 14-17   | Business decision (e.g., high-value approval) is embedded directly in the controller. |
| 3            | `PostMapping("/orders") createOrder()`                        | 25      | Business validation (`order.getTotal() < 0`) conducted in Co

In [27]:
# Run workflow on all leaky samples for comprehensive analysis
print("Batch Workflow Execution: All Leaky Samples from dummy-project")
print("=" * 70)

batch_results = {}

for sample_name, code in LEAKY_SAMPLES.items():
    if sample_name == "application":
        # Skip application main class (no violations expected)
        continue
        
    print(f"\nAnalyzing: {sample_name}")
    print("-" * 70)
    
    try:
        result = workflow({"code": code})
        batch_results[sample_name] = result["analysis"]
        print(f"Analysis complete for {sample_name}")
        print("Summary (first 400 chars):")
        print(result["analysis"][:400], "...\n")
    except Exception as e:
        print(f"Error analyzing {sample_name}: {str(e)}")
        batch_results[sample_name] = f"Error: {str(e)}"

print("\n" + "=" * 70)
print("Batch Execution Complete")
print(f"Successfully analyzed {len(batch_results)} files")
print("\nAll results stored in batch_results dictionary")

Batch Workflow Execution: All Leaky Samples from dummy-project

Analyzing: order_entity
----------------------------------------------------------------------


[1m> Entering new SequentialChain chain...[0m

[1m> Finished chain.[0m
Analysis complete for order_entity
Summary (first 400 chars):
**Analysis of Clean Architecture Violations in the Provided Java Code**

---

### 1. List of Violations with Exact Locations

| Violation No. | Location (Class, Method, Line) | Description |
|---|---|---|
| **1.** | `com.example.leakydemo.Order`, method `getDiscountedTotal()`, **line 14** | Business logic (discount calculation) embedded in the Entity getter |

---

### 2. Explanation of Why Each V ...


Analyzing: order_controller
----------------------------------------------------------------------


[1m> Entering new SequentialChain chain...[0m

[1m> Finished chain.[0m
Analysis complete for order_controller
Summary (first 400 chars):
Analysis of Violations of Clean Architecture Principl

**Exercise:**

1. **Extend Workflow with Summarization:**
   ```python
   summary_prompt = PromptTemplate.from_template(
       "Violations: {analysis}\n\n"
       "Provide executive summary: violation count, severity, priority fixes."
   )
   summary_chain = LLMChain(llm=llm, prompt=summary_prompt, output_key="summary")
   
   extended_workflow = SequentialChain(
       chains=[retrieval_chain, analysis_chain, summary_chain],
       input_variables=["code"],
       output_variables=["summary"],
       verbose=True
   )
   ```

2. **Custom Workflow for God Class Detection:**
   - Chain 1: Extract all methods and their responsibilities
   - Chain 2: Identify if multiple layers are mixed in one class
   - Chain 3: Generate refactoring plan (split into separate classes)

3. **Production Integration Ideas:**
   - How to integrate into GitHub Actions?
   - What error handling needed for CI/CD?
   - Rate limiting strategy for OpenAI API?
   - Caching strategy for unchanged files?

4. **Performance Analysis:**
   - Profile execution time per chain
   - Identify bottleneck (retrieval vs LLM call)
   - Optimization: Batch embeddings, cache results?

---

# Section Comparison: RAG vs Agents vs Workflows

| Aspect | RAG (Section 1) | Agents (Section 2) | Workflows (Section 3) |
|--------|-----------------|--------------------|-----------------------|
| **Autonomy** | None - manual orchestration | High - decides tool usage | Low - fixed sequence |
| **Determinism** | High - same input = same output | Low - reasoning varies | High - predictable steps |
| **Use Case** | Quick analysis, experimentation | Complex reasoning, multi-file | Production pipelines, CI/CD |
| **Debugging** | Easy - direct function calls | Moderate - trace reasoning | Easy - verbose logs per step |
| **Latency** | Low - single LLM call | High - multiple reasoning rounds | Medium - sequential execution |
| **Flexibility** | Low - requires manual setup | High - adapts to context | Low - predefined flow |
| **Best For** | Interactive exploration | Research, complex scenarios | Automated reviews, batch jobs |

**When to use each approach:**

- **RAG (Section 1):** 
  - Quick violation checks during development
  - Educational purposes (understanding violations)
  - Interactive code review assistance

- **Agents (Section 2):**
  - Complex codebases with multiple files
  - Research and deep analysis requiring reasoning
  - Exploratory analysis with uncertain scope

- **Workflows (Section 3):**
  - CI/CD integration for automated checks
  - Batch processing of many files
  - Production deployments requiring consistency
  - Regular reporting (e.g., weekly architecture reports)

---

# Wrap-Up: Next Steps

## Integration Strategies

### 1. CI/CD Integration (GitHub Actions)
```yaml
name: Architecture Check
on: [pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Run Architecture Analysis
        run: |
          python workflow_runner.py --files changed_java_files.txt
          if [ $? -ne 0 ]; then exit 1; fi
```

### 2. Pre-commit Hooks
- Run workflow on staged Java files
- Block commits with critical violations
- Generate fix suggestions automatically

### 3. IDE Integration
- Real-time violation detection as you type
- Inline suggestions and quick fixes
- Integration with existing tools (SonarQube, Checkstyle)

## Scaling Considerations

### Knowledge Base Enhancement
- **Custom Rules:** Add project-specific patterns to knowledge base
- **Domain Examples:** Include examples from your actual codebase
- **Team Patterns:** Document common violations from your team's history

### Performance Optimization
- **Caching:** Cache embeddings for unchanged files
- **Batch Processing:** Analyze multiple files in parallel
- **Incremental:** Only analyze changed files in PRs

### Model Fine-tuning
- Fine-tune embedding model on your codebase
- Collect violation examples from your project
- Build project-specific knowledge base

## Discussion Points

1. **Current Pain Points:**
   - Which violations are most common in your codebase?
   - Where does business logic leak most often?
   - Main refactoring challenges?

2. **Adoption Strategy:**
   - Start with workflows for automated checks
   - Use RAG for education and training
   - Deploy agents for complex legacy code

3. **Metrics:**
   - Track violation frequency over time
   - Measure time saved in code reviews
   - Monitor architecture debt reduction

## Resources

- **LangChain Documentation:** https://python.langchain.com/
- **Clean Architecture (Book):** Robert C. Martin
- **FAISS:** https://github.com/facebookresearch/faiss
- **OpenAI API:** https://platform.openai.com/docs
- **Spring Best Practices:** https://docs.spring.io/spring-framework/reference/

---

**Thank you for participating!**

Questions? Feedback? Discuss with your team and adapt these patterns for your specific context.