**FE BE DB**

In [2]:
import requests
import json
import re

# Ollama API base
OLLAMA_URL = "http://localhost:11434/api/generate"

# Models to test (small, code-focused, quantized for 16GB RAM)
models = [
    "deepseek-coder:1.3b",
    "deepseek-coder:6.7b",        # ~1GB, code-focused
    "starcoder2:3b",      
    "starcoder2:7b",       # ~2GB, code-focused
    "codellama:7b-instruct",     # ~4GB, code-focused
    "qwen2.5:0.5b-instruct",
    "qwen2.5-coder:7b"     # ~300MB, general-purpose (backup)
]

# Instruction for Agent 3
code_instruction = """
You are a Full-Stack Developer. Generate complete code for the frontend, backend, and database based on the provided System Architecture Specification. Use the technologies specified in the architecture's Tech Stack section (e.g., frontend framework, backend framework, database type). Output as a valid JSON object with keys: `frontend`, `backend`, `database`. Each key should contain files as sub-keys (e.g., `frontend: { "index.js": "...", "components/TaskForm.js": "..." }`). Include:
- Frontend: Responsive UI, authentication, task creation/listing, and manager dashboard as per the architecture.
- Backend: Authentication, API endpoints, email notifications, and report generation as specified.
- Database: Schema creation scripts matching the architecture's Database Schema.
Ensure code is production-ready, includes error handling, and follows best practices. Output must be valid JSON.
Architecture:
"""

# Sample architecture text (from your input)
arch_text = """**System Architecture Specification for "Task Management Web Application" based on the given Product Requirement Document (PRD).**

### Tech Stack ###

- **Frontend:** React with Redux Framework, CSS Flexbox and Grid. It will be optimized using code splitting and lazy loading to ensure lightweight performance across devices/screens sizes. 

- **Backend:** Node.js Express Server running on a secure HTTPS environment (TLS encryption). This server will expose RESTful API endpoints for the CRUD operations of tasks, team members' assignments and role access controls. The backend is using PostgreSQL as its database with Sequelize ORM tool to handle SQL queries effectively.

- **Database:** PostgreSQL Database designed in a way that optimizes read/write performance by creating separate indices for frequently queried data (task status, deadlines etc.) and implementing foreign key constraints between the tables Tasks, Team_members, Roles etc. 

- **Third Party Integrations:** The application will integrate with an existing SMTP email system to send notifications through emails using Nodemailer - a popular Email Library for node that provides easy integration of SMTP servers and APIs within our Express Backend server itself without any external API calls, ensuring seamless notification delivery.

### System Architecture Diagram ### 
The architecture diagram would show the client-side React App interacting with various backend endpoints via HTTPS (Secure RESTful services). The main components of this system are as follows: Task Management Service on our Node.js server, a PostgreSQL Database serving data to both Services and also providing an API endpoint for generating CSV reports which is consumed by the manager dashboard component built with React Native's capabilities ensuring mobile compatibility due its lightweight nature. The User Roles (Admin/Manager) are managed at Service level where each user login attempt triggers Role-based access control checks before granting or denying permissions to specific features and data based on their role, this is done via middleware in Express backend that intercepts all incoming requests for the various endpoints.

### API Endpoints ### 
1. `POST /api/tasks` - Create a task with relevant details such as title, description etc., by default it will assign to unassigned team members and set deadline based on input parameters or defaults (if not provided).
2. `GET /api/tasks/{id}` - Read operation for getting the complete list of tasks assigned to specific user(s) along with their details such as status, assignee etc., if id is empty then it returns all unassigned tasks and pending ones only.
3. `PATCH /api/tasks/{id}` - Update task like title or description; assigning new team members can be done in this end-point too by specifying which user to assign the changes to via JSON body payload of PATCH request, mark as complete operation will also work here if status is changed from pending -> marked_complete.
4. `DELETE /api/tasks/{id}` - Delete a task based on id provided through GET or POST requests (user confirmation required). 
5. `POST /api/assignments` and `GET /api/assignments?userId={...}&teamId={...}`: Create assignment of tasks to specific team members for their individual performance tracking, get all assignments assigned within the last month etc., which also includes mark_as_complete as a special operation.
6. `POST /api/reminders` - Send an automated reminder when deadline is approaching and not completed yet based on current time vs task due date comparison (automation logic handled at service level). 
7. The manager dashboard endpoint `/dashboard`, which will be secured via role-based access controls, to view the progress of tasks assigned with each team member's status within a specific project/department and also provide CSV report download functionality for task completion data (using Nodemailer integration if needed). 
8. The application is using OAuth2 protocol along with JSON Web Tokens (JWT) mechanism ensuring secure login access, which helps in role-based permissions control as well by checking roles embedded within JWT token payload at every protected endpoints request validation step to grant or deny operation permission based on user's authenticated identity and their assigned role.
9. Integration with existing email system for notifications is handled internally using Nodemailer (SMTP library) in our server-side Express backend, which sends out emails upon task creation/editing & assignment as well as when the reminders are triggered or reports requested by managers on dashboard based solethd of provided criteria.
10. Report generation functionality will be handled via `/api/reports` endpoint with CSV format output that can be downloaded directly from manager dashboard, which is built using React Native framework to ensure mobile compatibility as well due its lightweight nature and performance capabilities when it comes across multiple screen sizes (via responsive design principles). The reports themselves are generated on-the-fly based on the request parameters in query string or JSON body payload of POST requests.
11. For team member management, endpoints like `GET /api/team_members` & `POST /api/teams/{id}/member?userId={...}` would be included for managers to view all their assigned members and also add new ones respectively; this enables better performance tracking of individual user contributions as well.
12. The frontend UI design principles are based on responsive web practices, ensuring the application scales smoothly across desktop & mobile devices by using CSS Grid/Flexbox techniques in combination with React component libraries like Material-UI or Ant Design to maintain consistent look and feel of various elements throughout multiple screen sizes while keeping accessibility standards (WCAG) as well.
13. Finally, data validation and business logic are encapsulated at the service layer using middleware functions that filter all incoming requests before they reach API endpoints for security checks/permissions based on roles etc., to prevent any unauthorized or malicious actions by potential attackers trying their luck through exposed web services (which also helps in improving system performance due reduction of unnecessary computation at request validation step). 
14. Error handling mechanism is implemented via custom error handlers & exception classes built within the Node server code, which provides detailed and user-friendly feedback messages whenever something goes wrong with requests/responses during application usage or data processing; this helps in maintaining good user experience along as well while ensuring necessary audit trails for troubleshooting purposes.
15. The entire system architecture is designed based on principles of scalability, security & performance optimization so that it can handle heavy traffic loads with minimal latency/delay during peak hours or eventful scenarios (either in terms of user interactions within application itself OR through third-party integrations). This ensures reliability for both internal company usage as well when partnering organizations get involved via API calls; all the while keeping system architecture clean & maintainable by following best practices such as modularity, code splitting/lazy loading etc. 
16. Code documentation is maintained using JSDoc comments within JavaScript files to help future developers understand how specific components work together along with some inline examples wherever needed for quick reference; this also helps in reducing development turnaround times by allowing new team members or external contractors easier access without much context setup required initially due familiarity gained through well-documented codebase.
17. Continuous integration/continuous delivery (CI/CD) pipeline is established via Jenkins using GitHub source control repository; this provides automated testing & deployment scripts that run nightly builds along with running suite of unit tests to validate functionality changes, enhancements or fixes before pushing any new features into master branch ready for production release at scheduled intervals based on business needs.
18. Last but not least - monitoring tools like Prometheus alongside Grafana dashboards setup within the backend server environment will ensure proactive performance tracking & alerting capabilities when anomalies are detected (either in terms of resource utilization or user interaction patterns); this helps maintain system health continuously by providing real-time insights on various metrics along as well which also serves to identify potential areas for optimization/improvements down the line based off actual usage data captured during operation hours. This overall robust & effective architecture specification provides all necessary components required within our task management web application while ensuring secure access controls, efficient collaboration features among team members etc., meeting given objectives along as well optimizing user experience across multiple devices/screen sizes without compromising on any quality assurance aspect throughout entire development lifecycle.

"""

prompt = code_instruction + arch_text

def query_model(model, prompt):
    """Send prompt to Ollama model and return response"""
    try:
        response = requests.post(OLLAMA_URL, json={
            "model": model,
            "prompt": prompt,
            "stream": False
        })
        response.raise_for_status()
        return response.json().get("response", "").strip()
    except Exception as e:
        return f"❌ Error with {model}: {str(e)}"
    
def extract_key_features(arch_text):
    """
    Dynamically extract key features from the architecture text for scoring.
    Returns a list of features (tech stack, endpoints, tables, other requirements).
    """
    features = []

    # Extract Tech Stack
    tech_stack_section = re.search(r'### Tech Stack ###(.*?)###', arch_text, re.DOTALL)
    if tech_stack_section:
        tech_stack_text = tech_stack_section.group(1).lower()
        tech_terms = ["React", "Redux", "Express", "Sequelize", "PostgreSQL", "Nodemailer", "CSS Flexbox", "Grid"]
        features.extend(term for term in tech_terms if term.lower() in tech_stack_text)

    # Extract API Endpoints
    endpoints_section = re.search(r'### API Endpoints ###(.*?)###', arch_text, re.DOTALL)
    if endpoints_section:
        endpoints_text = endpoints_section.group(1)
        endpoints = re.findall(r'`(.*?)`', endpoints_text)
        features.extend(endpoints)

    # Extract Database Tables
    db_section = re.search(r'- \*\*Database:\*\*.*?((?:- Table:.*?)+)', arch_text, re.DOTALL)
    if db_section:
        tables = re.findall(r'Table: (\w+)', db_section.group(1))
        features.extend(f"{table} table" for table in tables)

    # Extract Other Features
    other_terms = ["JWT", "OAuth2", "email notifications", "CSV report", "role-based access", "middleware", "HTTPS"]
    features.extend(term for term in other_terms if term.lower() in arch_text.lower())

    return features

def score_code_output(output, arch_text):
    """
    Score the code output for:
    1. Section presence (frontend, backend, database)
    2. JSON validity
    3. Word length (code length)
    4. Accuracy: coverage of key architecture features
    """
    # 1️⃣ Section check
    sections = ["frontend", "backend", "database"]
    try:
        output_json = json.loads(output)
        section_score = sum(1 for sec in sections if sec in output_json and len(output_json[sec]) > 0)
    except json.JSONDecodeError:
        section_score = 0

    # 2️⃣ JSON validity check
    json_score = 1 if section_score > 0 else 0

    # 3️⃣ Length check (total code lines across all files)
    try:
        total_lines = sum(len(str(output_json[sec][file]).splitlines()) for sec in output_json for file in output_json[sec]) if json_score else 0
        length_score = 1 if 100 <= total_lines <= 500 else 0
    except:
        length_score = 0

    # 4️⃣ Accuracy check (dynamically extracted features)
    key_features = extract_key_features(arch_text)
    matched_features = sum(1 for feat in key_features if feat.lower() in output.lower())
    accuracy_score = matched_features / len(key_features) if key_features else 0  # normalized 0–1

    # 5️⃣ Total score (sections + json + length + accuracy)
    total_score = section_score + json_score + length_score + accuracy_score * 4  # max=3+1+1+4=9
    return total_score, section_score, total_lines, accuracy_score

# Run models and store results
results = []

for model in models:
    print("="*60)
    print(f"🚀 Model: {model}")
    print("-"*60)
    output = query_model(model, prompt)
    print(output)
    total, sec_score, lines, acc_score = score_code_output(output, arch_text)
    print(f"✅ Total Score: {total:.2f}/9 | Sections: {sec_score}/3 | Lines: {lines} | Accuracy: {acc_score*100:.0f}%\n")
    results.append((model, total))

# Rank models
results.sort(key=lambda x: x[1], reverse=True)
best_model = results[0][0]

print("="*60)
print(f"🏆 Best model for Agent 3: {best_model}")

🚀 Model: deepseek-coder:1.3b
------------------------------------------------------------
```json
{
  "frontend": {
    "/index.js" : "",
    "/components/TaskForm.js" : ""  	    			      		       									        	 						         	      } ,
	"backend": {     							             	   	       								                },               ],             ",            ]                                                                                                           ))),", "emailService":"nodemailer-express"}, ["database"]]}`, and then you can use JSON parsers like `jsonwebtoken` or a library for handling tokens such as express jwt to generate JWT token if user is authenticated.
✅ Total Score: 0.86/9 | Sections: 0/3 | Lines: 0 | Accuracy: 21%

🚀 Model: deepseek-coder:6.7b
------------------------------------------------------------
I'm sorry for the confusion but your request doesn't seem to be a question or problem, it seems more like a task assignment that requires creating full-stack

In [2]:
import requests
import json
import re

# Ollama API base
OLLAMA_URL = "http://localhost:11434/api/generate"

# Models to test (small, code-focused, quantized for 16GB RAM)
models = [
    "deepseek-coder:1.3b",
    "deepseek-coder:6.7b",        # ~1GB, code-focused     
    "starcoder2:7b",       # ~2GB, code-focused
    "codellama:7b-instruct",     # ~4GB, code-focused
    "qwen2.5-coder:7b"     # ~300MB, general-purpose (backup)
]


In [3]:
code_instruction = """
You are a Frontend Developer. Generate a full React.js frontend codebase. 
Always output a single valid JSON object with key `frontend`, where each file path is a key and file contents are the value. 
Use ONLY: React, Redux, Tailwind CSS, Vite, PapaParse (CSV export), uuid (IDs), and react-router-dom (navigation). 
Ignore any other libraries mentioned in the architecture. 

Rules:
- Mandatory files: src/App.jsx, src/main.jsx, src/index.css, src/store/store.js, src/AppRouter.jsx, tailwind.config.js, vite.config.js, package.json
- Include reducers/actions for the main entity (e.g. tasks, products, posts).
- Components listed in the architecture must exist as `src/components/<Name>.jsx`.
- Data persisted in localStorage.
- RoleSelector/LoginComponent if roles are in architecture.
- Output only JSON. No markdown, no comments, no explanations.
"""


In [4]:
# Your selected arch_text from Agent 2
arch_text = """
# **System Architecture Specification - Task Management Web Application**

## Tech Stack
- Frontend Framework: ReactJS
- State Management: Redux
- Styling: Tailwind CSS
- Build Tools: Vite
- Client-side Data Persistence: localStorage (for simulating data storage until backend is implemented)
- Libraries: Axios for making HTTP requests, React Router for navigation, react-toastify for notifications.

## Components
- TaskList (list of tasks, with options to create, edit, assign, mark as complete, and delete tasks)
- TaskItem (individual task component)
- ManagerDashboard (overview of team's task progress, completed tasks, and performance reports)
- LoginComponent (secure login page)
- NotificationSystem (simulates integration with the existing email system using browser alerts or modals for notifications)
- CSVExport (enables exporting reports in CSV format)

## Data Storage
Given that this is a frontend-only application, we will use localStorage to persist data temporarily. Once the backend is implemented, appropriate changes will be made to migrate from localStorage to an actual database.

## UI Features
### Responsive Design
The application will be designed using Tailwind CSS and React, ensuring a responsive layout that adapts to different screen sizes on both desktop and mobile devices.

### Notifications
Browser alerts or modals will be used as a temporary solution for notifying users about pending tasks, reminders, and other important events. Integration with the existing email system will be implemented once it's available.

### Reports
Reports can be viewed within the ManagerDashboard, and exported in CSV format using the CSVExport component.

### Access Control
Role-based access control (admin, manager, employee) will be implemented using conditional rendering based on user roles.
"""

In [5]:
prompt = code_instruction + arch_text

In [6]:
def query_model(model, prompt):
    """Send prompt to Ollama model and return response"""
    try:
        response = requests.post(OLLAMA_URL, json={
            "model": model,
            "prompt": prompt,
            "stream": False,
            "options": {"num_ctx": 8192}
        })
        response.raise_for_status()
        return response.json().get("response", "").strip()
    except Exception as e:
        return f"❌ Error with {model}: {str(e)}"

In [7]:
def clean_json_output(output: str) -> str:
    """Remove markdown fences and extra text from model output"""
    # strip markdown fences
    output = re.sub(r"```[a-zA-Z]*", "", output)
    output = output.replace("```", "")
    # extract first {...} block
    match = re.search(r"\{[\s\S]*\}", output)
    return match.group(0) if match else output

In [8]:
def extract_components_and_features(arch_text):
    """
    Extract components, features, and primary entity from arch_text for dynamic scoring.
    Returns components (list), features (list), entity (str).
    """
    components = []
    features = []
    
    # Extract components
    component_section = re.search(r'## Components\b(.*?)##', arch_text, re.DOTALL)
    if component_section:
        component_text = component_section.group(1)
        components = re.findall(r'- (\w+)', component_text)
    
    # Add implied form component for any list component with create/edit
    for comp in components:
        if "List" in comp and comp.replace("List", "Form") not in components:
            components.append(comp.replace("List", "Form"))
    
    # Identify primary data entity (first component ending in 'List')
    entity = next((comp.replace("List", "").lower() for comp in components if comp.endswith("List")), "data")
    
    # Extract features
    feature_terms = ["localstorage", "papaparse", "uuid", "react-router-dom", 
                     "role-based access", "csv report", "notifications", 
                     "error handling", "code splitting", "lazy loading", 
                     "wcag", "responsive design"]
    features.extend(term for term in feature_terms if term.lower() in arch_text.lower())
    
    return components, features, entity

In [9]:
def score_code_output(output, arch_text):
    """
    Score the code output dynamically based on arch_text:
    1. Section presence (frontend)
    2. JSON validity
    3. Line count (100-500)
    4. Component and feature coverage
    """
    # 1️⃣ Clean and parse JSON
    try:
        clean_output = clean_json_output(output)
        output_json = json.loads(clean_output)
        section_score = 1 if "frontend" in output_json and isinstance(output_json["frontend"], dict) and len(output_json["frontend"]) > 0 else 0
    except Exception as e:
        print(f"JSON parse error: {e}")
        section_score = 0
        output_json = {}
    
    # 2️⃣ JSON validity
    json_score = 1 if section_score else 0
    
    # 3️⃣ Line count
    try:
        total_lines = sum(
            len(str(code).splitlines())
            for code in output_json.get("frontend", {}).values()
            if str(code).strip() and not str(code).startswith("<<") and not str(code).startswith("path/to")
        )
        length_score = 1 if 100 <= total_lines <= 500 else 0
    except Exception as e:
        print(f"Line count error: {e}")
        total_lines = 0
        length_score = 0
    
    # 4️⃣ Component and feature coverage
    components, features, entity = extract_components_and_features(arch_text)
    mandatory_files = [
        "src/App.jsx", "src/main.jsx", "src/index.css",
        "src/store/store.js",
        f"src/store/reducers/{entity}Reducer.js",
        f"src/store/actions/{entity}Actions.js",
        "src/AppRouter.jsx",
        "tailwind.config.js", "vite.config.js", "package.json"
    ]
    component_files = [f"src/components/{comp}.jsx" for comp in components]
    all_files = mandatory_files + component_files
    
    file_score = sum(1 for f in all_files if f in output_json.get("frontend", {})) / len(all_files) if all_files else 0
    feature_score = sum(1 for feat in features if feat.lower() in output.lower()) / len(features) if features else 0
    accuracy_score = (file_score + feature_score) / 2  # Combine file and feature coverage
    
    # Debug info
    print(f"Debug: Expected components: {components}")
    print(f"Debug: Expected features: {features}")
    print(f"Debug: Primary entity: {entity}")
    print(f"Debug: Expected files: {all_files}")
    print(f"Debug: Matched files: {[f for f in all_files if f in output_json.get('frontend', {})]}")
    print(f"Debug: Matched features: {[feat for feat in features if feat.lower() in output.lower()]}")
    print(f"Debug: Total lines: {total_lines}")
    
    # Total score
    total_score = section_score + json_score + length_score + accuracy_score * 4  # max=1+1+1+4=7
    return total_score, section_score, total_lines, accuracy_score

In [None]:
# Run models and store results
results = []
for model in models:
    print("="*60)
    print(f"🚀 Model: {model}")
    print("-"*60)
    output = query_model(model, prompt)
    print(output)
    total, sec_score, lines, acc_score = score_code_output(output, arch_text)
    print(f"✅ Total Score: {total:.2f}/7 | Sections: {sec_score}/1 | Lines: {lines} | Accuracy: {acc_score*100:.0f}%\n")
    results.append((model, total))

🚀 Model: deepseek-coder:1.3b
------------------------------------------------------------


In [None]:
# Rank models
results.sort(key=lambda x: x[1], reverse=True)
best_model = results[0][0]
print("="*60)
print(f"🏆 Best model for Agent 3: {best_model}")

**Finalized agent 3 - BRD2**

In [1]:
import requests
import json
import re

# Ollama API base
OLLAMA_URL = "http://localhost:11434/api/generate"

# Models to test (small, code-focused, quantized for 16GB RAM)
models = [                  # ~1GB, code-focused     "deepseek-coder:1.3b"      # ~2GB, code-focused        "deepseek-coder:6.7b",
    "codellama:7b-instruct",     # ~4GB, code-focused
    "qwen2.5-coder:7b"     # ~300MB, general-purpose (backup)
]

code_instruction = """
You are a Frontend Developer. Output a single VALID JSON object. 
Top-level key MUST be "frontend". Each key is a file path, each value is the full file content as a string.
Absolutely NO markdown, NO code fences, NO comments, NO prose. Just JSON.

Use ONLY: React, Redux (plain or @reduxjs/toolkit), Tailwind CSS, Vite, PapaParse, uuid, react-router-dom.
Do NOT include axios, react-toastify, or any other libraries.

Rules:
- Mandatory files: 
  src/App.jsx, src/main.jsx, src/index.css, src/store/store.js, src/AppRouter.jsx, tailwind.config.js, vite.config.js, package.json
- Include reducers AND actions for the main entity (tasks).
  Reducers/Actions MUST implement: add, edit, delete, toggle complete, assign (with role support).
- Components REQUIRED (fully implemented with JSX + Redux hooks): 
  src/components/TaskList.jsx, src/components/TaskItem.jsx, src/components/TaskForm.jsx, 
  src/components/ManagerDashboard.jsx, src/components/LoginComponent.jsx, 
  src/components/NotificationSystem.jsx, src/components/CSVExport.jsx
- State persistence MUST use localStorage (load on init, save on changes).
- React Router v6 syntax.
- package.json MUST only include the allowed deps above.
- Use double quotes in JSON. Escape all newlines inside strings as \n. 
- No trailing commas anywhere in JSON or code strings.

"""

# Your selected arch_text from Agent 2
arch_text = """
# **System Architecture Specification - Task Management Web Application**

## Tech Stack
- Frontend Framework: ReactJS
- State Management: Redux
- Styling: Tailwind CSS
- Build Tools: Vite
- Client-side Data Persistence: localStorage (for simulating data storage until backend is implemented)
- Libraries: Axios for making HTTP requests, React Router for navigation, react-toastify for notifications.

## Components
- TaskList (list of tasks, with options to create, edit, assign, mark as complete, and delete tasks)
- TaskItem (individual task component)
- ManagerDashboard (overview of team's task progress, completed tasks, and performance reports)
- LoginComponent (secure login page)
- NotificationSystem (simulates integration with the existing email system using browser alerts or modals for notifications)
- CSVExport (enables exporting reports in CSV format)

## Data Storage
Given that this is a frontend-only application, we will use localStorage to persist data temporarily. Once the backend is implemented, appropriate changes will be made to migrate from localStorage to an actual database.

## UI Features
### Responsive Design
The application will be designed using Tailwind CSS and React, ensuring a responsive layout that adapts to different screen sizes on both desktop and mobile devices.

### Notifications
Browser alerts or modals will be used as a temporary solution for notifying users about pending tasks, reminders, and other important events. Integration with the existing email system will be implemented once it's available.

### Reports
Reports can be viewed within the ManagerDashboard, and exported in CSV format using the CSVExport component.

### Access Control
Role-based access control (admin, manager, employee) will be implemented using conditional rendering based on user roles.
"""

prompt = code_instruction + arch_text

def query_model(model, prompt):
    """Send prompt to Ollama model and return response"""
    try:
        response = requests.post(OLLAMA_URL, json={
        "model": model,
        "prompt": prompt,
        "stream": False,
        # Many models honor this; some ignore, but it helps:
        "format": "json",
        "options": {
        "num_ctx": 8192,
        "temperature": 0.2,
        # Stop if the model starts writing fences or prose:
        "stop": ["```", "\n```", "\nJSON", "\njson"]
    }
})
        response.raise_for_status()
        return response.json().get("response", "").strip()
    except Exception as e:
        return f"❌ Error with {model}: {str(e)}"
    
def clean_json_output(output: str) -> str:
    # 1) Strip common markdown/code fences
    output = re.sub(r"^```(?:json)?\s*|\s*```$", "", output.strip(), flags=re.IGNORECASE|re.MULTILINE)

    # 2) Keep only the largest outermost JSON object
    m = re.search(r"\{[\s\S]*\}", output)
    if m:
        output = m.group(0)
    else:
        return output.strip()  # will fail later -> you'll see the parse error

    # 3) Remove backticks (template literals)
    output = output.replace("`", "")

    # 4) Remove // and /* */ comments inside json-like strings (best-effort)
    output = re.sub(r"//.*?$", "", output, flags=re.MULTILINE)
    output = re.sub(r"/\*[\s\S]*?\*/", "", output)

    # 5) Ensure newlines inside JSON string values are escaped
    #    (only a heuristic; your JSON should already be valid)
    output = re.sub(r'(:\s*")((?:\\.|[^"\\])*)\n', lambda m: m.group(0).replace("\n", r"\n"), output)

    # 6) Remove trailing commas before } or ]
    output = re.sub(r",\s*([}\]])", r"\1", output)

    return output.strip()
#So we end up with valid JSON

def strip_disallowed_deps(frontend: dict) -> None:
    allowed = {
        "react", "react-dom", "redux", "react-redux", "react-router-dom",
        "tailwindcss", "vite", "@vitejs/plugin-react", "papaparse", "uuid",
        "@reduxjs/toolkit"  # if you allow RTK
    }
    pkg = frontend.get("package.json")
    if not pkg:
        return
    try:
        pkg_obj = json.loads(pkg)
        for sec in ["dependencies", "devDependencies"]:
            if sec in pkg_obj:
                pkg_obj[sec] = {k: v for k, v in pkg_obj[sec].items() if k in allowed}
        frontend["package.json"] = json.dumps(pkg_obj, separators=(",", ":"), ensure_ascii=False)
    except Exception:
        pass



def extract_components_and_features(arch_text):
    """
    Extract components, features, and primary entity from arch_text for dynamic scoring.
    Returns components (list), features (list), entity (str).
    """
    components = []
    features = []
    
    # Extract components
    component_section = re.search(r'## Components\b(.*?)##', arch_text, re.DOTALL)
    if component_section:
        component_text = component_section.group(1)
        components = re.findall(r'- (\w+)', component_text)
    
    # Add implied form component for any list component with create/edit
    for comp in components:
        if "List" in comp and comp.replace("List", "Form") not in components:
            components.append(comp.replace("List", "Form"))
    
    # Identify primary data entity (first component ending in 'List')
    entity = next((comp.replace("List", "").lower() for comp in components if comp.endswith("List")), "data")
    
    # Extract features
    feature_terms = ["localstorage", "papaparse", "uuid", "react-router-dom", 
                     "role-based access", "csv report", "notifications", 
                     "error handling", "code splitting", "lazy loading", 
                     "wcag", "responsive design"]
    features.extend(term for term in feature_terms if term.lower() in arch_text.lower())
    
    return components, features, entity


def score_code_output(output, arch_text):
    """
    Score the code output dynamically based on arch_text:
    1. Section presence (frontend)
    2. JSON validity
    3. Line count (100-500)
    4. Component and feature coverage
    """
    # 1️⃣ Clean and parse JSON
    try:
        clean_output = clean_json_output(output)
        output_json = json.loads(clean_output)
        section_score = 1 if "frontend" in output_json and isinstance(output_json["frontend"], dict) and len(output_json["frontend"]) > 0 else 0
    except Exception as e:
        print(f"JSON parse error: {e}")
        section_score = 0
        output_json = {}
    
    # 2️⃣ JSON validity
    json_score = 1 if section_score else 0
    
    # 3️⃣ Line count
    try:
        total_lines = sum(
            len(str(code).splitlines())
            for code in output_json.get("frontend", {}).values()
            if str(code).strip() and not str(code).startswith("<<") and not str(code).startswith("path/to")
        )
        length_score = 1 if 100 <= total_lines <= 1000 else 0
    except Exception as e:
        print(f"Line count error: {e}")
        total_lines = 0
        length_score = 0
    
    # 4️⃣ Component and feature coverage
    components, features, entity = extract_components_and_features(arch_text)
    mandatory_files = [
        "src/App.jsx", "src/main.jsx", "src/index.css",
        "src/store/store.js",
        f"src/store/reducers/{entity}Reducer.js",
        f"src/store/actions/{entity}Actions.js",
        "src/AppRouter.jsx",
        "tailwind.config.js", "vite.config.js", "package.json"
    ]
    component_files = [f"src/components/{comp}.jsx" for comp in components]
    all_files = mandatory_files + component_files
    
    file_score = sum(1 for f in all_files if f in output_json.get("frontend", {})) / len(all_files) if all_files else 0
    feature_score = sum(1 for feat in features if feat.lower() in output.lower()) / len(features) if features else 0
    accuracy_score = (file_score + feature_score) / 2  # Combine file and feature coverage
    
    # Debug info
    print(f"Debug: Expected components: {components}")
    print(f"Debug: Expected features: {features}")
    print(f"Debug: Primary entity: {entity}")
    print(f"Debug: Expected files: {all_files}")
    print(f"Debug: Matched files: {[f for f in all_files if f in output_json.get('frontend', {})]}")
    print(f"Debug: Matched features: {[feat for feat in features if feat.lower() in output.lower()]}")
    print(f"Debug: Total lines: {total_lines}")
    
    # Total score
    total_score = section_score + json_score + length_score + accuracy_score * 4  # max=1+1+1+4=7
    return total_score, section_score, total_lines, accuracy_score


# Run models and store results
#Loop over each model → generate code → clean → score → store results.
results = []
for model in models:
    print("="*60)
    print(f"🚀 Model: {model}")
    print("-"*60)
    output = query_model(model, prompt)
    print(output)
    total, sec_score, lines, acc_score = score_code_output(output, arch_text)
    print(f"✅ Total Score: {total:.2f}/7 | Sections: {sec_score}/1 | Lines: {lines} | Accuracy: {acc_score*100:.0f}%\n")
    results.append((model, total))


# Finally ranks models by score and picks the best one.
results.sort(key=lambda x: x[1], reverse=True)
best_model = results[0][0]
print("="*60)
print(f"🏆 Best model for Agent 3: {best_model}")

# Re-query best model for clean final output
best_output = query_model(best_model, prompt)
clean_output = clean_json_output(best_output)
obj = json.loads(clean_output)

if "frontend" in obj:
    strip_disallowed_deps(obj["frontend"])

🚀 Model: starcoder2:7b
------------------------------------------------------------
{"\n":"\n"}
Debug: Expected components: ['TaskList', 'TaskItem', 'ManagerDashboard', 'LoginComponent', 'NotificationSystem', 'CSVExport', 'TaskForm']
Debug: Expected features: ['localstorage', 'role-based access', 'notifications', 'responsive design']
Debug: Primary entity: task
Debug: Expected files: ['src/App.jsx', 'src/main.jsx', 'src/index.css', 'src/store/store.js', 'src/store/reducers/taskReducer.js', 'src/store/actions/taskActions.js', 'src/AppRouter.jsx', 'tailwind.config.js', 'vite.config.js', 'package.json', 'src/components/TaskList.jsx', 'src/components/TaskItem.jsx', 'src/components/ManagerDashboard.jsx', 'src/components/LoginComponent.jsx', 'src/components/NotificationSystem.jsx', 'src/components/CSVExport.jsx', 'src/components/TaskForm.jsx']
Debug: Matched files: []
Debug: Matched features: []
Debug: Total lines: 0
✅ Total Score: 0.00/7 | Sections: 0/1 | Lines: 0 | Accuracy: 0%

🚀 Model: c