EQUIREMENTS_UPGRADED.md
Web2API transforms any web service into an OpenAI-compatible API endpoint using intelligent browser automation and auto-discovery.
Core Flow: URL + Credentials → Auto-Discovery → OpenAI API Endpoint
Key Innovation: Zero manual configuration - the system discovers service capabilities, authentication flows, and operations automatically through intelligent analysis.
- INTEGRATE all 157 Owl-Browser commands properly
- Complete Owl-Browser Command Mapping
- Architecture Transformation Strategy
- Discovery Pipeline Implementation
- Execution Engine with Owl-Browser Integration
- Live Viewport Streaming
- Database Schema Extensions
- Implementation Phases
- Success Criteria
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_detect_captcha |
Detect CAPTCHA during login | web2api/builder/discovery/form_analyzer.py |
browser_classify_captcha |
Classify CAPTCHA type | NEW: auth/captcha_handler.py |
browser_solve_text_captcha |
Solve text-based CAPTCHAs | NEW: auth/captcha_handler.py |
browser_solve_image_captcha |
Solve image-based CAPTCHAs | NEW: auth/captcha_handler.py |
browser_solve_captcha |
Universal CAPTCHA solver | NEW: auth/captcha_handler.py |
browser_find_element |
Find login forms, buttons | web2api/builder/discovery/form_analyzer.py |
browser_get_cookies |
Extract session cookies after login | NEW: auth/session_manager.py |
browser_set_cookie |
Restore saved session | NEW: auth/session_manager.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_ai_analyze |
AI-powered page analysis | web2api/builder/analyzer/visual_analyzer.py |
browser_query_page |
Query page capabilities | web2api/builder/analyzer/page_analyzer.py |
browser_screenshot |
Capture UI for vision models | web2api/builder/analyzer/visual_analyzer.py |
browser_get_html |
Extract DOM structure | web2api/builder/analyzer/page_analyzer.py |
browser_get_markdown |
Extract content as markdown | web2api/builder/analyzer/page_analyzer.py |
browser_extract_text |
Extract visible text | web2api/builder/analyzer/page_analyzer.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_is_enabled |
CRITICAL: Detect when "Send" button re-enables (response complete) | MODIFY: runner/test_runner.py → execution/operation_runner.py |
browser_ai_extract |
Extract response from chat interface | MODIFY: runner/test_runner.py → execution/operation_runner.py |
browser_extract_text |
Extract text content | MODIFY: runner/test_runner.py → execution/operation_runner.py |
browser_click |
Click buttons, submit forms | web2api/concurrency/browser_pool.py |
browser_type |
Fill input fields | web2api/concurrency/browser_pool.py |
browser_upload_file |
Upload files for services that support it | MODIFY: runner/test_runner.py |
browser_select_option |
Select dropdown options | web2api/concurrency/browser_pool.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
start_live_stream |
CRITICAL: Live viewport during discovery | NEW: execution/live_viewport.py |
stop_live_stream |
Stop live viewport | NEW: execution/live_viewport.py |
get_live_stream_stats |
Get stream statistics | NEW: execution/live_viewport.py |
list_live_streams |
List active streams | NEW: execution/live_viewport.py |
get_live_frame |
Get current frame | NEW: execution/live_viewport.py |
start_video_recording |
Record discovery process | NEW: execution/video_recorder.py |
stop_video_recording |
Stop recording | NEW: execution/video_recorder.py |
download_video_recording |
Download recorded video | NEW: execution/video_recorder.py |
get_video_recording_stats |
Get recording stats | NEW: execution/video_recorder.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
new_tab |
Create isolated service context | MODIFY: concurrency/browser_pool.py |
switch_tab |
Switch between service contexts | MODIFY: concurrency/browser_pool.py |
close_tab |
Close service context | MODIFY: concurrency/browser_pool.py |
get_tabs |
List all service tabs | MODIFY: concurrency/browser_pool.py |
get_active_tab |
Get current service tab | MODIFY: concurrency/browser_pool.py |
get_tab_count |
Count service tabs | MODIFY: concurrency/browser_pool.py |
set_popup_policy |
Handle popups during discovery | MODIFY: concurrency/browser_pool.py |
get_blocked_popups |
Check blocked popups | MODIFY: concurrency/browser_pool.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_navigate |
Navigate to service URL | web2api/concurrency/browser_pool.py |
browser_reload |
Reload page | web2api/concurrency/browser_pool.py |
browser_go_back |
Go back in history | web2api/concurrency/browser_pool.py |
browser_go_forward |
Go forward in history | web2api/concurrency/browser_pool.py |
browser_wait_for_selector |
Wait for elements | web2api/concurrency/browser_pool.py |
browser_wait_for_navigation |
Wait for page load | web2api/concurrency/browser_pool.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_get_element_count |
Count elements | web2api/builder/analyzer/page_analyzer.py |
browser_get_element_text |
Get element text | web2api/builder/analyzer/page_analyzer.py |
browser_get_element_html |
Get element HTML | web2api/builder/analyzer/page_analyzer.py |
browser_get_element_attribute |
Get attributes | web2api/builder/analyzer/page_analyzer.py |
browser_is_visible |
Check element visibility | web2api/builder/analyzer/page_analyzer.py |
browser_is_hidden |
Check if hidden | web2api/builder/analyzer/page_analyzer.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_fill_form |
Auto-fill login forms | MODIFY: builder/discovery/form_analyzer.py → auth/form_filler.py |
browser_check |
Check checkboxes | web2api/concurrency/browser_pool.py |
browser_uncheck |
Uncheck checkboxes | web2api/concurrency/browser_pool.py |
browser_hover |
Hover over elements | web2api/concurrency/browser_pool.py |
browser_drag |
Drag and drop | web2api/concurrency/browser_pool.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_evaluate |
Execute custom JS | web2api/concurrency/browser_pool.py |
browser_evaluate_on_page |
Evaluate in page context | web2api/concurrency/browser_pool.py |
browser_get_current_url |
Get current URL | web2api/concurrency/browser_pool.py |
browser_get_page_title |
Get page title | web2api/concurrency/browser_pool.py |
| Owl-Browser Command | Web2API Use Case | Integration File |
|---|---|---|
browser_get_network_requests |
Monitor API calls | MODIFY: builder/discovery/api_detector.py |
browser_wait_for_response |
Wait for API response | web2api/concurrency/browser_pool.py |
browser_get_console_logs |
Get console errors | web2api/concurrency/browser_pool.py |
browser_get_metrics |
Get performance metrics | web2api/concurrency/browser_pool.py |
Target Purpose: Web2API service management + OpenAI endpoints
- POST /build
- POST /run
- GET /results
- POST /api/services # Register service
- GET /api/services # List services
- GET /api/services/:id # Get service
- PUT /api/services/:id # Update config
- DELETE /api/services/:id # Delete service
- POST /api/services/:id/discover # Trigger discovery
- WS /ws/services/:id # WebSocket for live updates
- POST /v1/chat/completions # Main endpoint
- GET /v1/models # List services as models
- CORS middleware
- Error handlers
- Health check
---
#### `concurrency/browser_pool.py` → `concurrency/browser_pool.py` (Enhance)
**Current Purpose**: Pool browser contexts for parallel tests
**Target Purpose**: Pool browser contexts per service (isolation)
**Changes Required**:
```python
# ADD service-specific context management
+ async def acquire_service_context(self, service_id: str) -> BrowserContext:
+ """Get or create context for specific service."""
+ if service_id not in self._service_contexts:
+ context = await self._create_service_context(service_id)
+ self._service_contexts[service_id] = context
+ return self._service_contexts[service_id]
# ADD tab management for multi-service
+ async def new_service_tab(self, service_id: str):
+ """Create new tab for service isolation."""
+
+ async def switch_service_tab(self, service_id: str, tab_index: int):
+ """Switch to service tab."""
# ADD session persistence
+ async def save_service_session(self, service_id: str):
+ """Save cookies for service."""
+ cookies = await self._browser.browser_get_cookies(
+ context_id=self._service_contexts[service_id].id
+ )
+ await self._storage.save_session(service_id, cookies)
+ async def restore_service_session(self, service_id: str):
+ """Restore saved session."""
+ cookies = await self._storage.get_session(service_id)
+ await self._browser.browser_set_cookie(
+ context_id=self._service_contexts[service_id].id,
+ cookies=cookies
+ )
# KEEP existing pooling logic
- Lifecycle management
- Health checks
- Resource cleanup
Current Purpose: Analyze page structure for test generation Target Purpose: Extract service capabilities (features) Changes Required:
# ADD Owl-Browser AI commands
+ async def detect_service_capabilities(self, browser, context_id):
+ """Use browser_ai_analyze to detect features."""
+ analysis = await browser.browser_ai_analyze({
+ "context_id": context_id,
+ "question": "What capabilities does this service offer? (chat, image generation, code execution, etc.)"
+ })
+ return self._parse_capabilities(analysis)
+ async def detect_ui_elements(self, browser, context_id):
+ """Use browser_query_page to find interactive elements."""
+ elements = await browser.browser_query_page({
+ "context_id": context_id,
+ "question": "List all buttons, inputs, dropdowns, and their purposes"
+ })
+ return self._parse_elements(elements)
# ADD model/feature detection
+ async def detect_available_models(self, browser, context_id):
+ """Find model selector options."""
+ models = await browser.browser_query_page({
+ "context_id": context_id,
+ "question": "What models or AI engines are available?"
+ })
+ return models
# KEEP existing DOM analysis
- HTML parsing
- Element classification
- Interactive element detectionCurrent Purpose: Vision model integration for UI analysis Target Purpose: Same (perfect for feature detection) No Changes Needed - Already integrated with vision models! Reuse %: 100%
Current Purpose: Detect input fields and generate test cases Target Purpose: Detect authentication forms + CAPTCHA handling Changes Required:
# ADD auth-specific detection
+ async def detect_login_form(self, elements):
+ """Identify login form elements."""
+ email_field = await self._browser.browser_find_element({
+ "context_id": context_id,
+ "description": "email or username input field"
+ })
+ password_field = await self._browser.browser_find_element({
+ "context_id": context_id,
+ "description": "password input field"
+ })
+ return {"email": email_field, "password": password_field}
+ async def handle_captcha(self, context_id):
+ """Detect and solve CAPTCHA if present."""
+ has_captcha = await self._browser.browser_detect_captcha({
+ "context_id": context_id
+ })
+
+ if has_captcha.get("found"):
+ captcha_type = await self._browser.browser_classify_captcha({
+ "context_id": context_id
+ })
+
+ if captcha_type["type"] == "text":
+ solved = await self._browser.browser_solve_text_captcha({
+ "context_id": context_id
+ })
+ elif captcha_type["type"] == "image":
+ solved = await self._browser.browser_solve_image_captcha({
+ "context_id": context_id
+ })
+ else:
+ solved = await self._browser.browser_solve_captcha({
+ "context_id": context_id
+ })
+
+ return solved
+ return None
# KEEP existing field detection
- Field type detection (email, password, etc.)
- Validation rule inference
- Test case generationCurrent Purpose: Execute test steps sequentially Target Purpose: Execute discovered operations with streaming Critical Changes:
# ADD response detection using browser_is_enabled
+ async def wait_for_response_complete(self, browser, context_id, submit_selector):
+ """
+ CRITICAL: Use browser_is_enabled to detect when response is ready.
+
+ Most chat interfaces disable the "Send" button while generating response.
+ When it re-enables, response is complete.
+ """
+ while True:
+ is_enabled = await browser.browser_is_enabled({
+ "context_id": context_id,
+ "selector": submit_selector
+ })
+
+ if is_enabled.get("enabled"):
+ break # Response complete!
+
+ await asyncio.sleep(0.5)
# ADD response extraction
+ async def extract_response(self, browser, context_id, output_selector):
+ """Extract AI response from chat interface."""
+ # Use AI-powered extraction
+ response = await browser.browser_ai_extract({
+ "context_id": context_id,
+ "selector": output_selector,
+ "prompt": "Extract the AI assistant's response text"
+ })
+
+ # Fallback to text extraction
+ if not response.get("content"):
+ response = await browser.browser_extract_text({
+ "context_id": context_id,
+ "selector": output_selector
+ })
+
+ return response.get("content", "")
# ADD file upload support
+ async def upload_file_if_needed(self, browser, context_id, file_path):
+ """Handle file upload for services that support it."""
+ await browser.browser_upload_file({
+ "context_id": context_id,
+ "files": [file_path]
+ })
# MODIFY execution loop to support streaming
- async def run_test(self, test):
+ async def execute_operation(self, service_id, operation, params, websocket=None):
+ """Execute operation with real-time streaming."""
+ if websocket:
+ await websocket.send_json({"type": "log", "message": "Starting operation"})
+
+ for step in operation.execution_steps:
+ await self._execute_step(step, websocket)
+
+ # Extract and return result
+ result = await self.extract_response(...)
+ return result
# KEEP existing step execution
- Navigate
- Click
- Type
- Wait for selectorsCurrent Purpose: Selector recovery when UI changes Target Purpose: Same No Changes Needed - Perfect for error recovery! Reuse %: 100%
Current Purpose: LLM integration for AutoQA tools Target Purpose: Add Web2API-specific prompts Changes Required:
# ADD new prompt types
+ PromptType.AUTH_DETECTION
+ PromptType.FEATURE_EXTRACTION
+ PromptType.OPERATION_MAPPING
+ PromptType.CAPABILITY_ANALYSIS
# ADD new methods
+ async def detect_auth_type(self, url, page_content, forms):
+ """Detect authentication mechanism using LLM."""
+ return await self._execute_prompt(
+ ToolName.WEB2API, # New tool
+ PromptType.AUTH_DETECTION,
+ {"url": url, "page": page_content, "forms": json.dumps(forms)}
+ )
+ async def extract_service_capabilities(self, screenshot, dom):
+ """Extract service capabilities."""
+ return await self._execute_prompt(
+ ToolName.WEB2API,
+ PromptType.FEATURE_EXTRACTION,
+ {"screenshot": screenshot, "dom": dom}
+ )
+ async def map_operation(self, features, purpose):
+ """Map features to executable operation."""
+ return await self._execute_prompt(
+ ToolName.WEB2API,
+ PromptType.OPERATION_MAPPING,
+ {"features": json.dumps(features), "purpose": purpose}
+ )
# KEEP existing LLM methods
- Test generation
- Step transformation
- Assertion validation
- Selector enhancementCurrent Purpose: Store test artifacts and results Target Purpose: Store services, credentials, executions Changes Required:
# ADD new tables (append-only)
+ class Service(Base):
+ __tablename__ = "services"
+
+ id = Column(UUID, primary_key=True)
+ name = Column(String(255))
+ url = Column(Text)
+ type = Column(String(50))
+ status = Column(String(50))
+ login_status = Column(String(50))
+ discovery_status = Column(String(50))
+ config = Column(JSONB)
+ created_at = Column(DateTime, default=datetime.utcnow)
+ updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
+ class ServiceCredential(Base):
+ __tablename__ = "service_credentials"
+
+ id = Column(UUID, primary_key=True)
+ service_id = Column(UUID, ForeignKey("services.id"))
+ auth_type = Column(String(50))
+ encrypted_email = Column(Text)
+ encrypted_password = Column(Text)
+ encrypted_api_key = Column(Text)
+ session_cookie = Column(JSONB)
+ created_at = Column(DateTime, default=datetime.utcnow)
+ class Execution(Base):
+ __tablename__ = "executions"
+
+ id = Column(UUID, primary_key=True)
+ service_id = Column(UUID, ForeignKey("services.id"))
+ operation_id = Column(String(100))
+ parameters = Column(JSONB)
+ result = Column(Text)
+ status = Column(String(50))
+ started_at = Column(DateTime, default=datetime.utcnow)
+ completed_at = Column(DateTime)
+ error = Column(Text)
+ class ServiceStats(Base):
+ __tablename__ = "service_stats"
+
+ service_id = Column(UUID, ForeignKey("services.id"), primary_key=True)
+ total_requests = Column(Integer, default=0)
+ successful_requests = Column(Integer, default=0)
+ failed_requests = Column(Integer, default=0)
+ avg_latency_ms = Column(Integer, default=0)
+ last_request_at = Column(DateTime)
+ uptime_start = Column(DateTime, default=datetime.utcnow)
# KEEP existing tables
- artifacts
- test_results
- test_suites"""
Service CRUD endpoints for TOWER frontend integration.
"""
from fastapi import APIRouter, Depends, HTTPException
from web2api.storage.database import Service, ServiceCredential
from web2api.auth.credential_store import CredentialStore
from web2api.execution.queue_manager import ExecutionQueue
router = APIRouter(prefix="/api/services", tags=["services"])
@router.post("/")
async def register_service(
url: str,
email: str,
password: str,
credential_store: CredentialStore = Depends()
):
"""Register a new service."""
# Create service
service = Service(
name=url.split("//")[1].split("/")[0],
url=url,
status="offline",
login_status="pending",
discovery_status="pending"
)
# Encrypt and store credentials
credential_store.store_credentials(service.id, email, password)
# Trigger background login
await trigger_login_process(service.id)
return service
@router.get("/")
async def list_services():
"""List all registered services."""
return await Service.all()
@router.get("/{service_id}")
async def get_service(service_id: str):
"""Get service details."""
return await Service.get(service_id)
@router.put("/{service_id}")
async def update_service_config(service_id: str, config: dict):
"""Update service configuration."""
service = await Service.get(service_id)
service.config = config
await service.save()
return service
@router.delete("/{service_id}")
async def delete_service(service_id: str):
"""Delete a service."""
await Service.delete(service_id)
return {"success": True}
@router.post("/{service_id}/discover")
async def trigger_discovery(service_id: str):
"""Trigger auto-discovery process."""
from web2api.discovery.orchestrator import DiscoveryOrchestrator
orchestrator = DiscoveryOrchestrator()
task_id = await orchestrator.start_discovery(service_id)
return {
"message": "Discovery started",
"task_id": task_id
}"""
WebSocket handler for real-time updates to TOWER frontend.
"""
from fastapi import WebSocket
from web2api.execution.live_viewport import LiveViewportManager
class WebSocketHandler:
"""Manages WebSocket connections for live updates."""
def __init__(self):
self.active_connections: dict[str, WebSocket] = {}
async def connect(self, service_id: str, websocket: WebSocket):
"""Accept WebSocket connection."""
await websocket.accept()
self.active_connections[service_id] = websocket
async def disconnect(self, service_id: str):
"""Remove WebSocket connection."""
self.active_connections.pop(service_id, None)
async def send_login_update(self, service_id: str, status: str, message: str):
"""Send login status update."""
if service_id in self.active_connections:
await self.active_connections[service_id].send_json({
"type": "login_update",
"loginStatus": status,
"message": message
})
async def send_discovery_update(
self,
service_id: str,
status: str,
progress: int,
message: str
):
"""Send discovery progress update."""
if service_id in self.active_connections:
await self.active_connections[service_id].send_json({
"type": "discovery_update",
"discoveryStatus": status,
"progress": progress,
"message": message
})
async def send_execution_log(
self,
service_id: str,
level: str,
message: str
):
"""Send execution log."""
if service_id in self.active_connections:
await self.active_connections[service_id].send_json({
"type": "execution_log",
"timestamp": int(time.time() * 1000),
"level": level,
"message": message
})
async def send_live_viewport_frame(self, service_id: str, frame_data: bytes):
"""Send live viewport frame."""
if service_id in self.active_connections:
await self.active_connections[service_id].send_bytes(frame_data)"""
OpenAI-compatible API endpoints.
"""
from fastapi import APIRouter, HTTPException
from web2api.execution.operation_runner import OperationRunner
router = APIRouter()
@router.post("/v1/chat/completions")
async def chat_completion(request: ChatCompletionRequest):
"""
Main OpenAI-compatible endpoint.
Maps OpenAI request to Web2API service execution.
"""
# Parse model to get service_id
service_id = request.model.split(":")[0]
# Get service config
service = await Service.get(service_id)
if not service:
raise HTTPException(404, "Service not found")
# Execute operation
runner = OperationRunner()
result = await runner.execute_operation(
service_id=service_id,
operation_id="chat_completion",
parameters={
"message": request.messages[-1]["content"],
"model": request.model
}
)
# Format as OpenAI response
return {
"id": f"chatcmpl-{uuid.uuid4().hex[:24]}",
"object": "chat.completion",
"created": int(time.time()),
"model": request.model,
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": result
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": estimate_tokens(request.messages[-1]["content"]),
"completion_tokens": estimate_tokens(result),
"total_tokens": estimate_tokens(request.messages[-1]["content"] + result)
}
}
@router.get("/v1/models")
async def list_models():
"""List all services as models."""
services = await Service.all()
return {
"object": "list",
"data": [
{
"id": service.id,
"object": "model",
"created": int(service.created_at.timestamp()),
"owned_by": "web2api"
}
for service in services
]
}"""
Authentication mechanism detection using Owl-Browser commands.
"""
class AuthDetector:
"""Detects authentication mechanisms on web services."""
async def detect_auth_mechanism(self, browser, context_id, url):
"""Detect how service authenticates users."""
await browser.browser_navigate({
"context_id": context_id,
"url": url
})
# Check for form login
has_email = await browser.browser_find_element({
"context_id": context_id,
"description": "email or username input field"
})
has_password = await browser.browser_find_element({
"context_id": context_id,
"description": "password input field"
})
if has_email.get("found") and has_password.get("found"):
return {
"type": "form_login",
"email_selector": has_email["selector"],
"password_selector": has_password["selector"]
}
# Check for OAuth
has_oauth = await browser.browser_find_element({
"context_id": context_id,
"description": "Sign in with Google, GitHub, or other OAuth button"
})
if has_oauth.get("found"):
return {
"type": "oauth",
"oauth_selector": has_oauth["selector"]
}
# Check for API key
has_api_key = await browser.browser_find_element({
"context_id": context_id,
"description": "API key input field"
})
if has_api_key.get("found"):
return {
"type": "api_key",
"api_key_selector": has_api_key["selector"]
}
return {"type": "unknown"}
async def execute_login(
self,
browser,
context_id,
auth_config,
credentials,
websocket_handler=None
):
"""Execute login flow with CAPTCHA handling."""
if auth_config["type"] == "form_login":
# Fill email
await browser.browser_type({
"context_id": context_id,
"selector": auth_config["email_selector"],
"text": credentials["email"]
})
if websocket_handler:
await websocket_handler.send_login_update(
service_id, "processing", "Entered email"
)
# Fill password
await browser.browser_type({
"context_id": context_id,
"selector": auth_config["password_selector"],
"text": credentials["password"]
})
if websocket_handler:
await websocket_handler.send_login_update(
service_id, "processing", "Entered password"
)
# Check for CAPTCHA before submit
captcha_result = await self._handle_captcha_if_present(
browser, context_id, websocket_handler
)
# Submit form
submit_button = await browser.browser_find_element({
"context_id": context_id,
"description": "submit, login, or sign in button"
})
await browser.browser_click({
"context_id": context_id,
"selector": submit_button["selector"]
})
# Wait for successful login
await browser.browser_wait_for_navigation({
"context_id": context_id
})
return True
return False
async def _handle_captcha_if_present(
self,
browser,
context_id,
websocket_handler=None
):
"""Detect and solve CAPTCHA using Owl-Browser commands."""
has_captcha = await browser.browser_detect_captcha({
"context_id": context_id
})
if not has_captcha.get("found"):
return None
if websocket_handler:
await websocket_handler.send_login_update(
service_id, "processing", "CAPTCHA detected, solving..."
)
# Classify CAPTCHA type
captcha_type = await browser.browser_classify_captcha({
"context_id": context_id
})
# Solve based on type
if captcha_type["type"] == "text":
result = await browser.browser_solve_text_captcha({
"context_id": context_id
})
elif captcha_type["type"] == "image":
result = await browser.browser_solve_image_captcha({
"context_id": context_id
})
else:
result = await browser.browser_solve_captcha({
"context_id": context_id
})
if websocket_handler:
await websocket_handler.send_login_update(
service_id, "processing", "CAPTCHA solved"
)
return result"""
Service capability detection using Owl-Browser AI commands.
"""
class FeatureMapper:
"""Maps web service features using AI analysis."""
async def detect_service_capabilities(
self,
browser,
context_id,
llm_service
):
"""Detect what the service can do."""
# Take screenshot
screenshot = await browser.browser_screenshot({
"context_id": context_id
})
# Get DOM
html = await browser.browser_get_html({
"context_id": context_id
})
# Use browser_ai_analyze for capability detection
ai_analysis = await browser.browser_ai_analyze({
"context_id": context_id,
"question": """
Analyze this web interface and identify:
1. What is the primary service? (chat, image generation, search, etc.)
2. What models/AI engines are available?
3. What features are configurable? (temperature, max tokens, web browsing, etc.)
4. Where is the main input area?
5. Where do responses appear?
6. Are there file upload capabilities?
"""
})
# Use browser_query_page for element discovery
input_elements = await browser.browser_query_page({
"context_id": context_id,
"question": "Find the main input area (textarea, input field) for user prompts"
})
submit_button = await browser.browser_query_page({
"context_id": context_id,
"question": "Find the submit, send, or generate button"
})
output_area = await browser.browser_query_page({
"context_id": context_id,
"question": "Find where the AI response or output appears"
})
# Parse results
capabilities = {
"primary_operation": self._extract_primary_operation(ai_analysis),
"available_models": await self._extract_models(browser, context_id),
"features": await self._extract_features(browser, context_id),
"input_selector": input_elements.get("selector"),
"submit_selector": submit_button.get("selector"),
"output_selector": output_area.get("selector"),
"has_file_upload": await self._check_file_upload(browser, context_id)
}
return capabilities
async def _extract_models(self, browser, context_id):
"""Extract available model options."""
models = await browser.browser_query_page({
"context_id": context_id,
"question": "List all available AI models or engines shown in dropdowns or options"
})
return models.get("models", [])
async def _extract_features(self, browser, context_id):
"""Extract configurable features."""
features = await browser.browser_query_page({
"context_id": context_id,
"question": "Find all toggles, sliders, dropdowns for configuring behavior"
})
return features.get("features", [])
async def _check_file_upload(self, browser, context_id):
"""Check if service supports file uploads."""
upload_button = await browser.browser_find_element({
"context_id": context_id,
"description": "file upload, attachment, or upload button"
})
return upload_button.get("found", False)"""
Dynamic operation generation from discovered features.
"""
class OperationBuilder:
"""Builds executable operations from discovered features."""
async def build_chat_completion_operation(
self,
service_id,
features,
auth_config
):
"""Build chat completion operation from detected features."""
execution_steps = []
# Step 1: Navigate (if needed)
execution_steps.append({
"action": "navigate",
"url": auth_config.get("url", ""),
"description": "Navigate to service"
})
# Step 2: Wait for input
execution_steps.append({
"action": "wait_for_selector",
"selector": features["input_selector"],
"timeout": 5000,
"description": "Wait for input field"
})
# Step 3: Type message
execution_steps.append({
"action": "type",
"selector": features["input_selector"],
"value": "{{message}}",
"description": "Type user message"
})
# Step 4: Select model (if applicable)
if features.get("available_models"):
execution_steps.append({
"action": "select_model",
"selector": features["model_selector"],
"value": "{{model}}",
"description": "Select AI model"
})
# Step 5: Upload file (if provided)
execution_steps.append({
"action": "upload_file_if_present",
"description": "Upload file if present in request"
})
# Step 6: Click submit
execution_steps.append({
"action": "click",
"selector": features["submit_selector"],
"description": "Submit request"
})
# Step 7: Wait for response (CRITICAL: use browser_is_enabled)
execution_steps.append({
"action": "wait_for_response_complete",
"submit_selector": features["submit_selector"],
"timeout": 60000,
"description": "Wait for response to complete",
"implementation": "browser_is_enabled polling"
})
# Step 8: Extract response
execution_steps.append({
"action": "extract_response",
"selector": features["output_selector"],
"description": "Extract AI response",
"implementation": "browser_ai_extract + browser_extract_text fallback"
})
return {
"id": "chat_completion",
"name": "Chat Completion",
"description": "Send message and get AI response",
"parameters": {
"message": {
"type": "string",
"required": True,
"description": "User message to send"
},
"model": {
"type": "string",
"required": False,
"enum": features.get("available_models", []),
"description": "AI model to use"
}
},
"execution_steps": execution_steps
}"""
Generate final service configuration from discovery results.
"""
class ConfigGenerator:
"""Generates service configuration JSON."""
async def generate_service_config(
self,
service_id: str,
url: str,
auth_config: dict,
features: dict,
operations: list
):
"""Generate complete service configuration."""
config = {
"service_id": service_id,
"name": url.split("//")[1].split("/")[0],
"url": url,
"type": self._determine_service_type(features),
"auth": {
"type": auth_config["type"],
"config": auth_config
},
"capabilities": {
"primary_operation": features["primary_operation"],
"available_models": features.get("available_models", []),
"features": features.get("features", []),
"has_file_upload": features.get("has_file_upload", False)
},
"operations": operations,
"ui_selectors": {
"input": features["input_selector"],
"submit": features["submit_selector"],
"output": features["output_selector"]
},
"discovered_at": datetime.utcnow().isoformat(),
"version": "1.0"
}
return config
def _determine_service_type(self, features):
"""Determine service type from features."""
primary = features["primary_operation"].lower()
if "chat" in primary:
return "chat"
elif "image" in primary or "generation" in primary:
return "image_generation"
elif "search" in primary:
return "search"
else:
return "generic""""
Discovery pipeline coordinator with live viewport streaming.
"""
class DiscoveryOrchestrator:
"""Orchestrates the complete discovery process."""
def __init__(self):
self.auth_detector = AuthDetector()
self.feature_mapper = FeatureMapper()
self.operation_builder = OperationBuilder()
self.config_generator = ConfigGenerator()
self.live_viewport = LiveViewportManager()
async def start_discovery(
self,
service_id: str,
websocket_handler=None
):
"""Start full discovery pipeline with live viewport."""
task_id = str(uuid.uuid4())
# Start live viewport streaming
if websocket_handler:
await self.live_viewport.start_streaming(
service_id,
websocket_handler
)
try:
# Step 1: Get service from DB
service = await Service.get(service_id)
# Step 2: Acquire browser context
from web2api.concurrency.browser_pool import BrowserPool
pool = BrowserPool(...)
async with pool.acquire_service_context(service_id) as context_id:
# Step 3: Detect auth
await websocket_handler.send_discovery_update(
service_id, "scanning", 10, "Detecting authentication..."
)
auth_config = await self.auth_detector.detect_auth_mechanism(
browser, context_id, service.url
)
# Step 4: Execute login
await websocket_handler.send_discovery_update(
service_id, "scanning", 30, "Logging in..."
)
credentials = await credential_store.get_credentials(service_id)
login_success = await self.auth_detector.execute_login(
browser, context_id, auth_config, credentials, websocket_handler
)
if not login_success:
raise Exception("Login failed")
# Save session cookies
await pool.save_service_session(service_id)
# Step 5: Detect features
await websocket_handler.send_discovery_update(
service_id, "scanning", 50, "Discovering capabilities..."
)
features = await self.feature_mapper.detect_service_capabilities(
browser, context_id, llm_service
)
# Step 6: Build operations
await websocket_handler.send_discovery_update(
service_id, "scanning", 70, "Building operations..."
)
operations = []
chat_op = await self.operation_builder.build_chat_completion_operation(
service_id, features, auth_config
)
operations.append(chat_op)
# Step 7: Generate config
await websocket_handler.send_discovery_update(
service_id, "scanning", 90, "Generating configuration..."
)
config = await self.config_generator.generate_service_config(
service_id, service.url, auth_config, features, operations
)
# Step 8: Save to database
service.config = config
service.discovery_status = "complete"
service.login_status = "success"
await service.save()
await websocket_handler.send_discovery_update(
service_id, "complete", 100, "Discovery complete!"
)
# Stop live viewport
await self.live_viewport.stop_streaming(service_id)
return task_id
except Exception as e:
await websocket_handler.send_discovery_update(
service_id, "failed", 0, f"Discovery failed: {str(e)}"
)
await self.live_viewport.stop_streaming(service_id)
raise"""
Queue-based execution manager (NO rate limiting).
"""
class ExecutionQueue:
"""FIFO queue for service execution requests."""
def __init__(self):
self.queues: dict[str, asyncio.Queue] = {}
self.processors: dict[str, asyncio.Task] = {}
async def add_task(
self,
service_id: str,
operation_id: str,
parameters: dict,
websocket_handler=None
):
"""Add task to service queue (NO rate limiting)."""
if service_id not in self.queues:
self.queues[service_id] = asyncio.Queue()
# Start processor for this service
self.processors[service_id] = asyncio.create_task(
self._process_queue(service_id, websocket_handler)
)
task = {
"operation_id": operation_id,
"parameters": parameters,
"task_id": str(uuid.uuid4()),
"created_at": time.time()
}
await self.queues[service_id].put(task)
return task["task_id"]
async def _process_queue(
self,
service_id: str,
websocket_handler=None
):
"""Process tasks from queue (sequential, NO rate limiting)."""
from web2api.concurrency.browser_pool import BrowserPool
from web2api.execution.operation_runner import OperationRunner
pool = BrowserPool(...)
runner = OperationRunner()
while True:
task = await self.queues[service_id].get()
try:
# Execute operation
result = await runner.execute_operation(
service_id,
task["operation_id"],
task["parameters"],
websocket_handler
)
# Send result
await websocket_handler.send_execution_log(
service_id, "info", f"Task {task['task_id']} completed"
)
except Exception as e:
await websocket_handler.send_execution_log(
service_id, "error", f"Task failed: {str(e)}"
)
self.queues[service_id].task_done()"""
Live viewport streaming for TOWER frontend.
"""
class LiveViewportManager:
"""Manages live viewport streaming during discovery."""
def __init__(self):
self.active_streams: dict[str, str] = {} # service_id → stream_id
async def start_streaming(
self,
service_id: str,
websocket_handler,
browser,
context_id
):
"""Start live viewport stream using Owl-Browser."""
# Start live stream
stream_info = await browser.start_live_stream({
"context_id": context_id,
"quality": "medium",
"fps": 10
})
self.active_streams[service_id] = stream_info["stream_id"]
# Start background task to send frames
asyncio.create_task(
self._stream_frames(
service_id,
stream_info["stream_id"],
websocket_handler,
browser
)
)
async def _stream_frames(
self,
service_id: str,
stream_id: str,
websocket_handler,
browser
):
"""Stream frames to WebSocket."""
while service_id in self.active_streams:
try:
# Get current frame
frame = await browser.get_live_frame({
"stream_id": stream_id
})
# Send to frontend
await websocket_handler.send_live_viewport_frame(
service_id,
frame["data"]
)
# Small delay to control FPS
await asyncio.sleep(0.1)
except Exception as e:
logger.error(f"Frame streaming error: {e}")
break
async def stop_streaming(self, service_id: str, browser):
"""Stop live viewport stream."""
if service_id in self.active_streams:
stream_id = self.active_streams[service_id]
await browser.stop_live_stream({
"stream_id": stream_id
})
del self.active_streams[service_id]"""
Video recording for discovery process playback.
"""
class VideoRecorder:
"""Records discovery process for later review."""
async def start_recording(self, service_id: str, browser, context_id):
"""Start video recording."""
recording = await browser.start_video_recording({
"context_id": context_id
})
return recording["recording_id"]
async def stop_and_save(
self,
service_id: str,
recording_id: str,
browser
):
"""Stop recording and save to storage."""
await browser.stop_video_recording({
"recording_id": recording_id
})
# Download video
video_data = await browser.download_video_recording({
"recording_id": recording_id
})
# Save to artifact storage
from web2api.storage.artifact_manager import ArtifactManager
artifact_manager = ArtifactManager()
artifact_path = await artifact_manager.save_video(
service_id,
video_data["data"],
filename=f"discovery_{service_id}_{int(time.time())}.mp4"
)
return artifact_path"""
Session persistence using Owl-Browser cookie commands.
"""
class SessionManager:
"""Manages browser session persistence."""
async def save_session(
self,
service_id: str,
browser,
context_id,
storage
):
"""Save session cookies for reuse."""
cookies = await browser.browser_get_cookies({
"context_id": context_id
})
await storage.save_session_cookies(service_id, cookies)
async def restore_session(
self,
service_id: str,
browser,
context_id,
storage
):
"""Restore saved session."""
cookies = await storage.get_session_cookies(service_id)
if cookies:
await browser.browser_set_cookie({
"context_id": context_id,
"cookies": cookies
})
async def is_session_valid(
self,
service_id: str,
browser,
context_id
):
"""Check if saved session is still valid."""
try:
# Try to navigate to service
await browser.browser_navigate({
"context_id": context_id,
"url": service_url
})
# Check if we're redirected to login
current_url = await browser.browser_get_current_url({
"context_id": context_id
})
if "login" in current_url.lower():
return False
return True
except:
return False"""
Encrypted credential storage.
"""
from cryptography.fernet import Fernet
class CredentialStore:
"""Encrypts and stores service credentials."""
def __init__(self, encryption_key: bytes):
self.cipher = Fernet(encryption_key)
async def store_credentials(
self,
service_id: str,
email: str,
password: str
):
"""Encrypt and store credentials."""
encrypted_email = self.cipher.encrypt(email.encode())
encrypted_password = self.cipher.encrypt(password.encode())
credential = ServiceCredential(
service_id=service_id,
auth_type="form_login",
encrypted_email=encrypted_email.decode(),
encrypted_password=encrypted_password.decode()
)
await credential.save()
async def get_credentials(
self,
service_id: str
):
"""Retrieve and decrypt credentials."""
credential = await ServiceCredential.get(service_id)
email = self.cipher.decrypt(
credential.encrypted_email.encode()
).decode()
password = self.cipher.decrypt(
credential.encrypted_password.encode()
).decode()
return {"email": email, "password": password}---
## 3. Architecture Transformation Strategy
### 3.1 Directory Structure Transformation
web2api/ web2api/ ├── api/ ├── api/ │ └── main.py │ ├── main.py (MODIFY) │ │ ├── service_manager.py (NEW) │ │ ├── openai_compat.py (NEW) │ │ └── websocket_handler.py (NEW) ├── builder/ ├── discovery/ (RENAME from builder) │ ├── analyzer/ │ ├── auth_detector.py (NEW) │ │ ├── page_analyzer.py │ ├── feature_mapper.py (ADAPT from page_analyzer.py) │ │ ├── visual_analyzer.py │ ├── operation_builder.py (NEW) │ │ └── element_classifier.py│ ├── config_generator.py (NEW) │ ├── discovery/ │ └── orchestrator.py (NEW) │ │ ├── form_analyzer.py │ │ │ ├── api_detector.py │ ├── auth/ (NEW directory) │ │ └── flow_detector.py │ ├── form_filler.py (ADAPT from form_analyzer.py) │ └── crawler/ │ ├── session_manager.py (NEW) │ │ └── credential_store.py (NEW) ├── concurrency/ ├── concurrency/ │ └── browser_pool.py │ └── browser_pool.py (ENHANCE for service isolation) ├── llm/ ├── llm/ │ └── service.py │ └── service.py (EXTEND with Web2API prompts) ├── runner/ ├── execution/ (RENAME from runner) │ ├── test_runner.py │ ├── operation_runner.py (ADAPT from test_runner.py) │ └── self_healing.py │ ├── queue_manager.py (NEW) │ │ ├── streaming.py (NEW) │ │ ├── live_viewport.py (NEW) │ │ └── video_recorder.py (NEW) │ │ └── self_healing.py (KEEP) ├── storage/ ├── storage/ │ ├── database.py │ └── database.py (EXTEND with service tables) │ └── artifact_manager.py │ └── artifact_manager.py (KEEP)
### 3.2 Code Reuse Summary
|-----------|-------------|--------------|---------|
| Browser Pooling | `concurrency/browser_pool.py` | `concurrency/browser_pool.py` | 95% |
| Page Analysis | `builder/analyzer/page_analyzer.py` | `discovery/feature_mapper.py` | 85% |
| Visual Analysis | `builder/analyzer/visual_analyzer.py` | `discovery/visual_analyzer.py` | 100% |
| Form Analysis | `builder/discovery/form_analyzer.py` | `auth/form_analyzer.py` | 90% |
| LLM Integration | `llm/service.py` | `llm/service.py` | 90% |
| Test Execution | `runner/test_runner.py` | `execution/operation_runner.py` | 60% |
| Error Recovery | `runner/self_healing.py` | `execution/self_healing.py` | 100% |
| Database | `storage/database.py` | `storage/database.py` | 80% |
| API Framework | `api/main.py` | `api/main.py` | 40% |
**Overall Reuse: ~70%**
---
## 4. Discovery Pipeline Implementation
### 4.1 Complete Discovery Flow with Owl-Browser Commands
```python
"""
Complete auto-discovery pipeline using specific Owl-Browser commands.
"""
async def discover_service(service_id: str, url: str, credentials: dict):
"""Full discovery with Owl-Browser command integration."""
# 1. Acquire browser context (from pool)
context_id = await browser_pool.acquire_service_context(service_id)
# 2. Navigate to service
await browser.browser_navigate({
"context_id": context_id,
"url": url
})
# 3. Detect auth mechanism
auth_detector = AuthDetector()
auth_config = await auth_detector.detect_auth_mechanism(
browser, context_id, url
)
# Uses: browser_find_element
# 4. Execute login with CAPTCHA handling
login_success = await auth_detector.execute_login(
browser, context_id, auth_config, credentials, websocket
)
# Uses: browser_type, browser_click, browser_detect_captcha,
# browser_classify_captcha, browser_solve_captcha,
# browser_wait_for_navigation
if login_success:
# 5. Save session cookies
cookies = await browser.browser_get_cookies({
"context_id": context_id
})
await storage.save_session(service_id, cookies)
# Uses: browser_get_cookies
# 6. Start live viewport streaming
await browser.start_live_stream({
"context_id": context_id,
"quality": "medium"
})
# Uses: start_live_stream
# 7. Detect service capabilities
feature_mapper = FeatureMapper()
features = await feature_mapper.detect_service_capabilities(
browser, context_id, llm_service
)
# Uses: browser_screenshot, browser_ai_analyze, browser_query_page,
# browser_get_html, browser_find_element
# 8. Build operations
operation_builder = OperationBuilder()
operations = []
chat_op = await operation_builder.build_chat_completion_operation(
service_id, features, auth_config
)
operations.append(chat_op)
# 9. Generate config
config_generator = ConfigGenerator()
config = await config_generator.generate_service_config(
service_id, url, auth_config, features, operations
)
# 10. Save to database
service = await Service.get(service_id)
service.config = config
service.discovery_status = "complete"
await service.save()
# 11. Stop live viewport
await browser.stop_live_stream({
"context_id": context_id
})
# Uses: stop_live_stream
# 12. Release context
await browser_pool.release_service_context(service_id)
return config
"""
Execute chat completion with browser_is_enabled for response detection.
"""
async def execute_chat_completion(
service_id: str,
message: str,
model: str = None
):
"""Execute chat completion using discovered configuration."""
# Get service config
service = await Service.get(service_id)
config = service.config
# Acquire context
context_id = await browser_pool.acquire_service_context(service_id)
# Restore session
await browser.browser_set_cookie({
"context_id": context_id,
"cookies": await storage.get_session(service_id)
})
# Uses: browser_set_cookie
# Navigate to service
await browser.browser_navigate({
"context_id": context_id,
"url": config["url"]
})
# Wait for input field
await browser.browser_wait_for_selector({
"context_id": context_id,
"selector": config["ui_selectors"]["input"],
"timeout": 5000
})
# Type message
await browser.browser_type({
"context_id": context_id,
"selector": config["ui_selectors"]["input"],
"text": message
})
# Select model if needed
if model:
await browser.browser_select_option({
"context_id": context_id,
"selector": config["ui_selectors"]["model"],
"value": model
})
# Upload file if present
if file_path:
await browser.browser_upload_file({
"context_id": context_id,
"files": [file_path]
})
# Uses: browser_upload_file
# Click submit
await browser.browser_click({
"context_id": context_id,
"selector": config["ui_selectors"]["submit"]
})
# CRITICAL: Wait for response using browser_is_enabled
# Most chat interfaces disable "Send" button while generating
submit_selector = config["ui_selectors"]["submit"]
while True:
is_enabled = await browser.browser_is_enabled({
"context_id": context_id,
"selector": submit_selector
})
# Uses: browser_is_enabled (THE KEY COMMAND!)
if is_enabled.get("enabled"):
break # Response complete!
await asyncio.sleep(0.5)
# Extract response
response = await browser.browser_ai_extract({
"context_id": context_id,
"selector": config["ui_selectors"]["output"],
"prompt": "Extract the AI assistant's response text"
})
# Uses: browser_ai_extract
# Fallback to text extraction
if not response.get("content"):
response = await browser.browser_extract_text({
"context_id": context_id,
"selector": config["ui_selectors"]["output"]
})
# Uses: browser_extract_text
# Release context
await browser_pool.release_service_context(service_id)
return response.get("content", "")"""
Live viewport streaming using Owl-Browser video commands.
"""
class LiveViewportStreamer:
"""Stream live viewport to TOWER frontend."""
async def start_streaming(
self,
service_id: str,
websocket: WebSocket,
browser,
context_id
):
"""Start live viewport stream."""
# Start live stream
stream = await browser.start_live_stream({
"context_id": context_id,
"quality": "medium",
"fps": 10
})
stream_id = stream["stream_id"]
# Stream frames in background
asyncio.create_task(
self._stream_loop(service_id, stream_id, websocket, browser)
)
async def _stream_loop(
self,
service_id: str,
stream_id: str,
websocket: WebSocket,
browser
):
"""Continuously send frames to frontend."""
while True:
try:
# Get current frame
frame = await browser.get_live_frame({
"stream_id": stream_id
})
# Send to WebSocket
await websocket.send_bytes(frame["data"])
await asyncio.sleep(0.1) # 10 FPS
except Exception as e:
logger.error(f"Stream error: {e}")
break
async def stop_streaming(self, stream_id: str, browser):
"""Stop live stream."""
await browser.stop_live_stream({
"stream_id": stream_id
})-- Services table
CREATE TABLE services (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
url TEXT NOT NULL,
type VARCHAR(50) DEFAULT 'generic',
status VARCHAR(50) DEFAULT 'offline',
login_status VARCHAR(50) DEFAULT 'pending',
discovery_status VARCHAR(50) DEFAULT 'pending',
config JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Credentials table (encrypted)
CREATE TABLE service_credentials (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_id UUID REFERENCES services(id) ON DELETE CASCADE,
auth_type VARCHAR(50),
encrypted_email TEXT,
encrypted_password TEXT,
encrypted_api_key TEXT,
session_cookie JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
-- Executions table
CREATE TABLE executions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_id UUID REFERENCES services(id) ON DELETE CASCADE,
operation_id VARCHAR(100),
parameters JSONB,
result TEXT,
status VARCHAR(50),
started_at TIMESTAMP DEFAULT NOW(),
completed_at TIMESTAMP,
error TEXT
);
-- Service stats table
CREATE TABLE service_stats (
service_id UUID PRIMARY KEY REFERENCES services(id) ON DELETE CASCADE,
total_requests INT DEFAULT 0,
successful_requests INT DEFAULT 0,
failed_requests INT DEFAULT 0,
avg_latency_ms INT DEFAULT 0,
last_request_at TIMESTAMP,
uptime_start TIMESTAMP DEFAULT NOW()
);
-- CREATE INDEXES
CREATE INDEX idx_services_status ON services(status);
CREATE INDEX idx_executions_service_id ON executions(service_id);
CREATE INDEX idx_executions_status ON executions(status);Goal: Basic database and API structure Tasks:
- Extend database schema with service tables
- Set up credential encryption
- Create basic FastAPI app structure
- Implement health check endpoint Files Modified:
storage/database.py- Add service tablesapi/main.py- Basic structure
Goal: CRUD operations for services Tasks:
- Implement service registration endpoint
- Create credential store
- Implement WebSocket handler
- Test service creation flow Files Created:
api/service_manager.pyauth/credential_store.pyapi/websocket_handler.py
Goal: Automated login with CAPTCHA handling Tasks:
- Implement auth detection
- Create form filler with CAPTCHA solving
- Implement session persistence
- Test login on k2think.ai Files Created:
discovery/auth_detector.pyauth/form_filler.py(adapted from form_analyzer.py)auth/session_manager.pyOwl-Browser Commands Integrated:browser_find_elementbrowser_detect_captchabrowser_classify_captchabrowser_solve_captchabrowser_get_cookiesbrowser_set_cookie
Goal: Detect service capabilities automatically Tasks:
- Implement AI-powered feature detection
- Create model/options detection
- Build operation generator
- Test feature extraction Files Created:
discovery/feature_mapper.py(adapted from page_analyzer.py)discovery/operation_builder.pydiscovery/config_generator.pyOwl-Browser Commands Integrated:browser_ai_analyzebrowser_query_pagebrowser_screenshotbrowser_get_htmlbrowser_extract_text
Goal: Live streaming during discovery Tasks:
- Implement live viewport streaming
- Create video recording
- Integrate with WebSocket
- Test streaming performance Files Created:
execution/live_viewport.pyexecution/video_recorder.pyOwl-Browser Commands Integrated:start_live_streamstop_live_streamget_live_framestart_video_recordingstop_video_recordingdownload_video_recording
Goal: Complete discovery pipeline Tasks:
- Implement discovery orchestrator
- Connect all discovery components
- Add progress tracking
- Test full discovery flow Files Created:
discovery/orchestrator.py
Goal: Execute discovered operations Tasks:
- Implement operation runner
- Add response detection with
browser_is_enabled - Implement queue manager
- Add file upload support Files Created:
execution/operation_runner.py(adapted from test_runner.py)execution/queue_manager.pyOwl-Browser Commands Integrated:browser_is_enabled(CRITICAL for response detection)browser_ai_extractbrowser_extract_textbrowser_upload_file
Goal: OpenAI-compatible endpoints Tasks:
- Implement
/v1/chat/completions - Implement
/v1/models - Add SSE streaming support
- Test with OpenAI client libraries Files Created:
api/openai_compat.py
Goal: Parallel service execution Tasks:
- Enhance browser pool with tab management
- Implement service isolation
- Test concurrent service execution Files Modified:
concurrency/browser_pool.pyOwl-Browser Commands Integrated:new_tabswitch_tabclose_tabget_tabsget_active_tab
Goal: End-to-end testing and polish Tasks:
- Test with k2think.ai
- Test with multiple services
- Performance optimization
- Error handling improvements
✅ Service Registration: Register k2think.ai via API
✅ Auto-Login: Login without manual intervention
✅ CAPTCHA Handling: Detect and solve CAPTCHAs automatically
✅ Feature Discovery: Detect input, submit, output areas automatically
✅ Live Viewport: Stream discovery process to frontend
✅ Response Detection: Use browser_is_enabled to detect response completion
✅ OpenAI API: /v1/chat/completions returns valid response
✅ Queue Execution: Multiple requests queue properly (NO rate limiting)
✅ Session Persistence: Save and restore sessions automatically
✅ Multi-Service: Run multiple services in parallel using tabs
# Register service
curl -X POST http://localhost:8000/api/services \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.k2think.ai",
"email": "test@example.com",
"password": "password123"
}'
# Trigger discovery
curl -X POST http://localhost:8000/api/services/{service_id}/discover
# Test OpenAI API
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "{service_id}",
"messages": [{"role": "user", "content": "Hello!"}]
}'
# Expected response
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "{service_id}",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}The "Secret Sauce" Commands:
browser_is_enabled- THE most critical command for chat interfaces- Detect when "Send" button re-enables after response generation
- Eliminates need for arbitrary timeouts
- Works across all chat-based services
browser_ai_extract+browser_extract_text- Robust response extraction- AI-powered extraction understands context
- Text extraction as fallback
- Works even when DOM changes
start_live_stream- Live viewport for user trust- User can watch discovery happen
- Debug failures easily
- Transparent operation
browser_detect_captcha+browser_solve_captcha- Auth automation- Handle login friction automatically
- No manual intervention needed
- Tab Management Commands - Multi-service isolation
- Run multiple services simultaneously
- True parallelization
- Clean separation
Why Modify Instead of Create New?
- 70% code reuse from AutoQA
- Browser pooling already perfect
- Page analysis already sophisticated
- Form detection already robust
- Just need to pivot from "testing" to "service discovery" Why Queue-Based (No Rate Limiting)?
- Services handle their own rate limits
- Web2API just passes requests through
- FIFO is sufficient
- Simpler architecture Why OpenAI Compatibility?
- Standard API format
- Works with all LLM frameworks
- Easy integration
- Largest ecosystem
Ready for Implementation!
This document provides:
✅ Complete Owl-Browser command mapping (157 commands)
✅ File-by-file transformation guide (all 61 AutoQA files)
✅ In-place modification strategy (no new module)
✅ Discovery pipeline with specific command sequences
✅ Execution engine with response detection
✅ Live viewport streaming implementation
✅ Database schema extensions
✅ 10-phase implementation plan (22 days)
✅ Clear success criteria with test commands
Key Innovation: Using browser_is_enabled to detect when chat responses are complete - this is the breakthrough insight that makes Web2API possible!
Next Step: Begin Phase 1 - Infrastructure Setup
Web2API Project Requirements
Web2API is a universal API proxy system that transforms any web service (with URL + credentials) into an OpenAI-compatible API endpoint. By intelligently discovering service features, forms, and APIs through browser automation, Web2API creates a unified interface for interacting with diverse web applications, enabling seamless integration with AI agents, LLMs, and automation tools. Web2API transforms any web service into an OpenAI-compatible API endpoint using intelligent browser automation and auto-discovery.
Core Flow: URL + Credentials → Auto-Discovery → OpenAI API Endpoint
Input: URL + Credentials (username/password, API key, OAuth)
Output: OpenAI-compatible API endpoint (/v1/chat/completions, /v1/models, etc.)
Key Innovation: Zero manual configuration - the system discovers service capabilities, authentication flows, and operations automatically through intelligent analysis.
Key Features:
- 🔍 Auto-Discovery: Automatically maps web service capabilities to API endpoints
- 🔐 Credential Management: Secure handling of authentication (forms, OAuth2, API keys)
- ⚙️ Configuration Layer: Toggle features, configure models, set rate limits
- 🤖 OpenAI Compatibility: Standard API format for LLM/AI agent integration
- 🌐 Anti-Detection: Leverages Owl Browser for undetectable automation
┌─────────────────────────────────────────────────────────────┐
│ TOWER Frontend (React) │
│ - Service Management UI │
│ - Real-time Status Updates │
│ - Configuration Editor │
└────────────────┬────────────────────────────────────────────┘
│ HTTP/REST + WebSocket
┌────────────────▼────────────────────────────────────────────┐
│ Web2API Server (FastAPI) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ API Layer │ │
│ │ - OpenAI-compatible endpoints (/v1/chat/completions)│ │
│ │ - Service management (CRUD) │ │
│ │ - WebSocket streaming │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Discovery Engine │ │
│ │ - Authentication detection │ │
│ │ - Feature extraction (vision + DOM) │ │
│ │ - Operation mapping │ │
│ │ - Configuration generation │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Execution Engine │ │
│ │ - Queue-based task processing │ │
│ │ - Browser session management │ │
│ │ - Error recovery & self-healing │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ - Service configurations (PostgreSQL) │ │
│ │ - Encrypted credentials │ │
│ │ - Execution artifacts │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────┬────────────────────────────────────────────┘
│ Owl-Browser Python SDK
┌────────────────▼────────────────────────────────────────────┐
│ Owl-Browser Pool │
│ - Stealth browser automation │
│ - Anti-detection measures │
│ - Session pooling & reuse │
│ - Natural language selectors │
└────────────────┬────────────────────────────────────────────┘
│ HTTPS
┌────────────────▼────────────────────────────────────────────┐
│ Target Web Services │
│ - ChatGPT, Claude, Z.ai, K2Think, etc. │
└─────────────────────────────────────────────────────────────┘
Backend:
- FastAPI (async Python web framework)
- Owl-Browser Python SDK (stealth automation)
- PostgreSQL (service configurations & state)
- Redis (optional: queue & caching)
- SQLAlchemy (ORM)
- Pydantic (data validation)
- Cryptography (credential encryption)
- LiteLLM (LLM orchestration)
Automation:
- Owl-Browser (anti-detection browser automation)
- Vision models (GPT-4V, Claude 3.5 Sonnet) for UI analysis
- DOM analysis & element classification
- Network traffic inspection
Frontend:
- React + TypeScript
- TailwindCSS
- WebSocket client for real-time updates
- Stealth Automation: Anti-bot detection, browser fingerprinting, human-like interactions
- Natural Language Selectors:
click("Login button"),type("email field", "user@example.com") - AI-Powered Element Finding: Vision models identify UI elements semantically
- Session Management: Browser profile persistence, cookie management
- Multi-mode Operation: Local, Remote (WebSocket), MCP server
from owl_browser import OwlBrowser
# AutoQA's browser_pool.py already uses this!
browser = OwlBrowser(headless=True, stealth=True)
await browser.navigate("https://example.com")
await browser.click("Login button")
await browser.type("email input", "user@example.com")
await browser.type("password input", "secret123")
await browser.click("Submit")
result = await browser.extract("response text")
await browser.close()- ✅ Already integrated in web2api/concurrency/browser_pool.py
- ✅ Session pooling implemented
- ✅ Anti-detection active
- ✅ Natural language selectors ready
- Need: Enhanced credential injection, auth flow detection
| Module | File | Functionality | Reusable for Web2API |
|---|---|---|---|
| Browser Pool | concurrency/browser_pool.py |
Owl-browser session pooling | ✅ 100% - Already perfect |
| Page Analyzer | builder/analyzer/page_analyzer.py |
DOM structure + visual analysis | ✅ 90% - Extract features |
| Form Analyzer | builder/discovery/form_analyzer.py |
Detect input fields, buttons | ✅ 95% - Auth detection |
| API Detector | builder/discovery/api_detector.py |
Network pattern recognition | ✅ 80% - Endpoint discovery |
| Intelligent Crawler | builder/crawler/intelligent_crawler.py |
State-based navigation | ✅ 70% - Multi-step flows |
| Element Classifier | builder/analyzer/element_classifier.py |
ML-based element classification | ✅ 85% - UI understanding |
| Visual Analyzer | builder/analyzer/visual_analyzer.py |
Vision model integration | ✅ 100% - Feature detection |
| LLM Service | llm/service.py |
Prompt orchestration | ✅ 90% - Analysis prompts |
| Test Runner | runner/test_runner.py |
Execution orchestration | ✅ 60% - Operation executor |
| Self-Healing | runner/self_healing.py |
Selector recovery | ✅ 100% - Error recovery |
| Database | storage/database.py |
PostgreSQL ORM | ✅ 80% - Service storage |
| Artifact Manager | storage/artifact_manager.py |
Screenshot/video storage | ✅ 70% - Execution logs |
assertions/- Test-specific, not needed for API proxydsl/models.py- Test DSL, replaced with service config schemaci/generator.py- CI/CD integration, not neededgenerative/chaos_agents.py- Chaos testing, not neededvisual/regression_engine.py- Visual testing, not neededversioning/- Code versioning, not needed
What AutoQA Has:
✅ Browser automation with Owl-Browser
✅ Page & form analysis
✅ Element discovery
✅ LLM integration
✅ Database storage
✅ Error recovery
What Web2API Needs: ❌ OpenAI-compatible API layer ❌ Service registration & lifecycle management ❌ Auto-discovery → configuration pipeline ❌ Credential encryption & auth handling ❌ Queue-based execution (no rate limiting) ❌ WebSocket streaming for frontend ❌ Service-specific feature detection ❌ Dynamic operation generation
│ ├── main.py # FastAPI app (modify from autoqa)
│ ├── openai_compat.py # NEW: OpenAI API endpoints
│ ├── service_manager.py # NEW: Service CRUD
│ └── websocket_handler.py # NEW: Real-time updates
│ ├── auth_detector.py # NEW: Auth mechanism detection
│ ├── feature_mapper.py # NEW: Service capability extraction
│ ├── operation_builder.py # NEW: Dynamic operation creation
│ └── config_generator.py # NEW: Config synthesis
│ ├── queue_manager.py # NEW: FIFO task queue
│ ├── operation_runner.py # ADAPT: From test_runner
│ └── streaming.py # NEW: WebSocket log streaming
│ ├── credential_store.py # NEW: Encrypted storage
│ ├── session_manager.py # NEW: Browser session + cookies
│ └── form_filler.py # ADAPT: From form_analyzer
│ ├── service.py # NEW: Service data model
│ ├── operation.py # NEW: Operation schema
│ └── openai_schemas.py # NEW: OpenAI request/response
└── database.py # MODIFY: Add service tables
Service Data Model:
interface Service {
id: string;
name: string;
type: 'zai' | 'chatgpt' | 'claude' | 'k2think' | 'generic';
url: string;
status: 'online' | 'offline' | 'maintenance' | 'analyzing';
loginStatus: 'pending' | 'processing' | 'success' | 'failed';
discoveryStatus: 'pending' | 'scanning' | 'complete';
config: ServiceConfig;
availableModels?: string[];
stats: {
uptime: string;
requests24h: number;
avgLatency: number;
};
}Required REST Endpoints:
POST /api/services
Body: { url: string, email: string, password: string }
Response: Service (with loginStatus: 'pending', discoveryStatus: 'pending')
GET /api/services
Response: Service[]
GET /api/services/:id
Response: Service
PUT /api/services/:id
Body: { config: ServiceConfig }
Response: Service
DELETE /api/services/:id
Response: { success: boolean }
POST /api/services/:id/discover
Triggers background discovery process
Response: { message: string, taskId: string }
POST /api/services/:id/execute
Body: { message: string, model?: string }
Response: { response: string, model: string }
WebSocket Endpoint:
WS /ws/services/:id
Events:
- login_update: { loginStatus: string, message: string }
- discovery_update: { discoveryStatus: string, progress: number, message: string }
- execution_log: { timestamp: number, level: string, message: string }
- status_change: { status: string }
POST /v1/chat/completions
Body: {
model: string, // Service ID or "service-id:model-name"
messages: Array<{role: string, content: string}>,
stream?: boolean,
temperature?: number,
max_tokens?: number
}
Response: {
id: string,
object: "chat.completion",
created: number,
model: string,
choices: [{
index: 0,
message: {role: "assistant", content: string},
finish_reason: "stop"
}],
usage: {prompt_tokens: number, completion_tokens: number, total_tokens: number}
}
GET /v1/models
Response: {
object: "list",
data: [{
id: string, // Service ID
object: "model",
created: number,
owned_by: "web2api"
}]
}
Service Registration
↓
1. Authentication Discovery
- Detect login form fields (email, password, 2FA)
- Identify OAuth flow if present
- Check for cookie-based auth
- Store auth mechanism
↓
2. Login Execution
- Auto-fill credentials
- Submit form
- Handle 2FA (manual intervention if needed)
- Verify successful login
- Save session cookies
↓
3. Feature Detection (Vision + DOM)
- Screenshot main interface
- LLM vision analysis: "What capabilities does this service offer?"
- DOM analysis: buttons, dropdowns, toggles
- Identify:
* Model selectors
* Feature toggles (web browsing, image generation)
* Input areas (chat, prompt)
* Output areas (response, result)
↓
4. Operation Mapping
- Detect primary operation (e.g., "Send Message")
- Map input fields → parameters
- Map UI controls → optional parameters
- Generate selector strategy (natural language)
↓
5. Configuration Generation
- Create service config JSON
- Define operations with steps
- Set up parameter schema
- Store in database
↓
Service Ready (discoveryStatus: 'complete')
Auth Detection (auth_detector.py):
async def detect_auth_mechanism(browser, url):
await browser.navigate(url)
# Check for login form
has_email = await browser.find("email input")
has_password = await browser.find("password input")
if has_email and has_password:
return {
"type": "form_login",
"fields": {"email": email_selector, "password": password_selector}
}
# Check for OAuth
has_oauth = await browser.find("Sign in with Google")
if has_oauth:
return {"type": "oauth", "provider": "google"}
# Check for API key
has_api_key = await browser.find("API key input")
if has_api_key:
return {"type": "api_key"}
return {"type": "unknown"}Feature Detection (feature_mapper.py):
async def detect_features(browser, llm_service):
screenshot = await browser.screenshot()
dom = await browser.extract_dom()
# Vision analysis
vision_prompt = """
Analyze this web interface screenshot.
Identify:
1. What is the primary purpose? (chat, image generation, etc.)
2. What models/options are available?
3. What configurable features exist? (toggles, dropdowns)
4. What is the main input method?
5. Where does the output appear?
"""
analysis = await llm_service.analyze_vision(screenshot, vision_prompt)
# DOM analysis for selectors
buttons = await browser.find_all("button")
inputs = await browser.find_all("input, textarea")
selects = await browser.find_all("select")
return {
"primary_operation": analysis.primary_purpose,
"available_models": analysis.models,
"features": analysis.features,
"input_selector": find_best_input(inputs),
"submit_selector": find_submit_button(buttons),
"output_selector": find_output_area(dom)
}Configuration Generation (config_generator.py):
async def generate_config(service_info, features):
return {
"service_id": service_info.id,
"name": service_info.name,
"url": service_info.url,
"auth": {
"type": service_info.auth_type,
"session_cookie": service_info.session_cookie
},
"operations": [{
"id": "chat_completion",
"name": "Send Message",
"parameters": {
"message": {"type": "string", "required": True},
"model": {"type": "string", "enum": features.available_models, "optional": True}
},
"execution_steps": [
{"action": "navigate", "url": service_info.url},
{"action": "wait_for", "selector": features.input_selector},
{"action": "type", "selector": features.input_selector, "value": "{{message}}"},
{"action": "select_model", "selector": features.model_selector, "value": "{{model}}"},
{"action": "click", "selector": features.submit_selector},
{"action": "wait_for_response", "selector": features.output_selector, "timeout": 60},
{"action": "extract", "selector": features.output_selector}
]
}],
"discovered_at": datetime.utcnow()
}Queue Manager (queue_manager.py):
from asyncio import Queue
from typing import Dict
class ExecutionQueue:
def __init__(self):
self.queues: Dict[str, Queue] = {} # service_id → queue
async def add_task(self, service_id: str, task: dict):
if service_id not in self.queues:
self.queues[service_id] = Queue()
await self.queues[service_id].put(task)
async def process_queue(self, service_id: str, browser_pool):
queue = self.queues.get(service_id)
if not queue:
return
while True:
task = await queue.get()
try:
browser = await browser_pool.acquire(service_id)
result = await execute_operation(browser, task)
await send_result(task.client_id, result)
finally:
await browser_pool.release(browser)
queue.task_done()Operation Runner (operation_runner.py):
async def execute_operation(browser, service_config, operation_id, parameters, ws_client):
operation = service_config.get_operation(operation_id)
for step in operation.execution_steps:
await ws_client.send_log(f"Executing: {step.action}")
if step.action == "navigate":
await browser.navigate(step.url)
elif step.action == "type":
value = parameters.get(step.value.strip("{{}}"))
await browser.type(step.selector, value)
elif step.action == "click":
await browser.click(step.selector)
elif step.action == "wait_for_response":
await browser.wait_for(step.selector, timeout=step.timeout)
elif step.action == "extract":
result = await browser.extract(step.selector)
return result-- Services table
CREATE TABLE services (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
url TEXT NOT NULL,
type VARCHAR(50) DEFAULT 'generic',
status VARCHAR(50) DEFAULT 'offline',
login_status VARCHAR(50) DEFAULT 'pending',
discovery_status VARCHAR(50) DEFAULT 'pending',
config JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Credentials table (encrypted)
CREATE TABLE service_credentials (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_id UUID REFERENCES services(id) ON DELETE CASCADE,
auth_type VARCHAR(50), -- form_login, oauth, api_key
encrypted_email TEXT,
encrypted_password TEXT,
encrypted_api_key TEXT,
session_cookie TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Execution history
CREATE TABLE executions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_id UUID REFERENCES services(id) ON DELETE CASCADE,
operation_id VARCHAR(100),
parameters JSONB,
result TEXT,
status VARCHAR(50), -- pending, running, success, failed
started_at TIMESTAMP DEFAULT NOW(),
completed_at TIMESTAMP,
error TEXT
);
-- Service stats (aggregated)
CREATE TABLE service_stats (
service_id UUID PRIMARY KEY REFERENCES services(id) ON DELETE CASCADE,
total_requests INT DEFAULT 0,
successful_requests INT DEFAULT 0,
failed_requests INT DEFAULT 0,
avg_latency_ms INT DEFAULT 0,
last_request_at TIMESTAMP,
uptime_start TIMESTAMP DEFAULT NOW()
);Goal: Basic FastAPI server with database
Files to Create:
/__init__.py- Package initservice.py- Service data modelopenai_schemas.py- OpenAI request/response schemasdatabase.py- SQLAlchemy models + DB connectionmain.py- FastAPI app with CORS, middleware
Tasks:
- Set up PostgreSQL database
- Create tables with migrations
- Basic FastAPI server running on port 8000
- Health check endpoint
/health
Goal: CRUD operations for services
Files to Create:
service_manager.py- Service CRUD routescredential_store.py- Encrypted credential storagewebsocket_handler.py- WebSocket connection manager
Endpoints to Implement:
POST /api/services- Register serviceGET /api/services- List all servicesGET /api/services/:id- Get single servicePUT /api/services/:id- Update service configDELETE /api/services/:id- Delete serviceWS /ws/services/:id- WebSocket for updates
Test: Register k2think.ai service, verify database storage
Goal: Automated service discovery
Files to Create:
auth_detector.py- Auth mechanism detectionfeature_mapper.py- UI feature extractionoperation_builder.py- Operation definition creationconfig_generator.py- Config synthesisorchestrator.py- Discovery pipeline coordinator
Reuse from AutoQA:
web2api/builder/analyzer/page_analyzer.pyweb2api/builder/analyzer/visual_analyzer.pyweb2api/builder/discovery/form_analyzer.pyweb2api/llm/service.py
Tasks:
- Implement auth detection (form login focus)
- Login execution with credential injection
- Vision-based feature detection
- Operation mapping with natural language selectors
- Config generation and storage
Test: Discover k2think.ai capabilities automatically
Goal: Execute discovered operations
Files to Create:
queue_manager.py- FIFO task queueoperation_runner.py- Operation execution logicstreaming.py- WebSocket log streaming
Reuse from AutoQA:
web2api/concurrency/browser_pool.py- Owl-browser poolingweb2api/runner/self_healing.py- Selector recovery
Tasks:
- Queue-based task processing (no rate limiting)
- Browser session acquisition from pool
- Step-by-step execution with logging
- Real-time log streaming via WebSocket
- Error recovery with self-healing selectors
Test: Execute chat completion on k2think.ai Goal: OpenAI-compatible endpoints
Files to Create:
web2api/api/openai_compat.py- OpenAI API routes
Endpoints to Implement:
POST /v1/chat/completions- Main execution endpointGET /v1/models- List services as models- Streaming support with SSE
Tasks:
- Map OpenAI request → service execution
- Format response as OpenAI completion
- Handle streaming responses
- Token counting (estimated)
Test:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "k2think",
"messages": [{"role": "user", "content": "Hello, how are you?"}]
}'Goal: End-to-end working system
Test Credentials:
- URL: https://www.k2think.ai
- Email: developer@pixelium.uk
- Password: developer123?Collapse commentComment on line R687cubic-dev-ai[bot] commented on Jan 20, 2026 cubic-dev-ai[bot]on Jan 20, 2026More actions P1: Remove plaintext credentials from the requirements document; storing real usernames/passwords in repo docs is a security risk. Use placeholders or reference a secure secret store instead.
Prompt for AI agents Check if this issue is valid — if so, understand the root cause and fix it. At backend/REQUIREMENTS.md, line 687:
Remove plaintext credentials from the requirements document; storing real usernames/passwords in repo docs is a security risk. Use placeholders or reference a secure secret store instead.
@@ -1,27 +1,851 @@ +**Test Credentials:** +- URL: https://www.k2think.ai +- Email: developer@pixelium.uk +- Password: developer123? + +**Test Plan:**ReactWrite a replyResolve commentCode has comments. Press enter to view.
Test Plan:
- Register service via API
- Monitor login via WebSocket
- Trigger discovery
- Verify discovered configuration
- Execute chat completion via OpenAI API
- Verify response correctness
- Test multiple concurrent requests
- Test error recovery
Success Criteria: ✅ Service registers successfully ✅ Auto-login completes ✅ Features discovered automatically ✅ OpenAI API returns valid response ✅ Multiple requests queue properly ✅ Errors self-heal and retry
# Rename autoqa directory to web2api
mv backend/web2api/autoqa-ai-testing/src/autoqa backend/web2api/autoqa-ai-testing/src/web2api_base
# Keep as reference, create new web2api alongside1. api/main.py: Transform from test API to service API
- Remove:
/build,/run,/resultsendpoints - Add: Service CRUD, discovery trigger, execute operation
- Add: WebSocket handler integration
2. concurrency/browser_pool.py: Enhance for service isolation
- Add: Service-specific browser profiles
- Add: Session persistence per service
- Keep: Existing pooling logic
3. llm/service.py: Adapt for discovery prompts
- Add: Auth detection prompts
- Add: Feature extraction prompts
- Add: Operation mapping prompts
4. storage/database.py: Add service tables
- Add: Service model
- Add: ServiceCredential model
- Add: Execution model
- Keep: Existing artifact storage
Discovery Module:
discovery/
├── __init__.py
├── auth_detector.py # NEW
├── feature_mapper.py # NEW
├── operation_builder.py # NEW
├── config_generator.py # NEW
└── orchestrator.py # NEW
Execution Module:
web2api/execution/
├── __init__.py
├── queue_manager.py # NEW
├── operation_runner.py # ADAPT from runner/test_runner.py
└── streaming.py # NEW
Auth Module:
auth/
├── __init__.py
├── credential_store.py # NEW
├── session_manager.py # NEW
└── form_filler.py # ADAPT from discovery/form_analyzer.py
API Module:
api/
├── __init__.py
├── main.py # MODIFY from web2api/api/main.py
├── service_manager.py # NEW
├── openai_compat.py # NEW
└── websocket_handler.py # NEW
Models:
models/
├── __init__.py
├── service.py # NEW - Pydantic models
├── operation.py # NEW
└── openai_schemas.py # NEW
System is considered COMPLETE when:
- ✅ K2Think.ai service registers automatically
- ✅ Login completes without manual intervention
- ✅ Discovery identifies at least: input field, submit button, output area
- ✅ Configuration generates valid operation definition
- ✅ OpenAI API request:
POST /v1/chat/completionsreturns valid response - ✅ Response contains actual output from k2think.ai
- ✅ Multiple concurrent requests queue properly
- ✅ TOWER frontend can connect and manage services
- ✅ WebSocket streams real-time updates
- ✅ Error recovery works (self-healing selectors)
Test Command:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "k2think",
"messages": [
{"role": "user", "content": "Write a haiku about programming"}
]
}'Expected Response:
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "k2think",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Code flows like water\nBugs emerge then disappear\nCommit, push, repeat"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}Web2API transforms any web service into an OpenAI-compatible API endpoint using intelligent browser automation and auto-discovery.
Core Flow: URL + Credentials → Auto-Discovery → OpenAI API Endpoint
Key Innovation: Zero manual configuration - the system discovers service capabilities, authentication flows, and operations automatically through intelligent analysis.
┌─────────────────────────────────────────────────────────────┐
│ TOWER Frontend (React) │
│ - Service Management UI │
│ - Real-time Status Updates │
│ - Configuration Editor │
└────────────────┬────────────────────────────────────────────┘
│ HTTP/REST + WebSocket
┌────────────────▼────────────────────────────────────────────┐
│ Web2API Server (FastAPI) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ API Layer │ │
│ │ - OpenAI-compatible endpoints (/v1/chat/completions)│ │
│ │ - Service management (CRUD) │ │
│ │ - WebSocket streaming │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Discovery Engine │ │
│ │ - Authentication detection │ │
│ │ - Feature extraction (vision + DOM) │ │
│ │ - Operation mapping │ │
│ │ - Configuration generation │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Execution Engine │ │
│ │ - Queue-based task processing │ │
│ │ - Browser session management │ │
│ │ - Error recovery & self-healing │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ - Service configurations (PostgreSQL) │ │
│ │ - Encrypted credentials │ │
│ │ - Execution artifacts │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────┬────────────────────────────────────────────┘
│ Owl-Browser Python SDK
┌────────────────▼────────────────────────────────────────────┐
│ Owl-Browser Pool │
│ - Stealth browser automation │
│ - Anti-detection measures │
│ - Session pooling & reuse │
│ - Natural language selectors │
└────────────────┬────────────────────────────────────────────┘
│ HTTPS
┌────────────────▼────────────────────────────────────────────┐
│ Target Web Services │
│ - ChatGPT, Claude, Z.ai, K2Think, etc. │
└─────────────────────────────────────────────────────────────┘
Backend:
- FastAPI (async Python web framework)
- Owl-Browser Python SDK (stealth automation)
- PostgreSQL (service configurations & state)
- Redis (optional: queue & caching)
- SQLAlchemy (ORM)
- Pydantic (data validation)
- Cryptography (credential encryption)
- LiteLLM (LLM orchestration)
Automation:
- Owl-Browser (anti-detection browser automation)
- Vision models (GPT-4V, Claude 3.5 Sonnet) for UI analysis
- DOM analysis & element classification
- Network traffic inspection
Frontend:
- React + TypeScript
- TailwindCSS
- WebSocket client for real-time updatesExecutive Summary & Architecture - Complete system diagram Owl-Browser Analysis - Stealth capabilities & integration AutoQA Functionality Matrix - 70% reusable components mapped Gap Analysis - Clear 30% new development required API Specifications - OpenAI + TOWER frontend contracts Discovery Pipeline - Zero-template auto-discovery flow Execution System - Queue-based (NO rate limiting) Database Schema - PostgreSQL tables for services/credentials/executions Implementation Phases - 6 phases, 14-day timeline File Modification Plan - Detailed file-by-file changes Success Metrics - Clear acceptance criteria with K2Think.ai test
📋 TOWER Frontend Contract Identified Clear API requirements from App.tsx & types.ts:
Service CRUD with status tracking Discovery trigger endpoint Execute endpoint for chat completions WebSocket for real-time updates Status fields: loginStatus, discoveryStatus, status
├── api/ │ ├── main.py # FastAPI app (modify from autoqa) │ ├── openai_compat.py # NEW: OpenAI API endpoints │ ├── service_manager.py # NEW: Service CRUD │ └── websocket_handler.py # NEW: Real-time updates ├── discovery/ │ ├── auth_detector.py # NEW: Auth mechanism detection │ ├── feature_mapper.py # NEW: Service capability extraction │ ├── operation_builder.py # NEW: Dynamic operation creation │ └── config_generator.py # NEW: Config synthesis ├── execution/ │ ├── queue_manager.py # NEW: FIFO task queue │ ├── operation_runner.py # ADAPT: From test_runner │ └── streaming.py # NEW: WebSocket log streaming ├── auth/ │ ├── credential_store.py # NEW: Encrypted storage │ ├── session_manager.py # NEW: Browser session + cookies │ └── form_filler.py # ADAPT: From form_analyzer ├── models/ │ ├── service.py # NEW: Service data model │ ├── operation.py # NEW: Operation schema │ └── openai_schemas.py # NEW: OpenAI request/response └── storage/ └── database.py # MODIFY: Add service tables
Service Data Model:
interface Service {
id: string;
name: string;
type: 'zai' | 'chatgpt' | 'claude' | 'k2think' | 'generic';
url: string;
status: 'online' | 'offline' | 'maintenance' | 'analyzing';
loginStatus: 'pending' | 'processing' | 'success' | 'failed';
discoveryStatus: 'pending' | 'scanning' | 'complete';
config: ServiceConfig;
availableModels?: string[];
stats: {
uptime: string;
requests24h: number;
avgLatency: number;
};
}Required REST Endpoints:
POST /api/services
Body: { url: string, email: string, password: string }
Response: Service (with loginStatus: 'pending', discoveryStatus: 'pending')
GET /api/services
Response: Service[]
GET /api/services/:id
Response: Service
PUT /api/services/:id
Body: { config: ServiceConfig }
Response: Service
DELETE /api/services/:id
Response: { success: boolean }
POST /api/services/:id/discover
Triggers background discovery process
Response: { message: string, taskId: string }
POST /api/services/:id/execute
Body: { message: string, model?: string }
Response: { response: string, model: string }
WebSocket Endpoint:
WS /ws/services/:id
Events:
- login_update: { loginStatus: string, message: string }
- discovery_update: { discoveryStatus: string, progress: number, message: string }
- execution_log: { timestamp: number, level: string, message: string }
- status_change: { status: string }
POST /v1/chat/completions
Body: {
model: string, // Service ID or "service-id:model-name"
messages: Array<{role: string, content: string}>,
stream?: boolean,
temperature?: number,
max_tokens?: number
}
Response: {
id: string,
object: "chat.completion",
created: number,
model: string,
choices: [{
index: 0,
message: {role: "assistant", content: string},
finish_reason: "stop"
}],
usage: {prompt_tokens: number, completion_tokens: number, total_tokens: number}
}
GET /v1/models
Response: {
object: "list",
data: [{
id: string, // Service ID
object: "model",
created: number,
owned_by: "web2api"
}]
}
Service Registration
↓
1. Authentication Discovery
- Detect login form fields (email, password, 2FA)
- Identify OAuth flow if present
- Check for cookie-based auth
- Store auth mechanism
↓
2. Login Execution
- Auto-fill credentials
- Submit form
- Handle 2FA (manual intervention if needed)
- Verify successful login
- Save session cookies
↓
3. Feature Detection (Vision + DOM)
- Screenshot main interface
- LLM vision analysis: "What capabilities does this service offer?"
- DOM analysis: buttons, dropdowns, toggles
- Identify:
* Model selectors
* Feature toggles (web browsing, image generation)
* Input areas (chat, prompt)
* Output areas (response, result)
↓
4. Operation Mapping
- Detect primary operation (e.g., "Send Message")
- Map input fields → parameters
- Map UI controls → optional parameters
- Generate selector strategy (natural language)
↓
5. Configuration Generation
- Create service config JSON
- Define operations with steps
- Set up parameter schema
- Store in database
↓
Service Ready (discoveryStatus: 'complete')
Auth Detection (auth_detector.py):
async def detect_auth_mechanism(browser, url):
await browser.navigate(url)
# Check for login form
has_email = await browser.find("email input")
has_password = await browser.find("password input")
if has_email and has_password:
return {
"type": "form_login",
"fields": {"email": email_selector, "password": password_selector}
}
# Check for OAuth
has_oauth = await browser.find("Sign in with Google")
if has_oauth:
return {"type": "oauth", "provider": "google"}
# Check for API key
has_api_key = await browser.find("API key input")
if has_api_key:
return {"type": "api_key"}
return {"type": "unknown"}Feature Detection (feature_mapper.py):
async def detect_features(browser, llm_service):
screenshot = await browser.screenshot()
dom = await browser.extract_dom()
# Vision analysis
vision_prompt = """
Analyze this web interface screenshot.
Identify:
1. What is the primary purpose? (chat, image generation, etc.)
2. What models/options are available?
3. What configurable features exist? (toggles, dropdowns)
4. What is the main input method?
5. Where does the output appear?
"""
analysis = await llm_service.analyze_vision(screenshot, vision_prompt)
# DOM analysis for selectors
buttons = await browser.find_all("button")
inputs = await browser.find_all("input, textarea")
selects = await browser.find_all("select")
return {
"primary_operation": analysis.primary_purpose,
"available_models": analysis.models,
"features": analysis.features,
"input_selector": find_best_input(inputs),
"submit_selector": find_submit_button(buttons),
"output_selector": find_output_area(dom)
}Configuration Generation (config_generator.py):
async def generate_config(service_info, features):
return {
"service_id": service_info.id,
"name": service_info.name,
"url": service_info.url,
"auth": {
"type": service_info.auth_type,
"session_cookie": service_info.session_cookie
},
"operations": [{
"id": "chat_completion",
"name": "Send Message",
"parameters": {
"message": {"type": "string", "required": True},
"model": {"type": "string", "enum": features.available_models, "optional": True}
},
"execution_steps": [
{"action": "navigate", "url": service_info.url},
{"action": "wait_for", "selector": features.input_selector},
{"action": "type", "selector": features.input_selector, "value": "{{message}}"},
{"action": "select_model", "selector": features.model_selector, "value": "{{model}}"},
{"action": "click", "selector": features.submit_selector},
{"action": "wait_for_response", "selector": features.output_selector, "timeout": 60},
{"action": "extract", "selector": features.output_selector}
]
}],
"discovered_at": datetime.utcnow()
}Queue Manager (queue_manager.py):
from asyncio import Queue
from typing import Dict
class ExecutionQueue:
def __init__(self):
self.queues: Dict[str, Queue] = {} # service_id → queue
async def add_task(self, service_id: str, task: dict):
if service_id not in self.queues:
self.queues[service_id] = Queue()
await self.queues[service_id].put(task)
async def process_queue(self, service_id: str, browser_pool):
queue = self.queues.get(service_id)
if not queue:
return
while True:
task = await queue.get()
try:
browser = await browser_pool.acquire(service_id)
result = await execute_operation(browser, task)
await send_result(task.client_id, result)
finally:
await browser_pool.release(browser)
queue.task_done()Operation Runner (operation_runner.py):
async def execute_operation(browser, service_config, operation_id, parameters, ws_client):
operation = service_config.get_operation(operation_id)
for step in operation.execution_steps:
await ws_client.send_log(f"Executing: {step.action}")
if step.action == "navigate":
await browser.navigate(step.url)
elif step.action == "type":
value = parameters.get(step.value.strip("{{}}"))
await browser.type(step.selector, value)
elif step.action == "click":
await browser.click(step.selector)
elif step.action == "wait_for_response":
await browser.wait_for(step.selector, timeout=step.timeout)
elif step.action == "extract":
result = await browser.extract(step.selector)
return result## 8. DATABASE SCHEMA
```sql
-- Services table
CREATE TABLE services (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
url TEXT NOT NULL,
type VARCHAR(50) DEFAULT 'generic',
status VARCHAR(50) DEFAULT 'offline',
login_status VARCHAR(50) DEFAULT 'pending',
discovery_status VARCHAR(50) DEFAULT 'pending',
config JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Credentials table (encrypted)
CREATE TABLE service_credentials (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_id UUID REFERENCES services(id) ON DELETE CASCADE,
auth_type VARCHAR(50), -- form_login, oauth, api_key
encrypted_email TEXT,
encrypted_password TEXT,
encrypted_api_key TEXT,
session_cookie TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Execution history
CREATE TABLE executions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
service_id UUID REFERENCES services(id) ON DELETE CASCADE,
operation_id VARCHAR(100),
parameters JSONB,
result TEXT,
status VARCHAR(50), -- pending, running, success, failed
started_at TIMESTAMP DEFAULT NOW(),
completed_at TIMESTAMP,
error TEXT
);
-- Service stats (aggregated)
CREATE TABLE service_stats (
service_id UUID PRIMARY KEY REFERENCES services(id) ON DELETE CASCADE,
total_requests INT DEFAULT 0,
successful_requests INT DEFAULT 0,
failed_requests INT DEFAULT 0,
avg_latency_ms INT DEFAULT 0,
last_request_at TIMESTAMP,
uptime_start TIMESTAMP DEFAULT NOW()
);Goal: Basic FastAPI server with database
Files to Create:
__init__.py- Package initmodels/service.py- Service data modelmodels/openai_schemas.py- OpenAI request/response schemasstorage/database.py- SQLAlchemy models + DB connectionapi/main.py- FastAPI app with CORS, middleware
Tasks:
- Set up PostgreSQL database
- Create tables with migrations
- Basic FastAPI server running on port 8000
- Health check endpoint
/health
Goal: CRUD operations for services
Files to Create:
api/service_manager.py- Service CRUD routesauth/credential_store.py- Encrypted credential storageapi/websocket_handler.py- WebSocket connection manager
Endpoints to Implement:
POST /api/services- Register serviceGET /api/services- List all servicesGET /api/services/:id- Get single servicePUT /api/services/:id- Update service configDELETE /api/services/:id- Delete serviceWS /ws/services/:id- WebSocket for updates
Test: Register k2think.ai service, verify database storage
Goal: Automated service discovery
Files to Create:
discovery/auth_detector.py- Auth mechanism detectiondiscovery/feature_mapper.py- UI feature extractiondiscovery/operation_builder.py- Operation definition creationdiscovery/config_generator.py- Config synthesisdiscovery/orchestrator.py- Discovery pipeline coordinator
Reuse from AutoQA:
web2api/builder/analyzer/page_analyzer.pyweb2api/builder/analyzer/visual_analyzer.pyweb2api/builder/discovery/form_analyzer.pyweb2api/llm/service.py
Tasks:
- Implement auth detection (form login focus)
- Login execution with credential injection
- Vision-based feature detection
- Operation mapping with natural language selectors
- Config generation and storage
Test: Discover k2think.ai capabilities automatically
Goal: Execute discovered operations
Files to Create:
execution/queue_manager.py- FIFO task queueexecution/operation_runner.py- Operation execution logicexecution/streaming.py- WebSocket log streaming
Reuse from AutoQA:
web2api/concurrency/browser_pool.py- Owl-browser poolingweb2api/runner/self_healing.py- Selector recovery
Tasks:
- Queue-based task processing (no rate limiting)
- Browser session acquisition from pool
- Step-by-step execution with logging
- Real-time log streaming via WebSocket
- Error recovery with self-healing selectors
Test: Execute chat completion on k2think.ai
Goal: OpenAI-compatible endpoints
Files to Create:
web2api/api/openai_compat.py- OpenAI API routes
Endpoints to Implement:
POST /v1/chat/completions- Main execution endpointGET /v1/models- List services as models- Streaming support with SSE
Tasks:
- Map OpenAI request → service execution
- Format response as OpenAI completion
- Handle streaming responses
- Token counting (estimated)
Test:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "k2think",
"messages": [{"role": "user", "content": "Hello, how are you?"}]
}'Goal: End-to-end working system
Test Credentials:
- URL: https://www.k2think.ai
- Email: developer@pixelium.uk
- Password: developer123?
Test Plan:
- Register service via API
- Monitor login via WebSocket
- Trigger discovery
- Verify discovered configuration
- Execute chat completion via OpenAI API
- Verify response correctness
- Test multiple concurrent requests
- Test error recovery
Success Criteria: ✅ Service registers successfully ✅ Auto-login completes ✅ Features discovered automatically ✅ OpenAI API returns valid response ✅ Multiple requests queue properly ✅ Errors self-heal and retry
1. api/main.py: Transform from test API to service API
- Remove:
/build,/run,/resultsendpoints - Add: Service CRUD, discovery trigger, execute operation
- Add: WebSocket handler integration
2. concurrency/browser_pool.py: Enhance for service isolation
- Add: Service-specific browser profiles
- Add: Session persistence per service
- Keep: Existing pooling logic
3. llm/service.py: Adapt for discovery prompts
- Add: Auth detection prompts
- Add: Feature extraction prompts
- Add: Operation mapping prompts
4. storage/database.py: Add service tables
- Add: Service model
- Add: ServiceCredential model
- Add: Execution model
- Keep: Existing artifact storage
Discovery Module:
discovery/
├── __init__.py
├── auth_detector.py # NEW
├── feature_mapper.py # NEW
├── operation_builder.py # NEW
├── config_generator.py # NEW
└── orchestrator.py # NEW
Execution Module:
execution/
├── __init__.py
├── queue_manager.py # NEW
├── operation_runner.py # ADAPT from runner/test_runner.py
└── streaming.py # NEW
Auth Module:
auth/
├── __init__.py
├── credential_store.py # NEW
├── session_manager.py # NEW
└── form_filler.py # ADAPT from discovery/form_analyzer.py
API Module:
api/
├── __init__.py
├── main.py # MODIFY from web2api/api/main.py
├── service_manager.py # NEW
├── openai_compat.py # NEW
└── websocket_handler.py # NEW
Models:
models/
├── __init__.py
├── service.py # NEW - Pydantic models
├── operation.py # NEW
└── openai_schemas.py # NEW
System is considered COMPLETE when:
- ✅ K2Think.ai service registers automatically
- ✅ Login completes without manual intervention
- ✅ Discovery identifies at least: input field, submit button, output area
- ✅ Configuration generates valid operation definition
- ✅ OpenAI API request:
POST /v1/chat/completionsreturns valid response - ✅ Response contains actual output from k2think.ai
- ✅ Multiple concurrent requests queue properly
- ✅ TOWER frontend can connect and manage services
- ✅ WebSocket streams real-time updates
- ✅ Error recovery works (self-healing selectors)
Test Command:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "k2think",
"messages": [
{"role": "user", "content": "Write a haiku about programming"}
]
}'Expected Response:
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "k2think",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Code flows like water\nBugs emerge then disappear\nCommit, push, repeat"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
}
}NEXT STEPS: BEGIN IMPLEMENTATION
Phase 1 starts now with core infrastructure setup.
FeatureMapper - Map UI elements to capabilities OperationBuilder - Generate executable operations CredentialStore - Encrypted credential storage Complete execution logic - Navigate, login, chat, extract response 🧪 Test Commands (once complete)
curl -X POST http://localhost:8080/api/services
-d '{"url": "https://www.k2think.ai",
"credentials": {"type": "login_password",
"username": "developer@pixelium.uk",
"password": "developer123?"}}'
curl -X POST http://localhost:8080/api/services/{id}/discover
curl -X POST http://localhost:8080/v1/chat/completions
-d '{"model": "{service_id}", "messages": [{"role": "user", "content": "Hello!"}]}'
In TOWER PROJECT. Fully analyze - https://github.com/Zeeeepa/TOWER/tree/main/backend/web2api/autoqa-ai-testing/src/autoqa
ALL PY FILES - VIEW ALL OF THEM, VIEW WHOLE https://github.com/Zeeeepa/TOWER/blob/main/backend/web2api/autoqa-ai-testing/docs/TECHNICAL.md FULLY VIEW https://github.com/Zeeeepa/TOWER/blob/main/backend/packages/owl-browser/README.md FULLY VIEW https://github.com/Zeeeepa/TOWER/blob/main/backend/packages/owl-browser-sdk/README.md https://owlbrowser.net/docs - View all 157 commands
VIEW ALL 157 commands available for owl browser ->
https://owlbrowser.net/docs/browser_ai_extract - this should be used to identify area where from to extract response in webchat interface.
https://owlbrowser.net/docs/browser_extract_text - this should be used to extract response in webchat interface.
https://owlbrowser.net/docs/browser_detect_captcha - this should be used to solve captcha when logging in if present
https://owlbrowser.net/docs/browser_classify_captcha - this should be used to solve captcha when logging in if present
https://owlbrowser.net/docs/browser_solve_text_captcha - th is should be used to solve captcha when logging in if present
https://owlbrowser.net/docs/browser_solve_image_captcha - this should be used to solve captcha when logging in if present
https://owlbrowser.net/docs/browser_solve_captcha - this should be used to solve captcha when logging in if present
https://owlbrowser.net/docs/browser_query_page - this could be used to identify functions/features from the page?
https://owlbrowser.net/docs/browser_ai_analyze - this could be used to identify functions/features from the page?
https://owlbrowser.net/docs/browser_find_element - this could be used to identify functions/features from the page?
https://owlbrowser.net/docs/browser_get_cookies - this should be used when using same service in future
https://owlbrowser.net/docs/browser_set_cookie - this should be used when using same service in future
https://owlbrowser.net/docs/browser_is_enabled - this should be used to identify when "Send" button changes state from disabled to enabled (Meaning that the response in the page was completed) and ready for retrieval
https://owlbrowser.net/docs/browser_upload_file - this should be used for services that allow uploading files.
For different paralel services it should use
Tab Management
8 commands
set_popup_policyget_tabsswitch_tabclose_tabnew_tabget_active_tabget_tab_countget_blocked_popups
WHEN ADDING SERVICE VIA TOWER FRONTEND -> IT SHOULD HAVE live-record feed viewport for actions and identificaTion flow using these commands - (So that user would live-view how it goes to identify, test and save flows to access and configure functions/features of service).
Video Recording 11 start_video_recordingpause_video_recordingresume_video_recordingstop_video_recordingget_video_recording_statsdownload_video_recordingstart_live_streamstop_live_streamget_live_stream_statslist_live_streamsget_live_frame
Can you fully analyze requirements, analyze all py codes, all documentation website pages, all documentations, owl browser usages and features -> AND Properly UPGRADE DOCUMENTATION TO MATCH BETTER.
GOAL -> Is not to create new module, but to modify https://github.com/Zeeeepa/TOWER/tree/main/backend/web2api/autoqa-ai-testing/src/autoqa to modify autoqa folder into web2api working server modifying ALL NEEDED CODEFILES WITHIN, not creating separate modules - but upgrading existant module -> for that you first need to identify what needs to be implemented from EXISTANT owl-browser functions