
# üß∞ 06 ‚Äî MCP Tools & Connectors (Full Design Patterns, Option A)

This notebook is your **pattern library** for designing **MCP tools and connectors**.

It focuses on:

- How to **shape tools** so LLMs and agents can use them safely and effectively  
- Patterns for **HTTP, DB, filesystem, search/RAG, DevOps, and domain tools**  
- **Bad ‚Üí Good** transformations to train your intuition  
- **Checklists** you can reuse whenever you design a new tool or server  

You can treat this notebook as:

- A **design manual** for yourself and your team  
- A **review checklist** when doing MCP design/code reviews  
- The ‚Äú**style guide**‚Äù for tools in your world-class MCP foundation  



## 1. Tool Design Philosophy

Before diving into categories, it helps to lock in a few principles.

### 1.1 Tools Are Typed Capabilities, Not Mini-LLMs

Bad mental model:

> ‚ÄúA tool is just a function where I dump a prompt and get back some text.‚Äù

Good mental model:

> ‚ÄúA tool is a **typed capability** with a clear contract and predictable behavior.‚Äù

Implications:

- Tools are **not** your main place for ‚Äúprompting‚Äù ‚Äî the LLM already does that.  
- Tools should have:
  - explicit **inputs**
  - explicit **outputs**
  - well-defined **errors**
- Tools should be reusable across:
  - multiple LLMs
  - agents
  - products  

### 1.2 Tools Should Be Easy for LLMs to Use

Ask:

- Can a model easily infer:
  - when to call this tool?
  - what arguments to supply?  
- Does your description:
  - clearly say what the tool does?
  - provide examples in human terms?  

### 1.3 Tools Should Be Safe and Composable

- Constrain inputs to **safe shapes**  
- Avoid global side effects unless necessary  
- Design outputs so that:
  - they can be consumed by other tools
  - they can be logged and inspected by humans  



## 2. General Tool Design Checklist

Use this when defining any new tool.

### 2.1 Inputs Checklist

- [ ] Are all **required** inputs clearly marked as required?  
- [ ] Are optional inputs either:
  - truly optional, or
  - given safe defaults?  
- [ ] Are types precise?
  - `string` vs `enum` vs `number` vs structured objects  
- [ ] Are there **validation rules**? (e.g., `limit <= 100`)  
- [ ] Are there **explicit constraints** on:
  - paths
  - hostnames
  - IDs  

### 2.2 Outputs Checklist

- [ ] Is the **main data** returned in a structured form?  
- [ ] Is there **metadata**, such as:
  - `total`, `count`
  - `truncated`
  - info about source system  
- [ ] Are error conditions represented clearly (e.g., partial success)?  
- [ ] Could another tool or agent use this output as input easily?

### 2.3 Error Semantics

- [ ] Do you have **domain error codes** (e.g., `not_found`, `forbidden`)?  
- [ ] Do you distinguish:
  - validation errors
  - upstream errors
  - internal server errors?  
- [ ] Are error messages informative but not leaking secrets?  

### 2.4 Safety & Limits

- [ ] Are potentially expensive operations:
  - rate-limited?
  - bounded by `limit`, `max_bytes`, or timeouts?  
- [ ] Are high-impact changes protected by:
  - additional checks
  - special tools with limited access?  

### 2.5 Documentation

- [ ] Does the `description` clearly:
  - say what tool does?
  - explain key arguments?
  - mention limits and expected use?  

You can convert this into a **tool review checklist** for PRs.



## 3. HTTP / REST / GraphQL Tools

HTTP tools are extremely common for MCP because:

- Most systems already expose REST/GraphQL APIs  
- You want LLMs and agents to leverage existing services  

### 3.1 Anti-Pattern: Raw HTTP Proxy Tool

**Bad design:**

- Tool: `http_request`
- Input:
  - arbitrary URL
  - arbitrary headers
  - arbitrary method
  - arbitrary body  

Problems:

- Huge security surface (SSRF, data exfiltration)  
- Unpredictable behavior  
- Hard to reason about / monitor  

### 3.2 Pattern: Domain-Specific HTTP Tools

Prefer tools that:

- Are **tied to a specific backend service or domain**  
- Expose **structured operations**, not raw HTTP  

Example:

> Instead of `http_request`, define:
> - `getIssueById(issue_id)`
> - `searchIssues(query, status?)`
> - `createIssue(title, description, labels?)`

The HTTP details stay inside the tool implementation.

### 3.3 Pattern: Semi-Generic HTTP Fetch with Allow-Lists

If you really need a more generic HTTP fetch tool:

- Restrict to:
  - `https://` only  
  - allow-listed hostnames  
  - specific HTTP methods (e.g., GET only)  
- Enforce:
  - max response size via `max_bytes`
  - timeouts  

Tool shape (conceptually):

```jsonc
{
  "name": "http_fetch",
  "description": "Fetch content from allowed HTTPS APIs (GET only).",
  "input_schema": {
    "type": "object",
    "properties": {
      "url": { "type": "string" },
      "max_bytes": { "type": "integer" }
    },
    "required": ["url"]
  }
}
```

Output:

- `status`, `headers`, `body`
- meta: `truncated`, `max_bytes`, `url`  

### 3.4 Handling Authentication

Avoid letting the LLM see or manage **secrets** directly.

Patterns:

- Use environment-configured API keys  
- Use per-tenant tokens injected by the MCP server  
- If needed, create:
  - separate MCP servers for different auth scopes  
  - or tools that operate on behalf of specific users with pre-configured tokens  



## 4. Database Tools (SQL/NoSQL, Analytics, Data Lakes)

Databases are powerful but dangerous when exposed to LLMs.

### 4.1 Anti-Pattern: Arbitrary SQL Tool

**Bad design:**

- Tool: `run_sql(query: string)`  

Problems:

- LLM hallucination ‚Üí destructive queries  
- Data exfiltration  
- Hard to constrain  

### 4.2 Pattern: Predefined Queries with Parameters

Better approach:

- Each tool corresponds to a **pre-approved query or query family**  
- Inputs:
  - parameters to filter/slice results  
- Example:

```jsonc
{
  "name": "getCustomerById",
  "input_schema": {
    "type": "object",
    "properties": {
      "customer_id": { "type": "string" }
    },
    "required": ["customer_id"]
  }
}
```

Implementation:

- Maps to a specific SQL/ORM query  
- Returns a **structured customer object**  

### 4.3 Pattern: Analytics View Tools

For analytics:

- Use **views** / **materialized views** or **curated tables**  
- Tools operate only on those:

Examples:

- `listKpiDefinitions()`
- `getKpiSnapshot(kpi_name, window)`
- `querySalesByRegion(region, start_date, end_date)`

### 4.4 Pagination & Limits

Database tools must:

- Accept `limit`, `offset` or `cursor`  
- Enforce a **max limit** even if LLM requests more  
- Return metadata:
  - `returned`
  - `total` (if cheap)
  - `truncated` or `next_cursor`  

### 4.5 Masking and Redaction

Plan for:

- Removing or masking PII in results where not needed  
- Different result schemas for:
  - admins
  - normal users
  - external contexts  



## 5. Filesystem Tools (Local & Remote Storage)

Filesystem tools are often used to:

- Access documentation  
- Read logs / reports  
- Manage small config files  

### 5.1 Sandbox Design

Always:

- Use a **base directory** per MCP server
- Reject paths that resolve outside base  
- Provide:
  - `listFiles(path, recursive?, max_items?)`
  - `readFile(path, max_bytes?)`  

### 5.2 Pattern: Path-Based Tools with Relative Paths

Tools should:

- Accept **relative paths only**  
- Provide:
  - `is_dir` info
  - file sizes  

### 5.3 Pattern: File Content Summary Tools

Instead of always returning raw content:

- Consider tools like:
  - `summarizeFile(path, max_bytes?)`
  - `getFileMetadata(path)`  

This pairs well with RAG and reduces tokens.



## 6. Search & RAG Tools

You can treat your **RAG system** as being behind MCP tools.

### 6.1 Typical RAG-Related Tools

Examples:

- `searchDocuments(query, top_k, filters?)`
- `getDocumentById(id)`
- `getDocumentChunk(id, chunk_id)`
- `summarizeContext(chunks)`  

### 6.2 Pattern: Separate Retrieval from Generation

Design pattern:

1. A **retrieval tool** returns structured context objects
2. The LLM:
   - reads/ingests these contexts
   - decides how to answer  

Avoid tools that:

- take a prompt and do retrieval + generation internally with no visibility  
- hide context from upstream logs/analysis  

### 6.3 Pattern: Multi-Index / Multi-Source Tools

Expose:

- `searchProductDocs`
- `searchInternalPolicies`
- `searchFinancialReports`

Each tool can have:

- separate indexes  
- separate filters  
- separate access rules  



## 7. DevOps, Logs, and Observability Tools

MCP is powerful for **DevOps** and **SRE** workflows, but must be designed carefully.

### 7.1 Read-Only DevOps Tools

Safer first step:

- `getRecentLogs(service, window)`
- `getMetricSnapshot(metric_name, window)`
- `listDeployments(service, window)`  

These should:

- Limit windows (e.g., last 1h, last 24h)  
- Limit log volume (max lines / bytes)  
- Provide structure, not raw plain-text dumps  

### 7.2 High-Impact Tools (Deploys, Rollbacks)

If you expose:

- `triggerDeploy(service, version)`
- `rollbackDeploy(service, to_version)`

Be very strict:

- consider human-in-the-loop approval  
- strong logging and auditing  
- limited environments (e.g., dev/stage only from MCP)  

### 7.3 Pattern: Diagnostic Bundles

Tools that assemble summaries:

- `getServiceHealthSummary(service)`
  - error rates
  - p95 latency
  - last incidents  

This is more LLM-friendly than separate calls to multiple bare metrics.



## 8. Domain-Specific Tools (Finance, Legal, Healthcare, etc.)

MCP shines when you create **high-level domain tools**.

### 8.1 Pattern: Ask for Intent, Use Tools for Data

In a financial assistant MCP:

- Tools:
  - `getAccountSummary(account_id)`
  - `listHoldings(portfolio_id)`
  - `getTransactionHistory(account_id, window)`  

LLM uses tools to ground its reasoning.

### 8.2 Pattern: Guarded Actions

For domains like finance/healthcare:

- Distinguish between:
  - **informational tools** (safe, read-only)
  - **action tools** (e.g., place trade, schedule appointment)  

Action tools should:

- be few
- be well-guarded
- have strong logs and approvals  



## 9. Bad ‚Üí Good: Concrete Examples

### 9.1 ‚ÄúDo Anything‚Äù Tool ‚Üí Specific Tools

**Bad:**

```jsonc
{
  "name": "do_anything",
  "input_schema": {
    "type": "object",
    "properties": {
      "prompt": { "type": "string" }
    }
  }
}
```

**Good split:**

- `searchKnowledgeBase(question, tags?)`
- `getUserProfile(user_id)`
- `getLatestAnnouncements(limit)`  

Each with:

- well-defined inputs
- typed outputs
- easier reasoning/testing  

### 9.2 Raw SQL ‚Üí Filtered Query Tool

**Bad:**

```jsonc
{
  "name": "run_sql",
  "input_schema": {
    "type": "object",
    "properties": {
      "query": { "type": "string" }
    },
    "required": ["query"]
  }
}
```

**Better:**

```jsonc
{
  "name": "listOpenTickets",
  "input_schema": {
    "type": "object",
    "properties": {
      "assignee": { "type": "string" },
      "limit": { "type": "integer" }
    }
  }
}
```

Internally, you generate a safe SQL query with parameter binding.



## 10. Composability Patterns

Design tools so they can be **chained** and **reused**.

### 10.1 ID-Centric Flows

Pattern:

1. `searchSomething(...)` ‚Üí returns list of items with IDs  
2. `getSomethingById(id)` ‚Üí returns full details  
3. Optional: `updateSomething(id, patch)`  

This lets:

- LLMs ‚Äúroughly search‚Äù, then ‚Äúzoom in‚Äù  
- Agents build multi-step reasoning easily  

### 10.2 Summary + Detail Combo

- Tool A: `summarizeEntity(id)`  
- Tool B: `getEntityDetails(id)`  

LLM can first get a summary:

- If deeper context needed, call the detail tool  

### 10.3 Pre-Composed High-Level Tools

You can add tools that:

- internally orchestrate calls to multiple lower-level tools  
- then return a **domain-specific summary**  

Example:

- `getIncidentOverview(incident_id)`:
  - logs
  - metrics
  - related tickets  



## 11. Multi-Tenant / Multi-User Patterns

When multiple tenants/users share an MCP server:

### 11.1 Contextual Tool Behavior

Tools should:

- derive **tenant/user context** from:
  - session metadata
  - auth tokens in the MCP client  
- never rely solely on user-supplied IDs  

### 11.2 Scoping & Isolation

Per-tenant:

- base directories  
- DB schemas or row-level filters  
- rate limits  

Design tools so cross-tenant access is impossible by construction.



## 12. Tool Pattern Library ‚Äî Quick Reference

You can treat this as a **micro-TOC** for patterns in this notebook.

- HTTP:
  - Domain-specific HTTP tools
  - Safe semi-generic fetch with allow-list  
- DB:
  - Predefined queries with parameters
  - Analytics views & KPIs  
- Filesystem:
  - Sandbox base dir
  - List / read / summarize patterns  
- Search & RAG:
  - Retrieve vs generate separation
  - Multi-index tools  
- DevOps:
  - Read-only observability tools
  - Guarded deploy/rollback tools  
- Domain:
  - Informational vs action tools
  - Guarded actions with strong audits  

Whenever you define a new tool, ask:

- Which **pattern family** does it belong to?  
- Am I following the **checklist** for safety & usability?  
