Web search, content fetching, and research for Pi. Multi-provider search, specialized fetchers for GitHub/Reddit/Twitter/YouTube/PDF, and a scout subagent that keeps noise out of your context.
pi install git:github.com/JJGO/pi-internetOr try without installing:
pi -e git:github.com/JJGO/pi-internetSearch the web using multiple providers in parallel. Results are deduplicated by URL and the richer snippet is kept.
Search for "typescript monorepo best practices 2025"
- Primary providers run in parallel (default: Brave + Kagi)
- Fallback providers fill gaps when primaries fail or return too few unique results (default: Tavily)
- Brave rate-limit / usage-limit responses disable Brave for the rest of the current session
- Override per-call with
provider: "brave"/"kagi"/"tavily"— explicit provider selection does not auto-fallback - Default: 10 results (max 20) — each provider returns 10, merged and deduplicated
- Freshness filter:
day,week,month,year - Concise warnings are surfaced when a provider is disabled or a fallback had to fill results
Fetch any URL and get clean, token-efficient markdown. Auto-detects content type:
| URL Type | Handler |
|---|---|
| GitHub | Clones repo locally, returns tree + README. Use read/bash on the local path. |
| Uses a configurable Redlib-compatible proxy when configured. Structured posts + nested comments (depth 4). | |
| Twitter/X | Uses a configurable Nitter-compatible proxy when configured. Profiles, threads, tweets with RT/quote detection. |
| YouTube | Videos return transcripts via yt-dlp. Playlists and channels preview 25 entries inline and write the full list to cache. |
Extracts text via unpdf. Large extractions are also saved to ~/Downloads/. |
|
| HTML | Readability → RSC parser → Jina Reader fallback chain. |
- Links stripped by default (saves ~50 tokens/link). Set
includeLinks: trueto keep. - CSS selector support:
selector: ".docs-content"narrows extraction. verbose: truefor Reddit: full comment depth. For Twitter: untruncated tweets. For YouTube collections: no internal entry cap.- YouTube playlists/channels write full lists to
~/.cache/pi-internet/youtube-lists/. Default output previews the first 25 items inline.
web_research (hidden by default)
Spawns a lightweight scout subagent that searches + fetches pages, then returns only relevant findings. All noise stays in the scout's disposable context.
The /toggle-research switch is session-only by design.
/toggle-research # Enable the tool
Auto-detected scout model based on your current provider:
| Your Provider | Scout Uses |
|---|---|
| Anthropic | claude-haiku-4-5 |
| OpenAI | gpt-4.1-mini |
gemini-2.0-flash |
Override per-call with model: "...".
Settings live in Pi's settings files (~/.pi/agent/settings.json or .pi/settings.json):
{
"piInternet": {
"searchProviders": ["brave", "kagi"],
"fallbackProviders": ["tavily"],
"reddit": {
"commentDepth": 4
},
"twitter": {},
"github": {
"enabled": true,
"maxRepoSizeMB": 350,
"clonePath": "/absolute/path/to/github-repos",
"refreshTtlMs": 300000
},
"youtube": {
"enabled": true
},
"fetch": {
"includeLinks": false,
"timeoutMs": 30000,
"socksProxy": "socks5h://127.0.0.1:25344"
}
}
}fetch.socksProxy is optional and disabled by default. When set, pi-internet routes its outbound HTTP requests through that SOCKS proxy.
github.clonePath defaults to ~/.cache/pi-internet/github-repos when omitted.
github.refreshTtlMs defaults to 300000 (5 minutes). Cached repos are refreshed with git fetch + hard reset when they are older than the TTL. If the cached clone has local edits, pi-internet keeps it untouched and creates a fresh sibling clone instead.
piWebSurf is still accepted as a legacy config key for backward compatibility, but piInternet is preferred.
| Provider | Env Var |
|---|---|
| Brave Search | BRAVE_API_KEY |
| Kagi | KAGI_SESSION_TOKEN |
| Tavily | TAVILY_API_KEY |
Kagi also checks ~/.pi/kagi-search.json and ~/.kagi_session_token as fallbacks.
| Feature | Env Var | Value |
|---|---|---|
| Package-wide SOCKS proxy for outbound HTTP | PI_INTERNET_SOCKS_PROXY |
Full proxy URL, e.g. socks5h://127.0.0.1:25344 |
| Reddit via Redlib-compatible proxy | PI_INTERNET_REDLIB_PROXY |
Hostname only, e.g. redlib.example.com |
| Twitter/X via Nitter-compatible proxy | PI_INTERNET_NITTER_PROXY |
Hostname only, e.g. nitter.example.com |
PI_INTERNET_SOCKS_PROXY overrides piInternet.fetch.socksProxy when both are set.
If these env vars are unset, SOCKS proxying stays disabled and Reddit/X URLs fall through to regular HTTP fetching.
| Binary | Required For |
|---|---|
yt-dlp |
YouTube transcripts and playlist/channel metadata |
git or gh |
GitHub cloning |
| Command | Description |
|---|---|
/search-providers |
List configured providers, availability, and session-disabled status |
/kagi-login |
Set Kagi session token interactively |
/toggle-research |
Show/hide the web_research tool for the current session |
web_search(query)
→ Run primary providers in parallel (Brave + Kagi)
→ Merge: deduplicate by URL, keep richer snippet
→ If merged results are short → try fallback providers (Tavily) to fill the gap
→ If Brave returns a rate-limit or usage-limit response → disable Brave for the rest of the session
fetch_url(url)
→ Reddit? Configured Redlib-compatible proxy → parse posts/comments → render markdown
→ Twitter? Configured Nitter-compatible proxy → parse tweets/profile → render markdown
→ GitHub? Clone repo → tree + README + file content
→ YouTube? Video → yt-dlp subtitles → parse VTT → timestamped transcript
Playlist/channel → yt-dlp flat JSON → first 25 inline + full list file
→ PDF? unpdf extraction → inline markdown (+ save large outputs)
→ HTTP? Readability → RSC parser → Jina Reader fallback
web_research(task)
→ Spawn scout: pi --mode json --no-session -e <this-ext>
→ Scout searches + fetches + analyzes
→ Returns only relevant findings
- Links stripped from extracted content by default
- Images always stripped
- Reddit comments capped at depth 4 with truncation notices
- Tweet content truncated to 500 chars in non-verbose mode
- Output truncated to Pi's standard limits (50KB / 2000 lines)
- Social proxy auto-disables on failure (falls through to HTTP for session)
MIT
This project is a pragmatic blend of original code plus ideas and implementation patterns adapted from a few prior Pi- and web-fetch-related projects.
- Search builds on patterns from
pi-websearch,pi-web-access, andpi-kagi-searchfor provider routing, result normalization, and Kagi session-based scraping. - Fetchers reuse or adapt ideas from
pi-web-access,pi-fetch, for GitHub extraction, Readability/RSC/Jina fallback behavior, PDF extraction. - Research/scout mode borrows the disposable subagent pattern from
pi-surf. - Utilities and glue code were simplified, consolidated, or rewritten to fit this package, especially around config loading, markdown rendering, routing, and packaging.
Where code was adapted, comments in the relevant source files point back to the upstream project or module.