Skip to content

Lyra-stellAI/Web-browser-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web browser agent

A lightweight Flask web app that helps you read the web faster and incrementally build a personal knowledge graph:

  • Search the web from a single input bar (DuckDuckGo).
  • Summarize any page by pasting its URL and clicking Summarize.
  • Summarize pasted text by pasting raw text instead of a URL.
  • One-click "Summarize this" on every search result.
  • Knowledge Graph — highlight any text in a summary, or paste a chunk in the input and click + KG, to save it (with source metadata, tags, and notes) to a staging area. When you're ready, Integrate curated chunks into the persistent overall graph — the LLM extracts entities and links chunks to existing nodes, so the graph compounds over time the more you read.

Summaries can be generated by any of:

Provider Env var for API key Default model
Anthropic (Claude) ANTHROPIC_API_KEY claude-haiku-4-5-20251001
OpenAI OPENAI_API_KEY gpt-4o-mini
Qwen (DashScope) DASHSCOPE_API_KEY qwen-plus
DeepSeek DEEPSEEK_API_KEY deepseek-chat

Pick the provider and model from the dropdowns in the UI. The model field is free-form, so any model ID the provider supports also works. If no API keys are set, the app falls back to a local extractive summarizer that needs no key.

Quick start

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Optional — set any one or more for higher-quality AI summaries:
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
export DASHSCOPE_API_KEY=sk-...       # Qwen
export DEEPSEEK_API_KEY=sk-...

python app.py

Then open http://localhost:5000.

How to use

  • Search — Type a query, press Search (or hit Enter).
  • Summarize a URL — Paste a URL, press Summarize (or Shift+Enter).
  • Summarize text — Paste raw text into the input and press Summarize.
  • From any search result, click Summarize this to summarize that page.

How to use

Read tab

  • Search — type a query, press Search (or Enter).
  • Summarize a URL — paste a URL, press Summarize (or Shift+Enter).
  • Save a chunk to KG — either paste text in the input and click + KG, or highlight any text in a summary to surface a floating "Save to KG" button. A modal lets you add tags, a note, and source metadata.

Knowledge Graph tab

  • Ingest a corpus — at the top of the tab, three modes:
    • Files: drag-drop or pick .txt, .md, .html, .pdf (up to 50 MB).
    • URLs: one URL per line — each page is fetched and chunked.
    • Text: paste a large document with an optional source title.
  • All modes share chunk size (default 800 chars, ~200 tokens) and overlap (default 120 chars, ~15%). The chunker splits on paragraph boundaries first, then sentence boundaries inside long paragraphs, and carries an overlap (snapped to a word boundary) between chunks. Chunks land in staging tagged with their source and part:i/N.
  • Browse what's in staging (recently saved chunks awaiting integration).
  • Browse the integrated graph (chunks + extracted entities + edges).
  • Click Integrate → to move all staged chunks into the overall graph. The selected provider/model (from the Read tab) extracts entities and links them. If no API key is set or you select Extractive, the app falls back to a heuristic (capitalized phrase) entity extractor.
  • Click a node in the graph or sidebar to see its full text, source, tags, note, and linked nodes.
  • Search the graph by free text — matches chunk text, entity names, tags, notes, and source titles.

The graph is stored as plain JSON in data/current.json (staging) and data/overall.json (integrated). It's just a git-friendly file you can diff, back up, or edit by hand.

Endpoints

  • GET /api/providers — list configured providers and their suggested models.
  • POST /api/search{ "query": "..." } → DuckDuckGo results.
  • POST /api/summarize{ "input": "<url-or-text>", "provider": "...", "model": "..." } → page summary. provider may be auto, anthropic, openai, qwen, deepseek, or extractive. model is optional and falls back to the provider's default.
  • GET /api/kg/stats{current: {chunks, entities, edges}, overall: {...}}.
  • GET /api/kg/graph?where=current|overall — full nodes + edges.
  • POST /api/kg/add{ "text", "source_title?", "source_url?", "tags?", "note?" } adds a chunk to the staging graph.
  • POST /api/kg/integrate{ "provider?", "model?", "use_ai?" } moves staged chunks into the overall graph, extracting entities and edges.
  • POST /api/kg/query{ "query", "where?" } returns matching nodes.
  • DELETE /api/kg/node/<id>?where=current|overall — remove a node.
  • POST /api/kg/ingest/text — chunk a pasted document into staging. Body: { "text", "source_title?", "source_url?", "tags?", "chunk_size?", "overlap?" }.
  • POST /api/kg/ingest/urls — fetch each URL, parse, chunk into staging. Body: { "urls": [...], "tags?", "chunk_size?", "overlap?" }.
  • POST /api/kg/ingest/files — multipart upload of .txt/.md/.html/.pdf, with form fields tags, chunk_size, overlap.

Configuration

Variable Purpose
ANTHROPIC_API_KEY Enables Claude-powered summaries.
OPENAI_API_KEY Enables OpenAI-powered summaries.
DASHSCOPE_API_KEY Enables Qwen (DashScope) summaries.
DEEPSEEK_API_KEY Enables DeepSeek summaries.
OPENAI_BASE_URL Override OpenAI base URL (proxy, Azure, etc.).
QWEN_BASE_URL Override DashScope base URL (use China endpoint, etc.).
DEEPSEEK_BASE_URL Override DeepSeek base URL.
KG_DATA_DIR Directory for KG JSON files (default data/).
PORT Port to bind (default 5000).

About

An agent that helps user to understand and analyze web content

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors