Skip to content

Newscatcher/langchain-catchall

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦜🔗 LangChain CatchAll

License: MIT

The official LangChain integration for CatchAll by NewsCatcher.

Build autonomous web search agents, financial analysts, and research assistants that can find, read, and analyze millions of web pages.


🌟 Features

  • Smart Caching: "Fetch Once, Query Many." Search for a topic, then ask infinite follow-up questions instantly using the local cache.
  • Agent Toolkit: Ready-to-use CatchAllTools for LangGraph agents.
  • Dual-Mode: Supports both granular control (for scripts) and autonomous agents.
  • LLM Agnostic: Works with OpenAI, Gemini, Anthropic, or any LangChain-compatible model.

🚀 Quick Start

Installation

pip install langchain-catchall

Basic Usage (One-Shot Search)

import os
from langchain_catchall import CatchAllClient

os.environ["CATCHALL_API_KEY"] = "your_key"

client = CatchAllClient(api_key=os.environ["CATCHALL_API_KEY"])

# Search and wait for results
result = client.search("Find all articles about security incidents (data breaches, ransomware, hacks) disclosed for last 3 days")

print(f"Found {result.valid_records} records.")
for record in result.all_records[:3]:
    print(f"- {record.record_title}")

🤖 Building an Autonomous Agent

The real power comes when you connect CatchAll to a LangGraph agent. The agent can decide when to search for new data and when to just analyze what it already found.

from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain.messages import HumanMessage
from langchain_catchall import CatchAllTools, CATCHALL_AGENT_PROMPT

# Here is CATCHALL_AGENT_PROMPT:
"""You are a News Research Assistant powered by CatchAll.

Your workflow is strictly defined:

1. SEARCH: Use `catchall_search_data` to get a broad initial dataset (e.g., 'Find all US office openings').
   - WARNING: This tool takes 15 minutes. NEVER call it twice in a row.
   - After searching, STOP and return what you found. WAIT for the user's next question.
   - DO NOT automatically analyze or summarize unless explicitly asked.
   - If the user asks for a limited number of results (e.g., "top 50", "limit to 20"), pass that limit into the search tool.
   
2. ANALYZE: Use `catchall_analyze_data` ONLY when the user asks a follow-up question.
   - FILTERING & SORTING: 'Show me only Florida deals', 'Sort by date', 'Find top 3'.
   - AGGREGATION: 'Group by state', 'Count by industry'.
   - QA: 'What are the main trends?', 'Summarize key findings'.
   
CRITICAL RULES:
- After a search completes, report the number of results found and STOP. Wait for user input.
- ONLY call analyze_data when the user explicitly asks a follow-up question.
- If user says "Find X", just search and report results. If they say "Summarize Y" or "Show me Z", then analyze.
- Never use `catchall_search_data` to filter. Always use `catchall_analyze_data` for filtering.
- If the user asks for a subset of data (like 'only Florida deals'), assume it is ALREADY in your search results.
- Only use `catchall_search_data` if the user explicitly asks for a 'new search' or a completely different topic.
"""

# 1. Setup Tools
llm = ChatOpenAI(model="gpt-4o")
toolkit = CatchAllTools(api_key="...", llm=llm, verbose=True)
tools = toolkit.get_tools()

# 2. Create Agent
agent = create_agent(
    model=ChatOpenAI(model="gpt-4o"),
    tools=tools,
    system_prompt=CATCHALL_AGENT_PROMPT
)

# 3. Run
messages = [HumanMessage(content="Find all articles about corporate HQ relocations or office closures in the US for last 3 days")]

response = agent.invoke({"messages": messages})
print(response["messages"][-1].content)

📚 Advanced Patterns

Fetch Once, Query Many (Financial Analyst Mode)

Perfect for deep dives where you don't want to re-run the search every time.

from langchain_catchall import CatchAllClient, query_with_llm
from langchain_openai import ChatOpenAI

# 1. Set up LLM
llm = ChatOpenAI(model="gpt-4o")

# 2. Grab needed data using CatchAllClient
client = CatchAllClient(api_key="...")
result = client.search("Find all articles about seed rounds over $5M announced this week")

# 3. The Fast Analysis (Local Cache)
# Ask as many questions as you want
print(query_with_llm(result, "List top 3 deals", llm))
print(query_with_llm(result, "Who are the CEOs?", llm))
print(query_with_llm(result, "What is total amount of money raised in the US market", llm))

📄 License

MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages