Python SDK for TABStack AI - Extract, Generate, and Automate web content using AI.
- 🔍 Extract: Convert web content to markdown or structured JSON
- ✨ Generate: Transform and enhance web data with AI
- 🤖 Automate: Execute complex web automation tasks using natural language
- ⚡ Async/Await: Modern async Python API for efficient concurrent operations
- 🔄 Connection Pooling: Configurable HTTP connection pooling for optimal performance
- 📘 Fully Typed: Complete type hints for better IDE support and type safety
- 🔒 JSON Schema: Use standard JSON Schema for structured data extraction
- 🛡️ Error Handling: Comprehensive custom exceptions for all API errors
uv pip install tabstackOr add to your project:
uv add tabstackpip install tabstackpoetry add tabstackpipenv install tabstackgit clone https://github.com/Mozilla-Ocho/tabstack-python.git
cd tabstack-python
pip install -e ".[dev]"import asyncio
import os
from tabstack import TABStack
async def main():
# Initialize the client with connection pooling
async with TABStack(
api_key=os.getenv('TABSTACK_API_KEY'),
max_connections=100,
max_keepalive_connections=20
) as tabs:
# Extract markdown from a URL
result = await tabs.extract.markdown(
url="https://news.ycombinator.com",
metadata=True
)
print(result.content)
print(result.metadata.title)
# Extract structured JSON data
schema = {
"type": "object",
"properties": {
"stories": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"points": {"type": "number"},
"author": {"type": "string"}
}
}
}
}
}
data = await tabs.extract.json(
url="https://news.ycombinator.com",
schema=schema
)
# Generate transformed content with AI
summary_schema = {
"type": "object",
"properties": {
"summaries": {
"type": "array",
"items": {
"type": "object",
"properties": {
"title": {"type": "string"},
"category": {"type": "string"},
"summary": {"type": "string"}
}
}
}
}
}
# Transform URL content with AI
summaries = await tabs.generate.json(
url="https://news.ycombinator.com",
schema=summary_schema,
instructions="For each story, categorize it and write a one-sentence summary"
)
# Automate web tasks (streaming)
async for event in tabs.automate.execute(
task="Find the top 3 trending repositories and extract their details",
url="https://github.com/trending"
):
if event.type == "task:completed":
print(f"Result: {event.data.final_answer}")
elif event.type == "agent:extracted":
print(f"Extracted: {event.data.extracted_data}")
# Run the async function
asyncio.run(main())All methods are async and should be awaited. The client supports async context manager for automatic connection cleanup.
from tabstack import TABStack
async with TABStack(
api_key="your-api-key",
base_url="https://api.tabstack.ai/", # optional
max_connections=100, # optional
max_keepalive_connections=20, # optional
keepalive_expiry=30.0, # optional, in seconds
timeout=60.0 # optional, in seconds
) as tabs:
# Your code here
passParameters:
api_key(str, required): Your TABStack API keybase_url(str, optional): API base URL. Default:https://api.tabstack.ai/max_connections(int, optional): Maximum concurrent connections. Default:100max_keepalive_connections(int, optional): Maximum idle connections to keep alive. Default:20keepalive_expiry(float, optional): Seconds to keep idle connections alive. Default:30.0timeout(float, optional): Request timeout in seconds. Default:60.0
The Extract operator converts web content into structured formats without AI transformation.
Convert URL content to Markdown format.
Parameters:
url(str): URL to convertmetadata(bool): If True, return metadata as separate field. If False, embed as YAML frontmatter. Default:Falsenocache(bool): Bypass cache and force fresh retrieval. Default:False
Returns: MarkdownResponse with url, content, and optional metadata fields
Example:
result = await tabs.extract.markdown(
url="https://example.com",
metadata=True
)
print(result.content)
print(result.metadata.title)Generate a JSON Schema by analyzing the structure of a webpage.
Parameters:
url(str): URL to analyzeinstructions(str): Instructions for what data to extract (max 1000 characters)nocache(bool): Bypass cache. Default:False
Returns: SchemaResponse with generated schema dict
Example:
result = await tabs.extract.schema(
url="https://example.com/products",
instructions="Extract product listings with name, price, and availability"
)
# Use the schema for extraction
data = await tabs.extract.json(url="https://example.com/products", schema=result.schema)Extract structured JSON data from a URL using a schema.
Parameters:
url(str): URL to extract fromschema(dict): JSON Schema defining the structurenocache(bool): Bypass cache. Default:False
Returns: JsonResponse with extracted data
Example:
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"}
}
}
result = await tabs.extract.json(url="https://example.com", schema=schema)
print(result.data)The Generate operator uses AI to transform and enhance web content.
Fetch URL content and transform it into structured JSON using AI.
Parameters:
url(str): URL to fetch content fromschema(dict): JSON Schema for output structureinstructions(str): AI instructions for transformationnocache(bool): Bypass cache and force fresh retrieval. Default:False
Returns: JsonResponse with generated data
Example:
# Transform URL content with AI
schema = {
"type": "object",
"properties": {
"summary": {"type": "string"},
"topics": {"type": "array", "items": {"type": "string"}}
}
}
result = await tabs.generate.json(
url="https://news.ycombinator.com",
schema=schema,
instructions="Summarize the content and extract main topics"
)The Automate operator executes complex web automation tasks using natural language.
Execute an AI-powered browser automation task (returns async iterator for Server-Sent Events).
Parameters:
task(str): Natural language description of the taskurl(str, optional): Starting URL for the taskschema(dict, optional): JSON Schema for structured data extraction
Yields: AutomateEvent objects with type and data fields
Event Types:
start: Automation startedagent:navigating: Agent is navigating to a URLagent:thinking: Agent is analyzing the pageagent:action: Agent performed an action (click, scroll, etc.)agent:extracted: Agent extracted structured datatask:completed: Task finished successfully
Example:
schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"stars": {"type": "number"}
}
}
}
async for event in tabs.automate.execute(
task="Find trending repositories and extract their names and star counts",
url="https://github.com/trending",
schema=schema
):
if event.type == "agent:extracted":
print(f"Extracted: {event.data.extracted_data}")
elif event.type == "task:completed":
print(f"Final answer: {event.data.final_answer}")TABStack uses standard JSON Schema for defining data structures. Here are common patterns:
schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"price": {"type": "number"},
"in_stock": {"type": "boolean"}
}
}schema = {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {"type": "number"},
"name": {"type": "string"}
}
}
}schema = {
"type": "object",
"properties": {
"product": {
"type": "object",
"properties": {
"name": {"type": "string"},
"details": {
"type": "object",
"properties": {
"weight": {"type": "number"},
"dimensions": {"type": "string"}
}
}
}
}
}
}schema = {
"type": "object",
"properties": {
"tags": {
"type": "array",
"items": {"type": "string"}
}
}
}For more information on JSON Schema, see json-schema.org.
The SDK provides specific exception classes for different error scenarios:
| Exception | Status Code | Description | Retryable |
|---|---|---|---|
BadRequestError |
400 | Invalid request parameters | No |
UnauthorizedError |
401 | Invalid or missing API key | No |
InvalidURLError |
422 | URL is invalid or inaccessible | No |
ServerError |
500 | Internal server error | Yes (with backoff) |
ServiceUnavailableError |
503 | Service temporarily unavailable | Yes (after delay) |
APIError |
Other | Generic API error | Depends on status |
import asyncio
from tabstack import TABStack
from tabstack.exceptions import (
BadRequestError,
UnauthorizedError,
InvalidURLError,
ServerError,
ServiceUnavailableError,
)
async def main():
async with TABStack(api_key="your-api-key") as tabs:
try:
result = await tabs.extract.markdown(url="https://example.com")
except UnauthorizedError:
print("Error: Invalid API key")
except InvalidURLError as e:
print(f"Error: URL is invalid or inaccessible - {e.message}")
except BadRequestError as e:
print(f"Error: Bad request - {e.message}")
except ServerError as e:
print(f"Server error (retryable): {e.message}")
# Implement retry logic with exponential backoff
except ServiceUnavailableError as e:
print(f"Service unavailable (retryable): {e.message}")
# Wait and retry
asyncio.run(main())# Clone the repository
git clone https://github.com/Mozilla-Ocho/tabstack-python.git
cd tabstack-python
# Install with development dependencies
pip install -e ".[dev]"# Run all tests
pytest
# Run with coverage
pytest --cov=tabstack --cov-report=html
# Run specific test file
pytest tests/test_extract.py
# Run with verbose output
pytest -v# Format code with ruff
ruff format .
# Lint code
ruff check .
# Type checking
mypy tabstack/tests/
├── conftest.py # Shared pytest fixtures
├── test_client.py # TABStack client tests
├── test_extract.py # Extract operator tests
├── test_generate.py # Generate operator tests
├── test_automate.py # Automate operator tests
├── test_http_client.py # HTTP client tests
├── test_types.py # Response type tests
├── test_exceptions.py # Exception tests
├── test_utils.py # Utility function tests
└── test_integration.py # End-to-end integration tests
All tests use mocked HTTP responses - no real API calls are made during testing.
Contributions are welcome! Here's a quick checklist:
- Fork the repository and create a feature branch
- Write tests for new functionality
- Ensure all tests pass (
pytest) - Format code with ruff (
ruff format .) - Ensure linting passes (
ruff check .) - Update documentation as needed
- Submit a pull request with clear description
- Python 3.10+ (tested on 3.10, 3.11, 3.12, 3.13, 3.14)
- httpx >= 0.27.0
Apache License 2.0 - see LICENSE for details.
- Homepage: https://tabstack.ai
- Documentation: https://docs.tabstack.ai
- PyPI: https://pypi.org/project/tabstack/
- Repository: https://github.com/Mozilla-Ocho/tabstack-python
- Issues: https://github.com/Mozilla-Ocho/tabstack-python/issues
- Email: support@tabstack.ai
- Discord: Join our community
- Documentation: docs.tabstack.ai