# 05 - Data Ingestion & Refresh

This notebook covers the data ingestion pipeline that keeps our local market database up-to-date.

## Ingestion Pipeline

The project uses two main ways to ingest data:
1. **Refresh Markets Script**: A heavy-duty script that fetches hundreds of events and categorizes them.
2. **Ingestion Team**: A specialized team of agents (simulated) that fetch markets and perform research (news) before storing them.

### Categorization Logic

The refresh script uses a rule-based system to categorize markets into topics like Politics, Sports, Crypto, etc., based on tags and keywords.

In [None]:
from scripts.python.refresh_markets import categorize_event, CATEGORIES
from pprint import pprint

sample_event = {
    "title": "2024 Presidential Election",
    "description": "Who will win the next election?",
    "tags": [{"label": "Politics"}]
}

category = categorize_event(sample_event)
print(f"Detected Category: {category}")
print("\nSupported Categories:")
pprint(list(CATEGORIES.keys()))

## Running a Refresh

You can manually trigger a database refresh. This will fetch new markets and clean up expired ones.

In [None]:
from scripts.python.refresh_markets import refresh_database

# Note: This will actually update your data/markets.db
print("Refreshing database (limit 10 events for demo)...\n")
stats = refresh_database(max_events=10)
pprint(stats)

## Ingestion Team

The Ingestion Team provides a more "agentic" approach, where it not only fetches the market but also uses research tools to find relevant news before storing.

In [None]:
from polymarket_agents.team.ingestion import IngestionTeam

team = IngestionTeam()
print("Ingestion Team ready to run cycles.")
# team.run_cycle(limit=2)

## Continuous Refresh

In production, the refresh script is usually run in continuous mode via a bash script or a daemon to ensure the agent always has the latest prices.

In [None]:
print("To run continuous refresh in the background:")
print("nohup python scripts/python/refresh_markets.py --continuous --interval 300 &")