Skip to content

MAJ224/TelegramScrapper

Repository files navigation

Telegram Sync API

Fetch Telegram channel messages and extract regex patterns via FastAPI.

Features

  • Named regex patterns for organized extraction results.
  • Async FastAPI + Telethon client; graceful FloodWait handling.
  • JSON config file for channels and patterns.
  • Stateless - fetches latest N messages on each request.

Requirements

Setup

  1. Install dependencies:

    python -m venv .venv
    .venv/Scripts/activate  # Windows
    pip install -r requirements.txt
  2. Copy environment template:

    cp .env.example .env
    cp config.example.json config.json
  3. Configure your .env:

    TG_API_ID=123456
    TG_API_HASH=your_api_hash_here
    API_KEY=your_secret_key
    DEFAULT_LIMIT=100
  4. Configure your config.json:

    {
      "channels": [
        "@channel1",
        "@channel2"
      ],
      "patterns": {
        "digits": "\\d+",
        "urls": "https?://\\S+",
        "proxy_links": "https?://t\\.me/proxy\\?.*"
      }
    }

Authenticate with Telegram

Run the login helper once to create an authorized session file:

python -m app.auth_login
  • Prompts for phone and code; stores session at TG_SESSION_PATH (default ./data/session.session).
  • If two-factor is enabled, you will be prompted for the password.

Run the API

uvicorn app.main:app --host 0.0.0.0 --port 8000

Endpoints

  • GET /health - Health check

    { "status": "ok" }
  • GET /sync?limit=N - Sync messages (requires X-API-KEY header)

    • limit (optional): Total messages to check across all channels. Defaults to DEFAULT_LIMIT.
    • The limit is divided evenly among configured channels.

Response shape

{
  "results": [
    {
      "name": "digits",
      "items": ["123", "456"]
    },
    {
      "name": "urls",
      "items": ["https://example.com"]
    }
  ]
}

Configuration

Environment Variables (.env)

Variable Description Default
TG_API_ID Telegram API ID required
TG_API_HASH Telegram API Hash required
API_KEY API authentication key required
TG_SESSION_PATH Path to session file ./data/session.session
DEFAULT_LIMIT Default message limit 100
REQUEST_TIMEOUT_SECONDS Request timeout 25
SLEEP_BETWEEN_CHANNELS_MS Delay between channels 200

JSON Config (config.json)

Field Description
channels Array of channel usernames (with or without @)
patterns Object mapping pattern names to regex strings

Tests

Run unit tests:

pytest

Docker (optional)

A simple compose file is included for local runs:

docker-compose up --build

Expose API_KEY and Telegram credentials via environment or .env.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors