Skip to content

Pastebin Integration for Large Outputs #7

@anchitjain1234

Description

@anchitjain1234

Task: Pastebin Integration for Large Outputs

Description

Implement external pastebin service integration to handle command outputs that exceed Discord's message size limits. This system will automatically detect large outputs, upload them to external pastebin services, and provide Discord users with convenient links to view the complete output. The implementation includes support for multiple pastebin providers with automatic failover and secure upload handling.

This task provides an alternative approach to real-time streaming by focusing on complete output upload after command execution, making it suitable for commands with very large outputs or when real-time updates are not critical.

Acceptance Criteria

Core Functionality

  • Automatic detection of large outputs exceeding Discord limits (>1900 chars)
  • Upload complete command output to external pastebin services
  • Support for multiple pastebin providers (hastebin, pastebin.com, etc.)
  • Automatic failover between services when primary provider fails
  • Generate and return Discord-formatted links with output summaries
  • Configurable upload thresholds and service preferences

Pastebin Provider Support

  • Hastebin/Haste-server: Primary provider, self-hostable
  • GitHub Gist: Secure, authenticated uploads with version control
  • Pastebin.com: Popular service with API key authentication
  • Ix.io: Simple, anonymous uploads as fallback
  • Extensible provider interface for adding new services

Security & Privacy

  • Secure API key management for authenticated services
  • Optional output scrubbing to remove sensitive data before upload
  • Configurable expiration times for uploaded content
  • Option to use private/unlisted uploads where supported
  • Audit logging of all upload attempts and results

Error Handling & Resilience

  • Robust failover between providers when services are down
  • Rate limiting compliance for each pastebin service
  • Retry logic with exponential backoff for temporary failures
  • Graceful degradation when all pastebin services are unavailable
  • Clear error messages for upload failures

Technical Details

Architecture Components

Pastebin Manager (output/pastebin.go)

type PastebinManager struct {
    providers  []PastebinProvider
    config     *PastebinConfig
    httpClient *http.Client
    logger     *logrus.Logger
}

type PastebinConfig struct {
    Providers    []ProviderConfig
    Threshold    int           // Size threshold for upload
    Timeout      time.Duration
    MaxRetries   int
    ScrubRegex   []string     // Patterns to scrub from output
    DefaultExpiry time.Duration
}

Provider Interface (output/providers.go)

type PastebinProvider interface {
    Name() string
    Upload(content string, opts UploadOptions) (*PasteResult, error)
    HealthCheck() error
    RateLimit() *RateLimiter
}

type UploadOptions struct {
    Title      string
    Syntax     string        // Language for syntax highlighting
    Expiry     time.Duration
    Private    bool
    AuthToken  string
}

type PasteResult struct {
    URL        string
    RawURL     string        // Direct text access URL
    ID         string
    ExpiresAt  time.Time
    Provider   string
}

Provider Implementations

// Hastebin provider
type HastebinProvider struct {
    baseURL    string
    httpClient *http.Client
    rateLimiter *RateLimiter
}

// GitHub Gist provider  
type GistProvider struct {
    token      string
    httpClient *http.Client
    rateLimiter *RateLimiter
}

// Pastebin.com provider
type PastebinProvider struct {
    apiKey     string
    userKey    string
    httpClient *http.Client
    rateLimiter *RateLimiter
}

Implementation Strategy

1. Output Size Detection

  • Monitor command output size during execution
  • Trigger pastebin upload when threshold exceeded
  • Support both total size and individual message size thresholds
  • Early detection to avoid unnecessary Discord message attempts

2. Provider Management

  • Priority-ordered provider list with automatic failover
  • Health checks for provider availability monitoring
  • Rate limiting per provider to respect service limits
  • Circuit breaker pattern for temporarily unavailable services

3. Content Processing

  • Output scrubbing to remove sensitive information (tokens, passwords, IPs)
  • Syntax detection for appropriate highlighting (bash, json, logs, etc.)
  • Content compression for very large outputs
  • Metadata inclusion (command, timestamp, exit code, duration)

4. Discord Integration

  • Rich embed formatting with upload results
  • Summary statistics (lines, size, upload time)
  • Direct link and raw text link provision
  • Fallback message when all upload attempts fail

Upload Flow

Command Output → Size Check → Content Scrub → Provider Selection → Upload → Discord Link
                     ↓              ↓              ↓             ↓           ↓
                 Skip if small  Remove secrets  Try primary   Retry on    Rich embed
                                                  ↓          failure        with stats
                                              Try backup
                                              providers

Dependencies

Internal Dependencies

  • Task 003: Command execution infrastructure for output capture
  • executor/command.go: Access to complete command output
  • discord/bot.go: Discord message formatting and sending
  • config/config.go: Pastebin provider configuration
  • auth/audit.go: Upload attempt logging

External Dependencies

  • net/http package for API communication
  • Provider-specific API documentation and SDKs
  • encoding/json for API request/response handling
  • regexp package for output scrubbing
  • compress/gzip for large content compression

Effort Estimate

Total: 3 days

Day 1: Core Infrastructure (8 hours)

  • Design and implement PastebinProvider interface
  • Create PastebinManager with provider registration
  • Implement size threshold detection and triggering logic
  • Build content scrubbing and processing pipeline
  • Write unit tests for core components

Day 2: Provider Implementations (8 hours)

  • Implement Hastebin provider with API integration
  • Add GitHub Gist provider with authentication
  • Create Pastebin.com provider with API key handling
  • Add fallback ix.io provider for simple uploads
  • Implement rate limiting and health checking for each provider

Day 3: Integration & Error Handling (8 hours)

  • Integrate pastebin system with command execution flow
  • Add comprehensive error handling and failover logic
  • Implement Discord message formatting for upload results
  • Write integration tests with mock APIs
  • Add configuration management and audit logging

Definition of Done

Code Quality

  • All provider implementations follow consistent interface
  • Unit test coverage >90% for pastebin components
  • Integration tests with mock external APIs
  • Error handling tests for all failure scenarios
  • Performance tests with large output uploads

Functionality

  • Automatic upload triggered by configurable size thresholds
  • All specified pastebin providers working with proper failover
  • Output scrubbing removes configured sensitive patterns
  • Discord integration shows rich upload results with statistics
  • Configuration allows customization of providers and settings

Security

  • API keys securely stored and managed
  • Sensitive data scrubbing working correctly
  • Private/unlisted uploads used where configured
  • Comprehensive audit logging of all upload attempts
  • No sensitive information leaked in error messages

Performance

  • Upload operations complete within 10 seconds for normal outputs
  • Failover between providers happens within 5 seconds
  • Large outputs (>10MB) handled without memory issues
  • Concurrent uploads from multiple commands work correctly

Integration

  • Seamless integration with command execution system
  • Provider configuration loaded from YAML config files
  • Error messages clearly indicate upload failures and alternatives
  • Clean resource management and connection pooling

Ready for Production: Large command outputs are automatically and reliably uploaded to external pastebin services with robust error handling and security controls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    taskIndividual task within an epic

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions