# Sisal Selenium Scraper Test

This notebook tests the Selenium-based scraper with a specific live match: **Francia U21 vs Georgia U21**.

The scraper will connect to the Sisal page, extract betting odds, and create a BettingOdds instance.

In [1]:
import sys
import os

# Add the project root to Python path
# Get the current notebook directory and go up one level to reach project root
current_dir = os.path.dirname(os.path.abspath('__file__' if '__file__' in globals() else 'scraper_test.ipynb'))
project_root = os.path.dirname(current_dir)

if project_root not in sys.path:
    sys.path.insert(0, project_root)

# Import the Selenium scraper class
from src.scraper import SisalSeleniumScraper

In [2]:
# Test scraper with the specified match
# Target match URL
url = "https://www.sisal.it/scommesse-live/evento/calcio/cecafa-cup-f/burundi-kenya"
print("Testing Sisal Selenium Scraper")
print(f"URL: {url}")
print("=" * 60)

# Create scraper instance and scrape the odds
scraper = SisalSeleniumScraper(headless=False)
odds = scraper.scrape_betting_odds(url)

if odds:
    print("✅ SUCCESS! Betting odds extracted")
    print(f"Match: {odds.home_team} vs {odds.away_team}")
    print(f"Timestamp: {odds.timestamp}")
    print(f"Source: {odds.source}")
    print()
    
    # Display odds
    print("Extracted odds:")
    print(f"  1X2: {odds.home_win} / {odds.draw} / {odds.away_win}")
    print(f"  O/U 2.5: {odds.over_2_5} / {odds.under_2_5}")
    print(f"  BTTS: {odds.both_teams_score_yes} / {odds.both_teams_score_no}")
    print(f"  Double Chance: {odds.home_or_draw} / {odds.away_or_draw} / {odds.home_or_away}")
    print()
    
    # Show complete BettingOdds object
    print("Complete BettingOdds object:")
    print(repr(odds))
    
else:
    print("❌ Failed to extract odds")
    print("Possible reasons:")
    print("- Match not live")
    print("- Geographic restrictions") 
    print("- Page structure changed")
    print("- Betting markets not available")

Testing Sisal Selenium Scraper
URL: https://www.sisal.it/scommesse-live/evento/calcio/cecafa-cup-f/burundi-kenya
Starting scraping for: https://www.sisal.it/scommesse-live/evento/calcio/cecafa-cup-f/burundi-kenya
✓ CSV storage initialized: data\sisal_odds_20250615_153114.csv
✓ Chrome WebDriver setup successful
✓ Chrome WebDriver setup successful
✓ Cookie banner accepted
✓ Page content loaded
✓ Teams: Burundi vs Kenya
✓ 1X2 Main odds extracted
✓ Double Chance odds extracted
✓ Cookie banner accepted
✓ Page content loaded
✓ Teams: Burundi vs Kenya
✓ 1X2 Main odds extracted
✓ Double Chance odds extracted
✓ Over/Under odds extracted
✓ Goal/NoGoal odds extracted

=== EXTRACTED BETTING ODDS ===
Match: Burundi vs Kenya
Match ID: burundi-kenya
Source: Sisal
Timestamp: 2025-06-15 15:31:21.437684
1X2: 26.0 / 15.0 / None
Double Chance: 8.68 / None / None
O/U 2.5: 1.84 / 1.68
BTTS: 1.93 / 1.61

✓ Stored odds for Burundi vs Kenya
✓ Over/Under odds extracted
✓ Goal/NoGoal odds extracted

=== EXTRACTE

## Continuous Scraping Test

The scraper now supports continuous data extraction at regular intervals. This is useful for monitoring live betting odds changes over time.

In [4]:
# Example of continuous scraping (commented out to avoid long execution)
# This would scrape data every 10 seconds for 5 minutes

from src.storage import CSVBettingOddsStorage
from datetime import datetime

# Create a storage instance for continuous scraping
continuous_storage = CSVBettingOddsStorage(
    session_id=f"demo_continuous_{datetime.now().strftime('%H%M%S')}",
    output_dir="data/continuous_demo",
    filename_prefix="sisal_live_demo"
)

# Create scraper with the continuous storage
continuous_scraper = SisalSeleniumScraper(headless=True, storage=continuous_storage)

print("🎯 Continuous scraping setup ready!")
print(f"📁 Storage session: {continuous_storage.session_id}")
print(f"📂 Output will be saved to: {continuous_storage.output_dir}")

# To actually run continuous scraping, uncomment the following:
successful_scrapes = continuous_scraper.scrape_continuously(
     url=url,  # Use the same URL from above
     duration_minutes=1,  # Run for 5 minutes
     interval_seconds=5  # Scrape every 10 seconds
)
print(f"✅ Completed! {successful_scrapes} successful scrapes")

# Don't forget to close the scraper when done
continuous_scraper.close()

🎯 Continuous scraping setup ready!
📁 Storage session: demo_continuous_153609
📂 Output will be saved to: data\continuous_demo
🚀 Starting continuous scraping session
   📍 URL: https://www.sisal.it/scommesse-live/evento/calcio/cecafa-cup-f/burundi-kenya
   ⏱️  Duration: 1 minutes
   🔄 Interval: 5 seconds
   📁 Storage: CSVBettingOddsStorage
🎯 Session started at 2025-06-15 15:36:09
⏰ Session will end at 2025-06-15 15:37:09
   Press Ctrl+C to stop early
------------------------------------------------------------
🌐 Performing initial page setup...
Starting scraping for: https://www.sisal.it/scommesse-live/evento/calcio/cecafa-cup-f/burundi-kenya
✓ CSV storage initialized: data\continuous_demo\sisal_live_demo_demo_continuous_153609.csv
✓ Chrome WebDriver setup successful
✓ Chrome WebDriver setup successful
✓ Cookie banner accepted
✓ Page content loaded
✓ Teams: Burundi vs Kenya
✓ 1X2 Main odds extracted
✓ Double Chance odds extracted
✓ Cookie banner accepted
✓ Page content loaded
✓ Teams: Bur

### Command Line Usage

For production use, you can use the command line script:

```bash
# Basic usage - scrape for 10 minutes every 10 seconds
python continuous_scraping_example.py "https://www.sisal.it/scommesse/sport/calcio/match/123"

# Custom duration and interval
python continuous_scraping_example.py "https://..." --duration 30 --interval 15

# Headless mode with custom output directory
python continuous_scraping_example.py "https://..." --headless --output-dir data/match_analysis

# Show help
python continuous_scraping_example.py --help
```

### Key Features

1. **Timed Sessions**: Automatically stops after specified duration
2. **Regular Intervals**: Extracts data at consistent intervals
3. **Session Management**: Each session gets a unique CSV file
4. **Graceful Shutdown**: Handles Ctrl+C interruption properly
5. **Progress Monitoring**: Shows real-time progress and statistics
6. **Error Handling**: Continues running even if individual scrapes fail