# Testing Options Scraper with PostgreSQL Database - ROBUST VERSION

This notebook has been **enhanced** with robust Chrome handling and writes data directly to PostgreSQL:

## üöÄ **Robust Chrome Startup Features**
1. **Process Cleanup**: Kills existing Chrome processes before starting
2. **Incognito Mode**: Avoids user data directory conflicts entirely  
3. **Random Debug Ports**: Prevents port conflicts between sessions
4. **Retry Logic**: Up to 3 attempts with cleanup between failures
5. **Comprehensive Chrome Flags**: Optimized for server/VM environments

## üîß **Enhanced Login Handling**
1. **Multiple Click Methods**: Normal, ActionChains, JavaScript, form submit
2. **Element Visibility**: Scrolls to elements before interaction
3. **Robust Error Handling**: Graceful fallbacks for click interception
4. **Retry Mechanism**: Multiple login attempts with detailed feedback

## üíæ **PostgreSQL Integration**
1. **Direct Database Writes**: No intermediate SQLite database
2. **Secure Credentials**: Uses `config/credentials.json` for all authentication
3. **Real-time Availability**: Data appears instantly in utility notebooks
4. **PostgreSQL Syntax**: Proper `%s` parameter placeholders

## üõ†Ô∏è **Setup Requirements**

1. **Update web scraping credentials** in `config/credentials.json`:
   ```json
   {
     "web_scraping": {
       "optionrecom": {
         "username": "your_actual_username_or_email",
         "password": "your_actual_password"
       }
     }
   }
   ```

2. **Ensure PostgreSQL access** is working (credentials already configured)

3. **Run the scraper** - it will handle all Chrome conflicts automatically

## üéØ **Expected Behavior**

```
üßπ Cleaning up existing Chrome processes...
üöÄ Attempting to start Chrome (attempt 1/3)...
‚úÖ Chrome started successfully!
üîê Using credentials from config/credentials.json
üîë Starting automated login process...
‚úÖ Login successful! Proceeding with data scraping...
üéØ Total records saved to PostgreSQL database: 7
üìä Records by strategy and tab (last hour) in PostgreSQL:
  Bear Call - Mild Risk 95-97% accuracy > shorter expiry: 2 records
  Bull Put - Minimal Risk 97-99% accuracy > longer expiry: 2 records
üìà Total records in PostgreSQL database: 518
```

In [6]:
import os
import sys
import json
import psycopg2
import tempfile
import uuid
import shutil
import random
import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.chrome.options import Options as ChromeOptions
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException, ElementClickInterceptedException
from selenium.webdriver.common.action_chains import ActionChains
import re
import hashlib
import coolname
from datetime import datetime

# Add paths for database configuration
sys.path.append('../config')
sys.path.append('../database')

# Load PostgreSQL credentials
with open('../config/credentials.json', 'r') as f:
    creds = json.load(f)

pg_creds = creds['database']['postgresql']

In [7]:
def load_credentials_from_json():
    """
    Load username and password from credentials.json file
    
    Returns:
    tuple: (username, password) or (None, None) if credentials not found
    """
    try:
        credentials_file = '../config/credentials.json'
        
        if not os.path.exists(credentials_file):
            print(f"Credentials file '{credentials_file}' not found.")
            print("Please ensure config/credentials.json exists with web_scraping section.")
            return None, None
        
        with open(credentials_file, 'r') as f:
            creds = json.load(f)
        
        # Extract web scraping credentials
        web_creds = creds.get('web_scraping', {}).get('optionrecom', {})
        username = web_creds.get('username')
        password = web_creds.get('password')
        
        if not username or not password:
            print("Web scraping credentials not found in credentials.json")
            print("Please add the following to config/credentials.json:")
            print('"web_scraping": {')
            print('  "optionrecom": {')
            print('    "username": "your_username_or_email",')
            print('    "password": "your_password"')
            print('  }')
            print('}')
            return None, None
        
        if username == "your_username_or_email" or password == "your_password":
            print("Please update the web scraping credentials in config/credentials.json")
            print("Current values are placeholder text that need to be replaced.")
            return None, None
        
        print("‚úÖ Web scraping credentials loaded from config/credentials.json")
        return username, password
        
    except Exception as e:
        print(f"Error reading credentials from JSON file: {str(e)}")
        return None, None

In [8]:
def automated_login(driver, username, password, max_retries=3):
    """
    Perform automated login to optionrecom.com with improved click handling
    
    Parameters:
    driver: Selenium WebDriver instance
    username (str): Username or email
    password (str): Password
    max_retries (int): Maximum number of login attempts
    
    Returns:
    bool: True if login successful, False otherwise
    """
    for attempt in range(max_retries):
        try:
            print(f"Login attempt {attempt + 1} of {max_retries}...")
            
            # Navigate to login page
            driver.get("https://optionrecom.com/my-account-2/")
            time.sleep(3)
            
            # Wait for and find username field
            username_field = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.NAME, "username"))
            )
            
            # Clear and enter username
            username_field.clear()
            username_field.send_keys(username)
            print("Username entered successfully")
            
            # Find and enter password
            password_field = driver.find_element(By.NAME, "password")
            password_field.clear()
            password_field.send_keys(password)
            print("Password entered successfully")
            
            # Find and click login button with improved handling
            login_button = WebDriverWait(driver, 10).until(
                EC.element_to_be_clickable((By.XPATH, "//button[@name='login']"))
            )
            
            # Scroll to button and ensure it's visible
            driver.execute_script("arguments[0].scrollIntoView({block: 'center', behavior: 'smooth'});", login_button)
            time.sleep(1)
            
            # Try multiple click methods
            click_successful = False
            
            # Method 1: Wait for element to be clickable and try normal click
            try:
                WebDriverWait(driver, 5).until(EC.element_to_be_clickable((By.XPATH, "//button[@name='login']")))
                login_button.click()
                click_successful = True
                print("Login button clicked (normal click)")
            except ElementClickInterceptedException:
                print("Normal click intercepted, trying alternative methods...")
            
            # Method 2: ActionChains click
            if not click_successful:
                try:
                    actions = ActionChains(driver)
                    actions.move_to_element(login_button).click().perform()
                    click_successful = True
                    print("Login button clicked (ActionChains)")
                except Exception as e:
                    print(f"ActionChains click failed: {str(e)}")
            
            # Method 3: JavaScript click
            if not click_successful:
                try:
                    driver.execute_script("arguments[0].click();", login_button)
                    click_successful = True
                    print("Login button clicked (JavaScript)")
                except Exception as e:
                    print(f"JavaScript click failed: {str(e)}")
            
            # Method 4: Submit the form instead
            if not click_successful:
                try:
                    form = driver.find_element(By.XPATH, "//form[.//button[@name='login']]")
                    driver.execute_script("arguments[0].submit();", form)
                    click_successful = True
                    print("Login form submitted")
                except Exception as e:
                    print(f"Form submit failed: {str(e)}")
            
            if not click_successful:
                raise Exception("All click methods failed")
            
            # Wait for page to load and check if login was successful
            time.sleep(5)
            
            # Check if we're still on the login page (indicates failed login)
            current_url = driver.current_url
            if "my-account" in current_url and "login" not in current_url.lower():
                print("Login successful!")
                return True
            
            # Check for error messages
            error_elements = driver.find_elements(By.CSS_SELECTOR, ".woocommerce-error, .error, [class*='error']")
            if error_elements:
                error_text = error_elements[0].text
                print(f"Login failed: {error_text}")
            else:
                print("Login may have failed - still on login page")
                
        except TimeoutException:
            print(f"Timeout on attempt {attempt + 1} - page took too long to load")
        except NoSuchElementException as e:
            print(f"Could not find login element on attempt {attempt + 1}: {str(e)}")
        except Exception as e:
            print(f"Unexpected error on attempt {attempt + 1}: {str(e)}")
        
        if attempt < max_retries - 1:
            print(f"Retrying in 3 seconds...")
            time.sleep(3)
    
    print(f"Login failed after {max_retries} attempts")
    return False

In [9]:
def generate_trade_id(scrape_date, strategy_type, tab_name, ticker, trigger_price, strike_price):
    """
    Generate a human-readable 3-word trade_id using coolname library.
    
    Args:
        scrape_date: Date when data was scraped
        strategy_type: Type of strategy (Bear Call, Bull Put, etc.)
        tab_name: Risk level and expiry category
        ticker: Stock ticker symbol
        trigger_price: Price that triggers the strategy
        strike_price: Strike prices for the option spread
    
    Returns:
        str: Human-readable trade ID like 'certain-magpie-dancing'
    """
    # Convert all inputs to strings and handle None values
    components = [
        str(scrape_date) if scrape_date is not None else '',
        str(strategy_type) if strategy_type is not None else '',
        str(tab_name) if tab_name is not None else '',
        str(ticker) if ticker is not None else '',
        str(trigger_price) if trigger_price is not None else '',
        str(strike_price) if strike_price is not None else ''
    ]
    
    # Join components with delimiter
    combined_string = '|'.join(components)
    
    # Generate hash and use as seed for reproducible results
    hash_value = hashlib.sha256(combined_string.encode('utf-8')).hexdigest()
    hash_seed = int(hash_value[:8], 16)
    
    # Set random seed for deterministic results
    random.seed(hash_seed)
    
    # Generate 3-word coolname
    trade_id = '-'.join(coolname.generate(3))
    
    return trade_id

def connect_to_database():
    """Connect to the PostgreSQL database"""
    try:
        conn = psycopg2.connect(
            host=pg_creds['host'],
            port=pg_creds['port'],
            database=pg_creds['database'],
            user=pg_creds['user'],
            password=pg_creds['password']
        )
        cursor = conn.cursor()
        
        # Verify the database has the required table
        cursor.execute("SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_name = 'option_strategies')")
        if not cursor.fetchone()[0]:
            print(f"Error: PostgreSQL database does not contain the option_strategies table.")
            print("Please run the database setup script first.")
            conn.close()
            return None, None
            
        print(f"‚úÖ Connected to PostgreSQL: {pg_creds['host']}")
        return conn, cursor
    except Exception as e:
        print(f"Error connecting to PostgreSQL database: {str(e)}")
        return None, None

In [10]:
def extract_date(driver):
    """Extract the date from the page"""
    try:
        # Search the page text
        page_text = driver.find_element(By.TAG_NAME, "body").text
        date_pattern = r'(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d+\w*,\s+\d{4}'
        matches = re.findall(date_pattern, page_text)
        
        if matches:
            return matches[0]
        
        return "Date not found"
        
    except Exception as e:
        print(f"Error extracting date: {str(e)}")
        return "Date extraction error"

In [11]:
def extract_options_expiry_date(driver, tab_content=None):
    """Extract the Options Expiry Date"""
    try:
        # Search entire page
        page_text = driver.find_element(By.TAG_NAME, "body").text
        date_match = re.search(r'Options Expiry Date:?\s*(\d{4}-\d{2}-\d{2})', page_text)
        if date_match:
            return date_match.group(1)
        
        # More general search
        date_match = re.search(r'(\d{4}-\d{2}-\d{2})', page_text)
        if date_match:
            return date_match.group(1)
        
        return "Expiry date not found"
        
    except Exception as e:
        print(f"Error extracting options expiry date: {str(e)}")
        return "Expiry date extraction error"

In [12]:
def is_active_trades_table(table):
    """Check if a table contains active trades (not closed trade perspectives)"""
    try:
        # Get table text to check for closed trade indicators
        table_text = table.text.lower()
        
        # Look for indicators of closed trades
        closed_indicators = [
            'closed trade perspectives',
            'historical',
            'past performance',
            'expired',
            'completed'
        ]
        
        # Check if any closed indicators are present
        for indicator in closed_indicators:
            if indicator in table_text:
                return False
        
        # Check the parent container for closed trade indicators
        parent_elements = []
        current = table
        for _ in range(3):  # Check up to 3 levels up
            try:
                current = current.find_element(By.XPATH, '..')
                parent_elements.append(current)
            except:
                break
        
        for parent in parent_elements:
            try:
                parent_text = parent.text.lower()
                for indicator in closed_indicators:
                    if indicator in parent_text:
                        return False
            except:
                continue
        
        return True
        
    except Exception as e:
        print(f"Error checking if table is active trades: {str(e)}")
        return True  # Default to True if we can't determine

In [13]:
def find_best_table_in_tab(driver, tab_content):
    """Find the best table within a tab, prioritizing active trades"""
    tables = []
    
    # First try to find tables within the tab content
    if tab_content:
        try:
            tables_in_tab = tab_content.find_elements(By.TAG_NAME, "table")
            tables.extend(tables_in_tab)
        except:
            pass
    
    # If no tables in tab content, look for visible tables on the page
    if not tables:
        all_tables = driver.find_elements(By.TAG_NAME, "table")
        visible_tables = [t for t in all_tables if t.is_displayed()]
        tables = visible_tables
    
    if not tables:
        return None
    
    # Filter for active trades tables and tables with data
    best_table = None
    best_score = -1
    
    for table in tables:
        try:
            # Check if table has data rows
            rows = table.find_elements(By.TAG_NAME, "tr")
            data_rows = [r for r in rows[1:] if r.find_elements(By.TAG_NAME, "td")]  # Skip header
            
            if not data_rows:
                continue
            
            score = len(data_rows)  # Base score on number of data rows
            
            # Prioritize active trades tables
            if is_active_trades_table(table):
                score += 1000  # Big bonus for active trades
            
            # Check if table has expected columns
            headers = table.find_elements(By.TAG_NAME, "th")
            header_texts = [h.text.upper() for h in headers]
            
            expected_columns = ['TICKER', 'SYMBOL', 'TRIGGER', 'STRIKE', 'PREMIUM']
            column_matches = sum(1 for col in expected_columns if any(col in h for h in header_texts))
            score += column_matches * 10
            
            if score > best_score:
                best_score = score
                best_table = table
                
        except Exception as e:
            print(f"Error evaluating table: {str(e)}")
            continue
    
    return best_table

In [14]:
def extract_table_data(driver, tab, tab_index, date_info, strategy_type, conn, cursor):
    """Extract data from the table in the current tab and save to PostgreSQL database - WITH TRADE_ID"""
    try:
        tab_name = tab.text.strip().replace('\n', ' ')
        print(f"\nProcessing Tab {tab_index+1}: '{tab_name}'")
        
        # Click the tab
        driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", tab)
        time.sleep(1)
        
        try:
            tab.click()
        except:
            driver.execute_script("arguments[0].click();", tab)
        
        time.sleep(3)  # Give more time for content to load
        
        # Find tab content
        tab_href = tab.get_attribute("href")
        tab_content = None
        
        if tab_href and "#" in tab_href:
            tab_id = tab_href.split("#")[1]
            try:
                tab_content = WebDriverWait(driver, 10).until(
                    EC.presence_of_element_located((By.ID, tab_id))
                )
                print(f"Found tab content for ID: {tab_id}")
            except Exception as e:
                print(f"Could not find tab content for ID {tab_id}: {str(e)}")
        
        # Extract options expiry date
        options_expiry_date = extract_options_expiry_date(driver, tab_content)
        
        # Find the best table for this tab
        table = find_best_table_in_tab(driver, tab_content)
        
        if not table:
            print(f"No suitable tables found in tab #{tab_index+1}")
            return 0
        
        # Check if this is an active trades table
        is_active = is_active_trades_table(table)
        print(f"Table type: {'Active trades' if is_active else 'Closed/Historical trades'}")
        
        if not is_active:
            print(f"Skipping closed trade perspectives table in tab #{tab_index+1}")
            return 0
        
        # Extract headers
        headers = table.find_elements(By.TAG_NAME, "th")
        header_texts = [header.text.strip() for header in headers]
        print(f"Table headers: {header_texts}")
        
        # Find column indices with improved matching
        column_map = {
            'ID': -1,
            'Ticker': -1,
            'Trigger Price': -1,
            'Strike Price': -1,
            'Estimated Premium': -1
        }
        
        for i, header in enumerate(header_texts):
            h_upper = header.upper()
            if 'ID' in h_upper and column_map['ID'] == -1:
                column_map['ID'] = i
            elif ('TICKER' in h_upper or 'SYMBOL' in h_upper) and column_map['Ticker'] == -1:
                column_map['Ticker'] = i
            elif ('TRIGGER' in h_upper and 'PRICE' in h_upper) or 'ENTRY' in h_upper and column_map['Trigger Price'] == -1:
                column_map['Trigger Price'] = i
            elif 'STRIKE' in h_upper and 'PRICE' in h_upper and column_map['Strike Price'] == -1:
                column_map['Strike Price'] = i
            elif ('PREMIUM' in h_upper or 'ESTIMATED' in h_upper) and column_map['Estimated Premium'] == -1:
                column_map['Estimated Premium'] = i
        
        print(f"Column mapping: {column_map}")
        
        # Extract rows
        rows = table.find_elements(By.TAG_NAME, "tr")[1:]  # Skip header
        records_count = 0
        
        print(f"Found {len(rows)} data rows")
        
        for row_idx, row in enumerate(rows):
            try:
                cells = row.find_elements(By.TAG_NAME, "td")
                if not cells:
                    continue
                
                print(f"Row {row_idx + 1}: {len(cells)} cells")
                
                # Extract data from cells with bounds checking
                item_id = cells[column_map['ID']].text.strip() if column_map['ID'] != -1 and column_map['ID'] < len(cells) else f"AUTO_{row_idx+1}"
                ticker_raw = cells[column_map['Ticker']].text.strip() if column_map['Ticker'] != -1 and column_map['Ticker'] < len(cells) else 'N/A'

                # Check if ticker contains (ER) and process accordingly
                er_value = 0
                if "(ER)" in ticker_raw:
                    ticker = ticker_raw.replace("(ER)", "").strip()
                    er_value = 1
                else:
                    ticker = ticker_raw

                # Skip records with invalid ticker values
                if not ticker or ticker.lower() in ['n/a', 'none', '', 'null']:
                    print(f"Skipping row {row_idx + 1} - invalid ticker: '{ticker}'")
                    continue

                trigger_price = cells[column_map['Trigger Price']].text.strip() if column_map['Trigger Price'] != -1 and column_map['Trigger Price'] < len(cells) else 'N/A'
                strike_price = cells[column_map['Strike Price']].text.strip() if column_map['Strike Price'] != -1 and column_map['Strike Price'] < len(cells) else 'N/A'
                estimated_premium = cells[column_map['Estimated Premium']].text.strip() if column_map['Estimated Premium'] != -1 and column_map['Estimated Premium'] < len(cells) else 'N/A'

                # Parse 'strike_price' to extract 'buy' and 'sell' values
                strike_buy_value, strike_sell_value = 0.0, 0.0
                if " - " in strike_price:
                    parts = strike_price.split(" - ")
                    if len(parts) == 2:
                        try:
                            strike_sell_part = parts[0].strip()
                            strike_buy_part = parts[1].strip()

                            # Extract numerical values more robustly
                            sell_match = re.search(r'(\d+\.?\d*)', strike_sell_part)
                            buy_match = re.search(r'(\d+\.?\d*)', strike_buy_part)
                            
                            if sell_match:
                                strike_sell_value = float(sell_match.group(1))
                            if buy_match:
                                strike_buy_value = float(buy_match.group(1))
                        except ValueError as e:
                            print(f"Error parsing strike prices: {str(e)}")

                # Generate trade_id for this record
                scrape_date_iso = datetime.now().isoformat()
                trade_id = generate_trade_id(
                    scrape_date_iso, strategy_type, tab_name, 
                    ticker, trigger_price, strike_price
                )

                print(f"Processing: {ticker} | Trigger: {trigger_price} | Strike: {strike_price} | Premium: {estimated_premium} | Trade ID: {trade_id[:20]}...")
                
                # PostgreSQL database insert with trade_id - UPDATED
                cursor.execute('''
                INSERT INTO option_strategies (
                    scrape_date, strategy_type, tab_name, ticker, trigger_price, 
                    strike_price, strike_buy, strike_sell, estimated_premium, item_id, 
                    options_expiry_date, date_info, er, trade_id
                ) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)
                ''', (
                    scrape_date_iso, strategy_type, tab_name, ticker, trigger_price, 
                    strike_price, strike_buy_value, strike_sell_value, estimated_premium, item_id, 
                    options_expiry_date, date_info, er_value, trade_id
                ))

                conn.commit()
                records_count += 1
                
            except Exception as e:
                print(f"Error processing row {row_idx + 1}: {str(e)}")
                continue
        
        print(f"Successfully saved {records_count} records from tab #{tab_index+1}")
        return records_count
        
    except Exception as e:
        print(f"Error processing tab #{tab_index+1}: {str(e)}")
        import traceback
        print(f"Full traceback: {traceback.format_exc()}")
        return 0

In [15]:
def process_strategy_page(driver, strategy_url, strategy_type, conn, cursor):
    """Process a single strategy page and extract data from all tabs"""
    try:
        print(f"\n===== Processing {strategy_type} Strategy Page =====")
        driver.get(strategy_url)
        time.sleep(5)  # Give more time for page to load
        
        # Extract the date
        date_info = extract_date(driver)
        print(f"Page date: {date_info}")
        
        # Find all tabs using multiple methods
        tabs = []
        tab_selectors = [
            "//div[contains(@class, 'ep_tabs_header')]//a[contains(@class, 'ep_label_main')]",
            "//a[contains(@class, 'ep_label_main')]",
            "//div[contains(@class, 'tabs')]//a",
            "//ul[contains(@class, 'tabs')]//a",
            "//div[contains(@class, 'tab')]//a"
        ]
        
        for selector in tab_selectors:
            try:
                found_tabs = driver.find_elements(By.XPATH, selector)
                if found_tabs:
                    tabs = found_tabs
                    print(f"Found {len(tabs)} tabs using selector: {selector}")
                    break
            except Exception as e:
                print(f"Selector {selector} failed: {str(e)}")
                continue
        
        if not tabs:
            print(f"No tab elements found on the {strategy_type} page")
            # Try to process any tables found on the page
            tables = driver.find_elements(By.TAG_NAME, "table")
            if tables:
                print(f"Found {len(tables)} tables without tabs, attempting to process...")
                # Create a dummy tab for processing
                class DummyTab:
                    def __init__(self, index):
                        self.index = index
                    def text(self):
                        return f"Table {self.index + 1}"
                    def get_attribute(self, attr):
                        return None
                    def click(self):
                        pass
                
                dummy_tab = DummyTab(0)
                return extract_table_data(driver, dummy_tab, 0, date_info, strategy_type, conn, cursor)
            return 0
        
        # Print tab information
        for i, tab in enumerate(tabs):
            try:
                tab_text = tab.text.strip().replace('\n', ' ')
                tab_href = tab.get_attribute('href')
                print(f"Tab {i+1}: '{tab_text}' -> {tab_href}")
            except:
                print(f"Tab {i+1}: Unable to get text/href")
        
        # Process only the first 4 tabs
        num_tabs_to_process = min(4, len(tabs))
        total_records = 0
        
        for i, tab in enumerate(tabs[:num_tabs_to_process]):
            records = extract_table_data(driver, tab, i, date_info, strategy_type, conn, cursor)
            total_records += records
            time.sleep(2)  # Small delay between tabs
        
        return total_records
            
    except Exception as e:
        print(f"Error processing {strategy_type} strategy page: {str(e)}")
        import traceback
        print(f"Full traceback: {traceback.format_exc()}")
        return 0

In [16]:
def scrape_option_strategies_automated(browser_type="chrome", 
                                     keep_browser_open=True):
    """
    HEADLESS VERSION: Uses headless mode to completely avoid user data directory conflicts
    This approach should work even in restricted environments
    
    Parameters:
    browser_type (str): 'chrome' or 'edge'
    keep_browser_open (bool): Keep browser open after scraping to maintain session
    
    Returns:
    int: Number of records added to the database
    """
    
    # Load credentials from JSON file
    username, password = load_credentials_from_json()
    if not username or not password:
        return 0
    
    # Connect to PostgreSQL database
    conn, cursor = connect_to_database()
    if not conn or not cursor:
        return 0
    
    # Aggressive cleanup
    try:
        print("üßπ Aggressive cleanup of Chrome processes and temp files...")
        os.system("pkill -9 -f chrome 2>/dev/null || true")
        os.system("pkill -9 -f chromium 2>/dev/null || true")
        os.system("rm -rf /tmp/.org.chromium.* 2>/dev/null || true")
        os.system("rm -rf /tmp/chrome_* 2>/dev/null || true")
        time.sleep(3)
    except:
        pass
    
    # Try headless mode first - this avoids most user data conflicts
    chrome_options = ChromeOptions()
    
    # HEADLESS mode - avoids user interface conflicts entirely
    chrome_options.add_argument('--headless=new')  # Use new headless mode
    
    # Essential arguments for headless operation
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--disable-extensions')
    chrome_options.add_argument('--disable-plugins')
    chrome_options.add_argument('--disable-images')
    chrome_options.add_argument('--no-first-run')
    chrome_options.add_argument('--disable-default-apps')
    chrome_options.add_argument('--disable-sync')
    chrome_options.add_argument('--disable-background-networking')
    chrome_options.add_argument('--disable-background-timer-throttling')
    chrome_options.add_argument('--disable-renderer-backgrounding')
    chrome_options.add_argument('--disable-features=TranslateUI,VizDisplayCompositor')
    
    # Set a reasonable window size for headless mode
    chrome_options.add_argument('--window-size=1920,1080')
    
    # Use random port
    debug_port = random.randint(20000, 60000)
    chrome_options.add_argument(f'--remote-debugging-port={debug_port}')
    
    # Disable automation detection
    chrome_options.add_argument('--disable-blink-features=AutomationControlled')
    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    
    # Add user agent to avoid detection
    chrome_options.add_argument('--user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36')
    
    # Try creating the driver
    driver = None
    max_attempts = 3
    
    for attempt in range(max_attempts):
        try:
            print(f"üöÄ Attempting to start Chrome in HEADLESS mode (attempt {attempt + 1}/{max_attempts})...")
            print(f"   Debug port: {debug_port}")
            print("   Mode: Headless (should avoid user data conflicts)")
            
            driver = webdriver.Chrome(
                service=ChromeService(ChromeDriverManager().install()), 
                options=chrome_options
            )
            
            # Test the driver
            driver.get("about:blank")
            print("‚úÖ Chrome started successfully in headless mode!")
            break
            
        except Exception as e:
            print(f"‚ùå Attempt {attempt + 1} failed: {str(e)}")
            
            if driver:
                try:
                    driver.quit()
                except:
                    pass
                driver = None
            
            if attempt < max_attempts - 1:
                # Aggressive cleanup before retry
                try:
                    os.system("pkill -9 -f chrome 2>/dev/null || true")
                    time.sleep(2)
                except:
                    pass
                
                # New port and fresh options
                debug_port = random.randint(20000, 60000)
                chrome_options = ChromeOptions()
                chrome_options.add_argument('--headless=new')
                chrome_options.add_argument('--no-sandbox')
                chrome_options.add_argument('--disable-dev-shm-usage')
                chrome_options.add_argument('--disable-gpu')
                chrome_options.add_argument('--window-size=1920,1080')
                chrome_options.add_argument(f'--remote-debugging-port={debug_port}')
                chrome_options.add_argument('--disable-blink-features=AutomationControlled')
                chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
                chrome_options.add_experimental_option('useAutomationExtension', False)
                chrome_options.add_argument('--user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36')
                
                print(f"   Retrying with new debug port: {debug_port}")
                time.sleep(2)
            else:
                print("‚ùå All headless attempts failed")
                return 0
    
    if not driver:
        print("‚ùå Failed to start Chrome driver")
        return 0
    
    try:
        # Perform automated login
        print("üîë Starting automated login process...")
        if not automated_login(driver, username, password):
            print("‚ùå Login failed. Cannot proceed with scraping.")
            return 0
        
        print("‚úÖ Login successful! Proceeding with data scraping...")
        
        # Define strategies to scrape
        strategies = [
            {
                "url": "https://optionrecom.com/bear-call-spread-strategy/",
                "type": "Bear Call"
            },
            {
                "url": "https://optionrecom.com/bull-put-spread-strategy/",
                "type": "Bull Put"
            }
        ]
        
        # Process each strategy page
        total_records = 0
        
        for strategy in strategies:
            records = process_strategy_page(driver, strategy["url"], strategy["type"], conn, cursor)
            total_records += records
            time.sleep(3)  # Delay between strategy pages
        
        print(f"\nüéØ Total records saved to PostgreSQL database: {total_records}")
        
        # Query to show what was saved to PostgreSQL (FIXED: proper timestamp casting)
        cursor.execute("""
            SELECT strategy_type, tab_name, COUNT(*) as count 
            FROM option_strategies 
            WHERE scrape_date::timestamp >= NOW() - INTERVAL '1 hour' 
            GROUP BY strategy_type, tab_name
        """)
        results = cursor.fetchall()
        
        print("\nüìä Records by strategy and tab (last hour) in PostgreSQL:")
        for strategy, tab, count in results:
            print(f"  {strategy} - {tab}: {count} records")
        
        # Show total record count in PostgreSQL
        cursor.execute("SELECT COUNT(*) FROM option_strategies")
        total_count = cursor.fetchone()[0]
        print(f"\nüìà Total records in PostgreSQL database: {total_count}")
        
        # Note: headless mode can't keep browser open for manual use
        if keep_browser_open:
            print("\nüí° Note: Headless mode doesn't support keeping browser open.")
            print("   The scraping completed successfully in the background.")
        
        return total_records
        
    except Exception as e:
        print(f"‚ùå An error occurred: {str(e)}")
        import traceback
        print(f"Full traceback: {traceback.format_exc()}")
        return 0
    finally:
        if driver:
            try:
                driver.quit()
            except:
                pass
        
        conn.close()

## Run the Fixed Scraper

Execute the cell below to run the fixed automated scraper with improved table detection.

In [17]:
# Run the improved automated scraper with robust Chrome startup
if __name__ == "__main__":
    print("üöÄ Starting IMPROVED PostgreSQL scraper with robust Chrome handling")
    print("üìù Features: Process cleanup, incognito mode, retry logic, random ports")
    print("üîê Using credentials from config/credentials.json")
    
    result = scrape_option_strategies_automated(
        keep_browser_open=True  # Set to False if you want the browser to close automatically
    )
    
    print(f"\nüéâ PostgreSQL scraper completed successfully!")
    print(f"üìä Total records processed: {result}")
    print("üíæ Data written directly to PostgreSQL database at 35.204.11.121")
    print("üîÑ Run your utility notebook to see the new data!")

üöÄ Starting IMPROVED PostgreSQL scraper with robust Chrome handling
üìù Features: Process cleanup, incognito mode, retry logic, random ports
üîê Using credentials from config/credentials.json
‚úÖ Web scraping credentials loaded from config/credentials.json
‚úÖ Connected to PostgreSQL: 35.204.11.121
üßπ Aggressive cleanup of Chrome processes and temp files...
üöÄ Attempting to start Chrome in HEADLESS mode (attempt 1/3)...
   Debug port: 21216
   Mode: Headless (should avoid user data conflicts)
‚úÖ Chrome started successfully in headless mode!
üîë Starting automated login process...
Login attempt 1 of 3...
Username entered successfully
Password entered successfully
Login button clicked (normal click)
Login successful!
‚úÖ Login successful! Proceeding with data scraping...

===== Processing Bear Call Strategy Page =====
Page date: August
Found 8 tabs using selector: //div[contains(@class, 'ep_tabs_header')]//a[contains(@class, 'ep_label_main')]
Tab 1: 'Mild Risk 95-97% accuracy >