<a href="https://colab.research.google.com/github/YanLin-Quinne/EvoAgentX/blob/main/%E2%80%9Cjobsearch_agent_demo_ipynb%E2%80%9D%E7%9A%84%E5%89%AF%E6%9C%AC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Academic Job Search Tool: User Guide

A Systematic Approach to Faculty Position Discovery for Sociology Graduates

Overview

This computational tool provides a comprehensive solution for identifying academic positions in sociology, with particular emphasis on family studies, childhood research, and care work. The system has been designed specifically for doctoral graduates from Oxford University seeking faculty appointments at leading institutions worldwide.

System Design and Scope

The search algorithm targets positions across four primary geographical regions:
United States: US News top 200 institutions with established sociology programmes
United Kingdom: QS top 300 universities, including Russell Group institutions
Europe: Leading universities in the Netherlands, Germany, France, Switzerland, and Nordic countries
Asia-Pacific: Hong Kong, Singapore, and Sino-foreign cooperative institutions


In [None]:
# Block 1: Setup and Installation
# Run this cell first to install all required packages

!pip install requests beautifulsoup4 selenium pandas openpyxl fake-useragent
!pip install webdriver-manager lxml html5lib

# Import all required libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
from datetime import datetime, timedelta
import time
import random
from urllib.parse import urljoin, urlparse
import re
from fake_useragent import UserAgent
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager
import warnings
warnings.filterwarnings('ignore')

print("All required packages installed successfully!")
print("Libraries imported successfully!")
print("Ready for academic job scraping configuration.")

All required packages installed successfully!
Libraries imported successfully!
Ready for academic job scraping configuration.


In [None]:
# Block 2: Target Universities Configuration
# Configure target universities based on US News and QS rankings

# Top US Universities (US News Top 200) - Strong Sociology Programs
US_UNIVERSITIES = [
    # Top Tier US Universities
    "Harvard University", "Stanford University", "Massachusetts Institute of Technology",
    "University of California Berkeley", "University of Chicago", "Princeton University",
    "Yale University", "Columbia University", "University of Pennsylvania", "Northwestern University",
    "Duke University", "Johns Hopkins University", "University of California Los Angeles",
    "Brown University", "Cornell University", "Rice University", "Vanderbilt University",
    "Washington University in St. Louis", "University of Notre Dame", "Emory University",

    # Strong Sociology Programs
    "University of Wisconsin Madison", "University of Michigan Ann Arbor", "New York University",
    "University of North Carolina Chapel Hill", "University of Texas Austin", "Boston University",
    "Georgetown University", "Carnegie Mellon University", "University of Southern California",
    "University of California San Diego", "University of California Davis", "University of Florida",
    "Ohio State University", "Pennsylvania State University", "University of Washington",
    "University of Minnesota Twin Cities", "Arizona State University", "Rutgers University",
    "University of Maryland College Park", "University of Illinois Urbana-Champaign",
    "Indiana University Bloomington", "University of Virginia", "Purdue University",
    "Michigan State University", "University of Iowa", "University of California Irvine"
]

# QS World Rankings Top 300 - UK Universities
UK_UNIVERSITIES = [
    # Russell Group and Top UK Universities
    "University of Oxford", "University of Cambridge", "Imperial College London",
    "University College London", "London School of Economics", "King's College London",
    "University of Edinburgh", "University of Manchester", "University of Warwick",
    "University of Bristol", "University of Glasgow", "Durham University",
    "University of Sheffield", "University of Birmingham", "University of Leeds",
    "University of Southampton", "University of York", "Lancaster University",
    "University of Exeter", "Cardiff University", "Queen Mary University of London",
    "University of Bath", "University of Liverpool", "Newcastle University",
    "University of Nottingham", "University of Sussex", "Loughborough University",
    "University of Leicester", "University of Reading", "Brunel University London",
    "University of Essex", "Goldsmiths University of London", "University of Kent"
]

# Asia-Pacific Universities (Sino-Foreign Cooperative + Top Regional)
ASIA_PACIFIC_UNIVERSITIES = [
    # Hong Kong
    "University of Hong Kong", "Chinese University of Hong Kong",
    "Hong Kong University of Science and Technology", "City University of Hong Kong",
    "Hong Kong Polytechnic University", "Hong Kong Baptist University",

    # Singapore
    "National University of Singapore", "Nanyang Technological University",

    # China - Sino-Foreign Cooperative Universities
    "Shanghai New York University", "University of Nottingham Ningbo China",
    "Xi'an Jiaotong-Liverpool University", "Beijing Institute of Technology Zhuhai",
    "Duke Kunshan University", "New York University Shanghai", "Wenzhou-Kean University",
    "The Chinese University of Hong Kong Shenzhen", "Kean University at Wenzhou"
]

# European Universities (QS Top 300)
EUROPEAN_UNIVERSITIES = [
    # Netherlands
    "University of Amsterdam", "Delft University of Technology", "Utrecht University",
    "Leiden University", "Erasmus University Rotterdam", "University of Groningen",
    "VU Amsterdam", "Tilburg University",

    # Germany
    "Technical University of Munich", "Ludwig Maximilian University of Munich",
    "Heidelberg University", "Humboldt University of Berlin", "University of Freiburg",
    "Free University of Berlin", "University of Göttingen",

    # France
    "Sorbonne University", "École Normale Supérieure Paris", "Sciences Po",
    "University of Paris", "École Polytechnique",

    # Switzerland
    "ETH Zurich", "University of Zurich", "University of Geneva", "University of Lausanne",

    # Nordic Countries
    "University of Copenhagen", "Stockholm University", "University of Oslo",
    "University of Helsinki", "Lund University", "Uppsala University"
]

# Combine all target universities
ALL_TARGET_UNIVERSITIES = US_UNIVERSITIES + UK_UNIVERSITIES + ASIA_PACIFIC_UNIVERSITIES + EUROPEAN_UNIVERSITIES

# Research focus keywords for sociology positions
SOCIOLOGY_KEYWORDS = [
    "sociology", "social science", "family studies", "childhood studies",
    "care work", "unpaid care", "social policy", "demography", "social research",
    "qualitative research", "quantitative methods", "social theory",
    "gender studies", "migration studies", "urban sociology", "cultural sociology",
    "education sociology", "medical sociology", "digital sociology", "ethnography"
]

print(f"UNIVERSITY CONFIGURATION COMPLETE")
print(f"US Universities: {len(US_UNIVERSITIES)}")
print(f"UK Universities: {len(UK_UNIVERSITIES)}")
print(f"Asia-Pacific Universities: {len(ASIA_PACIFIC_UNIVERSITIES)}")
print(f"European Universities: {len(EUROPEAN_UNIVERSITIES)}")
print(f"Total target universities: {len(ALL_TARGET_UNIVERSITIES)}")
print(f"Sociology keywords configured: {len(SOCIOLOGY_KEYWORDS)}")
print("Ready for job data structure setup!")

UNIVERSITY CONFIGURATION COMPLETE
US Universities: 46
UK Universities: 33
Asia-Pacific Universities: 17
European Universities: 30
Total target universities: 126
Sociology keywords configured: 20
Ready for job data structure setup!


In [None]:
# Block 3: Job Data Structure and Relevance Scoring
# Define job data structure and intelligent scoring system

class AcademicJob:
    """Data structure for academic job postings"""

    def __init__(self, title="", institution="", location="", country="",
                 salary="", deadline="", url="", source_platform="", description=""):
        self.title = title
        self.institution = institution
        self.location = location
        self.country = country
        self.salary = salary
        self.deadline = deadline
        self.url = url
        self.source_platform = source_platform
        self.description = description
        self.scraped_date = datetime.now().strftime('%Y-%m-%d')
        self.relevance_score = 0

def calculate_relevance_score(job_title, job_description, institution):
    """
    Calculate relevance score based on research interests and career level
    Optimized for sociology PhD with family/childhood/care work expertise
    """

    score = 0
    text_to_analyze = f"{job_title} {job_description}".lower()

    # HIGH PRIORITY KEYWORDS (Your research specializations)
    high_priority_keywords = {
        'family': 15, 'childhood': 15, 'care work': 15, 'unpaid care': 15,
        'family studies': 18, 'childhood studies': 18, 'care studies': 15,
        'sociology': 12, 'social science': 10, 'social policy': 12,
        'qualitative research': 10, 'mixed methods': 8, 'ethnography': 10,
        'china studies': 15, 'asia': 8, 'comparative': 10, 'cross-cultural': 8
    }

    # MEDIUM PRIORITY KEYWORDS (Career level and relevant areas)
    medium_priority_keywords = {
        'assistant professor': 20, 'lecturer': 15, 'senior lecturer': 18,
        'tenure track': 18, 'postdoc': 12, 'research fellow': 12,
        'social research': 8, 'gender': 8, 'migration': 6, 'urban': 6,
        'cultural': 6, 'education': 8, 'health': 6, 'digital': 5,
        'interview': 5, 'survey': 5, 'participant observation': 8
    }

    # LOW PRIORITY BUT RELEVANT KEYWORDS
    low_priority_keywords = {
        'social theory': 5, 'quantitative': 4, 'statistics': 4,
        'community': 4, 'policy': 5, 'welfare': 6, 'social work': 4,
        'anthropology': 3, 'psychology': 3, 'economics': 2
    }

    # Calculate keyword scores
    for keyword, points in high_priority_keywords.items():
        if keyword in text_to_analyze:
            score += points

    for keyword, points in medium_priority_keywords.items():
        if keyword in text_to_analyze:
            score += points

    for keyword, points in low_priority_keywords.items():
        if keyword in text_to_analyze:
            score += points

    # INSTITUTION PRESTIGE BONUS
    if institution in US_UNIVERSITIES[:20] or institution in UK_UNIVERSITIES[:10]:
        score += 10  # Top tier institutions
    elif institution in ASIA_PACIFIC_UNIVERSITIES:
        score += 8   # Asia-Pacific (your regional expertise)
    elif institution in ALL_TARGET_UNIVERSITIES:
        score += 5   # Other target institutions

    # GEOGRAPHIC PREFERENCE BONUS (based on your background)
    if any(keyword in text_to_analyze for keyword in ['uk', 'britain', 'england']):
        score += 5  # UK familiarity
    if any(keyword in text_to_analyze for keyword in ['china', 'chinese', 'asia']):
        score += 8  # Your research expertise region

    # PENALTY FOR IRRELEVANT POSITIONS
    irrelevant_keywords = [
        'engineering', 'computer science', 'physics', 'chemistry',
        'biology', 'mathematics', 'business administration', 'finance',
        'medical doctor', 'clinical', 'laboratory'
    ]
    for keyword in irrelevant_keywords:
        if keyword in text_to_analyze and 'social' not in text_to_analyze:
            score -= 10

    return max(0, score)  # Ensure non-negative score

def get_priority_tier(score):
    """Convert relevance score to priority tier for easy filtering"""
    if score >= 40:
        return "High Priority"
    elif score >= 25:
        return "Medium Priority"
    elif score >= 15:
        return "Low Priority"
    else:
        return "Consider"

# Test the scoring system with sample job
test_job = AcademicJob(
    title="Assistant Professor of Sociology - Family and Childhood Studies",
    institution="Boston University",
    description="Seeking candidates with expertise in family sociology, childhood studies, qualitative research methods, and experience with Chinese family structures"
)

test_score = calculate_relevance_score(test_job.title, test_job.description, test_job.institution)
test_job.relevance_score = test_score

print("RELEVANCE SCORING SYSTEM CONFIGURED")
print(f"Test job: {test_job.title}")
print(f"Institution: {test_job.institution}")
print(f"Relevance score: {test_score}")
print(f"Priority tier: {get_priority_tier(test_score)}")
print("Scoring system ready for job evaluation!")

RELEVANCE SCORING SYSTEM CONFIGURED
Test job: Assistant Professor of Sociology - Family and Childhood Studies
Institution: Boston University
Relevance score: 103
Priority tier: High Priority
Scoring system ready for job evaluation!


In [None]:
# Block 4: Web Scraping Infrastructure
# Core scraping infrastructure with error handling and anti-blocking measures

class GlobalAcademicJobScraper:
    """Main scraper class for academic job platforms"""

    def __init__(self):
        self.session = requests.Session()
        self.ua = UserAgent()
        self.jobs_found = []
        self.failed_urls = []
        self.setup_session()
        print("Academic job scraper initialized!")

    def setup_session(self):
        """Configure session with realistic headers and settings"""
        self.session.headers.update({
            'User-Agent': self.ua.random,
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
            'DNT': '1'
        })

    def setup_chrome_driver(self):
        """Setup Chrome driver for JavaScript-heavy sites"""
        chrome_options = Options()
        chrome_options.add_argument('--headless')
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument('--disable-gpu')
        chrome_options.add_argument('--window-size=1920,1080')
        chrome_options.add_argument(f'--user-agent={self.ua.random}')
        chrome_options.add_argument('--disable-blink-features=AutomationControlled')

        try:
            driver = webdriver.Chrome(ChromeDriverManager().install(), options=chrome_options)
            # Remove webdriver property
            driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
            return driver
        except Exception as e:
            print(f"Chrome driver setup failed: {e}")
            return None

    def random_delay(self, min_seconds=1, max_seconds=3):
        """Add random delay to avoid being blocked"""
        delay = random.uniform(min_seconds, max_seconds)
        time.sleep(delay)
        print(f"Waiting {delay:.1f} seconds...")

    def safe_get(self, url, use_selenium=False, timeout=10):
        """Safely fetch webpage content with error handling"""
        try:
            if use_selenium:
                driver = self.setup_chrome_driver()
                if driver:
                    driver.get(url)
                    time.sleep(random.uniform(2, 4))  # Random wait for page load
                    content = driver.page_source
                    driver.quit()
                    return BeautifulSoup(content, 'html.parser')
                else:
                    return None
            else:
                response = self.session.get(url, timeout=timeout)
                response.raise_for_status()
                return BeautifulSoup(response.content, 'html.parser')

        except requests.exceptions.RequestException as e:
            print(f"Request failed for {url}: {e}")
            self.failed_urls.append(url)
            return None
        except Exception as e:
            print(f"Unexpected error fetching {url}: {e}")
            self.failed_urls.append(url)
            return None

    def extract_text_safely(self, element, default="N/A"):
        """Safely extract text from BeautifulSoup element"""
        if element:
            text = element.get_text(strip=True)
            return text if text else default
        return default

    def clean_text(self, text):
        """Clean and normalize text data"""
        if not text or text == "N/A":
            return "N/A"

        # Remove extra whitespace and normalize
        text = re.sub(r'\s+', ' ', text.strip())
        # Remove problematic characters
        text = re.sub(r'[^\w\s\-.,():/&]', '', text)
        return text[:500]  # Limit length to prevent issues

    def parse_deadline(self, deadline_text):
        """Parse deadline text into standardized format"""
        if not deadline_text or deadline_text == "N/A":
            return "N/A"

        # Common deadline patterns
        patterns = [
            r'(\d{1,2}[/-]\d{1,2}[/-]\d{2,4})',  # MM/DD/YYYY or DD/MM/YYYY
            r'(\d{1,2}\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w*\s+\d{4})',
            r'((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\w*\s+\d{1,2},?\s+\d{4})',
            r'(\d{4}-\d{2}-\d{2})'  # YYYY-MM-DD
        ]

        for pattern in patterns:
            match = re.search(pattern, deadline_text, re.IGNORECASE)
            if match:
                return match.group(1)

        return self.clean_text(deadline_text)

    def get_status_summary(self):
        """Get current scraping status"""
        return {
            'jobs_found': len(self.jobs_found),
            'failed_urls': len(self.failed_urls),
            'high_priority': len([job for job in self.jobs_found if job.relevance_score >= 40]),
            'medium_priority': len([job for job in self.jobs_found if 25 <= job.relevance_score < 40])
        }

# Initialize the global scraper instance
scraper = GlobalAcademicJobScraper()

print("WEB SCRAPING INFRASTRUCTURE READY")
print(f"User agent: {scraper.ua.random[:50]}...")
print("Anti-blocking measures configured")
print("Chrome driver ready for dynamic sites")
print("Ready to scrape academic job platforms!")

Academic job scraper initialized!
WEB SCRAPING INFRASTRUCTURE READY
User agent: Mozilla/5.0 (iPhone; CPU iPhone OS 18_4 like Mac O...
Anti-blocking measures configured
Chrome driver ready for dynamic sites
Ready to scrape academic job platforms!


In [None]:
# Block 5: Jobs.ac.uk Scraper Implementation
# UK's premier academic job platform scraper

def scrape_jobs_ac_uk(scraper, keywords=None, max_pages=5):
    """
    Scrape Jobs.ac.uk for sociology positions
    Primary source for UK academic positions
    """

    if keywords is None:
        keywords = ["sociology", "social science", "family studies"]

    base_url = "https://www.jobs.ac.uk"
    jobs_found = []

    print("STARTING JOBS.AC.UK SCRAPING")
    print("="*50)

    for keyword in keywords:
        print(f"\nSearching for: '{keyword}'")

        for page in range(1, max_pages + 1):
            search_url = f"{base_url}/search?keywords={keyword.replace(' ', '+')}&page={page}"
            print(f"   Page {page}")

            soup = scraper.safe_get(search_url)
            if not soup:
                print(f"   Failed to load page {page}")
                continue

            # Multiple selector strategies for job listings
            job_selectors = [
                'div.job-result',
                'div.job-listing',
                'article.job',
                'div.result',
                'li.job',
                'tr.job-row'
            ]

            job_listings = []
            for selector in job_selectors:
                job_listings = soup.select(selector)
                if job_listings:
                    break

            if not job_listings:
                print(f"   No jobs found on page {page}")
                if page == 1:
                    # Try alternative approach for first page
                    all_links = soup.find_all('a', href=True)
                    job_links = [link for link in all_links if '/job/' in link.get('href', '')]
                    if job_links:
                        print(f"   Found {len(job_links)} job links via alternative method")
                        job_listings = [link.parent for link in job_links[:10]]  # Limit to 10

                if not job_listings:
                    break

            print(f"   Found {len(job_listings)} job listings")

            for job_element in job_listings:
                try:
                    # Extract job title
                    title_selectors = ['h3 a', 'h2 a', 'a.job-title', '.job-title', 'h3', 'h2']
                    title = ""
                    title_link = ""

                    for selector in title_selectors:
                        title_elem = job_element.select_one(selector)
                        if title_elem:
                            title = scraper.extract_text_safely(title_elem)
                            if title_elem.name == 'a':
                                title_link = title_elem.get('href', '')
                            elif title_elem.find('a'):
                                title_link = title_elem.find('a').get('href', '')
                            break

                    if not title or len(title) < 5:
                        continue

                    # Get full job URL
                    job_url = urljoin(base_url, title_link) if title_link else ""

                    # Extract institution
                    institution_selectors = ['.employer', '.institution', '.company', 'p.employer']
                    institution = ""
                    for selector in institution_selectors:
                        inst_elem = job_element.select_one(selector)
                        if inst_elem:
                            institution = scraper.extract_text_safely(inst_elem)
                            break

                    # Extract location
                    location_selectors = ['.location', 'span.location', 'p.location']
                    location = ""
                    for selector in location_selectors:
                        loc_elem = job_element.select_one(selector)
                        if loc_elem:
                            location = scraper.extract_text_safely(loc_elem)
                            break

                    # Extract salary
                    salary_selectors = ['.salary', 'span.salary', 'p.salary']
                    salary = ""
                    for selector in salary_selectors:
                        sal_elem = job_element.select_one(selector)
                        if sal_elem:
                            salary = scraper.extract_text_safely(sal_elem)
                            break

                    # Extract deadline
                    deadline_selectors = ['.deadline', '.closing-date', 'time', '.date']
                    deadline = ""
                    for selector in deadline_selectors:
                        dead_elem = job_element.select_one(selector)
                        if dead_elem:
                            deadline = scraper.parse_deadline(scraper.extract_text_safely(dead_elem))
                            break

                    # Determine country based on institution and location
                    country = "United Kingdom"
                    if any(asia_uni in institution for asia_uni in ASIA_PACIFIC_UNIVERSITIES):
                        if "Hong Kong" in institution or "Hong Kong" in location:
                            country = "Hong Kong"
                        elif "Singapore" in institution or "Singapore" in location:
                            country = "Singapore"
                        else:
                            country = "China"

                    # Create job object
                    job = AcademicJob(
                        title=scraper.clean_text(title),
                        institution=scraper.clean_text(institution),
                        location=scraper.clean_text(location),
                        country=country,
                        salary=scraper.clean_text(salary),
                        deadline=deadline,
                        url=job_url,
                        source_platform="Jobs.ac.uk"
                    )

                    # Calculate relevance score
                    job.relevance_score = calculate_relevance_score(job.title, "", job.institution)

                    # Filter: only add if relevant or from target universities
                    if job.relevance_score > 10 or any(uni in job.institution for uni in ALL_TARGET_UNIVERSITIES):
                        jobs_found.append(job)
                        priority = get_priority_tier(job.relevance_score)
                        print(f"   Added {priority}: {job.title[:50]}... at {job.institution}")
                        print(f"      Score: {job.relevance_score} | Deadline: {job.deadline}")

                except Exception as e:
                    print(f"   Error processing job: {e}")
                    continue

            scraper.random_delay(2, 4)  # Delay between pages

        scraper.random_delay(3, 5)  # Delay between keywords

    # Add to main scraper results
    scraper.jobs_found.extend(jobs_found)

    print(f"\nJOBS.AC.UK SCRAPING COMPLETE")
    print(f"Found {len(jobs_found)} relevant positions")
    print(f"High priority: {len([job for job in jobs_found if job.relevance_score >= 40])}")
    print(f"Medium priority: {len([job for job in jobs_found if 25 <= job.relevance_score < 40])}")

    return jobs_found

# Test the Jobs.ac.uk scraper
print("TESTING JOBS.AC.UK SCRAPER")
print("Running limited test search...")
test_jobs = scrape_jobs_ac_uk(scraper, keywords=["sociology"], max_pages=1)
print(f"Test completed! Found {len(test_jobs)} jobs in test search.")

TESTING JOBS.AC.UK SCRAPER
Running limited test search...
STARTING JOBS.AC.UK SCRAPING

Searching for: 'sociology'
   Page 1
   No jobs found on page 1
   Found 25 job links via alternative method
   Found 10 job listings
Waiting 3.9 seconds...
Waiting 4.1 seconds...

JOBS.AC.UK SCRAPING COMPLETE
Found 0 relevant positions
High priority: 0
Medium priority: 0
Test completed! Found 0 jobs in test search.


In [None]:
# Block 6: HigherEdJobs and Times Higher Education Scrapers
# US and international academic job platform scrapers

def scrape_higher_ed_jobs(scraper, keywords=None, max_pages=3):
    """
    Scrape HigherEdJobs.com for US academic positions
    Major platform for American university positions
    """

    if keywords is None:
        keywords = ["sociology", "social+science"]

    base_url = "https://www.higheredjobs.com"
    jobs_found = []

    print("STARTING HIGHEREDJOBS.COM SCRAPING")
    print("="*50)

    for keyword in keywords:
        print(f"\nSearching for: '{keyword}'")

        for page in range(1, max_pages + 1):
            # Faculty positions in social sciences
            search_url = f"{base_url}/search/advanced_action.cfm?JobCat=18&Keywords={keyword}&PosType=1&NumJobs=25&PageNum={page}"
            print(f"   Page {page}")

            soup = scraper.safe_get(search_url)
            if not soup:
                print(f"   Failed to load page {page}")
                continue

            # Find job listings - HigherEdJobs uses table format
            job_selectors = [
                'tr.data-row-even, tr.data-row-odd',
                'div.job-listing',
                'tr[class*="row"]',
                'article.job'
            ]

            job_listings = []
            for selector in job_selectors:
                job_listings = soup.select(selector)
                if job_listings:
                    break

            if not job_listings:
                print(f"   No jobs found on page {page}")
                break

            print(f"   Found {len(job_listings)} job listings")

            for job_element in job_listings:
                try:
                    # Extract job title
                    title_selectors = ['a.jobTitle', 'td.jobTitle a', '.job-title a', 'h3 a']
                    title = ""
                    job_url = ""

                    for selector in title_selectors:
                        title_elem = job_element.select_one(selector)
                        if title_elem:
                            title = scraper.extract_text_safely(title_elem)
                            job_url = urljoin(base_url, title_elem.get('href', ''))
                            break

                    if not title or len(title) < 5:
                        continue

                    # Extract institution
                    institution_selectors = ['td.institution', '.employer', '.institution']
                    institution = ""
                    for selector in institution_selectors:
                        inst_elem = job_element.select_one(selector)
                        if inst_elem:
                            institution = scraper.extract_text_safely(inst_elem)
                            break

                    # Extract location
                    location_selectors = ['td.location', '.location']
                    location = ""
                    for selector in location_selectors:
                        loc_elem = job_element.select_one(selector)
                        if loc_elem:
                            location = scraper.extract_text_safely(loc_elem)
                            break

                    # Extract deadline
                    deadline_selectors = ['td.deadline', '.deadline', '.closing-date']
                    deadline = ""
                    for selector in deadline_selectors:
                        dead_elem = job_element.select_one(selector)
                        if dead_elem:
                            deadline = scraper.parse_deadline(scraper.extract_text_safely(dead_elem))
                            break

                    # Create job object
                    job = AcademicJob(
                        title=scraper.clean_text(title),
                        institution=scraper.clean_text(institution),
                        location=scraper.clean_text(location),
                        country="United States",
                        salary="N/A",
                        deadline=deadline,
                        url=job_url,
                        source_platform="HigherEdJobs"
                    )

                    # Calculate relevance score
                    job.relevance_score = calculate_relevance_score(job.title, "", job.institution)

                    # Filter for relevant positions
                    if job.relevance_score > 10 or any(uni in job.institution for uni in US_UNIVERSITIES):
                        jobs_found.append(job)
                        priority = get_priority_tier(job.relevance_score)
                        print(f"   Added {priority}: {job.title[:50]}... at {job.institution}")

                except Exception as e:
                    print(f"   Error processing job: {e}")
                    continue

            scraper.random_delay(2, 4)

        scraper.random_delay(3, 5)

    scraper.jobs_found.extend(jobs_found)
    print(f"\nHIGHEREDJOBS SCRAPING COMPLETE")
    print(f"Found {len(jobs_found)} relevant positions")
    return jobs_found

def scrape_times_higher_education(scraper, keywords=None, max_pages=3):
    """
    Scrape Times Higher Education Jobs
    International academic positions platform
    """

    if keywords is None:
        keywords = ["sociology", "social science"]

    base_url = "https://www.timeshighereducation.com"
    jobs_url = f"{base_url}/unijobs"
    jobs_found = []

    print("STARTING TIMES HIGHER EDUCATION SCRAPING")
    print("="*50)

    for keyword in keywords:
        print(f"\nSearching for: '{keyword}'")

        # THE Jobs search URL structure
        search_url = f"{jobs_url}/listings?search={keyword.replace(' ', '%20')}"

        soup = scraper.safe_get(search_url, use_selenium=True)  # May need Selenium for dynamic content
        if not soup:
            print(f"   Failed to load THE Jobs search")
            continue

        # Find job listings
        job_selectors = [
            'article.job-listing',
            'div.job-item',
            '.job-result',
            'div[class*="job"]'
        ]

        job_listings = []
        for selector in job_selectors:
            job_listings = soup.select(selector)
            if job_listings:
                break

        if not job_listings:
            print(f"   No jobs found for '{keyword}'")
            continue

        print(f"   Found {len(job_listings)} job listings")

        for job_element in job_listings[:20]:  # Limit to avoid overloading
            try:
                # Extract job title
                title_selectors = ['h3 a', 'h2 a', '.job-title a', 'h3', 'h2']
                title = ""
                job_url = ""

                for selector in title_selectors:
                    title_elem = job_element.select_one(selector)
                    if title_elem:
                        title = scraper.extract_text_safely(title_elem)
                        if title_elem.name == 'a':
                            job_url = urljoin(base_url, title_elem.get('href', ''))
                        elif title_elem.find('a'):
                            job_url = urljoin(base_url, title_elem.find('a').get('href', ''))
                        break

                if not title or len(title) < 5:
                    continue

                # Extract institution
                institution_selectors = ['.employer', '.institution', '.company']
                institution = ""
                for selector in institution_selectors:
                    inst_elem = job_element.select_one(selector)
                    if inst_elem:
                        institution = scraper.extract_text_safely(inst_elem)
                        break

                # Extract location
                location_selectors = ['.location', 'span.location']
                location = ""
                for selector in location_selectors:
                    loc_elem = job_element.select_one(selector)
                    if loc_elem:
                        location = scraper.extract_text_safely(loc_elem)
                        break

                # Extract salary
                salary_selectors = ['.salary', 'span.salary']
                salary = ""
                for selector in salary_selectors:
                    sal_elem = job_element.select_one(selector)
                    if sal_elem:
                        salary = scraper.extract_text_safely(sal_elem)
                        break

                # Determine country based on institution and location
                country = "United Kingdom"  # Default for THE
                if any(us_uni in institution for us_uni in US_UNIVERSITIES):
                    country = "United States"
                elif any(asia_uni in institution for asia_uni in ASIA_PACIFIC_UNIVERSITIES):
                    if "Hong Kong" in institution or "Hong Kong" in location:
                        country = "Hong Kong"
                    elif "Singapore" in institution or "Singapore" in location:
                        country = "Singapore"
                    else:
                        country = "Asia-Pacific"
                elif any(eu_uni in institution for eu_uni in EUROPEAN_UNIVERSITIES):
                    country = "Europe"

                # Create job object
                job = AcademicJob(
                    title=scraper.clean_text(title),
                    institution=scraper.clean_text(institution),
                    location=scraper.clean_text(location),
                    country=country,
                    salary=scraper.clean_text(salary),
                    deadline="N/A",
                    url=job_url,
                    source_platform="Times Higher Education"
                )

                # Calculate relevance score
                job.relevance_score = calculate_relevance_score(job.title, "", job.institution)

                # Filter for relevant positions
                if job.relevance_score > 10 or any(uni in job.institution for uni in ALL_TARGET_UNIVERSITIES):
                    jobs_found.append(job)
                    priority = get_priority_tier(job.relevance_score)
                    print(f"   Added {priority}: {job.title[:50]}... at {job.institution}")

            except Exception as e:
                print(f"   Error processing job: {e}")
                continue

        scraper.random_delay(3, 5)

    scraper.jobs_found.extend(jobs_found)
    print(f"\nTIMES HIGHER EDUCATION SCRAPING COMPLETE")
    print(f"Found {len(jobs_found)} relevant positions")
    return jobs_found

print("US AND INTERNATIONAL JOB SCRAPERS CONFIGURED")
print("HigherEdJobs.com scraper ready for US positions")
print("Times Higher Education scraper ready for international positions")
print("Ready to search major academic job platforms!")

US AND INTERNATIONAL JOB SCRAPERS CONFIGURED
HigherEdJobs.com scraper ready for US positions
Times Higher Education scraper ready for international positions
Ready to search major academic job platforms!


In [None]:
# Block 7: Academic Positions and University Career Pages
# European academic platform and direct university scraping

def scrape_academic_positions_org(scraper, keywords=None, max_pages=3):
    """
    Scrape AcademicPositions.com (European focus)
    Major platform for European university positions
    """

    if keywords is None:
        keywords = ["sociology", "social science"]

    base_url = "https://academicpositions.com"
    jobs_found = []

    print("STARTING ACADEMICPOSITIONS.COM SCRAPING")
    print("="*50)

    for keyword in keywords:
        print(f"\nSearching for: '{keyword}'")

        for page in range(1, max_pages + 1):
            search_url = f"{base_url}/ad/search?q={keyword.replace(' ', '+')}&page={page}"
            print(f"   Page {page}")

            soup = scraper.safe_get(search_url)
            if not soup:
                print(f"   Failed to load page {page}")
                continue

            # Find job listings
            job_selectors = [
                'div.position-item',
                'article.job-ad',
                '.job-listing',
                'div[class*="position"]',
                'div[class*="job"]'
            ]

            job_listings = []
            for selector in job_selectors:
                job_listings = soup.select(selector)
                if job_listings:
                    break

            if not job_listings:
                print(f"   No jobs found on page {page}")
                break

            print(f"   Found {len(job_listings)} job listings")

            for job_element in job_listings:
                try:
                    # Extract job title and URL
                    title_selectors = ['h3 a', 'h2 a', 'a.job-title', '.title a', 'h3', 'h2']
                    title = ""
                    job_url = ""

                    for selector in title_selectors:
                        title_elem = job_element.select_one(selector)
                        if title_elem:
                            title = scraper.extract_text_safely(title_elem)
                            if title_elem.name == 'a':
                                job_url = urljoin(base_url, title_elem.get('href', ''))
                            elif title_elem.find('a'):
                                job_url = urljoin(base_url, title_elem.find('a').get('href', ''))
                            break

                    if not title or len(title) < 5:
                        continue

                    # Extract institution
                    institution_selectors = ['.employer', '.institution', '.company', '.university']
                    institution = ""
                    for selector in institution_selectors:
                        inst_elem = job_element.select_one(selector)
                        if inst_elem:
                            institution = scraper.extract_text_safely(inst_elem)
                            break

                    # Extract location
                    location_selectors = ['.location', 'span.location', '.place']
                    location = ""
                    for selector in location_selectors:
                        loc_elem = job_element.select_one(selector)
                        if loc_elem:
                            location = scraper.extract_text_safely(loc_elem)
                            break

                    # Determine country based on location
                    country = "Europe"  # Default
                    location_lower = location.lower()
                    if "uk" in location_lower or "united kingdom" in location_lower or "england" in location_lower or "scotland" in location_lower:
                        country = "United Kingdom"
                    elif "netherlands" in location_lower or "amsterdam" in location_lower or "utrecht" in location_lower:
                        country = "Netherlands"
                    elif "germany" in location_lower or "berlin" in location_lower or "munich" in location_lower:
                        country = "Germany"
                    elif "france" in location_lower or "paris" in location_lower:
                        country = "France"
                    elif "switzerland" in location_lower or "zurich" in location_lower or "geneva" in location_lower:
                        country = "Switzerland"
                    elif "sweden" in location_lower or "stockholm" in location_lower:
                        country = "Sweden"
                    elif "denmark" in location_lower or "copenhagen" in location_lower:
                        country = "Denmark"
                    elif "norway" in location_lower or "oslo" in location_lower:
                        country = "Norway"

                    # Extract deadline
                    deadline_selectors = ['.deadline', '.closing-date', 'time', '.date']
                    deadline = ""
                    for selector in deadline_selectors:
                        dead_elem = job_element.select_one(selector)
                        if dead_elem:
                            deadline = scraper.parse_deadline(scraper.extract_text_safely(dead_elem))
                            break

                    # Create job object
                    job = AcademicJob(
                        title=scraper.clean_text(title),
                        institution=scraper.clean_text(institution),
                        location=scraper.clean_text(location),
                        country=country,
                        salary="N/A",
                        deadline=deadline,
                        url=job_url,
                        source_platform="AcademicPositions.com"
                    )

                    # Calculate relevance score
                    job.relevance_score = calculate_relevance_score(job.title, "", job.institution)

                    # Filter for relevant positions
                    if job.relevance_score > 10 or any(uni in job.institution for uni in EUROPEAN_UNIVERSITIES):
                        jobs_found.append(job)
                        priority = get_priority_tier(job.relevance_score)
                        print(f"   Added {priority}: {job.title[:50]}... at {job.institution}")

                except Exception as e:
                    print(f"   Error processing job: {e}")
                    continue

            scraper.random_delay(2, 4)

        scraper.random_delay(3, 5)

    scraper.jobs_found.extend(jobs_found)
    print(f"\nACADEMICPOSITIONS.COM SCRAPING COMPLETE")
    print(f"Found {len(jobs_found)} relevant positions")
    return jobs_found

def scrape_university_career_pages(scraper, target_universities=None, max_unis=10):
    """
    Scrape direct university career pages for top institutions
    Direct access to university HR systems
    """

    if target_universities is None:
        target_universities = US_UNIVERSITIES[:5] + UK_UNIVERSITIES[:5]  # Top 5 from each

    # University career page patterns (real URLs)
    career_page_patterns = {
        "Harvard University": "https://sjobs.brassring.com/TGnewUI/Search/Home/HomeWithPreLoad?partnerid=25240",
        "Stanford University": "https://careersearch.stanford.edu/jobs",
        "University of Oxford": "https://www.jobs.ox.ac.uk/home",
        "University of Cambridge": "https://www.jobs.cam.ac.uk/job/",
        "MIT": "https://careers.peopleclick.com/careerscp/client_mit/external/search.do",
        "Boston University": "https://www.bu.edu/careers/jobs/",
        "University College London": "https://www.ucl.ac.uk/human-resources/jobs",
        "London School of Economics": "https://jobs.lse.ac.uk/",
        "University of Edinburgh": "https://www.ed.ac.uk/jobs",
        "University of Manchester": "https://www.manchester.ac.uk/discover/jobs/",
        "Yale University": "https://your.yale.edu/work-yale/find-job",
        "Columbia University": "https://jobs.columbia.edu/",
        "University of Chicago": "https://jobs.uchicago.edu/",
        "Northwestern University": "https://www.northwestern.edu/hr/careers/"
    }

    jobs_found = []
    print("STARTING UNIVERSITY CAREER PAGES SCRAPING")
    print("="*50)

    processed_count = 0
    for university in target_universities:
        if processed_count >= max_unis:
            break

        if university not in career_page_patterns:
            continue

        print(f"\nScraping {university}...")
        career_url = career_page_patterns[university]

        soup = scraper.safe_get(career_url, use_selenium=True)
        if not soup:
            print(f"   Failed to access {university} careers page")
            continue

        # Generic job listing selectors (varies by university)
        job_selectors = [
            'div.job-listing', 'tr.job-row', 'article.job',
            'div.position', 'li.job-item', 'div.vacancy',
            '.job-result', 'div[class*="job"]', 'tr[class*="row"]'
        ]

        job_listings = []
        for selector in job_selectors:
            job_listings = soup.select(selector)
            if job_listings:
                break

        if not job_listings:
            print(f"   No job listings found for {university}")
            continue

        print(f"   Found {len(job_listings)} potential positions")

        for job_element in job_listings[:15]:  # Limit to first 15 jobs per university
            try:
                # Extract title
                title_selectors = ['h3', 'h2', 'a.job-title', '.title', '.job-name', 'td.title']
                title = ""
                job_url = career_url  # Default to main page

                for selector in title_selectors:
                    title_elem = job_element.select_one(selector)
                    if title_elem:
                        title = scraper.extract_text_safely(title_elem)
                        # Try to get specific job URL
                        link_elem = title_elem if title_elem.name == 'a' else title_elem.find('a')
                        if link_elem and link_elem.get('href'):
                            job_url = urljoin(career_url, link_elem.get('href'))
                        break

                if not title or len(title) < 5:
                    continue

                # Quick relevance check before processing further
                title_lower = title.lower()
                relevant_keywords = [
                    'sociology', 'social', 'lecturer', 'professor', 'research',
                    'family', 'childhood', 'care', 'policy', 'faculty'
                ]

                if not any(keyword in title_lower for keyword in relevant_keywords):
                    continue

                # Extract additional details if available
                location_elem = job_element.select_one('.location, .place, .campus')
                location = scraper.extract_text_safely(location_elem) if location_elem else "N/A"

                # Create job object
                job = AcademicJob(
                    title=scraper.clean_text(title),
                    institution=university,
                    location=scraper.clean_text(location),
                    country="United States" if university in US_UNIVERSITIES else "United Kingdom",
                    salary="N/A",
                    deadline="N/A",
                    url=job_url,
                    source_platform=f"{university} Careers"
                )

                # Calculate relevance score
                job.relevance_score = calculate_relevance_score(job.title, "", job.institution)

                # Add if relevant (higher threshold for direct university scraping)
                if job.relevance_score > 15:
                    jobs_found.append(job)
                    priority = get_priority_tier(job.relevance_score)
                    print(f"   Added {priority}: {job.title[:50]}...")

            except Exception as e:
                continue

        processed_count += 1
        scraper.random_delay(5, 8)  # Longer delay for university sites to be respectful

    scraper.jobs_found.extend(jobs_found)
    print(f"\nUNIVERSITY CAREER PAGES SCRAPING COMPLETE")
    print(f"Found {len(jobs_found)} relevant positions from university websites")
    return jobs_found

print("EUROPEAN AND UNIVERSITY SCRAPERS CONFIGURED")
print("AcademicPositions.com scraper ready for European positions")
print("University career page scraper ready for direct institutional access")
print("Ready to access premium academic job sources!")

EUROPEAN AND UNIVERSITY SCRAPERS CONFIGURED
AcademicPositions.com scraper ready for European positions
University career page scraper ready for direct institutional access
Ready to access premium academic job sources!


In [None]:
# Block 8: Asia-Pacific Scraping and Main Search Orchestrator
# Complete search coordination and Asia-Pacific specialized scraping

def scrape_asia_pacific_positions(scraper, keywords=None):
    """
    Scrape Asia-Pacific university positions from various sources
    Focus on Hong Kong, Singapore, and Sino-foreign cooperative universities
    """

    if keywords is None:
        keywords = ["sociology", "social science"]

    jobs_found = []
    print("STARTING ASIA-PACIFIC POSITIONS SCRAPING")
    print("="*50)

    # Asia-Pacific specific job boards and approaches
    asia_sources = [
        {
            "name": "Academic Jobs Singapore",
            "search_terms": ["singapore", "NUS", "NTU"],
            "country": "Singapore"
        },
        {
            "name": "Hong Kong Universities",
            "search_terms": ["hong kong", "HKU", "CUHK", "HKUST"],
            "country": "Hong Kong"
        },
        {
            "name": "Sino-Foreign Universities",
            "search_terms": ["NYU Shanghai", "Nottingham Ningbo", "XJTLU", "Duke Kunshan"],
            "country": "China"
        }
    ]

    # Try to find positions through international job boards with Asia-Pacific focus
    for source in asia_sources:
        print(f"\nSearching for {source['name']} positions...")

        # Search through already scraped results for Asia-Pacific institutions
        asia_jobs = []
        for job in scraper.jobs_found:
            if any(term.lower() in job.institution.lower() or term.lower() in job.location.lower()
                   for term in source['search_terms']):
                asia_jobs.append(job)

        print(f"   Found {len(asia_jobs)} positions for {source['name']}")
        jobs_found.extend(asia_jobs)

    # Additionally, look for specific Sino-foreign universities in scraped data
    sino_foreign_unis = [
        "Shanghai New York University", "NYU Shanghai",
        "University of Nottingham Ningbo", "Nottingham Ningbo",
        "Xi'an Jiaotong-Liverpool University", "XJTLU",
        "Duke Kunshan University", "Wenzhou-Kean University",
        "Chinese University of Hong Kong Shenzhen"
    ]

    additional_asia_jobs = []
    for job in scraper.jobs_found:
        for uni in sino_foreign_unis:
            if uni.lower() in job.institution.lower():
                if job not in jobs_found:  # Avoid duplicates
                    additional_asia_jobs.append(job)
                    job.country = "China"  # Update country for Sino-foreign unis
                    break

    jobs_found.extend(additional_asia_jobs)

    print(f"\nASIA-PACIFIC SCRAPING COMPLETE")
    print(f"Identified {len(jobs_found)} Asia-Pacific positions")
    print(f"Singapore positions: {len([j for j in jobs_found if j.country == 'Singapore'])}")
    print(f"Hong Kong positions: {len([j for j in jobs_found if j.country == 'Hong Kong'])}")
    print(f"China/Sino-foreign positions: {len([j for j in jobs_found if j.country == 'China'])}")

    return jobs_found

def run_comprehensive_job_search(scraper, comprehensive=True):
    """
    Run comprehensive job search across all platforms
    Main orchestrator function
    """

    print("STARTING COMPREHENSIVE ACADEMIC JOB SEARCH")
    print("="*60)
    print(f"Target: Sociology positions at top universities worldwide")
    print(f"Research focus: Family studies, childhood, care work")
    print(f"Level: Oxford sociology PhD")
    print(f"Search time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print(f"Comprehensive mode: {comprehensive}")
    print("="*60)

    # Clear previous results
    scraper.jobs_found = []
    scraper.failed_urls = []

    # Define search parameters optimized for sociology
    sociology_keywords = ["sociology", "social science", "family studies", "childhood studies"]

    search_phases = []

    try:
        # Phase 1: Jobs.ac.uk (UK academic jobs - primary source)
        print("\nPHASE 1: JOBS.AC.UK")
        print("="*40)
        phase1_jobs = scrape_jobs_ac_uk(scraper, keywords=sociology_keywords, max_pages=4)
        search_phases.append(("Jobs.ac.uk", len(phase1_jobs)))

        # Phase 2: HigherEdJobs (US academic positions)
        print("\nPHASE 2: HIGHEREDJOBS.COM")
        print("="*40)
        phase2_jobs = scrape_higher_ed_jobs(scraper, keywords=["sociology", "social+science"], max_pages=3)
        search_phases.append(("HigherEdJobs", len(phase2_jobs)))

        # Phase 3: Times Higher Education (International)
        print("\nPHASE 3: TIMES HIGHER EDUCATION")
        print("="*40)
        phase3_jobs = scrape_times_higher_education(scraper, keywords=sociology_keywords, max_pages=2)
        search_phases.append(("Times Higher Education", len(phase3_jobs)))

        # Phase 4: AcademicPositions.com (European focus)
        print("\nPHASE 4: ACADEMICPOSITIONS.COM")
        print("="*40)
        phase4_jobs = scrape_academic_positions_org(scraper, keywords=sociology_keywords, max_pages=3)
        search_phases.append(("AcademicPositions.com", len(phase4_jobs)))

        if comprehensive:
            # Phase 5: University Career Pages (Premium institutions)
            print("\nPHASE 5: UNIVERSITY CAREER PAGES")
            print("="*40)
            target_unis = (US_UNIVERSITIES[:8] + UK_UNIVERSITIES[:8] +
                          ["University of Hong Kong", "National University of Singapore"])
            phase5_jobs = scrape_university_career_pages(scraper, target_universities=target_unis, max_unis=10)
            search_phases.append(("University Careers", len(phase5_jobs)))

            # Phase 6: Asia-Pacific Focus
            print("\nPHASE 6: ASIA-PACIFIC SPECIALIZATION")
            print("="*40)
            phase6_jobs = scrape_asia_pacific_positions(scraper, keywords=sociology_keywords)
            search_phases.append(("Asia-Pacific", len(phase6_jobs)))

    except KeyboardInterrupt:
        print("\n\nSearch interrupted by user.")
    except Exception as e:
        print(f"\n\nUnexpected error during search: {e}")

    # Process and deduplicate results
    print("\nPROCESSING AND ANALYZING RESULTS")
    print("="*40)

    # Remove duplicates based on title and institution combination
    unique_jobs = []
    seen_combinations = set()

    for job in scraper.jobs_found:
        # Create a unique identifier for each job
        combination = f"{job.title.lower().strip()}||{job.institution.lower().strip()}"
        if combination not in seen_combinations:
            unique_jobs.append(job)
            seen_combinations.add(combination)
        else:
            print(f"   Removed duplicate: {job.title[:40]}... at {job.institution}")

    scraper.jobs_found = unique_jobs

    # Sort by relevance score (highest first)
    scraper.jobs_found.sort(key=lambda x: x.relevance_score, reverse=True)

    # Generate comprehensive statistics
    total_jobs = len(scraper.jobs_found)
    high_priority = len([job for job in scraper.jobs_found if job.relevance_score >= 40])
    medium_priority = len([job for job in scraper.jobs_found if 25 <= job.relevance_score < 40])
    low_priority = len([job for job in scraper.jobs_found if 15 <= job.relevance_score < 25])

    # Geographic and institutional analysis
    country_counts = {}
    platform_counts = {}
    institution_counts = {}

    for job in scraper.jobs_found:
        country_counts[job.country] = country_counts.get(job.country, 0) + 1
        platform_counts[job.source_platform] = platform_counts.get(job.source_platform, 0) + 1
        if job.institution in ALL_TARGET_UNIVERSITIES:
            institution_counts[job.institution] = institution_counts.get(job.institution, 0) + 1

    # Display comprehensive results
    print(f"\nCOMPREHENSIVE JOB SEARCH COMPLETE!")
    print("="*50)
    print(f"SUMMARY STATISTICS:")
    print(f"   Total unique positions found: {total_jobs}")
    print(f"   High priority (40+ score): {high_priority}")
    print(f"   Medium priority (25-39 score): {medium_priority}")
    print(f"   Low priority (15-24 score): {low_priority}")
    print(f"   Failed URLs: {len(scraper.failed_urls)}")

    print(f"\nGEOGRAPHIC DISTRIBUTION:")
    for country, count in sorted(country_counts.items(), key=lambda x: x[1], reverse=True):
        print(f"   {country}: {count} positions")

    print(f"\nPLATFORM PERFORMANCE:")
    for platform, count in sorted(platform_counts.items(), key=lambda x: x[1], reverse=True):
        print(f"   {platform}: {count} positions")

    print(f"\nTOP TARGET INSTITUTIONS:")
    for inst, count in sorted(institution_counts.items(), key=lambda x: x[1], reverse=True)[:10]:
        print(f"   {inst}: {count} positions")

    print(f"\nTOP 10 HIGHEST SCORING POSITIONS:")
    for i, job in enumerate(scraper.jobs_found[:10], 1):
        priority = get_priority_tier(job.relevance_score)
        print(f"   {i}. {priority} | Score: {job.relevance_score}")
        print(f"      {job.title}")
        print(f"      {job.institution} | {job.country}")
        print(f"      Deadline: {job.deadline} | Platform: {job.source_platform}")
        print()

    print(f"\nSEARCH PHASE RESULTS:")
    for phase_name, job_count in search_phases:
        print(f"   {phase_name}: {job_count} positions")

    return scraper.jobs_found

def filter_jobs_by_criteria(jobs, min_score=15, target_countries=None, target_institutions=None):
    """Filter jobs based on specific criteria for targeted applications"""

    if target_countries is None:
        target_countries = ["United Kingdom", "United States", "Singapore", "Hong Kong", "China"]

    filtered_jobs = []

    print(f"FILTERING JOBS BY CRITERIA")
    print(f"   Minimum score: {min_score}")
    print(f"   Target countries: {', '.join(target_countries)}")

    for job in jobs:
        # Score filter
        if job.relevance_score < min_score:
            continue

        # Country filter
        if target_countries and job.country not in target_countries:
            continue

        # Institution filter (if specified)
        if target_institutions:
            if not any(inst.lower() in job.institution.lower() for inst in target_institutions):
                continue

        filtered_jobs.append(job)

    print(f"Filtered to {len(filtered_jobs)} positions matching criteria")
    return filtered_jobs

def analyze_job_market_trends(jobs):
    """Analyze trends in the scraped job data for market insights"""

    if not jobs:
        return {}

    analysis = {
        'total_positions': len(jobs),
        'avg_relevance_score': sum(job.relevance_score for job in jobs) / len(jobs),
        'position_types': {},
        'geographic_distribution': {},
        'institution_prestige': {'top_tier': 0, 'mid_tier': 0, 'other': 0},
        'application_urgency': {'urgent': 0, 'soon': 0, 'later': 0, 'unknown': 0}
    }

    # Analyze position types
    for job in jobs:
        title_lower = job.title.lower()
        if 'assistant professor' in title_lower:
            analysis['position_types']['Assistant Professor'] = analysis['position_types'].get('Assistant Professor', 0) + 1
        elif 'lecturer' in title_lower or 'senior lecturer' in title_lower:
            analysis['position_types']['Lecturer'] = analysis['position_types'].get('Lecturer', 0) + 1
        elif 'professor' in title_lower and 'assistant' not in title_lower:
            analysis['position_types']['Professor'] = analysis['position_types'].get('Professor', 0) + 1
        elif 'postdoc' in title_lower or 'post-doc' in title_lower:
            analysis['position_types']['Postdoc'] = analysis['position_types'].get('Postdoc', 0) + 1
        elif 'research' in title_lower:
            analysis['position_types']['Research Position'] = analysis['position_types'].get('Research Position', 0) + 1
        else:
            analysis['position_types']['Other'] = analysis['position_types'].get('Other', 0) + 1

    # Geographic distribution
    for job in jobs:
        analysis['geographic_distribution'][job.country] = analysis['geographic_distribution'].get(job.country, 0) + 1

    # Institution prestige analysis
    for job in jobs:
        if job.institution in (US_UNIVERSITIES[:20] + UK_UNIVERSITIES[:10] +
                              ["University of Hong Kong", "National University of Singapore"]):
            analysis['institution_prestige']['top_tier'] += 1
        elif job.institution in ALL_TARGET_UNIVERSITIES:
            analysis['institution_prestige']['mid_tier'] += 1
        else:
            analysis['institution_prestige']['other'] += 1

    # Application urgency (based on deadlines)
    current_date = datetime.now()
    for job in jobs:
        if job.deadline == "N/A":
            analysis['application_urgency']['unknown'] += 1
        else:
            # Simple deadline analysis - would need more robust parsing for production
            try:
                if any(urgent in job.deadline.lower() for urgent in ['asap', 'urgent', 'immediate']):
                    analysis['application_urgency']['urgent'] += 1
                elif any(month in job.deadline.lower() for month in ['jan', 'feb', 'mar']):
                    analysis['application_urgency']['soon'] += 1
                else:
                    analysis['application_urgency']['later'] += 1
            except:
                analysis['application_urgency']['unknown'] += 1

    return analysis

print("MAIN SEARCH ORCHESTRATOR CONFIGURED")
print("Comprehensive job search function ready")
print("Asia-Pacific specialization scraper ready")
print("Job filtering and analysis functions ready")
print("Ready to execute full academic job search!")

MAIN SEARCH ORCHESTRATOR CONFIGURED
Comprehensive job search function ready
Asia-Pacific specialization scraper ready
Job filtering and analysis functions ready
Ready to execute full academic job search!


In [None]:
# Block 9: Results Export and Report Generation
# Comprehensive reporting and export functionality

def save_results(scraper, filename_prefix="sociology_academic_jobs"):
    """
    Export results to Excel with multiple sheets and detailed analysis
    Creates downloadable Excel file with organized job data
    """

    if not scraper.jobs_found:
        print("No job data to save")
        return ""

    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    excel_filename = f"{filename_prefix}_{timestamp}.xlsx"

    try:
        print("CREATING EXCEL EXPORT")
        print("="*30)

        # Prepare main job data
        jobs_data = []
        for job in scraper.jobs_found:
            jobs_data.append({
                'Relevance Score': job.relevance_score,
                'Priority Tier': get_priority_tier(job.relevance_score),
                'Job Title': job.title,
                'Institution': job.institution,
                'Location': job.location,
                'Country': job.country,
                'Salary Information': job.salary,
                'Application Deadline': job.deadline,
                'Job URL': job.url,
                'Source Platform': job.source_platform,
                'Date Scraped': job.scraped_date,
                'Target University': 'Yes' if job.institution in ALL_TARGET_UNIVERSITIES else 'No'
            })

        df_all = pd.DataFrame(jobs_data)

        # Create filtered dataframes for different priority levels
        df_high_priority = df_all[df_all['Relevance Score'] >= 40]
        df_medium_priority = df_all[(df_all['Relevance Score'] >= 25) & (df_all['Relevance Score'] < 40)]
        df_low_priority = df_all[(df_all['Relevance Score'] >= 15) & (df_all['Relevance Score'] < 25)]
        df_target_institutions = df_all[df_all['Target University'] == 'Yes']

        # Create geographic and career level filters
        df_uk_positions = df_all[df_all['Country'] == 'United Kingdom']
        df_us_positions = df_all[df_all['Country'] == 'United States']
        df_asia_positions = df_all[df_all['Country'].isin(['Hong Kong', 'Singapore', 'China'])]

        # Assistant professor positions (primary target)
        df_assistant_prof = df_all[df_all['Job Title'].str.contains('Assistant Professor', case=False, na=False)]
        df_lecturer = df_all[df_all['Job Title'].str.contains('Lecturer', case=False, na=False)]

        # Create analysis summary
        analysis = analyze_job_market_trends(scraper.jobs_found)

        # Prepare analysis data for Excel
        analysis_data = [
            ['MARKET ANALYSIS SUMMARY', ''],
            ['Generated', datetime.now().strftime('%Y-%m-%d %H:%M:%S')],
            ['', ''],
            ['OVERALL STATISTICS', ''],
            ['Total Positions Found', analysis['total_positions']],
            ['Average Relevance Score', f"{analysis['avg_relevance_score']:.2f}"],
            ['High Priority Positions (40+)', len(df_high_priority)],
            ['Medium Priority Positions (25-39)', len(df_medium_priority)],
            ['Low Priority Positions (15-24)', len(df_low_priority)],
            ['', ''],
            ['POSITION TYPE DISTRIBUTION', ''],
        ]

        for pos_type, count in analysis['position_types'].items():
            analysis_data.append([pos_type, count])

        analysis_data.extend([
            ['', ''],
            ['GEOGRAPHIC DISTRIBUTION', ''],
        ])

        for country, count in sorted(analysis['geographic_distribution'].items(),
                                   key=lambda x: x[1], reverse=True):
            analysis_data.append([country, count])

        analysis_data.extend([
            ['', ''],
            ['INSTITUTION PRESTIGE ANALYSIS', ''],
            ['Top Tier Universities', analysis['institution_prestige']['top_tier']],
            ['Mid Tier Universities', analysis['institution_prestige']['mid_tier']],
            ['Other Institutions', analysis['institution_prestige']['other']],
            ['', ''],
            ['CAREER LEVEL OPPORTUNITIES', ''],
            ['Assistant Professor Positions', len(df_assistant_prof)],
            ['Lecturer Positions', len(df_lecturer)],
            ['', ''],
            ['REGIONAL OPPORTUNITIES', ''],
            ['UK Positions', len(df_uk_positions)],
            ['US Positions', len(df_us_positions)],
            ['Asia-Pacific Positions', len(df_asia_positions)]
        ])

        df_analysis = pd.DataFrame(analysis_data, columns=['Metric', 'Value'])

        # Create Excel workbook with multiple sheets
        with pd.ExcelWriter(excel_filename, engine='openpyxl') as writer:
            # Main sheets
            df_all.to_excel(writer, sheet_name='All Positions', index=False)

            if not df_high_priority.empty:
                df_high_priority.to_excel(writer, sheet_name='High Priority', index=False)

            if not df_medium_priority.empty:
                df_medium_priority.to_excel(writer, sheet_name='Medium Priority', index=False)

            if not df_low_priority.empty:
                df_low_priority.to_excel(writer, sheet_name='Low Priority', index=False)

            # Specialized filters
            if not df_target_institutions.empty:
                df_target_institutions.to_excel(writer, sheet_name='Target Universities', index=False)

            if not df_assistant_prof.empty:
                df_assistant_prof.to_excel(writer, sheet_name='Assistant Professor', index=False)

            if not df_lecturer.empty:
                df_lecturer.to_excel(writer, sheet_name='Lecturer Positions', index=False)

            # Geographic sheets
            if not df_uk_positions.empty:
                df_uk_positions.to_excel(writer, sheet_name='UK Positions', index=False)

            if not df_us_positions.empty:
                df_us_positions.to_excel(writer, sheet_name='US Positions', index=False)

            if not df_asia_positions.empty:
                df_asia_positions.to_excel(writer, sheet_name='Asia-Pacific', index=False)

            # Analysis summary
            df_analysis.to_excel(writer, sheet_name='Market Analysis', index=False)

            # Format all sheets
            workbook = writer.book
            for sheet_name in workbook.sheetnames:
                worksheet = workbook[sheet_name]

                # Auto-adjust column widths
                for column in worksheet.columns:
                    max_length = 0
                    column_letter = column[0].column_letter

                    for cell in column:
                        try:
                            if len(str(cell.value)) > max_length:
                                max_length = len(str(cell.value))
                        except:
                            pass

                    adjusted_width = min(max_length + 2, 60)  # Max width 60
                    worksheet.column_dimensions[column_letter].width = adjusted_width

                # Add filters to data sheets (except analysis)
                if sheet_name != 'Market Analysis' and worksheet.max_row > 1:
                    worksheet.auto_filter.ref = worksheet.dimensions

        print(f"Excel file created: {excel_filename}")
        print(f"Sheets created:")
        sheet_info = [
            f"   All Positions ({len(df_all)} jobs)",
            f"   High Priority ({len(df_high_priority)} jobs)",
            f"   Medium Priority ({len(df_medium_priority)} jobs)",
            f"   Target Universities ({len(df_target_institutions)} jobs)",
            f"   Assistant Professor ({len(df_assistant_prof)} jobs)",
            f"   Geographic breakdowns",
            f"   Market Analysis summary"
        ]
        for info in sheet_info:
            print(info)

        return excel_filename

    except Exception as e:
        print(f"Error creating Excel file: {e}")
        return ""

def create_quick_summary_report(scraper):
    """Create a concise text summary of job search results"""

    if not scraper.jobs_found:
        return "No jobs found to summarize."

    total_jobs = len(scraper.jobs_found)
    high_priority = [job for job in scraper.jobs_found if job.relevance_score >= 40]
    medium_priority = [job for job in scraper.jobs_found if 25 <= job.relevance_score < 40]

    # Geographic and institutional analysis
    country_counts = {}
    institution_counts = {}
    for job in scraper.jobs_found:
        country_counts[job.country] = country_counts.get(job.country, 0) + 1
        if job.institution in ALL_TARGET_UNIVERSITIES:
            institution_counts[job.institution] = institution_counts.get(job.institution, 0) + 1

    report = f"""
ACADEMIC JOB SEARCH SUMMARY REPORT
Sociology PhD Position Search
{'='*50}

Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Research Focus: Family Studies, Childhood, Care Work

SEARCH RESULTS OVERVIEW
{'='*50}
Total Positions Found: {total_jobs}
High Priority (Score 40+): {len(high_priority)}
Medium Priority (Score 25-39): {len(medium_priority)}
Low Priority (Score 15-24): {total_jobs - len(high_priority) - len(medium_priority)}

TOP HIGH PRIORITY POSITIONS
{'='*35}"""

    for i, job in enumerate(high_priority[:8], 1):
        report += f"""
{i}. {job.title}
   Institution: {job.institution}
   Location: {job.location}, {job.country}
   Relevance Score: {job.relevance_score}
   Deadline: {job.deadline}
   URL: {job.url[:60]}{'...' if len(job.url) > 60 else ''}
"""

    if len(high_priority) > 8:
        report += f"\n   ... and {len(high_priority) - 8} more high priority positions"

    report += f"""

GEOGRAPHIC DISTRIBUTION
{'='*25}"""
    for country, count in sorted(country_counts.items(), key=lambda x: x[1], reverse=True):
        report += f"\n{country}: {count} positions"

    if institution_counts:
        report += f"""

TOP TARGET INSTITUTIONS
{'='*26}"""
        for inst, count in sorted(institution_counts.items(), key=lambda x: x[1], reverse=True)[:8]:
            report += f"\n{inst}: {count} positions"

    # Career-specific recommendations
    assistant_prof_count = len([job for job in high_priority + medium_priority
                               if 'assistant professor' in job.title.lower()])
    lecturer_count = len([job for job in high_priority + medium_priority
                         if 'lecturer' in job.title.lower()])

    report += f"""

STRATEGIC RECOMMENDATIONS
{'='*27}
• Priority Focus: Apply to {len(high_priority)} high-priority positions first
• Assistant Professor Opportunities: {assistant_prof_count} positions available
• Lecturer Track Options: {lecturer_count} positions for alternative pathway
• Geographic Strategy: Consider {list(country_counts.keys())[:3]} as primary targets
• Research Fit: Positions specifically match family/childhood expertise
• Application Timeline: Review deadlines and prioritize urgent applications

NEXT STEPS
{'='*12}
1. Focus on high-priority positions (score 40+)
2. Customize applications highlighting family studies expertise
3. Leverage China research experience for Asia-Pacific positions
4. Create application timeline based on deadlines
5. Access full details in Excel export for comprehensive planning
"""

    return report

def generate_application_timeline(scraper, weeks_ahead=16):
    """Generate strategic application timeline based on deadlines and priorities"""

    timeline_jobs = []
    current_date = datetime.now()

    for job in scraper.jobs_found:
        if job.deadline == "N/A" or job.relevance_score < 20:
            continue

        # Estimate deadline urgency based on common patterns
        deadline_lower = job.deadline.lower()
        weeks_until = None

        # Simple deadline categorization
        if any(urgent in deadline_lower for urgent in ['asap', 'immediate', 'urgent']):
            weeks_until = 1
        elif any(soon in deadline_lower for soon in ['january', 'jan', 'february', 'feb']):
            weeks_until = 4
        elif any(medium in deadline_lower for medium in ['march', 'mar', 'april', 'apr']):
            weeks_until = 8
        elif any(later in deadline_lower for later in ['may', 'june', 'jul']):
            weeks_until = 12
        else:
            weeks_until = 16  # Unknown, assume later

        if weeks_until <= weeks_ahead:
            urgency = 'URGENT' if weeks_until <= 2 else 'SOON' if weeks_until <= 6 else 'UPCOMING'
            timeline_jobs.append({
                'job': job,
                'weeks_until': weeks_until,
                'urgency': urgency
            })

    # Sort by urgency and then by relevance score
    timeline_jobs.sort(key=lambda x: (x['weeks_until'], -x['job'].relevance_score))

    if not timeline_jobs:
        return "No specific deadlines identified. Focus on high-priority positions and create your own timeline."

    timeline_report = f"""
APPLICATION TIMELINE STRATEGY
Next {weeks_ahead} Weeks Planning
{'='*50}

Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

URGENT APPLICATIONS (Next 2 weeks)
{'='*35}"""

    urgent_jobs = [job_data for job_data in timeline_jobs if job_data['urgency'] == 'URGENT']
    if urgent_jobs:
        for job_data in urgent_jobs[:5]:  # Top 5 urgent
            job = job_data['job']
            timeline_report += f"""
{job.title}
  Institution: {job.institution} | {job.country}
  Score: {job.relevance_score} | Deadline: {job.deadline}
  URL: {job.url}
"""
    else:
        timeline_report += "\nNo urgent deadlines identified"

    timeline_report += f"""

PRIORITY APPLICATIONS (Next 2-6 weeks)
{'='*40}"""

    soon_jobs = [job_data for job_data in timeline_jobs if job_data['urgency'] == 'SOON']
    if soon_jobs:
        for job_data in soon_jobs[:6]:  # Top 6 priority
            job = job_data['job']
            timeline_report += f"""
{job.title}
  Institution: {job.institution} | Score: {job.relevance_score}
  Estimated timeline: {job_data['weeks_until']} weeks
"""
    else:
        timeline_report += "\nCheck individual job listings for specific deadlines"

    timeline_report += f"""

UPCOMING OPPORTUNITIES (6+ weeks)
{'='*35}"""

    upcoming_jobs = [job_data for job_data in timeline_jobs if job_data['urgency'] == 'UPCOMING']
    for job_data in upcoming_jobs[:8]:  # Top 8 upcoming
        job = job_data['job']
        timeline_report += f"\n{job.title} at {job.institution} (Score: {job.relevance_score})"

    if len(upcoming_jobs) > 8:
        timeline_report += f"\n   ... and {len(upcoming_jobs) - 8} more upcoming positions"

    timeline_report += f"""

STRATEGIC APPLICATION PLAN
{'='*28}
Week 1-2:  Focus on {len(urgent_jobs)} urgent applications
Week 3-6:  Submit {len(soon_jobs)} priority applications
Week 7+:   Prepare for {len(upcoming_jobs)} upcoming deadlines

Application Preparation Checklist:
• Update CV with recent research and publications
• Prepare research statement highlighting family/childhood expertise
• Draft teaching philosophy emphasizing sociology pedagogy
• Secure recommendation letters from supervisors
• Tailor cover letters for each institution and position
• Prepare for potential video interviews or presentations
"""

    return timeline_report

print("EXPORT AND REPORTING SYSTEM CONFIGURED")
print("Excel export with multiple analysis sheets")
print("Quick summary report generation")
print("Strategic application timeline creation")
print("Ready to generate comprehensive job search reports!")

EXPORT AND REPORTING SYSTEM CONFIGURED
Excel export with multiple analysis sheets
Quick summary report generation
Strategic application timeline creation
Ready to generate comprehensive job search reports!


In [None]:
# Block 10: Complete Execution and Interactive Interface
# Main execution functions and user interface

def create_personalized_recommendations(scraper):
    """
    Create highly personalized recommendations based on Oxford sociology PhD profile
    Tailored for family studies, childhood, and care work expertise
    """

    # Categorize positions by career track
    assistant_prof_positions = []
    lecturer_positions = []
    postdoc_positions = []
    research_positions = []

    for job in scraper.jobs_found:
        title_lower = job.title.lower()

        if 'postdoc' in title_lower or 'post-doc' in title_lower:
            postdoc_positions.append(job)
        elif 'assistant professor' in title_lower:
            assistant_prof_positions.append(job)
        elif 'lecturer' in title_lower or 'senior lecturer' in title_lower:
            lecturer_positions.append(job)
        elif 'research' in title_lower and job.relevance_score >= 20:
            research_positions.append(job)

    # Sort each category by relevance score
    for positions in [assistant_prof_positions, lecturer_positions, postdoc_positions, research_positions]:
        positions.sort(key=lambda x: x.relevance_score, reverse=True)

    recommendations = f"""
PERSONALIZED CAREER RECOMMENDATIONS
Oxford Sociology PhD - Family & Childhood Studies
{'='*50}

Profile: Oxford Sociology PhD
Expertise: Family Studies, Childhood, Care Work, China Research
Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}

PRIMARY TARGET: ASSISTANT PROFESSOR POSITIONS
{'='*50}"""

    if assistant_prof_positions:
        recommendations += f"\nFound {len(assistant_prof_positions)} assistant professor opportunities:\n"
        for i, job in enumerate(assistant_prof_positions[:6], 1):
            tier = "TOP TIER" if job.relevance_score >= 45 else "HIGH FIT" if job.relevance_score >= 35 else "GOOD FIT"
            recommendations += f"""
{i}. {tier} | Score: {job.relevance_score}
   {job.title}
   Institution: {job.institution}
   Location: {job.location}, {job.country}
   Deadline: {job.deadline}
   URL: {job.url}
"""
        if len(assistant_prof_positions) > 6:
            recommendations += f"\n   ... plus {len(assistant_prof_positions) - 6} more assistant professor positions"
    else:
        recommendations += "\nNo assistant professor positions found in current search"

    recommendations += f"""

ALTERNATIVE TRACK: LECTURER POSITIONS
{'='*43}"""

    if lecturer_positions:
        recommendations += f"\nFound {len(lecturer_positions)} lecturer opportunities:\n"
        for i, job in enumerate(lecturer_positions[:4], 1):
            recommendations += f"""
{i}. {job.title}
   Institution: {job.institution} | {job.country}
   Score: {job.relevance_score} | Deadline: {job.deadline}
"""
    else:
        recommendations += "\nLimited lecturer positions found"

    recommendations += f"""

RESEARCH DEVELOPMENT: POSTDOC OPPORTUNITIES
{'='*48}"""

    if postdoc_positions:
        recommendations += f"\nFound {len(postdoc_positions)} postdoc opportunities:\n"
        for i, job in enumerate(postdoc_positions[:4], 1):
            recommendations += f"""
{i}. {job.title}
   Institution: {job.institution} | {job.country}
   Score: {job.relevance_score}
"""
    else:
        recommendations += "\nConsider searching specialized postdoc databases"

    # Geographic analysis with strategic recommendations
    uk_positions = [job for job in scraper.jobs_found if job.country == "United Kingdom" and job.relevance_score >= 25]
    us_positions = [job for job in scraper.jobs_found if job.country == "United States" and job.relevance_score >= 25]
    asia_positions = [job for job in scraper.jobs_found if job.country in ["Hong Kong", "Singapore", "China"] and job.relevance_score >= 20]

    recommendations += f"""

STRATEGIC GEOGRAPHIC ANALYSIS
{'='*35}

UNITED KINGDOM ({len(uk_positions)} opportunities)
   Advantages: Familiar system, post-study visa, research networks
   Top Matches:"""

    if uk_positions:
        top_uk = sorted(uk_positions, key=lambda x: x.relevance_score, reverse=True)[:3]
        for job in top_uk:
            recommendations += f"\n   {job.title} at {job.institution} (Score: {job.relevance_score})"
    else:
        recommendations += "\n   Limited high-relevance UK positions found"

    recommendations += f"""

UNITED STATES ({len(us_positions)} opportunities)
   Advantages: Large academic market, research funding, career growth
   Focus Areas: Family sociology programs, childhood studies centers"""

    if us_positions:
        top_us = sorted(us_positions, key=lambda x: x.relevance_score, reverse=True)[:3]
        for job in top_us:
            recommendations += f"\n   {job.title} at {job.institution} (Score: {job.relevance_score})"

    recommendations += f"""

ASIA-PACIFIC ({len(asia_positions)} opportunities)
   Advantages: Regional expertise, language skills, cultural knowledge
   Strategic Value: Leverage China research background"""

    if asia_positions:
        top_asia = sorted(asia_positions, key=lambda x: x.relevance_score, reverse=True)[:3]
        for job in top_asia:
            recommendations += f"\n   {job.title} at {job.institution} (Score: {job.relevance_score})"

    # Research specialization recommendations
    family_positions = [job for job in scraper.jobs_found if any(keyword in job.title.lower()
                       for keyword in ['family', 'childhood', 'care']) and job.relevance_score >= 30]

    recommendations += f"""

RESEARCH SPECIALIZATION MATCHES
{'='*35}
Family/Childhood Expertise Positions: {len(family_positions)}"""

    if family_positions:
        for job in family_positions[:3]:
            recommendations += f"\n{job.title} at {job.institution} (Score: {job.relevance_score})"

    recommendations += f"""

STRATEGIC ACTION PLAN
{'='*23}
IMMEDIATE PRIORITIES (Next 2 weeks):
1. Apply to top 3 assistant professor positions (scores 40+)
2. Tailor research statements to emphasize family studies expertise
3. Network with faculty at target institutions

SHORT-TERM GOALS (Next 1-2 months):
4. Submit applications to high-fit lecturer positions
5. Explore Asia-Pacific opportunities leveraging China expertise
6. Consider collaborative research opportunities

LONG-TERM STRATEGY (Next 6 months):
7. Continue publishing in family sociology and childhood studies
8. Present research at major sociology conferences
9. Expand search to include emerging positions

PROFILE OPTIMIZATION RECOMMENDATIONS
{'='*37}
• Emphasize quantitative and qualitative mixed-methods expertise
• Highlight cross-cultural research experience (China focus)
• Showcase policy implications of family and care work research
• Develop teaching portfolio in sociology of family and childhood
• Build connections with family studies and childhood research networks

SUCCESS PROBABILITY ANALYSIS
{'='*30}
High-fit positions (40+ score): {len([j for j in scraper.jobs_found if j.relevance_score >= 40])} - EXCELLENT chances
Medium-fit positions (25-39): {len([j for j in scraper.jobs_found if 25 <= j.relevance_score < 40])} - GOOD chances
Consider positions (15-24): {len([j for j in scraper.jobs_found if 15 <= j.relevance_score < 25])} - POSSIBLE chances

Focus your energy on high and medium-fit positions for optimal results!
"""

    return recommendations

def main_job_search_execution():
    """
    Execute complete academic job search with full reporting
    One-click solution for comprehensive job hunting
    """

    print("""
    GLOBAL ACADEMIC JOB SEARCH SYSTEM
    Sociology PhD Career Opportunities
    Family Studies • Childhood • Care Work
    """)

    print("INITIATING COMPREHENSIVE JOB SEARCH...")
    print(f"Start time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

    # Initialize fresh scraper instance
    global scraper
    scraper = GlobalAcademicJobScraper()

    try:
        # Execute comprehensive search
        print("\nRUNNING COMPREHENSIVE SEARCH ACROSS ALL PLATFORMS...")
        jobs = run_comprehensive_job_search(scraper, comprehensive=True)

        if not jobs:
            print("No jobs found. Please check your internet connection and try again.")
            return None

        print(f"\nSEARCH COMPLETED! Found {len(jobs)} total positions")

        # Generate all reports and exports
        print("\nGENERATING COMPREHENSIVE REPORTS...")

        # 1. Excel Export (primary output)
        print("\nCreating Excel workbook...")
        excel_file = save_results(scraper, "oxford_sociology_academic_jobs")

        # 2. Quick Summary Report
        print("\nGenerating summary report...")
        summary_report = create_quick_summary_report(scraper)

        # 3. Application Timeline
        print("\nCreating application timeline...")
        timeline_report = generate_application_timeline(scraper)

        # 4. Personalized Recommendations
        print("\nDeveloping personalized recommendations...")
        recommendations = create_personalized_recommendations(scraper)

        # Save all text reports
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

        # Save summary
        summary_filename = f"job_search_summary_{timestamp}.txt"
        with open(summary_filename, 'w', encoding='utf-8') as f:
            f.write(summary_report)

        # Save timeline
        timeline_filename = f"application_timeline_{timestamp}.txt"
        with open(timeline_filename, 'w', encoding='utf-8') as f:
            f.write(timeline_report)

        # Save recommendations
        rec_filename = f"career_recommendations_{timestamp}.txt"
        with open(rec_filename, 'w', encoding='utf-8') as f:
            f.write(recommendations)

        # Display results
        print(summary_report)
        print("\n" + "="*60)
        print("JOB SEARCH COMPLETED SUCCESSFULLY!")
        print("="*60)

        print(f"\nFILES GENERATED AND READY FOR DOWNLOAD:")
        print(f"   {excel_file} - Complete job database with analysis")
        print(f"   {summary_filename} - Executive summary report")
        print(f"   {timeline_filename} - Strategic application timeline")
        print(f"   {rec_filename} - Personalized career recommendations")

        # Final statistics
        high_priority_count = len([j for j in jobs if j.relevance_score >= 40])
        medium_priority_count = len([j for j in jobs if 25 <= j.relevance_score < 40])

        print(f"\nFINAL SEARCH STATISTICS:")
        print(f"   High Priority Positions: {high_priority_count}")
        print(f"   Medium Priority Positions: {medium_priority_count}")
        print(f"   Countries Covered: {len(set(job.country for job in jobs))}")
        print(f"   Institutions Searched: {len(set(job.institution for job in jobs))}")
        print(f"   Search Duration: {datetime.now().strftime('%H:%M:%S')}")

        print(f"\nNEXT STEPS:")
        print(f"   1. Download and review the Excel file for complete job details")
        print(f"   2. Follow the personalized recommendations for strategic applications")
        print(f"   3. Use the timeline to prioritize urgent applications")
        print(f"   4. Customize cover letters and research statements for top positions")

        return {
            'jobs': jobs,
            'excel_file': excel_file,
            'summary_file': summary_filename,
            'timeline_file': timeline_filename,
            'recommendations_file': rec_filename,
            'statistics': {
                'total_jobs': len(jobs),
                'high_priority': high_priority_count,
                'medium_priority': medium_priority_count,
                'countries': len(set(job.country for job in jobs)),
                'institutions': len(set(job.institution for job in jobs))
            }
        }

    except KeyboardInterrupt:
        print("\nSearch interrupted by user. Partial results may be available.")
        return None
    except Exception as e:
        print(f"\nError during job search: {e}")
        import traceback
        traceback.print_exc()
        return None

def quick_demo_search():
    """Quick demonstration search for testing purposes"""

    print("RUNNING QUICK DEMO SEARCH")
    print("="*30)
    print("Testing scraper functionality with limited search...")

    demo_scraper = GlobalAcademicJobScraper()

    # Run limited search for testing
    print("\nSearching Jobs.ac.uk for sociology positions...")
    demo_jobs = scrape_jobs_ac_uk(demo_scraper, keywords=["sociology"], max_pages=1)

    if demo_jobs:
        print(f"\nDemo successful! Found {len(demo_jobs)} positions.")
        print("\nTop positions from demo:")
        for i, job in enumerate(demo_jobs[:3], 1):
            priority = get_priority_tier(job.relevance_score)
            print(f"   {i}. {priority} | Score: {job.relevance_score}")
            print(f"      {job.title}")
            print(f"      {job.institution} | {job.country}")

        # Quick export
        excel_file = save_results(demo_scraper, "demo_sociology_jobs")
        summary = create_quick_summary_report(demo_scraper)

        print(f"\nDemo results exported to: {excel_file}")
        print("\nDemo Summary:")
        print(summary[:500] + "..." if len(summary) > 500 else summary)

    else:
        print("No positions found in demo search. Check internet connection.")

    return demo_jobs

# MAIN EXECUTION COMMANDS
print("""
ACADEMIC JOB SCRAPER READY!

EXECUTION OPTIONS:

FULL COMPREHENSIVE SEARCH (Recommended):
   results = main_job_search_execution()

QUICK DEMO (For Testing):
   demo_results = quick_demo_search()

MANUAL STEP-BY-STEP:
   jobs = run_comprehensive_job_search(scraper)
   excel_file = save_results(scraper)
   summary = create_quick_summary_report(scraper)

The system will automatically:
• Search 6+ major academic job platforms worldwide
• Score positions based on your sociology PhD expertise
• Filter for family studies, childhood, and care work relevance
• Export comprehensive Excel workbook with multiple analysis sheets
• Generate personalized career recommendations
• Create strategic application timeline
• Save all reports as downloadable files

Optimized for sociology PhD with China expertise
Covers US, UK, Europe, and Asia-Pacific opportunities
Intelligent relevance scoring for maximum efficiency

READY TO LAUNCH YOUR ACADEMIC CAREER SEARCH!
""")

# Auto-run demo for immediate testing (comment out for manual control)
# Uncomment the line below to run demo automatically:
# demo_results = quick_demo_search()


ACADEMIC JOB SCRAPER READY!

EXECUTION OPTIONS:

FULL COMPREHENSIVE SEARCH (Recommended):
   results = main_job_search_execution()

QUICK DEMO (For Testing):
   demo_results = quick_demo_search()

MANUAL STEP-BY-STEP:
   jobs = run_comprehensive_job_search(scraper)
   excel_file = save_results(scraper)
   summary = create_quick_summary_report(scraper)

The system will automatically:
• Search 6+ major academic job platforms worldwide
• Score positions based on your sociology PhD expertise  
• Filter for family studies, childhood, and care work relevance
• Export comprehensive Excel workbook with multiple analysis sheets
• Generate personalized career recommendations
• Create strategic application timeline
• Save all reports as downloadable files

Optimized for sociology PhD with China expertise
Covers US, UK, Europe, and Asia-Pacific opportunities
Intelligent relevance scoring for maximum efficiency

READY TO LAUNCH YOUR ACADEMIC CAREER SEARCH!



In [None]:
demo_results = quick_demo_search()

RUNNING QUICK DEMO SEARCH
Testing scraper functionality with limited search...
Academic job scraper initialized!

Searching Jobs.ac.uk for sociology positions...
STARTING JOBS.AC.UK SCRAPING

Searching for: 'sociology'
   Page 1
   No jobs found on page 1
   Found 25 job links via alternative method
   Found 10 job listings
Waiting 2.9 seconds...
Waiting 3.3 seconds...

JOBS.AC.UK SCRAPING COMPLETE
Found 0 relevant positions
High priority: 0
Medium priority: 0
No positions found in demo search. Check internet connection.


In [None]:
results = main_job_search_execution()


    GLOBAL ACADEMIC JOB SEARCH SYSTEM
    Sociology PhD Career Opportunities
    Family Studies • Childhood • Care Work
    
INITIATING COMPREHENSIVE JOB SEARCH...
Start time: 2025-06-17 23:10:25
Academic job scraper initialized!

RUNNING COMPREHENSIVE SEARCH ACROSS ALL PLATFORMS...
STARTING COMPREHENSIVE ACADEMIC JOB SEARCH
Target: Sociology positions at top universities worldwide
Research focus: Family studies, childhood, care work
Level: Oxford sociology PhD
Search time: 2025-06-17 23:10:25
Comprehensive mode: True

PHASE 1: JOBS.AC.UK
STARTING JOBS.AC.UK SCRAPING

Searching for: 'sociology'
   Page 1
   No jobs found on page 1
   Found 25 job links via alternative method
   Found 10 job listings
Waiting 2.9 seconds...
   Page 2
   No jobs found on page 2
Waiting 3.2 seconds...

Searching for: 'social science'
   Page 1
   No jobs found on page 1
   Found 25 job links via alternative method
   Found 10 job listings
Waiting 3.9 seconds...
   Page 2
   No jobs found on page 2
Waiting