<a href="https://colab.research.google.com/github/Harooniqbal4879/AgenticAI/blob/main/Lead_Gen_%26_Outreach_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I've created a comprehensive Colab notebook for healthcare lead generation focused on finding nursing homes, hospitals, and other healthcare facilities that need shift nurses. Here are the key features:

# 🏥 Core Capabilities:

Multi-Source Lead Generation:

Google search integration for healthcare facilities
Web scraping for facility information
LinkedIn and social media link extraction
Contact information discovery


Healthcare Facility Types Covered:

Nursing Homes
Hospitals & Medical Centers
Assisted Living Facilities
Senior Care Centers
Home Health Agencies
Medical Clinics


Data Extraction Features:

Facility name and type identification
Contact information (phone, email)
Address and location data
Career page detection
Nursing job indicators
Social media presence
Lead scoring and qualification


# New Section
# 🎯 Lead Qualification System:
# The system automatically scores leads based on:

Available contact information (+2-4 points)
Active careers page (+3 points)
Nursing hiring indicators (+4 points)
Social media presence (+1 point)
Job-related keywords (+2 points)

# 📊 Output & Export:

Excel export with multiple sheets
Qualified leads filtering
Summary statistics
Lead scoring and notes

📧 Outreach Templates:
# Includes pre-written email templates for:

Initial contact
Follow-up emails
Career page outreach

🚀 How to Use:

Run the notebook in Google Colab
Execute: df, qualified_df = run_lead_generation('Your City, State', max_facilities=20)
View results: display_results(df)
Export: export_results(df, 'healthcare_leads.xlsx')

The system is designed to be respectful of websites with built-in delays and follows ethical scraping practices. It focuses specifically on healthcare facilities that are likely to need nursing staff, making it highly targeted for your use case.
Would you like me to add any specific features or modify the search criteria for particular types of healthcare facilities?RetryClaude can make mistakes. Please double-check responses.

In [1]:
# Healthcare Lead Generation & Outreach Agent
# Focused on Nursing Homes, Hospitals, and Healthcare Facilities

# Install required packages
!pip install requests beautifulsoup4 pandas selenium webdriver-manager lxml openpyxl googlesearch-python

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import time
import json
from urllib.parse import urljoin, urlparse
import warnings
warnings.filterwarnings('ignore')

# Data storage
leads_data = []
processed_urls = set()

class HealthcareLeadGenerator:
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
        })
        self.leads = []

    def search_google_for_facilities(self, location="", facility_type="nursing home"):
        """Search Google for healthcare facilities"""
        try:
            from googlesearch import search

            queries = [
                f"{facility_type} {location} hiring nurses",
                f"{facility_type} {location} job openings",
                f"{facility_type} {location} careers",
                f"{facility_type} {location} contact"
            ]

            urls = []
            for query in queries:
                try:
                    search_results = search(query, num_results=10, sleep_interval=2)
                    urls.extend(list(search_results))
                    time.sleep(1)
                except Exception as e:
                    print(f"Search error for '{query}': {e}")
                    continue

            return list(set(urls))  # Remove duplicates

        except ImportError:
            print("Google search not available. Using manual URL collection.")
            return []

    def extract_facility_info(self, url):
        """Extract facility information from a website"""
        try:
            response = self.session.get(url, timeout=10)
            response.raise_for_status()
            soup = BeautifulSoup(response.content, 'html.parser')

            # Extract basic information
            title = soup.find('title').text.strip() if soup.find('title') else ""

            # Look for contact information
            contact_info = self.extract_contact_info(soup)

            # Look for job/career indicators
            job_indicators = self.find_job_indicators(soup, response.text)

            # Extract address information
            address_info = self.extract_address_info(soup)

            facility_data = {
                'name': self.extract_facility_name(soup, title),
                'url': url,
                'title': title,
                'phone': contact_info.get('phone', ''),
                'email': contact_info.get('email', ''),
                'address': address_info.get('address', ''),
                'city': address_info.get('city', ''),
                'state': address_info.get('state', ''),
                'zip_code': address_info.get('zip', ''),
                'facility_type': self.determine_facility_type(title, response.text),
                'has_careers_page': job_indicators.get('has_careers', False),
                'hiring_nurses': job_indicators.get('hiring_nurses', False),
                'job_keywords_found': job_indicators.get('keywords', []),
                'social_media': self.extract_social_media(soup),
                'description': self.extract_description(soup)
            }

            return facility_data

        except Exception as e:
            print(f"Error processing {url}: {e}")
            return None

    def extract_contact_info(self, soup):
        """Extract phone and email from webpage"""
        text = soup.get_text()

        # Phone number patterns
        phone_patterns = [
            r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b',
            r'\(\d{3}\)\s*\d{3}[-.\s]?\d{4}',
            r'\b\d{10}\b'
        ]

        phones = []
        for pattern in phone_patterns:
            phones.extend(re.findall(pattern, text))

        # Email pattern
        email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
        emails = re.findall(email_pattern, text)

        return {
            'phone': phones[0] if phones else '',
            'email': emails[0] if emails else ''
        }

    def extract_address_info(self, soup):
        """Extract address information"""
        text = soup.get_text()

        # Look for address patterns
        address_elements = soup.find_all(['address', 'div', 'p'],
                                       class_=re.compile(r'address|location|contact', re.I))

        address_text = ""
        for elem in address_elements:
            address_text += elem.get_text() + " "

        if not address_text:
            address_text = text

        # Extract state abbreviations
        state_pattern = r'\b[A-Z]{2}\b'
        states = re.findall(state_pattern, address_text)

        # Extract ZIP codes
        zip_pattern = r'\b\d{5}(?:-\d{4})?\b'
        zips = re.findall(zip_pattern, address_text)

        return {
            'address': address_text[:200],  # Truncate for brevity
            'state': states[0] if states else '',
            'zip': zips[0] if zips else '',
            'city': ''  # Would need more complex extraction
        }

    def find_job_indicators(self, soup, text):
        """Look for job and career indicators"""
        job_keywords = [
            'hiring', 'jobs', 'careers', 'employment', 'positions',
            'rn', 'nurse', 'nursing', 'lpn', 'cna', 'staff', 'shift'
        ]

        nursing_keywords = [
            'registered nurse', 'rn', 'lpn', 'cna', 'nursing assistant',
            'shift nurse', 'staff nurse', 'charge nurse'
        ]

        text_lower = text.lower()
        found_keywords = [kw for kw in job_keywords if kw in text_lower]
        nursing_found = any(kw in text_lower for kw in nursing_keywords)

        # Look for career/job links
        career_links = soup.find_all('a', href=re.compile(r'career|job|employment', re.I))
        has_careers = len(career_links) > 0

        return {
            'has_careers': has_careers,
            'hiring_nurses': nursing_found,
            'keywords': found_keywords
        }

    def extract_facility_name(self, soup, title):
        """Extract facility name from various sources"""
        # Try h1 tag first
        h1 = soup.find('h1')
        if h1:
            return h1.get_text().strip()

        # Try title tag
        if title:
            # Clean up title
            name = re.sub(r'\s*[-|]\s*.*$', '', title)
            return name.strip()

        # Try meta property
        og_title = soup.find('meta', property='og:title')
        if og_title:
            return og_title.get('content', '').strip()

        return "Unknown Facility"

    def determine_facility_type(self, title, text):
        """Determine the type of healthcare facility"""
        text_lower = (title + " " + text).lower()

        if any(term in text_lower for term in ['nursing home', 'skilled nursing', 'long term care']):
            return 'Nursing Home'
        elif any(term in text_lower for term in ['hospital', 'medical center', 'health system']):
            return 'Hospital'
        elif any(term in text_lower for term in ['assisted living', 'senior living', 'retirement']):
            return 'Assisted Living'
        elif any(term in text_lower for term in ['clinic', 'medical clinic']):
            return 'Clinic'
        elif any(term in text_lower for term in ['home health', 'home care']):
            return 'Home Health'
        else:
            return 'Healthcare Facility'

    def extract_social_media(self, soup):
        """Extract social media links"""
        social_links = {}
        social_patterns = {
            'facebook': r'facebook\.com',
            'linkedin': r'linkedin\.com',
            'twitter': r'twitter\.com',
            'instagram': r'instagram\.com'
        }

        links = soup.find_all('a', href=True)
        for link in links:
            href = link['href']
            for platform, pattern in social_patterns.items():
                if re.search(pattern, href, re.I):
                    social_links[platform] = href
                    break

        return social_links

    def extract_description(self, soup):
        """Extract facility description"""
        # Try meta description first
        meta_desc = soup.find('meta', attrs={'name': 'description'})
        if meta_desc:
            return meta_desc.get('content', '').strip()

        # Try first paragraph
        paragraphs = soup.find_all('p')
        if paragraphs:
            return paragraphs[0].get_text().strip()[:200]

        return ""

# Initialize the lead generator
lead_gen = HealthcareLeadGenerator()

# Sample healthcare facility URLs for testing
sample_urls = [
    "https://www.sunriseseniorliving.com/",
    "https://www.brookdale.com/",
    "https://www.gentiva.com/",
    "https://www.goldenliving.com/",
    "https://www.amedisys.com/"
]

print("Healthcare Lead Generation & Outreach Agent")
print("=" * 50)

# Function to search for facilities by location and type
def search_facilities(location="New York", facility_types=None):
    """Search for healthcare facilities in a specific location"""
    if facility_types is None:
        facility_types = ["nursing home", "hospital", "assisted living", "senior care"]

    all_urls = []

    for facility_type in facility_types:
        print(f"\nSearching for {facility_type} facilities in {location}...")
        urls = lead_gen.search_google_for_facilities(location, facility_type)
        all_urls.extend(urls)
        print(f"Found {len(urls)} potential leads for {facility_type}")

    return list(set(all_urls))  # Remove duplicates

# Function to process URLs and extract facility data
def process_facilities(urls, max_facilities=20):
    """Process a list of URLs to extract facility information"""
    facilities = []

    print(f"\nProcessing up to {max_facilities} facilities...")

    for i, url in enumerate(urls[:max_facilities]):
        if url in processed_urls:
            continue

        print(f"Processing {i+1}/{min(len(urls), max_facilities)}: {url}")

        facility_data = lead_gen.extract_facility_info(url)
        if facility_data:
            facilities.append(facility_data)
            processed_urls.add(url)

        # Add delay to be respectful to websites
        time.sleep(1)

    return facilities

# Function to qualify leads
def qualify_leads(facilities):
    """Qualify leads based on various criteria"""
    qualified_leads = []

    for facility in facilities:
        score = 0
        qualification_notes = []

        # Scoring criteria
        if facility['phone']:
            score += 2
            qualification_notes.append("Has phone number")

        if facility['email']:
            score += 2
            qualification_notes.append("Has email")

        if facility['has_careers_page']:
            score += 3
            qualification_notes.append("Has careers page")

        if facility['hiring_nurses']:
            score += 4
            qualification_notes.append("Actively hiring nurses")

        if facility['social_media']:
            score += 1
            qualification_notes.append("Has social media presence")

        if len(facility['job_keywords_found']) > 2:
            score += 2
            qualification_notes.append("Multiple job-related keywords found")

        facility['lead_score'] = score
        facility['qualification_notes'] = qualification_notes
        facility['qualified'] = score >= 4

        if facility['qualified']:
            qualified_leads.append(facility)

    return qualified_leads

# Function to create outreach templates
def create_outreach_templates():
    """Create email templates for outreach"""
    templates = {
        'initial_contact': """
Subject: Staffing Solutions for {facility_name} - Qualified Nurses Available

Dear Hiring Manager,

I hope this email finds you well. I'm reaching out regarding potential staffing needs at {facility_name}.

We specialize in providing qualified nursing professionals for healthcare facilities, including:
- Registered Nurses (RN)
- Licensed Practical Nurses (LPN)
- Certified Nursing Assistants (CNA)
- Shift and temporary staffing solutions

Our nurses are thoroughly vetted, licensed, and ready to support your facility's needs. We understand the challenges of maintaining adequate staffing levels while ensuring quality patient care.

Would you be interested in learning more about our staffing solutions? I'd be happy to discuss how we can support {facility_name}.

Best regards,
[Your Name]
[Your Contact Information]
        """,

        'follow_up': """
Subject: Follow-up: Nursing Staffing Solutions for {facility_name}

Dear Hiring Manager,

I wanted to follow up on my previous email regarding nursing staffing solutions for {facility_name}.

Given the current healthcare staffing challenges, many facilities are finding value in having reliable backup staffing options. Our services include:
- Emergency shift coverage
- Seasonal staffing support
- Specialized nursing skills
- Flexible scheduling options

Would you have 10 minutes this week for a brief conversation about your current staffing needs?

Best regards,
[Your Name]
        """,

        'careers_page_contact': """
Subject: Nursing Staffing Partnership Opportunity - {facility_name}

Dear Hiring Team,

I noticed that {facility_name} has an active careers page, which suggests you may have ongoing staffing needs.

We partner with healthcare facilities to provide qualified nursing staff when needed. This can help you:
- Reduce overtime costs
- Maintain quality patient care during staff shortages
- Access specialized nursing skills
- Provide flexibility during peak periods

Would you be open to exploring a partnership discussion?

Best regards,
[Your Name]
        """
    }

    return templates

# Main execution functions
def run_lead_generation(location="New York", max_facilities=10):
    """Run the complete lead generation process"""
    print("Starting Healthcare Lead Generation Process...")

    # Step 1: Search for facilities
    facility_urls = search_facilities(location)
    print(f"\nTotal unique URLs found: {len(facility_urls)}")

    # Step 2: Process facilities
    facilities = process_facilities(facility_urls, max_facilities)
    print(f"\nSuccessfully processed {len(facilities)} facilities")

    # Step 3: Qualify leads
    qualified_leads = qualify_leads(facilities)
    print(f"\nQualified {len(qualified_leads)} leads out of {len(facilities)} total")

    # Step 4: Create DataFrame for analysis
    df = pd.DataFrame(facilities)
    qualified_df = pd.DataFrame(qualified_leads)

    return df, qualified_df

# Function to export results
def export_results(df, filename="healthcare_leads.xlsx"):
    """Export results to Excel file"""
    try:
        # Create Excel writer object
        with pd.ExcelWriter(filename, engine='openpyxl') as writer:
            # All leads
            df.to_excel(writer, sheet_name='All_Leads', index=False)

            # Qualified leads only
            qualified_df = df[df['qualified'] == True]
            qualified_df.to_excel(writer, sheet_name='Qualified_Leads', index=False)

            # Summary statistics
            summary_data = {
                'Metric': ['Total Facilities', 'Qualified Leads', 'With Phone', 'With Email', 'With Careers Page'],
                'Count': [
                    len(df),
                    len(qualified_df),
                    len(df[df['phone'] != '']),
                    len(df[df['email'] != '']),
                    len(df[df['has_careers_page'] == True])
                ]
            }
            summary_df = pd.DataFrame(summary_data)
            summary_df.to_excel(writer, sheet_name='Summary', index=False)

        print(f"Results exported to {filename}")

    except Exception as e:
        print(f"Error exporting results: {e}")

# Function to display results
def display_results(df):
    """Display results in a formatted way"""
    print("\n" + "="*80)
    print("HEALTHCARE LEAD GENERATION RESULTS")
    print("="*80)

    print(f"\nTotal Facilities Processed: {len(df)}")
    print(f"Qualified Leads: {len(df[df['qualified'] == True])}")
    print(f"Facilities with Phone: {len(df[df['phone'] != ''])}")
    print(f"Facilities with Email: {len(df[df['email'] != ''])}")
    print(f"Facilities with Careers Page: {len(df[df['has_careers_page'] == True])}")

    print("\n" + "-"*80)
    print("TOP QUALIFIED LEADS:")
    print("-"*80)

    qualified_leads = df[df['qualified'] == True].sort_values('lead_score', ascending=False)

    for idx, lead in qualified_leads.head(10).iterrows():
        print(f"\n{lead['name']}")
        print(f"Type: {lead['facility_type']}")
        print(f"URL: {lead['url']}")
        print(f"Phone: {lead['phone']}")
        print(f"Email: {lead['email']}")
        print(f"Lead Score: {lead['lead_score']}")
        print(f"Notes: {', '.join(lead['qualification_notes'])}")
        print("-" * 40)

# Example usage
print("Healthcare Lead Generation System Ready!")
print("\nTo run lead generation, use:")
print("df, qualified_df = run_lead_generation('Your City, State', max_facilities=20)")
print("\nTo display results, use:")
print("display_results(df)")
print("\nTo export results, use:")
print("export_results(df, 'your_filename.xlsx')")

# Get outreach templates
templates = create_outreach_templates()
print("\nOutreach templates are available in the 'templates' variable")
print("Template types:", list(templates.keys()))

Collecting selenium
  Downloading selenium-4.34.0-py3-none-any.whl.metadata (7.5 kB)
Collecting webdriver-manager
  Downloading webdriver_manager-4.0.2-py2.py3-none-any.whl.metadata (12 kB)
Collecting googlesearch-python
  Downloading googlesearch_python-1.3.0-py3-none-any.whl.metadata (3.4 kB)
Collecting trio~=0.30.0 (from selenium)
  Downloading trio-0.30.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket~=0.12.2 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting python-dotenv (from webdriver-manager)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)
Collecting outcome (from trio~=0.30.0->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 kB)
Collecting wsproto>=0.14 (from trio-websocket~=0.12.2->selenium)
  Downloading wsproto-1.2.0-py3-none-any.whl.metadata (5.6 kB)
Downloading selenium-4.34.0-py3-none-any.whl (9.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [3