Skip to content

High-performance async web scraper for extracting real estate listings and phone numbers from multiple sources.

Notifications You must be signed in to change notification settings

Ismat-Samadov/lead_generator

Repository files navigation

Lead Generator

Full-stack application for scraping and managing real estate leads from EVV.AZ and Villa.AZ.

Project Structure

.
├── scraper/              # Python scraper backend
│   ├── sources/          # Website-specific scrapers
│   ├── scripts/          # Utilities (validator, telegram)
│   └── main.py           # Scraper orchestrator
│
├── app/                  # Next.js frontend
│   ├── api/              # API routes
│   ├── dashboard/        # User dashboard
│   ├── admin/            # Admin panel
│   └── login/            # Authentication
│
├── lib/                  # Shared utilities
│   ├── db.ts             # Database connection
│   └── auth.ts           # Authentication helpers
│
└── components/           # React components

Features

Backend (Python Scraper)

  • Automated daily scraping from EVV.AZ and Villa.AZ
  • Phone number validation (Azerbaijan mobile numbers)
  • Telegram notifications with detailed reports
  • PostgreSQL storage with duplicate prevention
  • GitHub Actions for automated scheduling

Frontend (Next.js)

  • Admin authentication with NextAuth.js
  • User management (CRUD operations)
  • Leads dashboard with statistics
  • Search and pagination
  • Excel export functionality
  • Role-based access control (Admin/User)

Quick Start

1. Database Setup

# Initialize leads table
psql $DATABASE_URL -f scraper/init_db.sql

# Initialize users table
psql $DATABASE_URL -f init_users_db.sql

2. Frontend Development

# Install dependencies
npm install

# Set up environment
cp .env.example .env.local
# Edit .env.local with your credentials

# Run development server
npm run dev

Visit http://localhost:3000

Default credentials: admin / admin123

3. Backend (Scraper)

cd scraper

# Local testing
docker compose up scraper

# Or with Python
python main.py

Environment Variables

Frontend (.env.local)

DATABASE_URL=postgresql://...
NEXTAUTH_URL=http://localhost:3000
NEXTAUTH_SECRET=generate-with-openssl-rand-base64-32

Backend (scraper/.env)

DATABASE_URL=postgresql://...
TELEGRAM_BOT_TOKEN=your-bot-token
TELEGRAM_CHAT_ID=your-chat-id

Deployment

See DEPLOYMENT.md for detailed deployment instructions.

Quick Deploy to Vercel

# Install Vercel CLI
npm install -g vercel

# Deploy
vercel

Don't forget to:

  1. Initialize users database
  2. Set environment variables in Vercel
  3. Generate secure NEXTAUTH_SECRET
  4. Update NEXTAUTH_URL to production domain

User Roles

Admin

  • Full access to all features
  • Create/update/delete users
  • View and export all leads
  • Access to admin panel

User

  • View leads dashboard
  • Search and filter leads
  • Export data to Excel
  • No user management access

API Endpoints

Authentication

  • POST /api/auth/signin - Login
  • POST /api/auth/signout - Logout

Users (Admin only)

  • GET /api/users - List all users
  • POST /api/users - Create user
  • PUT /api/users - Update user
  • DELETE /api/users?id=X - Delete user

Leads

  • GET /api/leads - Get leads (paginated)
  • GET /api/leads/export - Export to Excel
  • GET /api/stats - Dashboard statistics

Scraper Schedule

GitHub Actions runs scraper:

  • Daily at 13:00 UTC (4:00 PM Azerbaijan Time)
  • Manual trigger available via GitHub Actions UI

Tech Stack

Frontend

  • Next.js 14 (App Router)
  • TypeScript
  • Tailwind CSS
  • NextAuth.js (Authentication)
  • XLSX (Excel export)

Backend

  • Python 3.11
  • aiohttp (Async HTTP)
  • BeautifulSoup + lxml (HTML parsing)
  • PostgreSQL (psycopg2)
  • Docker

Infrastructure

  • Vercel (Frontend hosting)
  • Neon Tech (PostgreSQL)
  • GitHub Actions (Automation)
  • Telegram Bot (Notifications)

Database Schema

leads

id, phone_number (unique), source_url, scraped_at

users

id, username (unique), password_hash, role, is_active, created_at, last_login

sessions

id, user_id, session_token, expires, created_at

GitHub Secrets Required

For automated scraping:

  • DATABASE_URL - PostgreSQL connection string
  • TELEGRAM_BOT_TOKEN - Bot token from @BotFather
  • TELEGRAM_CHAT_ID - Chat/group ID(s) for notifications

Development

Install Dependencies

# Frontend
npm install

# Backend
cd scraper
pip install -r requirements.txt

Run Tests

# Test scraper locally
cd scraper
python main.py

# Test Telegram notifications
python scripts/telegram.py

Get Telegram Chat ID

cd scraper
python scripts/get_chat_id.py

Troubleshooting

Database Connection Issues

  • Verify DATABASE_URL is correct
  • Check SSL mode is enabled
  • Ensure database accepts connections

Authentication Not Working

  • Verify NEXTAUTH_SECRET is set
  • Check NEXTAUTH_URL matches current domain
  • Clear browser cookies

Scraper Failures

  • Check phone validation rules
  • Verify website structure hasn't changed
  • Review GitHub Actions logs

License

Proprietary - All rights reserved

Support

For issues and questions, contact the development team.

About

High-performance async web scraper for extracting real estate listings and phone numbers from multiple sources.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published