An advanced web application that allows users to input a URL, scrape data from the provided website, and download the data as a CSV file. The application automatically deletes the CSV file from the server after download.
- Modern, responsive web interface with dark mode support
- Real-time progress updates during scraping
- Data preview before download
- Advanced scraping options:
- Multi-level scraping depth
- Pagination support
- Wait time configuration
- Image inclusion options
- Automatic data cleaning and formatting
- CSV export with one-click download
- Automatic file deletion after download
- Built with FastAPI for high performance
- Email Marketing Features:
- Send personalized emails to scraped contacts
- Template system with variable substitution
- Real-time email sending status updates
- SMTP server configuration options
- Modern UI with dark mode support
- Real-time progress tracking
- Data preview functionality
- Advanced configuration options
- Email marketing interface
-
Clone the repository
-
Create a virtual environment
python -m venv venv
-
Activate the virtual environment
-
Windows:
venv\Scripts\activate
-
macOS/Linux:
source venv/bin/activate
-
-
Install dependencies
pip install -r requirements.txt
-
Install Chrome browser (if not already installed)
The scraper uses Chrome browser, so make sure it's installed on your system.
-
Run the application
uvicorn app:app --host 0.0.0.0 --port 8000 --reload
-
Access the web interface
Open your browser and navigate to http://127.0.0.1:8000
- Enter the URL of the website you want to scrape in the input field
- Configure advanced options if needed:
- Scraping depth: Choose how many levels deep to scrape
- Wait time: Set delay between requests to avoid rate limiting
- Follow pagination: Enable to scrape multiple pages
- Include images: Option to include image data in results
- Click the "Scrape Data" button to start scraping
- Monitor real-time progress with detailed status updates
- Once scraping is complete, preview the data before downloading
- Click the "Download CSV" button to download the data
- The CSV file will be automatically deleted from the server after download
- After scraping data, scroll down to the "Send Emails to Contacts" section
- Enter your email credentials and SMTP server information
- Compose your email subject and body
- Use placeholders like
{{Company Name}}
or{{Email}}
to personalize your message - Click "Send Emails" to start the email campaign
- Monitor real-time progress of email sending
FastAPI automatically generates interactive API documentation:
- Swagger UI: http://127.0.0.1:8000/docs
- ReDoc: http://127.0.0.1:8000/redoc
GET /
: Web interfacePOST /scrape
: Start a new scraping jobGET /job_status/{job_id}
: Check the status of a jobGET /preview/{job_id}
: Get a preview of scraped dataGET /download/{job_id}
: Download the scraped data as CSVGET /jobs
: List all active and recent jobsPOST /send-emails/{job_id}
: Send emails to contacts in CSVGET /email-status/{job_id}/{email_job_id}
: Check email sending status
- Frontend: HTML, CSS, JavaScript, Bootstrap 5
- Backend: FastAPI, Python 3.10+
- Scraping: Selenium, Undetected ChromeDriver
- Data Processing: Pandas
- Email: SMTP, MIMEText, MIMEMultipart
- The scraper is configured to work with websites that require login
- For websites with CAPTCHA, manual intervention may be required
- Scraping performance depends on website structure and size
- Respect website terms of service and robots.txt when scraping
- For email sending, you may need to enable "Less secure app access" in Gmail or use an App Password