Skip to content

Vancarii/gpt-web-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔎 Lead Generation & Email Scraper

This is a Streamlit-based web application designed to help users generate leads by finding business websites and extracting verified email addresses. It combines the power of OpenAI's GPT for generating website lists and web scraping for email extraction and validation.


🚀 Features

1️⃣ Find Business Websites

  • Enter a search query (e.g., "AI companies in Canada").
  • Uses OpenAI's GPT to generate a list of relevant websites.
  • Download the list of websites for further use.

2️⃣ Extract & Verify Emails

  • Scrapes websites for email addresses.
  • Validates emails using DNS MX records to ensure they are real.
  • Saves the results as a clean CSV file.

3️⃣ Interactive UI

  • Built with Streamlit for an easy-to-use and visually appealing interface.
  • Includes tabs for finding websites and extracting emails.

🛠️ Installation

1. Clone the Repository

git clone https://github.com/your-username/your-repo-name.git
cd your-repo-name

2. Set up Virtual Environment

Create a virtual environment to manage dependencies

source .venv/bin/activate  # On macOS/Linux
# On Windows: .venv\Scripts\activate

3. Install Dependencies

  • Install the required Python packages:
pip install -r requirements.txt

4. Setup the OpenAI API Key

Set your OpenAI API key as an environment variable

export OPENAI_API_KEY="your_openai_api_key"

Running the app:

  • To run the application, run the following command in your terminal:
streamlit run webscraper.py
  • Open up the localhost port that it provides, and you're done!

About

A simple open ai web scraper built with streamlit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages