Skip to content

JPOORNA/web-scraper-mysql-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Data Scraper with MySQL and Docker

This setup demonstrates an end-to-end approach to extracting structured data (e.g., product name and price of sourdough bread) from a simple HTML interface and storing it in a MySQL database — all containerized using Docker for portable deployment.


Tech Stack

  • Python 3.10
  • BeautifulSoup (HTML parsing)
  • MySQL (Data storage)
  • Docker (Environment containerization)
  • Ubuntu / WSL2
  • VS Code

Use Case

Extracting and storing product data from a local HTML layout using Python. The flow simulates scraping from a bakery-style website where items like Sourdough Bread are dynamically picked and saved.


🔄 How It Works

  1. HTML Structure:

    • index.html contains the product listing.
  2. Python Script:

    • Scrape.py uses BeautifulSoup to locate a specific item.
    • Connects to a local MySQL database.
    • Inserts extracted data into a defined table.
  3. Docker Integration:

    • Everything runs in an isolated Docker container.
    • Dockerfile builds the image and installs all dependencies.

Getting Started

1. Clone the Repository

git clone https://github.com/JPOORNA/web-scraper-mysql-docker.git
cd web-scraper-mysql-docker

2. Update Scrape.py with Your MySQL Credentials

conn = mysql.connector.connect(
    host="host.docker.internal",
    user="root",
    password="poorna@610",
    database="bakery"
)

3. Build Docker Image

docker build -t bakery-scraper .

4. Run the Container

docker run bakery-scraper

✅ Expected Output

checking Sourdough Bread
Sourdough Bread: 200
data inserted

What’s Covered

  • Tag-based HTML parsing with BeautifulSoup
  • Host-to-container database communication via Docker
  • WSL2 environment support without using cloud
  • Real-time insertion of extracted data into MySQL

Possible Enhancements

  • Add cron or schedule for automation
  • Include UI or dashboard using Flask / Streamlit
  • Connect scraped data to analytics layer (Power BI / Pandas)

👤 Author

Poorna Chandra
Python | DevOps | Cloud | 🔗 GitHub: github.com/JPOORNA
🌐 LinkedIn: linkedin.com/in/yourprofile


✨ This setup helped build confidence working with Docker containers, database integrations, and local scraping logic without relying on AWS or cloud platforms.

About

Extract product data using Python, store in MySQL, containerized with Docker

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published