This setup demonstrates an end-to-end approach to extracting structured data (e.g., product name and price of sourdough bread) from a simple HTML interface and storing it in a MySQL database — all containerized using Docker for portable deployment.
- Python 3.10
- BeautifulSoup (HTML parsing)
- MySQL (Data storage)
- Docker (Environment containerization)
- Ubuntu / WSL2
- VS Code
Extracting and storing product data from a local HTML layout using Python. The flow simulates scraping from a bakery-style website where items like Sourdough Bread are dynamically picked and saved.
-
HTML Structure:
index.htmlcontains the product listing.
-
Python Script:
Scrape.pyusesBeautifulSoupto locate a specific item.- Connects to a local MySQL database.
- Inserts extracted data into a defined table.
-
Docker Integration:
- Everything runs in an isolated Docker container.
- Dockerfile builds the image and installs all dependencies.
git clone https://github.com/JPOORNA/web-scraper-mysql-docker.git
cd web-scraper-mysql-dockerconn = mysql.connector.connect(
host="host.docker.internal",
user="root",
password="poorna@610",
database="bakery"
)docker build -t bakery-scraper .docker run bakery-scraperchecking Sourdough Bread
Sourdough Bread: 200
data inserted
- Tag-based HTML parsing with
BeautifulSoup - Host-to-container database communication via Docker
- WSL2 environment support without using cloud
- Real-time insertion of extracted data into MySQL
- Add
cronorschedulefor automation - Include UI or dashboard using Flask / Streamlit
- Connect scraped data to analytics layer (Power BI / Pandas)
Poorna Chandra
Python | DevOps | Cloud |
🔗 GitHub: github.com/JPOORNA
🌐 LinkedIn: linkedin.com/in/yourprofile
✨ This setup helped build confidence working with Docker containers, database integrations, and local scraping logic without relying on AWS or cloud platforms.