Web scraping project completed for Master's in Data Science at Dalarna University.
This project demonstrates automated data collection techniques using Python web scraping. Built 3 different scrapers to collect data from various online sources.
File: Task1.py
Output: task1_output.txt
Scrapes articles about Machine Learning and AI from:
- TechTarget
- IBM
Extracts headlines and full article content from both sources.
File: task2.py
Output: task2_products.csv
Scrapes product information from Books to Scrape website:
- Product names
- Prices
- Exports to CSV format
File: task3.py
Output: task3_weather.txt
Collects weather data from multiple sources:
- Wttr.in (Weather API)
- TimeAndDate.com
Features fault-tolerant error handling that continues running even when sources fail.
- Python 3.x
- BeautifulSoup4 - HTML parsing
- Requests - HTTP requests
- CSV - Data export
pip install beautifulsoup4 requestspython Task1.py
python task2.py
python task3.pyEach script generates its own output file:
task1_output.txt- Extracted articlestask2_products.csv- Product data (opens in Excel)task3_weather.txt- Weather information
This project follows ethical web scraping practices:
- Respects robots.txt
- Uses appropriate User-Agent headers
- Implements delays between requests
- Only accesses publicly available data
Program: Master's in Data Science
University: Dalarna University, Sweden
Course: Data Collection and Quality
Mhmoud Ahmad
LinkedIn
For educational purposes.
---
## ✅ **NEXT STEPS:**
1. **Click "Add a README"** (green button)
2. **Paste the content above**
3. **Click "Commit"**
4. **Done!** ✓
---
## 🚀 **THEN:**
**Your GitHub link will be:**
https://github.com/Mhmoud94/Automated-Data-Collection-System---Python