Skip to content

Mhmoud94/Automated-Data-Collection-System---Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Data Collection System - Python

Web scraping project completed for Master's in Data Science at Dalarna University.

📊 Project Overview

This project demonstrates automated data collection techniques using Python web scraping. Built 3 different scrapers to collect data from various online sources.

🎯 Tasks

Task 1: Textual Data Scraping

File: Task1.py
Output: task1_output.txt

Scrapes articles about Machine Learning and AI from:

  • TechTarget
  • IBM

Extracts headlines and full article content from both sources.

Task 2: E-commerce Product Scraping

File: task2.py
Output: task2_products.csv

Scrapes product information from Books to Scrape website:

  • Product names
  • Prices
  • Exports to CSV format

Task 3: Weather Data with Error Handling

File: task3.py
Output: task3_weather.txt

Collects weather data from multiple sources:

  • Wttr.in (Weather API)
  • TimeAndDate.com

Features fault-tolerant error handling that continues running even when sources fail.

🛠️ Technologies Used

  • Python 3.x
  • BeautifulSoup4 - HTML parsing
  • Requests - HTTP requests
  • CSV - Data export

📦 Installation

pip install beautifulsoup4 requests

▶️ How to Run

python Task1.py
python task2.py
python task3.py

📄 Output Files

Each script generates its own output file:

  • task1_output.txt - Extracted articles
  • task2_products.csv - Product data (opens in Excel)
  • task3_weather.txt - Weather information

⚖️ Ethical Scraping

This project follows ethical web scraping practices:

  • Respects robots.txt
  • Uses appropriate User-Agent headers
  • Implements delays between requests
  • Only accesses publicly available data

🎓 Academic Context

Program: Master's in Data Science
University: Dalarna University, Sweden
Course: Data Collection and Quality

📧 Contact

Mhmoud Ahmad
LinkedIn

📝 License

For educational purposes.


---

## ✅ **NEXT STEPS:**

1. **Click "Add a README"** (green button)
2. **Paste the content above**
3. **Click "Commit"**
4. **Done!** ✓

---

## 🚀 **THEN:**

**Your GitHub link will be:**

https://github.com/Mhmoud94/Automated-Data-Collection-System---Python

About

Web scraping project - Dalarna University MSc Data Science. Task 1: Article scraper (TechTarget, IBM). Task 2: Product scraper → CSV. Task 3: Weather scraper with fault tolerance. Python | BeautifulSoup | Requests

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages