This project demonstrates web scraping using Python by extracting content from Wikipedia pages.
The scraper fetches webpage content, parses HTML, and saves the extracted information into structured text files for further use and analysis.
- Scrapes content directly from Wikipedia pages
- Fetches webpage data using HTTP requests
- Parses HTML using BeautifulSoup
- Stores extracted content in
.txtfiles - Simple and beginner-friendly implementation
- Python 🐍
- Requests – HTTP requests
- BeautifulSoup (bs4) – HTML parsing
- Jupyter Notebook
- 📓 wikkipediascraper.ipynb – Main notebook with scraping logic
- 📄 Anime.txt – Scraped Wikipedia content about Anime
- 📄 Mahatma Gandhi.txt – Scraped content about Mahatma Gandhi
- 📄 README.md – Project documentation
- Sends requests to Wikipedia pages
- Parses HTML content
- Extracts relevant text data
- Stores cleaned content into text files
- Anime.txt – Wikipedia data about Anime
- Mahatma Gandhi.txt – Wikipedia data about Mahatma Gandhi
This project is built to:
- Learn web scraping fundamentals
- Understand HTML structure and parsing
- Work with real-world web data
- Strengthen Python programming skills
This project is for educational purposes only.
Always follow Wikipedia’s terms of service and scraping guidelines.
- Scrape multiple pages dynamically
- Improve text cleaning and formatting
- Add error handling and logging
Anupam Singh
Aspiring Data Analyst & Developer