📘 Wikipedia Web Scraper (Python)

📌 Overview

This project demonstrates web scraping using Python by extracting content from Wikipedia pages.

The scraper fetches webpage content, parses HTML, and saves the extracted information into structured text files for further use and analysis.

✨ Features

Scrapes content directly from Wikipedia pages
Fetches webpage data using HTTP requests
Parses HTML using BeautifulSoup
Stores extracted content in .txt files
Simple and beginner-friendly implementation

🛠️ Technologies Used

Python 🐍
Requests – HTTP requests
BeautifulSoup (bs4) – HTML parsing
Jupyter Notebook

📂 Files Included

📓 wikkipediascraper.ipynb – Main notebook with scraping logic
📄 Anime.txt – Scraped Wikipedia content about Anime
📄 Mahatma Gandhi.txt – Scraped content about Mahatma Gandhi
📄 README.md – Project documentation

🔍 What This Project Does

Sends requests to Wikipedia pages
Parses HTML content
Extracts relevant text data
Stores cleaned content into text files

📤 Output

Anime.txt – Wikipedia data about Anime
Mahatma Gandhi.txt – Wikipedia data about Mahatma Gandhi

🎯 Purpose

This project is built to:

Learn web scraping fundamentals
Understand HTML structure and parsing
Work with real-world web data
Strengthen Python programming skills

⚠️ Disclaimer

This project is for educational purposes only.
Always follow Wikipedia’s terms of service and scraping guidelines.

🚀 Future Improvements

Scrape multiple pages dynamically
Improve text cleaning and formatting
Add error handling and logging

✨ Author

Anupam Singh
Aspiring Data Analyst & Developer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 Wikipedia Web Scraper (Python)

📌 Overview

✨ Features

🛠️ Technologies Used

📂 Files Included

🔍 What This Project Does

📤 Output

🎯 Purpose

⚠️ Disclaimer

🚀 Future Improvements

✨ Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Anime.txt		Anime.txt
Mahatma Gandhi.txt		Mahatma Gandhi.txt
README.md		README.md
wikkipediascraper.ipynb		wikkipediascraper.ipynb

Folders and files

Latest commit

History

Repository files navigation

📘 Wikipedia Web Scraper (Python)

📌 Overview

✨ Features

🛠️ Technologies Used

📂 Files Included

🔍 What This Project Does

📤 Output

🎯 Purpose

⚠️ Disclaimer

🚀 Future Improvements

✨ Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages