Skip to content

Kirisaki00/WikipediaScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📘 Wikipedia Web Scraper (Python)

📌 Overview

This project demonstrates web scraping using Python by extracting content from Wikipedia pages.

The scraper fetches webpage content, parses HTML, and saves the extracted information into structured text files for further use and analysis.


✨ Features

  • Scrapes content directly from Wikipedia pages
  • Fetches webpage data using HTTP requests
  • Parses HTML using BeautifulSoup
  • Stores extracted content in .txt files
  • Simple and beginner-friendly implementation

🛠️ Technologies Used

  • Python 🐍
  • Requests – HTTP requests
  • BeautifulSoup (bs4) – HTML parsing
  • Jupyter Notebook

📂 Files Included

  • 📓 wikkipediascraper.ipynb – Main notebook with scraping logic
  • 📄 Anime.txt – Scraped Wikipedia content about Anime
  • 📄 Mahatma Gandhi.txt – Scraped content about Mahatma Gandhi
  • 📄 README.md – Project documentation

🔍 What This Project Does

  • Sends requests to Wikipedia pages
  • Parses HTML content
  • Extracts relevant text data
  • Stores cleaned content into text files

📤 Output

  • Anime.txt – Wikipedia data about Anime
  • Mahatma Gandhi.txt – Wikipedia data about Mahatma Gandhi

🎯 Purpose

This project is built to:

  • Learn web scraping fundamentals
  • Understand HTML structure and parsing
  • Work with real-world web data
  • Strengthen Python programming skills

⚠️ Disclaimer

This project is for educational purposes only.
Always follow Wikipedia’s terms of service and scraping guidelines.


🚀 Future Improvements

  • Scrape multiple pages dynamically
  • Improve text cleaning and formatting
  • Add error handling and logging

✨ Author

Anupam Singh
Aspiring Data Analyst & Developer

About

Python web scraper that extracts Wikipedia content using BeautifulSoup and saves it into structured text files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors