WikiScraper is a professional Python library to scrape Wikipedia articles easily. It allows you to scrape a single page or all linked articles recursively, supporting both .txt and .csv outputs.
- Scrape a single Wikipedia page or all linked articles recursively.
- Supports
.txtand.csvoutput formats. - Optionally add titles to scraped content.
- Logging options for file saves and all actions.
- Append all scraped articles into a single file or save separately.
- Handles multiple languages and errors gracefully.
- Polite crawling with configurable delay between requests.
pip install wikiscraper-pyfrom wikiscraper import WikiScraper
scraper = WikiScraper(file_type="txt", add_title=True)
scraper.scrape_one("https://en.wikipedia.org/wiki/Python")from wikiscraper import WikiScraper
scraper = WikiScraper(file_type="txt", add_title=True, all_on_one_file=True, polite_time=2)
scraper.scrape_all("https://en.wikipedia.org/wiki/Python")from wikiscraper import WikiScraper
scraper = WikiScraper(file_type="csv", add_title=True, all_on_one_file=True)
scraper.scrape_all("https://en.wikipedia.org/wiki/Ethiopia")- If
add_title=Trueand output is CSV:- The first column will contain the article title.
- The second column will contain the article text.
file_type:'txt'or'csv'. Default is'txt'.add_title: Add the article title at the top of the file or first CSV column. Default isFalse.log_saving: Log only file saves. Default isTrue.log_all: Log all actions including errors and skipped links. Default isFalse.polite_time: Delay between requests in seconds. Default is3.all_on_one_file: Append all articles into a single file when scraping multiple pages. Default isTrue.
Scraped files are saved in the data/ folder created automatically in the working directory.
- If
all_on_one_file=True: all articles are appended todata/wikipedia_all.txtor.csv. - If
all_on_one_file=False: each article is saved as a separate file with its title as filename.
- Skips invalid Wikipedia URLs.
- Logs network errors and pages without titles.
- Automatically filters out non-article links (categories, special pages, user pages, etc.).
MIT License