Wiker

library for wikipedia text dataset collection

Installation

pip install wiker

Quickstart

!Warning!

Before running the code, create "data" and "extra" folders inside the project folder and "pre_urls.txt" and "post_urls.txt" files inside the "extra" folder

File structure

my-app/
├─ data/
├─ extra/
│  ├─ pre_urls.txt
│  ├─ post_urls.txt
├─ main.py # your file

from wiker import Wiker

wk = Wiker(lang='uz', first_article_link="Turkiston")

wk.run(scrape_limit=50)

Another methods

from wiker import Wiker

wk = Wiker(lang='uz', first_article_link="Turkiston")

wk.reader() # read the pre_urls.txt file and return the result as a list
wk.read_url_count() # The number of all links that read the pre_urls.txt file
wk.extra_file_writer() # if the pre_urls.txt file is empty, the function writes first_article_link to the file
wk.scraper() # Get all articles from links in pre_urls.txt file
wk.text_cleaner() # clean up the html and other tags in the retrieved articles
wk.next_urls() # get links for further scraping
wk.dir_scanner() # scan the "data" folder to count files
wk.cleaned_text_writer(text_dict=wk.text_cleaner()) # 
wk.post_url_writer(url_list=wk.scraper().keys()) # writing the name of the saved articles to the file
wk.pre_url_writer(url_list=wk.next_urls()) # write names in next_urls to files for next process

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
examples		examples
wiker		wiker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wiker

Installation

Quickstart

Another methods

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

anorprogrammer/wiker

Folders and files

Latest commit

History

Repository files navigation

Wiker

Installation

Quickstart

Another methods

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages