Wikifier

Wikification is the process of labeling input sentences into concepts from Wikipedia. The repository contains a major script for scraping text from Wikipedia dumps and parsing it into a dataset, the model for annotating sentences and an asynchronous web scraper for generating the dataset dynamically starting from a Wikipedia page used as seed.

Prerequisites

You can install the required dependencies using the Python package manager (pip):

pip3 install aiohttp
pip3 install cchardet
pip3 install aiodns
pip3 install wikipedia
pip3 install requests

Getting Started

First, we need to get the data. Wikiparser is a web scraper that loads dumps from XML files and stores the dataset as a collection of compressed files. You can run the script using the following syntax:

python3 WikiParser.py [OPTION]... URL... [-n NUM]
python3 WikiParser.py [OPTION]... [-n NUM]
python3 WikiParser.py [OPTION]... URL...

Built With

AIOHTTP - Asynchronous HTTP Client used
Beautiful Soup - Library for parsing HTML
mwparserfromhell - A parser for MediaWiki wikicode
wikipedia - A wrapper for the MediaWiki API

Authors

Leonardo Emili - LeonardoEmili

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
input_data		input_data
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

input_data

input_data

src

src

test

test

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Wikifier

Prerequisites

Getting Started

Built With

Authors

About

Releases 1

Packages

Languages

License

LeonardoEmili/Wikifier

Folders and files

Latest commit

History

Repository files navigation

Wikifier

Prerequisites

Getting Started

Built With

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages