Skip to content

Pytorch implementation of a BiLSTM model for the Wikification project.

License

Notifications You must be signed in to change notification settings

LeonardoEmili/Wikifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikifier

Wikification is the process of labeling input sentences into concepts from Wikipedia. The repository contains a major script for scraping text from Wikipedia dumps and parsing it into a dataset, the model for annotating sentences and an asynchronous web scraper for generating the dataset dynamically starting from a Wikipedia page used as seed.

Prerequisites

You can install the required dependencies using the Python package manager (pip):

pip3 install aiohttp
pip3 install cchardet
pip3 install aiodns
pip3 install wikipedia
pip3 install requests

Getting Started

First, we need to get the data. Wikiparser is a web scraper that loads dumps from XML files and stores the dataset as a collection of compressed files. You can run the script using the following syntax:

python3 WikiParser.py [OPTION]... URL... [-n NUM]
python3 WikiParser.py [OPTION]... [-n NUM]
python3 WikiParser.py [OPTION]... URL...

Built With

Authors