Skip to content

JulianMaurin/wikicrawl

Repository files navigation

Wikicrawl

python-badge ci-badge codecov-badge

Introduction

Wikicrawl is a web crawler written in Python3 and based on Celery. Its particularity is to focus on wikipedia.org and to exploit a graph database.

Have a look to the architecture diagram and to the data representation to have a quick overview of the software.

Motivation

Wikipedia is one of the richest knowledge base of the internet, such dataset could offers great capabilities to develop semantic technology.

Wikicrawl have been designed to operate with any wikimedia website in any language (see configuration), increasing the possibilities.

This perspective encouraged me to polish this personal project and to make it open source.

Documentations