Indexing and querying Wikipedia articles using Apache Lucene
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
libraries
small_lucene_index
src
LICENSE
README.md
find_periods.rb
notes

README.md

Parsing and indexing Wikipedia articles

This is one of my courseworks I've done during my study. The program consists of four main modules:

  • Indexer: this module handles indexing documents
  • Parser: this module parses the XML data file creates documents(Page object) and send it to Indexer for indexing
  • Searcher: this module handles searching
  • WikipediaRetriever: this is a wrapper module for all functionalities. You should run this module if you want to use provided functionalities of the program. After running this module you will have 2 options: first one is indexing the data and the second option is to query in already created index.