Adds math processing capabilities to Lucene or Solr
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.circleci
src
.gitignore
LICENSE
README.md
pom.xml

README.md

MIaSMath – Math processing for Lucene / Solr

CircleCI

MIaSMath is a math processing plugin for Lucene or Solr.

Usage

To integrate MIaSMath including MathTokenizer into a Solr instance:

  1. Copy the following libraries to the solr/lib directory:
  1. Configure the following attributes in schema.xml for the tokenizer MathTokenizer:
  • subformulaetrue for analyzer type index, and false for analyzer type query, as follows:

    <fieldType name="math" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="true"/> 
      </analyzer>
      <analyzer type="query">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="false"/> 
      </analyzer>
    </fieldType>
  • Declare a field for storing math as follows:

    <field name="math" type="math" indexed="true" stored="false" multiValued="true" />

That's it. You can now run your Solr instance and test MathTokenizer in the analysis interface.

Citing MIaSMath

Text

SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011. p. 57–60. ISBN 978-1-4503-0863-2. doi:10.1145/2034691.2034703.

BibTeX

@inproceedings{doi:10.1145:2034691.2034703,
     author = "Petr Sojka and Martin L{\'\i}{\v s}ka",
      title = "{The Art of Mathematics Retrieval}",
  booktitle = "{Proceedings of the ACM Conference on Document Engineering,
  		DocEng 2011}",
  publisher = "{Association of Computing Machinery}",
    address = "{Mountain View, CA}",
       year = 2011,
      month = Sep,
       isbn = "978-1-4503-0863-2",
      pages = "57--60",
        url = {http://doi.acm.org/10.1145/2034691.2034703},
        doi = {10.1145/2034691.2034703},
   abstract = {The design and architecture of MIaS (Math Indexer and Searcher), 
	       a system for mathematics retrieval is presented, and design 
	       decisions are discussed. We argue for an approach based on 
	       Presentation MathML using a similarity of math subformulae. The 
	       system was implemented as a math-aware search engine based on the 
	       state-of-the-art system Apache Lucene. Scalability issues were 
	       checked against more than 400,000 arXiv documents with 158 
	       million mathematical formulae. Almost three billion MathML 
	       subformulae were indexed using a Solr-compatible Lucene.},
}