Skip to content

MIR-MU/MIaSMath

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIaSMath – Math processing for Lucene / Solr

ci

MIaSMath is a math processing plugin for Lucene or Solr.

Usage

To integrate MIaSMath including MathTokenizer into a Solr instance:

  1. Copy the following libraries to the solr/lib directory:
  1. Configure the following attributes in schema.xml for the tokenizer MathTokenizer:
  • subformulaetrue for analyzer type index, and false for analyzer type query, as follows:

    <fieldType name="math" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="true"/> 
      </analyzer>
      <analyzer type="query">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="false"/> 
      </analyzer>
    </fieldType>
  • Declare a field for storing math as follows:

    <field name="math" type="math" indexed="true" stored="false" multiValued="true" />

That's it. You can now run your Solr instance and test MathTokenizer in the analysis interface.

Citing MIaSMath

Text

SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011. p. 57–60. ISBN 978-1-4503-0863-2. doi:10.1145/2034691.2034703.

BibTeX

@inproceedings{doi:10.1145:2034691.2034703,
     author = "Petr Sojka and Martin L\'{i}\v{s}ka",
      title = "{The Art of Mathematics Retrieval}",
  booktitle = "{Proceedings of the ACM Conference on Document Engineering,
  		DocEng 2011}",
  publisher = "{Association of Computing Machinery}",
    address = "{Mountain View, CA}",
       year = 2011,
      month = Sep,
       isbn = "978-1-4503-0863-2",
      pages = "57--60",
        url = {http://doi.acm.org/10.1145/2034691.2034703},
        doi = {10.1145/2034691.2034703},
   abstract = {The design and architecture of MIaS (Math Indexer and Searcher), 
	       a system for mathematics retrieval is presented, and design 
	       decisions are discussed. We argue for an approach based on 
	       Presentation MathML using a similarity of math subformulae. The 
	       system was implemented as a math-aware search engine based on the 
	       state-of-the-art system Apache Lucene. Scalability issues were 
	       checked against more than 400,000 arXiv documents with 158 
	       million mathematical formulae. Almost three billion MathML 
	       subformulae were indexed using a Solr-compatible Lucene.},
}