Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Misspellings lead IR systems to provide bad results without users even realizing their mistakes. Users assume that the system lacks quality, which decreases their satisfaction and likelihood of returning to the system. We detected the misspelling problem in the Portuguese Web Archive. Hence, we analyzed existing solutions for spelling suggestion and integrated the solution that provided the best results in our user interface.
The following technical report Query Suggestion for Web Archive Search describes the adopted methodology, the obtained results and the chosen solution.
The datasets (in Portuguese) used can be downloaded from http://www.linguateca.pt/Repositorio/CorrOrtog/.
git clone https://github.com/arquivo/pwa-technologies.git
The WAR file is available in:
- Install Hunspell from http://hunspell.sourceforge.net/
- Install the proper dictionaries available at http://wiki.services.openoffice.org/wiki/Dictionaries
- (Portuguese dictionaries are available at http://natura.di.uminho.pt/download/sources/Dictionaries/)
- Unzip all files to a directory.
- Copy pwaspellchecker-1.0.0.war to servlet container (e.g. Tomcat) webapps directory
- Configure webapps/pwaspellchecker/WEB-INF/web.xml
- Test at http://machine:8080/pwaspellchecker/checker?query=xxx
(where xxx is the query to test)