Skip to content

a crawler for checking that all site is translated to a specified language

License

Notifications You must be signed in to change notification settings

bogdartysh/languagechecker-crawler

Repository files navigation

This crawler is designed for single purpouse:  to check that available web site (or portion of it) is fully translated from one language to another.

As a user you could open distro folder, find the last shipped distro, edit task.properties file (set task_external_id, start_urls, url_pattern, origin_language_code, shouldbe_language_code, pages_to_fill_all_forms) and run from terminal start.sh (or start.bat). Results will be in the results folder

Developer will need to:
1. install maven (maven.apache.org) and Java 8+
3. open terminal and execute
3.1 mvn clean install
3.2 mvn assembly:single
3.3 cd target
3.4 adjust settings of the project (file task.properties, examples in tasks folder)
3.5 execute : java -jar -Xsx1024m -Xmx1024 languagechecker-crawler-1.0-SNAPSHOT-jar-with-dependencies.jar >results.csv
results will be in results.csv (some times in cases of error that csv will be not readable in OpenOffice, so please check it with editor first).
log files are in target/logs
Dictionary files of you language should be named as ru.dict, en.dict, pl.dict and be places in target/dict folder. These files should be in utf-8 encoding and should consist of all words of chosen language.
For example please see en.dict and pl.dict files (they are extracted from aspell project)



For license info please see license file

About

a crawler for checking that all site is translated to a specified language

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages