Skip to content
/ moss Public

Searching for misspelling, bad grammar, and violations of the Manual of Style in Wikipedia

License

Notifications You must be signed in to change notification settings

cdbeland/moss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

See https://en.wikipedia.org/wiki/Wikipedia:Typo_Team/moss

HARDWARE REQUIREMENTS
* It takes at least ~6GB of RAM just for the spell-check dictionary to load
* Parallelized Python code is expecting 8 available CPU cores for
  optimal performance (if tweaking, look for "Pool(8)" in Python
  scripts)

UBUNTU DEPENDENCIES
sudo apt-get install git pyflakes3 pylint python3-pep8 python3-dev g++ protobuf-c-compiler

FEDORA DEPENDENCIES
sudo dnf install git python3-flake8 protobuf-c-compiler pylint python3-devel protobuf-devel

GIT SETUP (optional)
git config --global color.ui true
git config --global push.default simple

CLONE AND SET UP:
cd ~
# Or wherever you'd like the clone to be parented
git clone https://github.com/cdbeland/moss.git
cd moss/
./reset_environment.sh
sudo mkdir /var/local/moss/bulk-wikipedia/
sudo chown $USER /var/local/moss/bulk-wikipedia/
Run "FIRST TIME SETUP" steps in ./update_downloads_parallel.sh

DOWNLOAD DATA:
# Do not do this if Wikipedia or Wiktionary dumps are in progress at:
#  https://dumps.wikimedia.org/backup-index-bydb.html
./update_downloads_parallel.sh

# Wait for download to complete; you can watch progress with:
# tail -f /var/local/moss/bulk-wikipedia/*log

RUN SPELL CHECK AND FRIENDS:
./run_moss_parallel.sh

TO SPELL CHECK ONLY ARTICLES WITH TITLES STARTING WITH "X":
./run_moss_parallel.sh --spell-check-only X

Omit X to run spell check only and skip other reports. You can also
specify any letter or number, or "BEFORE_A" and "AFTER_Z".

About

Searching for misspelling, bad grammar, and violations of the Manual of Style in Wikipedia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published