Skip to content

C language spell checker using a patricia trie and Damereau-Levenshtein distance.

Notifications You must be signed in to change notification settings

Kawaboongawa/Spellchecker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

################################################################################
################################################################################

   _____            _ _  _____ _               _
  / ____|          | | |/ ____| |             | |
 | (___  _ __   ___| | | |    | |__   ___  ___| | _____ _ __
  \___ \| '_ \ / _ \ | | |    | '_ \ / _ \/ __| |/ / _ \ '__|
  ____) | |_) |  __/ | | |____| | | |  __/ (__|   <  __/ |
 |_____/| .__/ \___|_|_|\_____|_| |_|\___|\___|_|\_\___|_|
        | |
        |_|

################################################################################
################################################################################

VERSION :


                        Last Update done the 31/08/2017
                                 Spellchecker v1.0


AUTHORS :


                        LUGAND Jérémy lugand_j@epita.fr
                        CETRE Cyril   cyril.cetre@epita.fr


PREVIEW :


This is a C language spell checker building first a disk written
dictionary an then using it to give every word that matches
the request within a given distance of Damerau-Levenshtein.
This project was written for an EPITA school project.


REQUIREMENT :


This project is available on both MacOs and LINUX operating system.
You will only need a version of gcc and gcc-7 on MacOs.


BUILD :


To build the project run the make command
just as shown in the following in the root directory of the project :

42sh$ make

This should generate two different binaries : TextMiningCompiler and
TextMiningCompiler.

To ensure that everything is working properly you can run our test case that
compare to the reference given by the teacher. you can do so by typing :

42sh$ make test

To clean binaries & trash files generated by the project, you can simply type:

42sh$ make clean


THE COMPILER :


usage :
42sh$ ./ref/osx/TextMiningCompiler /path/to/word/freq.txt /path/to/output/dict.bin

The binary will take a text file as first argument and will generate a
dictionary with the name of the second argument as output. The text file
must respect a proper syntax, which is the word, followed by at least one
space and followed by its frequency and a linefeed. this is an example
of input :

42sh$ cat -e example_word.txt
this     705$
was      695$
a        2014$
cool     758$
project  810$
to       69619$
do       5349$


THE REQUEST APPLICATION:


usage :
42sh$ echo "approx 0 example" | ./TextMiningApp /path/to/compiled/dict.bin
[{"word":"example","freq":984528,"distance":0}]

42sh$ echo "approx 0 anotherone" | ./TextMiningApp /path/to/compiled/dict.bin
[{"word":"anotherone","freq":933,"distance":0}]

42sh$ ./TextMiningApp /path/to/compiled/dict.bin < /path/to/file_with_request.txt
...

This binary will take the dictionary compiled by the compiler as first argument
and will read stdin. The input must have the format given above.
the number given is the maximal distance that we are looking for. A distance
of 0 means that we are looking for the exact word. Be careful that greater
is the distance, greater is the time taken to process the request.
The output result is given in JSON format.

About

C language spell checker using a patricia trie and Damereau-Levenshtein distance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published