Vietnamese-Spell-Correction

Training and evaluating phase before web application deploying

Data :

Data source (.txt files)
Data folder ( a single .pkl file)

Usage :

Download the Data folder and put it into our repo
Then, you can explore my approach in notebooks

Detail:

Result

Notes

This model is adapted from the seq2seq architecture( Sequence-to-sequence models are deep learning models that have achieved a lot of success in tasks like machine translation, text summarization, and image captioning )
There are some disadvantages when addressing social media data (Facebook, Zalo, Twitter, ... ) ( this model is overfit with artificial data, which is generated by add_noise.py )

It actually need the labeled real data for practical application