Deep learning model for query auto-correction in mixed script
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bi-gram markov model
recurrent language model
saved_model
LICENSE
README.md
auto_correct.py
code_run.py
dict1
dict_rev1
history
learning_model.py
process_data.py
question.pkl

README.md

Auto-correction-for-transliterated-queries

PLease refer to my blog Transliterated Queries 2 – Deep Learning for the implementation details.

The project is inspired by my following papers:

Refer to my blog for implementation of above papers: Simple Markov Model for correcting Transliterated Queries

Dependencies:

Install the following packages for using the project:

pip install nltk
pip install keras
pip install tensorflow
pip install h5py

Usage:

import auto_correct as auto
model = auto.auto_correct()
model.run()

enter a query
hw to lrn pythn anddeeplearning eas ily
how to learn python and deep learning easily    11.2134873867

Parameters of the model

auto_correct(data=,re_train=,vocab_size=,step=,batch_size=,nb_epoch=,embed_dims=)
        

For retraining the model, set re_train = True and pass the queries as the other argument. The queries must be given in the following format:

queries=[]
queries = ['how to handle a 1.5 year old when hitting',
 'how can i avoid getting sick in china',
 'how do male penguins survive without eating for four months',
 'how do i remove candle wax from a polar fleece jacket',
 'how do i find an out of print book']

model = auto.auto_correct(re_train=True,data=queries)

The other parameters to the model are

  • vocab_size - The size of vocabulary used i.e. the number of unique words
  • step - The size of sliding window
  • batch_size - Number of training samples to be passed on one iteration
  • nb_epoch - Total number of iteration
  • embed_dims - Embedding dimension size