Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KenLM wrapper #71

Closed
wants to merge 1 commit into from
Closed

Add KenLM wrapper #71

wants to merge 1 commit into from

Conversation

pkuyym
Copy link
Contributor

@pkuyym pkuyym commented Jun 3, 2017

resolves #2229

Copy link
Contributor

@xinghai-sun xinghai-sun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LM class should be able to directly evaluate a sentence string instead of a list of word token indexes. String tokenizing, dictionary loading, dictionary lookup etc. should be included inside it to make the usage easier.


.. code-block:: python

pip install https://github.com/kpu/kenlm/archive/master.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move it to "requirements.txt"?


pip install https://github.com/kpu/kenlm/archive/master.zip

Please refer to **Scalable Modified Kneser-Ney Language Model Estimation** for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add a reference link.
  2. Add a dot at the end of the sentence.

class KenLMModel(object):
"""
Wrapper for KenLM language model.
You should install KenLM python interface first.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should install --> Please install.


class KenLMModel(object):
"""
Wrapper for KenLM language model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any full name for KenLM ? Add a reference link.

"""
Initialize variables and load model.

:param model_path: Path of language model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a dot. Same below.

:param id_list: Id list of word.
:param id_str_dict: list
"""
assert len(id_list) > 0, 'invalid id list'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace it with raise ValueError.

import kenlm
self._model = kenlm.LanguageModel(self._model_path)

def score_sentence_ids(self, id_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string type is better than an id list.
And this class should be responsible for tokenization, dictionary loading and dictionary looking-up. Don't leave it to users.

end token explicitly if the input sentence has been completed.

:param id_list: Id list of word.
:param id_str_dict: list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add return and rtype in the comments.

return math.pow(10, score)


if __name__ == '__main__':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove below or add this to a unit test file.

id_str_dict,
verbose=False):
"""
Initialize variables and load model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function comments should be moved above to class comments.

@pkuyym pkuyym closed this Jun 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants