Skip to content

Conversation

@lalital
Copy link
Contributor

@lalital lalital commented Jun 14, 2019

According to issue #71, docstrings documentation (for pythainlp v2.0.5 (10 May 2019)) was added for the following modules:

  1. pythainlp.tokenize
  2. pythainlp.tag
  3. pythainlp.word_vector
  4. pythainlp.ulmfit
  5. pythainlp.util
  6. pythainlp.soundex
  7. pythainlp.spell
  8. pythainlp.transliterate
  9. pythainlp.corpus
  10. pythainlp.tools
  11. pythainlp.summarize
  12. pythainlp.spell

Code examples were added to most of the functions/methods (i.e. tokenize.word_tokenize, tag.pos_tag). Rerturn type for functions/methods were specified. Also, brief explaination of the functionality for important functions/methods.

Fixing issues:

  • fix formatting issues (pep8) (due 15 June)

Todos:

High Priority

Due date Mon 10 June 2019

  1. pythainlp.tokenize
  • todo_tokenize_1: provide example for sent_tokenize
  • todo_tokenize_2: provide example for word_tokenize
  • todo_tokenize_3: provide example for syllabus_tokenize
  • todo_tokenize_4: provide example for subword_tokenize
  • todo_tokenize_5: briefly explain the algorithm of maximum matching (newmm)and cite the reference
  • todo_tokenize_6: briefly explain the algorithm of pyicu and cite the reference
  • todo_tokenize_7: briefly explain the algorithm of etcc and cite the reference
  • todo_tokenize_8: briefly explain the algorithm of deepcut and cite the reference
  • todo_tokenize_9: briefly explain the algorithm of multicut and cite the reference
  • todo_tokenize_10: briefly explain the algorithm of longest matching (longest) and cite the reference
  • todo_tokenize_11: Fix docstring format in tcc.py
  1. pythainlp.tag
  • todo_tag_1: provide lists of NER, POS tags
  • todo_tag_2: provide examples for pos_tag
  • todo_tag_3: provide examples for pos_tag_sents
  • todo_tag_4: provide examples for pos_tag_provinces
  • todo_tag_5: formatting docstring examples for named_entity.ThaiNameTagger
  • todo_tag_6: briefly explain unigram engine and cite the reference
  • todo_tag_7: briefly explain perceptron engine and cite the reference
  • todo_tag_8: briefly explain artagger engine and cite the reference
  • todo_tag_9: briefly explain orchid and cite the reference
  • todo_tag_10: briefly explain orchid_ud and cite the reference
  • todo_tag_11: briefly explain pud and cite the reference
  • todo_tag_12: briefly explain each tagger engines at the another section.
  • todo_tag_13: add reference section
  1. pythainlp.word_vector
  • todo_word_vector_1: give example for most_similar_cosmul
  • todo_word_vector_2: give example for doesnt_match
  • todo_word_vector_3: give example for similarity
  • todo_word_vector_4: give example for sentence_vectorizer
  1. pythainlp.ulmfit
  • todo_ulmfit_1: provide example for document_vector
  • todo_ulmfit_2: provide example for merge_wgts (Canceled)
  • todo_ulmfit_3: explain and show example for pythainlp.ulmfit.ThaiTokenizer
  1. pythainlp.util
  • todo_word_util_1: provide example for thaicheck
  • todo_word_util_2: provide example for thaiword_to_num
  • todo_word_util_3: provide example for thai_to_eng
  • todo_word_util_4: provide example for rank
  • todo_word_util_5: provide example for num_to_thaiword
  • todo_word_util_6: provide example for now_reign_year
  • todo_word_util_7: provide example for normalize
  • todo_word_util_8: provide example for countthai
  • todo_word_util_9: provide example for find_keyword
  • todo_word_util_10: provide example for bahttext
  • todo_word_util_11: provide example for collate and briefly explain
  • todo_word_util_12: rewrite docstrting for isthaichar
  • todo_word_util_13: provide example for arabic_digit_to_thai_digit
  • todo_word_util_14: provide example for num_to_thaiword
  • todo_word_util_15: provide example for deletetone
  • todo_word_util_16: provide example for eng_to_thai
  • todo_word_util_17: provide example for thai_digit_to_arabic_digit
  • todo_word_util_18: provide example for reign_year_to_ad
  • todo_word_util_19: rewrite docstrting for isthai
  • todo_word_util_20: provide example for text_to_arabic_digit
  • todo_word_util_21: provide example for text_to_thai_digit
  • todo_word_util_22: provide example for thai_strftime
  • todo_word_util_23: format docstring for thai_strftime

Medium Priority

Due date: Thu 13 June 2019

  1. pythainlp.soundex
  • todo_soundex_1: provide more examples for metasound
  • todo_soundex_2: provide examples for udom83
  • todo_soundex_3: provide examples for lk82
  • todo_soundex_4: provide examples for soundex
  • todo_soundex_5: briefly explain lk82
  • todo_soundex_6: briefly explain udom82
  • todo_soundex_7: briefly explain metasound
  1. pythainlp.spell
  • todo_spell_1: briefly explain Peter Norvig’s algorithm
  • todo_spell_2: cite the reference of Peter Norvig’s algorithm
  • todo_spell_3: provide examples for correct
  • todo_spell_4: provide examples for spell
  • todo_spell_5: provide examples for NorvigSpellChecker.correct
  • todo_spell_6: provide examples for NorvigSpellChecker.dictionary
  • todo_spell_7: provide examples for NorvigSpellChecker.freq
  • todo_spell_8: provide examples for NorvigSpellChecker.known
  • todo_spell_9: provide examples for NorvigSpellChecker.prob
  • todo_spell_10: provide examples for NorvigSpellChecker.spell
  • todo_spell_11: briefly explain constant variable DEFAULT_SPELL_CHECKER
  • todo_spell_12: explain spell
  1. pythainlp.transliterate
  • todo_transliterate_1: format docstring for romanize
  • todo_transliterate_2: format docstring for transliterate
  • todo_transliterate_3: provide examples for romanize
  • todo_transliterate_4: provide examples for transliterate
  • todo_transliterate_5: add reference

Low Priority

Due date: Thu 13 June 2019

  1. pythainlp.corpus
  • todo_corpus_1: provide link to thai_stopwords
  • todo_corpus_2: provide link to thai_words
  • todo_corpus_3: provide link to thai_syllables
  • todo_corpus_4: provide link to thai_negations
  • todo_corpus_5: provide link to countries
  • todo_corpus_6: provide link to provinces
  • todo_corpus_7: provide example for corpus.download
  • todo_corpus_8: provide example for corpus.remove
  • todo_corpus_9: provide example for corpus.get_corpus
  • todo_corpus_10: provide example for corpus.get_corpus_path
  • todo_corpus_11: provide examples for conceptnet.edges
  • todo_corpus_12: provide examples for wordnet.synset
  • todo_corpus_13: provide examples for wordnet.synsets
  • todo_corpus_14: provide examples for wordnet.all_lemma_names
  • todo_corpus_15: provide examples for wordnet.all_synsets
  • todo_corpus_16: provide examples for wordnet.langs
  • todo_corpus_17: provide examples for wordnet.lemmas
  • todo_corpus_18: provide examples for wordnet.lemma
  • todo_corpus_19: provide examples for wordnet.lemma_from_key
  • todo_corpus_20: provide examples for wordnet.path_similarity
  • todo_corpus_21: provide examples for wordnet.lch_similarity
  • todo_corpus_22: provide examples for wordnet.wup_similarity
  • todo_corpus_23: provide examples for wordnet.morphy
  • todo_corpus_24: briefly explain conceptnet.edges
  • todo_corpus_25: briefly explain wordnet.custom_lemmas
  1. pythainlp.tools
  • provide example for tools.get_full_data_path
  • provide example for tools.get_pythainlp_data_path
  • provide examples for tools.get_pythainlp_path
  1. pythainlp.summarize
  • provide examples for summarize.summarize
  • briefly explain functionality of summarize.summarize

bact and others added 30 commits May 9, 2019 17:56
Merge from 2.0.5 release
Specify that package `pythainlp` is not in the same directory as thte configuration file `docs/conf.py`

Reference: https://medium.com/@eikonomega/getting-started-with-sphinx-autodoc-part-1-2cebbbca5365
@PyThaiNLP PyThaiNLP deleted a comment from pep8speaks Jun 16, 2019
@PyThaiNLP PyThaiNLP deleted a comment from pep8speaks Jun 16, 2019
@PyThaiNLP PyThaiNLP deleted a comment from pep8speaks Jun 16, 2019
@PyThaiNLP PyThaiNLP deleted a comment from pep8speaks Jun 16, 2019
@p16i
Copy link
Contributor

p16i commented Jun 20, 2019

Hi,

I have a question regarding writing consistency. Currently, we tend to use both segmentation and tokenization. Although these words are interchangeable, do you think it's better if we're strict with one of them, i.e. tokenization?

@wannaphong wannaphong requested a review from bact July 9, 2019 07:14
Copy link
Member

@bact bact left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fantastic.

@wannaphong wannaphong requested review from cstorm125 and korakot July 17, 2019 16:01
Copy link
Member

@cstorm125 cstorm125 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent job

Copy link
Member

@korakot korakot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good details krub.

@wannaphong wannaphong merged commit 0a57ae9 into 2.0 Jul 27, 2019
@bact bact deleted the issue71_add_documentation branch September 7, 2019 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation improve documentation and test cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants