Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 9 additions & 16 deletions README-pypi.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,12 @@
![PyThaiNLP Logo](https://avatars0.githubusercontent.com/u/32934255?s=200&v=4)

# PyThaiNLP 2.0.2

[![Codacy Badge](https://api.codacy.com/project/badge/Grade/cb946260c87a4cc5905ca608704406f7)](https://www.codacy.com/app/pythainlp/pythainlp_2?utm_source=github.com&utm_medium=referral&utm_content=PyThaiNLP/pythainlp&utm_campaign=Badge_Grade)[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
[![Build Status](https://travis-ci.org/PyThaiNLP/pythainlp.svg?branch=develop)](https://travis-ci.org/PyThaiNLP/pythainlp)
[![Build status](https://ci.appveyor.com/api/projects/status/9g3mfcwchi8em40x?svg=true)](https://ci.appveyor.com/project/wannaphongcom/pythainlp-9y1ch)
[![Coverage Status](https://coveralls.io/repos/github/PyThaiNLP/pythainlp/badge.svg?branch=dev)](https://coveralls.io/github/PyThaiNLP/pythainlp?branch=dev)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
# PyThaiNLP 2.0.3

PyThaiNLP is a Python library for natural language processing (NLP) of Thai language.

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📖 [Upgrading from PyThaiNLP 1.7 to 2.0](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)

📖 [Upgrade ThaiNER from PyThaiNLP 1.7 to 2.0](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)

📫 follow us on Facebook [Pythainlp](https://www.facebook.com/pythainlp/)
📫 follow us on Facebook [PyThaiNLP](https://www.facebook.com/pythainlp/)

## What's new in version 2.0 ?

Expand All @@ -28,8 +18,11 @@ PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, pa
- Remove sentiment analysis
- Improved word_tokenize (newmm, mm) and dict_word_tokenize
- Improved POS-tagging
- More and improved examples
- see [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118)
- See examples in [Get Started notebook](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb)
- [Full change log](https://github.com/PyThaiNLP/pythainlp/issues/118)
- [Upgrading from 1.7](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
- [Upgrade ThaiNER from 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)


## Install

Expand Down Expand Up @@ -62,8 +55,8 @@ Install it with pip, for example: `pip install marisa_trie‑0.7.5‑cp36‑cp36

## Links

- User guide : [English](https://colab.research.google.com/drive/1MQ10D1mJC5r1vQAHcj4ShoRS14vz8ZF-) , [ภาษาไทย](https://colab.research.google.com/drive/1rEkB2Dcr1UAKPqz4bCghZV7pXx2qxf89)
- User guide: [English](https://github.com/PyThaiNLP/pythainlp/blob/dev/notebooks/pythainlp-get-started.ipynb), [ภาษาไทย](https://colab.research.google.com/drive/1rEkB2Dcr1UAKPqz4bCghZV7pXx2qxf89)
- Docs: https://thainlp.org/pythainlp/docs/2.0/
- GitHub: https://github.com/PyThaiNLP/pythainlp
- Issues: https://github.com/PyThaiNLP/pythainlp/issues
- Facebook : [Pythainlp](https://www.facebook.com/pythainlp/)
- Facebook: [PyThaiNLP](https://www.facebook.com/pythainlp/)
17 changes: 10 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@ Thai Natural Language Processing in Python.

PyThaiNLP is a Python package for text processing and linguistic analysis, similar to `nltk` but with focus on Thai language.

- [Current PyThaiNLP stable release is 2.0](https://github.com/PyThaiNLP/pythainlp/tree/master)
- PyThaiNLP 2.0 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
- Python 2.7+ users can use PyThaiNLP 1.6.
**This is a document for development branch (post 2.0). Things will break.**

**This is a document for development branch (post 2.0). Things will break. For a stable branch document, see [master](https://github.com/PyThaiNLP/pythainlp/tree/master).**
- The latest stable release is [2.0.3](https://github.com/PyThaiNLP/pythainlp/tree/master)
- PyThaiNLP 2 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
- [Upgrading from 1.7](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
- [Upgrade ThaiNER from 1.7](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
- Python 2.7+ users can use PyThaiNLP 1.6.

📫 follow us on Facebook [PyThaiNLP](https://www.facebook.com/pythainlp/)

Expand Down Expand Up @@ -102,10 +104,11 @@ PyThaiNLP เป็นไลบารีภาษาไพทอนเพื่

> เพราะโลกขับเคลื่อนต่อไปด้วยการแบ่งปัน

- PyThaiNLP 2.0 รองรับ Python 3.6 ขึ้นไป
- ผู้ใช้ Python 2.7+ ยังสามารถใช้ PyThaiNLP 1.6 ได้
**เอกสารนี้สำหรับรุ่นพัฒนา อาจมีการเปลี่ยนแปลงได้ตลอด**

**เอกสารนี้สำหรับรุ่นพัฒนา (หลัง 2.0) อาจมีการเปลี่ยนแปลงได้ตลอด สำหรับเอกสารรุ่นเสถียร ดูที่ [master](https://github.com/PyThaiNLP/pythainlp/tree/master).**
- รุ่นเสถียรล่าสุดคือรุ่น [2.0.3](https://github.com/PyThaiNLP/pythainlp/tree/master)
- PyThaiNLP 2 รองรับ Python 3.6 ขึ้นไป
- ผู้ใช้ Python 2.7+ ยังสามารถใช้ PyThaiNLP 1.6 ได้

📫 ติดตามข่าวสารได้ที่ Facebook [Pythainlp](https://www.facebook.com/pythainlp/)

Expand Down
39 changes: 22 additions & 17 deletions bin/pythainlp
Original file line number Diff line number Diff line change
@@ -1,41 +1,46 @@
#!python3
# -*- coding: utf-8 -*-

_VERSION = "2.0.3"

import argparse

parser = argparse.ArgumentParser()
parser.add_argument("-t","--text",default=None, help="text",type=str)
parser.add_argument("-seg", "--segment", help="word segment",action="store_true")
parser.add_argument("-c", "--corpus", help="mange corpus",action="store_true")
parser.add_argument("-pos", "--postag", help="postag",action="store_true")
parser.add_argument("-soundex", "--soundex", help="soundex",default=None)
parser.add_argument("-e","--engine",default="newmm", help="the engine",type=str)
parser.add_argument("-pos-e","--postag_engine",default="perceptron", help="the engine for word tokenize",type=str)
parser.add_argument("-pos-c","--postag_corpus",default="orchid", help="corpus for postag",type=str)
parser.add_argument("-t", "--text", default=None, help="text", type=str)
parser.add_argument("-seg", "--segment", help="word segment", action="store_true")
parser.add_argument("-c", "--corpus", help="mange corpus", action="store_true")
parser.add_argument("-pos", "--postag", help="postag", action="store_true")
parser.add_argument("-soundex", "--soundex", help="soundex", default=None)
parser.add_argument("-e", "--engine", default="newmm", help="the engine", type=str)
parser.add_argument("-pos-e", "--postag_engine", default="perceptron", help="the engine for word tokenize", type=str)
parser.add_argument("-pos-c", "--postag_corpus", default="orchid", help="corpus for postag", type=str)
args = parser.parse_args()

if args.corpus:
from pythainlp.corpus import *
print("PyThaiNLP Corpus")
temp=""
while temp!="exit":
print("\n\nPlease fill this out.\n1. install\n2. remove\n3. update\n4. exit\nex 1 or 2")
temp=input("input (1,2,3 or 4) : ")
print("\n1. Install\n2. Remove\n3. Update\n4. Exit\n")
temp=input("Choose 1, 2, 3, or 4: ")
if temp=="1":
name=input("name corpus : ")
name=input("Corpus name:")
download(name)
elif temp=="2":
name=input("name corpus : ")
name=input("Corpus name:")
remove(name)
elif temp=="3":
name=input("name corpus : ")
name=input("Corpus name:")
download(name)
elif temp=="4":
break
else:
print("Please input 1,2,3 or 4.")
print("Choose 1, 2, 3, or 4:")
elif args.text!=None:
from pythainlp.tokenize import word_tokenize
tokens=word_tokenize(args.text,engine=args.engine)
tokens=word_tokenize(args.text, engine=args.engine)
if args.segment:
print('|'.join(tokens))
print("|".join(tokens))
elif args.postag:
from pythainlp.tag import pos_tag
print("\t".join([i[0]+"/"+i[1] for i in pos_tag(tokens, engine=args.postag_engine, corpus=args.postag_corpus)]))
Expand All @@ -45,4 +50,4 @@ elif args.soundex!=None:
args.engine="lk82"
print(soundex(args.soundex, engine=args.engine))
else:
print("PyThaiNLP 2.0.2")
print(f"PyThaiNLP {_VERSION}")
2 changes: 1 addition & 1 deletion conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% set version = "2.0.2" %}
{% set version = "2.0.3" %}

package:
name: pythainlp
Expand Down
2 changes: 1 addition & 1 deletion docs/api/tokenize.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ Modules
.. autofunction:: sent_tokenize
.. autofunction:: dict_trie
.. autoclass:: Tokenizer
:members: word_tokenize,set_tokenize_engine
:members: word_tokenize, set_tokenize_engine
2 changes: 1 addition & 1 deletion meta.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% set version = "2.0.2" %}
{% set version = "2.0.3" %}

package:
name: pythainlp
Expand Down
Loading