Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README-pypi.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
![PyThaiNLP Logo](https://avatars0.githubusercontent.com/u/32934255?s=200&v=4)

# PyThaiNLP 2.0
# PyThaiNLP 2.0.2

[![Codacy Badge](https://api.codacy.com/project/badge/Grade/cb946260c87a4cc5905ca608704406f7)](https://www.codacy.com/app/pythainlp/pythainlp_2?utm_source=github.com&utm_medium=referral&utm_content=PyThaiNLP/pythainlp&utm_campaign=Badge_Grade)[![pypi](https://img.shields.io/pypi/v/pythainlp.svg)](https://pypi.python.org/pypi/pythainlp)
[![Build Status](https://travis-ci.org/PyThaiNLP/pythainlp.svg?branch=develop)](https://travis-ci.org/PyThaiNLP/pythainlp)
Expand All @@ -12,9 +12,9 @@ PyThaiNLP is a Python library for natural language processing (NLP) of Thai lang

PyThaiNLP includes Thai word tokenizers, transliterators, soundex converters, part-of-speech taggers, and spell checkers.

📖 For details on upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see [From PyThaiNLP 1.7 to PyThaiNLP 2.0](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)
📖 [Upgrading from PyThaiNLP 1.7 to 2.0](https://thainlp.org/pythainlp/docs/2.0/notes/pythainlp-1_7-2_0.html)

📖 For ThaiNER user after upgrading from PyThaiNLP 1.7 to PyThaiNLP 2.0, see [Upgrade ThaiNER from PyThaiNLP 1.7 to PyThaiNLP 2.0](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)
📖 [Upgrade ThaiNER from PyThaiNLP 1.7 to 2.0](https://github.com/PyThaiNLP/pythainlp/wiki/Upgrade-ThaiNER-from-PyThaiNLP-1.7-to-PyThaiNLP-2.0)

📫 follow us on Facebook [Pythainlp](https://www.facebook.com/pythainlp/)

Expand Down
15 changes: 6 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ Thai Natural Language Processing in Python.
PyThaiNLP is a Python package for text processing and linguistic analysis, similar to `nltk` but with focus on Thai language.

- [Current PyThaiNLP stable release is 2.0](https://github.com/PyThaiNLP/pythainlp/tree/master)
- PyThaiNLP 2.0 will support only Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
- Python 2 users can use PyThaiNLP 1.6, our latest released that tested with Python 2.7.
- PyThaiNLP 2.0 supports Python 3.6+. Some functions may work with older version of Python 3, but it is not well-tested and will not be supported. See [PyThaiNLP 2.0 change log](https://github.com/PyThaiNLP/pythainlp/issues/118).
- Python 2.7+ users can use PyThaiNLP 1.6.

**This is a document for development branch (post 1.7.x). Things will break. For a stable branch document, see [master](https://github.com/PyThaiNLP/pythainlp/tree/master).**
**This is a document for development branch (post 2.0). Things will break. For a stable branch document, see [master](https://github.com/PyThaiNLP/pythainlp/tree/master).**

📫 follow us on Facebook [PyThaiNLP](https://www.facebook.com/pythainlp/)

Expand Down Expand Up @@ -102,13 +102,10 @@ PyThaiNLP เป็นไลบารีภาษาไพทอนเพื่

> เพราะโลกขับเคลื่อนต่อไปด้วยการแบ่งปัน

รองรับ Python 3.6 ขึ้นไป
- PyThaiNLP 2.0 รองรับ Python 3.6 ขึ้นไป
- ผู้ใช้ Python 2.7+ ยังสามารถใช้ PyThaiNLP 1.6 ได้

- ตั้งแต่รุ่น 1.7 PyThaiNLP จะเลิกสนับสนุน Python 2 (บางฟังก์ชันอาจยังทำงานได้ แต่จะไม่ได้รับการสนับสนุน)
- ตั้งแต่รุ่น 2.0 จะยุติการรองรับ Python 2 ทั้งหมด
- ผู้ใช้ Python 2 ยังสามารถใช้ PyThaiNLP 1.6 ได้

**เอกสารนี้สำหรับรุ่นพัฒนา (หลัง 1.7.x) อาจมีการเปลี่ยนแปลงได้ตลอด สำหรับเอกสารรุ่นเสถียร ดูที่ [master](https://github.com/PyThaiNLP/pythainlp/tree/master).**
**เอกสารนี้สำหรับรุ่นพัฒนา (หลัง 2.0) อาจมีการเปลี่ยนแปลงได้ตลอด สำหรับเอกสารรุ่นเสถียร ดูที่ [master](https://github.com/PyThaiNLP/pythainlp/tree/master).**

📫 ติดตามข่าวสารได้ที่ Facebook [Pythainlp](https://www.facebook.com/pythainlp/)

Expand Down
2 changes: 1 addition & 1 deletion bin/pythainlp
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,4 @@ elif args.soundex!=None:
args.engine="lk82"
print(soundex(args.soundex, engine=args.engine))
else:
print("PyThaiNLP 2.0")
print("PyThaiNLP 2.0.2")
2 changes: 1 addition & 1 deletion conda.recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% set version = "2.0.1" %}
{% set version = "2.0.2" %}

package:
name: pythainlp
Expand Down
2 changes: 1 addition & 1 deletion meta.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% set version = "2.0.1" %}
{% set version = "2.0.2" %}

package:
name: pythainlp
Expand Down
2 changes: 1 addition & 1 deletion pythainlp/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# -*- coding: utf-8 -*-

__version__ = "2.0.1"
__version__ = "2.0.2"

thai_consonants = "กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรลวศษสหฬอฮ" # 44 chars
thai_vowels = "ฤฦะ\u0e31าำ\u0e34\u0e35\u0e36\u0e37\u0e38\u0e39เแโใไ\u0e45\u0e47" # 19
Expand Down
1 change: 0 additions & 1 deletion pythainlp/tag/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@
"DIAC": "DET",
"DIBQ": "DET",
"DIAQ": "DET",
"DCNM": "DET",
# NUM
"NUM": "NUM",
"NCNM": "NUM",
Expand Down
12 changes: 10 additions & 2 deletions pythainlp/tokenize/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,17 +122,25 @@ def sent_tokenize(text: str, engine: str = "whitespace+newline") -> List[str]:
def subword_tokenize(text: str, engine: str = "tcc") -> List[str]:
"""
:param str text: text to be tokenized
:param str engine: choosing 'tcc' uses the Thai Character Cluster rule to segment words into the smallest unique units.
:param str engine: subword tokenizer
:Parameters for engine:
* tcc (default) - Thai Character Cluster (Theeramunkong et al. 2000)
* etcc - Enhanced Thai Character Cluster (Inrut et al. 2001) [In development]
:return: a list of tokenized strings.
"""
if not text:
return ""

from .tcc import tcc
from .etcc import etcc

if engine == "tcc":
return tcc(text)
elif engine == "etcc":
return etcc(text).split("/")
#default
return tcc(text)


def syllable_tokenize(text: str) -> List[str]:
"""
:param str text: input string to be tokenized
Expand Down
2 changes: 2 additions & 0 deletions pythainlp/tokenize/etcc.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
โปรแกรม ETCC ใน Python
พัฒนาโดย นาย วรรณพงษ์ ภัททิยไพบูลย์
19 มิ.ย. 2560
Reference: Inrut, Jeeragone, Patiroop Yuanghirun, Sarayut Paludkong, Supot Nitsuwat, and Para Limmaneepraserth. "Thai word segmentation using combination of forward and backward longest matching techniques." In International Symposium on Communications and Information Technology (ISCIT), pp. 37-40. 2001.


วิธีใช้งาน
etcc(คำ)
Expand Down
5 changes: 3 additions & 2 deletions pythainlp/tokenize/tcc.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# -*- coding: utf-8 -*-
"""
Separate Thai text into Thai Character Cluster (TCC).
Based on "Character cluster based Thai information retrieval" (Theeramunkong et al. 2002)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.2548
Based on "Character cluster based Thai information retrieval" (Theeramunkong et al. 2000)
https://dl.acm.org/citation.cfm?id=355225
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.2548

Credits:
- TCC: Jakkrit TeCho
Expand Down
2 changes: 1 addition & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 2.0.1
current_version = 2.0.2
commit = True
tag = True

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@

setup(
name="pythainlp",
version="2.0.1",
version="2.0.2",
description="Thai Natural Language Processing library",
long_description=readme,
long_description_content_type="text/markdown",
Expand Down