Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pythainlp.tag.chunk #524

Merged
merged 7 commits into from
Feb 4, 2021
Merged

Add pythainlp.tag.chunk #524

merged 7 commits into from
Feb 4, 2021

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Jan 16, 2021

What does this changes

I add pythainlp.tag.chunk. pythainlp.tag.chunk is a chunk parser.

Your checklist for this pull request

🚨Please review the guidelines for contributing to this repository.

  • Passed code styles and structures
  • Passed code linting checks and unit test

@pep8speaks
Copy link

pep8speaks commented Jan 16, 2021

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-01-26 05:52:10 UTC

@wannaphong
Copy link
Member Author

@wannaphong
Copy link
Member Author

wannaphong commented Jan 17, 2021

Model Card

Model Details

  • Developer: Wannaphong Phatthiyaphaibun
  • Model date: 2021-01-21
  • Model version: 0.2
  • Used in PyThaiNLP version: __
  • GitHub: ___
  • License: CC0
  • train notebook: Add chunk notebook pythainlp_notebook#1
  • Dataset: ORCHID++ from Thai Treebanks Dataset. We extract sentence subtree from tree to train data. (5,000 tree up to 5,935 tree)

Intended Use

  • Parser thai sentence to phrase structure
  • Not suitable for other language or other domain of orchid corpus.

Factors

  • Based on thai chunk parser problems.

Metrics

  • Evaluation metrics include precision, recall and f1-score.

Training Data

ORCHID++ (90%) from Thai Treebanks Dataset

Evaluation Data

ORCHID++ (10%) from Thai Treebanks Dataset

Quantitative Analyses

              precision    recall  f1-score   support

        B-NP       0.95      0.98      0.96       518
        I-NP       0.86      0.91      0.88      2128
           O       0.87      0.91      0.89       280
        B-PP       0.91      0.77      0.83        65
        I-PP       0.66      0.52      0.59       252
         B-S       0.65      0.49      0.56        90
         I-S       0.67      0.49      0.56      1082
        B-VP       0.86      0.89      0.88       515
        I-VP       0.90      0.94      0.92      4565

   micro avg       0.86      0.86      0.86      9495
   macro avg       0.81      0.77      0.79      9495
weighted avg       0.86      0.86      0.86      9495
 samples avg       0.86      0.86      0.86      9495


Ethical Considerations

no ideas

Caveats and Recommendations

  • 1 Thai sentence with [(word,part-of-speech)] (part-of-speech model trained from orchid corpus)

@wannaphong wannaphong changed the title [WIP] Add pythainlp.tag.chunk Add pythainlp.tag.chunk Jan 18, 2021
@wannaphong wannaphong requested a review from bact January 18, 2021 14:55
@wannaphong wannaphong merged commit 7b224ef into dev Feb 4, 2021
@wannaphong wannaphong added this to the 2.3 milestone Mar 11, 2021
@wannaphong wannaphong deleted the add-chunk branch March 15, 2021 20:23
@wannaphong wannaphong mentioned this pull request Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants