New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Pyarabic library (araby/stack modules) #581

Merged
merged 4 commits into from Sep 22, 2017

Conversation

Projects
None yet
3 participants
@LBenzahia
Member

LBenzahia commented Sep 18, 2017

@kylepjohnson, Back 馃憤
Ready to review.
issue #555 (add pyarabic/docs)

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Sep 18, 2017

Codecov Report

Merging #581 into master will decrease coverage by 0.1%.
The diff coverage is 83.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #581      +/-   ##
==========================================
- Coverage   86.28%   86.18%   -0.11%     
==========================================
  Files         128      131       +3     
  Lines        7730     8224     +494     
==========================================
+ Hits         6670     7088     +418     
- Misses       1060     1136      +76
Impacted Files Coverage 螖
cltk/tokenize/word.py 92.62% <100%> (+2.54%) 猬嗭笍
cltk/stop/arabic/stopword_filter.py 100% <100%> (+20%) 猬嗭笍
cltk/corpus/arabic/utils/pyarabic/stack.py 100% <100%> (酶)
cltk/corpus/arabic/utils/pyarabic/araby.py 79.13% <79.13%> (酶)
cltk/tests/test_arabic_utils.py 99% <99%> (酶)

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 68b737f...93be9df. Read the comment docs.

codecov-io commented Sep 18, 2017

Codecov Report

Merging #581 into master will decrease coverage by 0.1%.
The diff coverage is 83.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #581      +/-   ##
==========================================
- Coverage   86.28%   86.18%   -0.11%     
==========================================
  Files         128      131       +3     
  Lines        7730     8224     +494     
==========================================
+ Hits         6670     7088     +418     
- Misses       1060     1136      +76
Impacted Files Coverage 螖
cltk/tokenize/word.py 92.62% <100%> (+2.54%) 猬嗭笍
cltk/stop/arabic/stopword_filter.py 100% <100%> (+20%) 猬嗭笍
cltk/corpus/arabic/utils/pyarabic/stack.py 100% <100%> (酶)
cltk/corpus/arabic/utils/pyarabic/araby.py 79.13% <79.13%> (酶)
cltk/tests/test_arabic_utils.py 99% <99%> (酶)

Continue to review full report at Codecov.

Legend - Click here to learn more
螖 = absolute <relative> (impact), 酶 = not affected, ? = missing data
Powered by Codecov. Last update 68b737f...93be9df. Read the comment docs.

@kylepjohnson kylepjohnson self-requested a review Sep 18, 2017

@kylepjohnson

This comment has been minimized.

Show comment
Hide comment
@kylepjohnson

kylepjohnson Sep 18, 2017

Member

@LBenzahia Thanks!

I have pulled your code locally and have run into this error (from the docs):

In [1]: from cltk.support.arabic.pyarabic import araby as araby
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-3c525c55b7b8> in <module>()
----> 1 from cltk.support.arabic.pyarabic import araby as araby

ModuleNotFoundError: No module named 'cltk.support'

Are the docs wrong or is it the code?

Member

kylepjohnson commented Sep 18, 2017

@LBenzahia Thanks!

I have pulled your code locally and have run into this error (from the docs):

In [1]: from cltk.support.arabic.pyarabic import araby as araby
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-3c525c55b7b8> in <module>()
----> 1 from cltk.support.arabic.pyarabic import araby as araby

ModuleNotFoundError: No module named 'cltk.support'

Are the docs wrong or is it the code?

@LBenzahia

This comment has been minimized.

Show comment
Hide comment
@LBenzahia

LBenzahia Sep 18, 2017

Member

You welcome,
yes,i forget to make changes in the docs
use from cltk.corpus.arabic.utils.pyarabic import araby as araby instead of from cltk.support.arabic.pyarabic import araby as araby

Member

LBenzahia commented Sep 18, 2017

You welcome,
yes,i forget to make changes in the docs
use from cltk.corpus.arabic.utils.pyarabic import araby as araby instead of from cltk.support.arabic.pyarabic import araby as araby

@LBenzahia

This comment has been minimized.

Show comment
Hide comment
@LBenzahia

LBenzahia Sep 18, 2017

Member

@kylepjohnson i fixed it 馃憤
i used --amend option repush your changes again use force update

Member

LBenzahia commented Sep 18, 2017

@kylepjohnson i fixed it 馃憤
i used --amend option repush your changes again use force update

@kylepjohnson

This comment has been minimized.

Show comment
Hide comment
@kylepjohnson

kylepjohnson Sep 18, 2017

Member

Great, we're getting closer!

I have done the following from the docs, and I see two differences: .is_sukun and shadda both return False for me, but True in the docs.

Do you know if these are OK?

In [1]: from cltk.corpus.arabic.utils.pyarabic import araby as araby

In [2]: char = '賿 '

In [3]: araby.is_sukun(char)
Out[3]: False

In [4]: char = ''

In [5]: araby.is_shadda(char)
Out[5]: False

In [6]: text = u"丕賱賿毓賻乇賻亘賽賷賾丞購"

In [7]: araby.strip_harakat(text)
Out[7]: '丕賱毓乇亘賷賾丞'

In [8]: text = u"丕賱賿毓賻乇賻亘賽賷賾丞購"

In [9]: araby.strip_lastharaka(text)
Out[9]: '丕賱賿毓賻乇賻亘賽賷賾丞'

In [10]: text = u"丕賱賿毓賻乇賻亘賽賷賾丞購"

In [12]: araby.strip_tashkeel(text)
Out[12]: '丕賱毓乇亘賷丞'

I'm looking at a few more things now, will get back to you as soon as I can.

Member

kylepjohnson commented Sep 18, 2017

Great, we're getting closer!

I have done the following from the docs, and I see two differences: .is_sukun and shadda both return False for me, but True in the docs.

Do you know if these are OK?

In [1]: from cltk.corpus.arabic.utils.pyarabic import araby as araby

In [2]: char = '賿 '

In [3]: araby.is_sukun(char)
Out[3]: False

In [4]: char = ''

In [5]: araby.is_shadda(char)
Out[5]: False

In [6]: text = u"丕賱賿毓賻乇賻亘賽賷賾丞購"

In [7]: araby.strip_harakat(text)
Out[7]: '丕賱毓乇亘賷賾丞'

In [8]: text = u"丕賱賿毓賻乇賻亘賽賷賾丞購"

In [9]: araby.strip_lastharaka(text)
Out[9]: '丕賱賿毓賻乇賻亘賽賷賾丞'

In [10]: text = u"丕賱賿毓賻乇賻亘賽賷賾丞購"

In [12]: araby.strip_tashkeel(text)
Out[12]: '丕賱毓乇亘賷丞'

I'm looking at a few more things now, will get back to you as soon as I can.

@LBenzahia

This comment has been minimized.

Show comment
Hide comment
@LBenzahia

LBenzahia Sep 18, 2017

Member

@kylepjohnson , No, just delete white spaces, Should be like this, This is sukun'賿', and this is '賾', i fixed it.

Member

LBenzahia commented Sep 18, 2017

@kylepjohnson , No, just delete white spaces, Should be like this, This is sukun'賿', and this is '賾', i fixed it.

@kylepjohnson kylepjohnson merged commit 101faec into cltk:master Sep 22, 2017

1 check was pending

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@kylepjohnson

This comment has been minimized.

Show comment
Hide comment
@kylepjohnson

kylepjohnson Sep 22, 2017

Member

@LBenzahia I have merged your code and made a new release, so it should appear in about 20 mins on PyPI. Thank you!

Member

kylepjohnson commented Sep 22, 2017

@LBenzahia I have merged your code and made a new release, so it should appear in about 20 mins on PyPI. Thank you!

@LBenzahia

This comment has been minimized.

Show comment
Hide comment
@LBenzahia

LBenzahia Sep 22, 2017

Member

@kylepjohnson , you welcome 馃槃

Member

LBenzahia commented Sep 22, 2017

@kylepjohnson , you welcome 馃槃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment