Skip to content
Permalink
Browse files

Add documentation for Arabic Swadesh list (#960)

* added swadesh list from wikipedia, changed all words to Coptic script,added some missing words

* added tests

* updated tests and list to conform to discussed format

* one change from test

* Mk small adjustment to coptic Swadesh test

* Added documentation

* added doc
  • Loading branch information...
nolanee authored and kylepjohnson committed Nov 16, 2019
1 parent 18c6ebf commit ae0c5bab1a46e27043a741fbbdea29daf400dc0f
Showing with 15 additions and 0 deletions.
  1. +15 −0 docs/arabic.rst
@@ -166,6 +166,21 @@ To use the CLTK's built-in stopwords list:
In [3]: ar_stop_filter(text)
Out[3]: ['سئل', 'الكتاب', 'الخط', '،', 'يستحق', 'يوصف', 'بالجودة', '؟']
Swadesh
=======
The corpus module has a class for generating a Swadesh list for Arabic.

.. code-block:: python
In[1]: from cltk.corpus.swadesh import Swadesh
In[2]: swadesh = Swadesh('ar')
In[3]: swadesh.words()[:10]
Out[3]: ['أنا' ,'أنت‎, أنتِ‎', 'هو‎,هي' ,'نحن' ,'أنتم‎,‎ أنتن‎,‎ أنتما‎', 'هم‎,‎ هن‎,‎ هما' ,'هذا' ,'ذلك' ,'هنا‎']
Word Tokenization
=================
.. code-block:: python

0 comments on commit ae0c5ba

Please sign in to comment.
You can’t perform that action at this time.