Skip to content

Commit

Permalink
Merge pull request #2 from kylepjohnson/master
Browse files Browse the repository at this point in the history
update
  • Loading branch information
marpozzi committed Sep 3, 2015
2 parents b44cfc7 + 85cff3c commit 350a7c4
Show file tree
Hide file tree
Showing 6 changed files with 28 additions and 25 deletions.
13 changes: 8 additions & 5 deletions cltk/corpus/greek/tlgu.py
Expand Up @@ -39,8 +39,9 @@

class TLGU(object):
"""Check, install, and call TLGU."""
def __init__(self):
def __init__(self, testing=False):
"""Check whether tlgu is installed, if not, import and install."""
self.testing = testing
self._check_import_source()
self._check_install()

Expand All @@ -57,8 +58,7 @@ def _check_import_source():
logger.error('Failed to import TLGU: %s', exc)
raise

@staticmethod
def _check_install():
def _check_install(self):
"""Check if tlgu installed, if not install it."""
try:
subprocess.check_output(['which', 'tlgu'])
Expand All @@ -70,8 +70,11 @@ def _check_install():
else:
tlgu_path_rel = '~/cltk_data/greek/software/greek_software_tlgu'
tlgu_path = os.path.expanduser(tlgu_path_rel)
print('Do you want to install TLGU? To continue, press Return. To exit, Control-C.')
input()
if not self.testing:
print('Do you want to install TLGU? To continue, press Return. To exit, Control-C.')
input()
else:
print('Automated or test build, skipping keyboard input confirmation for installation of TLGU.')
try:
p_out = subprocess.call('cd {0} && make install'.format(tlgu_path), shell=True)
if p_out == 0:
Expand Down
16 changes: 8 additions & 8 deletions cltk/tests/test_corpus.py
Expand Up @@ -31,6 +31,11 @@ def test_greek_betacode_to_unicode(self):
target_unicode = 'ὅπως οὖν μὴ ταὐτὸ '
self.assertEqual(unicode, target_unicode)

def test_tlgu_init(self):
"""Test constructors of TLGU module for check, import, and install."""
tlgu = TLGU(testing=True)
self.assertTrue(tlgu)

def test_import_greek_software_tlgu(self):
"""Test cloning TLGU."""
corpus_importer = CorpusImporter('greek')
Expand All @@ -40,11 +45,6 @@ def test_import_greek_software_tlgu(self):
file_exists = os.path.isfile(file)
self.assertTrue(file_exists)

def test_tlgu_init(self):
"""Test constructors of TLGU module for check, import, and install."""
tlgu = TLGU()
self.assertTrue(tlgu)

def test_tlgu_convert(self):
"""Test TLGU convert. This reads the file
``tlgu_test_text_beta_code.txt``, which mimics a TLG file, and
Expand All @@ -53,7 +53,7 @@ def test_tlgu_convert(self):
"""
in_test = os.path.abspath('cltk/tests/tlgu_test_text_beta_code.txt')
out_test = os.path.expanduser('~/cltk_data/tlgu_test_text_unicode.txt')
tlgu = TLGU()
tlgu = TLGU(testing=True)
tlgu.convert(in_test, out_test)
with open(out_test) as out_file:
new_text = out_file.read()
Expand All @@ -65,14 +65,14 @@ def test_tlgu_convert(self):

def test_tlgu_convert_fail(self):
"""Test the TLGU to fail when importing a corpus that doesn't exist."""
tlgu = TLGU()
tlgu = TLGU(testing=True)
with self.assertRaises(AssertionError):
tlgu.convert('~/Downloads/corpora/TLG_E/bad_path.txt',
'~/Documents/thucydides.txt')

def test_tlgu_convert_corpus_fail(self):
"""Test the TLGU to fail when trying to convert an unsupported corpus."""
tlgu = TLGU()
tlgu = TLGU(testing=True)
with self.assertRaises(AssertionError):
tlgu.convert_corpus(corpus='bad_corpus')

Expand Down
8 changes: 4 additions & 4 deletions docs/citation.rst
@@ -1,16 +1,16 @@
Citation
********

Each major release of the CLTK is given a `DOI <http://en.wikipedia.org/wiki/Digital_object_identifier>`_, a type of unique identity for digital documents. This DOI ought to be included in your citation, as it will allow your readers to reproduce your scholarship should the CLTK's API or codebase change. To find the CLTK's current DOI, observe the blue ``DOI`` button in the `repository's home (``README.md``) <https://github.com/kylepjohnson/cltk>`. To the end of your bibliographic entry, append `DOI ` plus the current identifier.
Each major release of the CLTK is given a `DOI <http://en.wikipedia.org/wiki/Digital_object_identifier>`_, a type of unique identity for digital documents. This DOI ought to be included in your citation, as it will allow your readers to reproduce your scholarship should the CLTK's API or codebase change. To find the CLTK's current DOI, observe the blue ``DOI`` button in the repository's home (`README.md <https://github.com/kylepjohnson/cltk>`_). To the end of your bibliographic entry, append `DOI ` plus the current identifier.

Therefore, please cite the CLTK as follows:
Therefore, please cite the CLTK similar to:

.. code-block:: none
Kyle P. Johnson et al.. (2014-2015). CLTK: The Classical Language Toolkit. DOI 10.5281/zenodo.15442
A style-neutral BibTex entry would look like this:
A style-neutral BibTex entry looks like:

.. code-block:: none
Expand All @@ -22,4 +22,4 @@ A style-neutral BibTex entry would look like this:
year = {2014--2015}
}
Optionally you may add version/release number, e.g., ``v0.0.1.7``, to the entry.
Optionally you may add version/release number, e.g., ``v0.0.1.7``, to the entry.
2 changes: 1 addition & 1 deletion docs/greek.rst
Expand Up @@ -22,7 +22,7 @@ Converting TLG texts with TLGU
======================================


The `TLGU <http://tlgu.carmen.gr/>`_ is excellent C language software for converting the TLG and PHI corpora into human-readable Unicode. The CLTK has an automated downloader and installer, as well as a wrapper which facilitates its use. When ``TLGU()`` is instantiated, it checks the local OS for a functioning version of the software. If not found it is installed.
The `TLGU <http://tlgu.carmen.gr/>`_ is excellent C language software for converting the TLG and PHI corpora into human-readable Unicode. The CLTK has an automated downloader and installer, as well as a wrapper which facilitates its use. When ``TLGU()`` is instantiated, it checks the local OS for a functioning version of the software. If not found it is, following the user's confirmation, downloaded and installed.

Most users will want to do a bulk conversion of the entirety of a corpus without any text markup (such as chapter or line numbers). Note that you must `import a local corpus <http://docs.cltk.org/en/latest/importing_corpora.html#importing-a-corpus>`_ before converting it.

Expand Down
2 changes: 1 addition & 1 deletion docs/importing_corpora.rst
@@ -1,6 +1,6 @@
Importing Corpora
*****************
The CLTK works solely out of the local directory ``cltk_data``, which is created at a user's root directory upon the first initialization of the ``CorpusImporter()`` class. Within this is ``originals``, in which copies of downloaded or copied files are left, and also a directory for every language for which a corpus has been downloaded. Also within ``cltk_data`` is ``cltk.log``, which contains all of the cltk's logging.
The CLTK stores all data in the local directory ``cltk_data``, which is created at a user's root directory upon first initialization of the ``CorpusImporter()`` class. Within this are an ``originals`` directory, in which untouched copies of downloaded or copied files are preserved, and a directory for every language for which a corpus has been downloaded. It also contains ``cltk.log`` for all CLTK logging.

Listing corpora
===============
Expand Down
12 changes: 6 additions & 6 deletions docs/installation.rst
Expand Up @@ -8,7 +8,7 @@ With Pip

.. note::

The CLTK is only compatible with Python 3 on a POSIX-compatible operating system (Mac OS X, Linux, BSD, etc.).
The CLTK is only compatible with Python 3.

First, you'll need a working installation of `Python 3.4 <https://www.python.org/downloads/>`_, which now includes Pip. Create a virtual environment and activate it as follows:

Expand All @@ -27,21 +27,21 @@ Then, install the CLTK, which automatically includes all dependencies.
Second, you will need an installation of `Git <http://git-scm.com/downloads>`_, which the CLTK uses to download and update corpora, if you want to automatically import any of the `CLTK's corpora <https://github.com/cltk/>`_. Installation of Git will depend on your operating system.


.. note::
.. tip::

For a user–friendly interactive shell environment, consider trying IPython, which may be invoked with ``ipython`` or ``ipython notebook`` from the command line. You may install it with ``pip install ipython``.
For a user–friendly interactive shell environment, try IPython, which may be invoked with ``ipython`` from the command line. You may install it with ``pip install ipython``.


From source
===========
The `CLTK source is available at GitHub <https://github.com/kylepjohnson/cltk>`_. To build from source, clone the repository, make a virtual environment (as above), and finally run:
The `CLTK source is available at GitHub <https://github.com/kylepjohnson/cltk>`_. To build from source, clone the repository, make a virtual environment (as above), and run:

.. code-block:: shell
$ python setup.py install
If you have modified the CLTK source, rebuild the project with this same command. If you make any changes, it is a good idea to run the test suite to ensure you did not introduce any breakage. Test with:
If you have modified the CLTK source, rebuild the project with this same command. If you make any changes, it is a good idea to run the test suite to ensure you did not introduce any breakage. Test with ``nose`` (obtained with ``pip install nose``):

.. code-block:: shell
$ python cltk/tests/test_cltk.py
$ nosetests

0 comments on commit 350a7c4

Please sign in to comment.