Skip to content

Commit

Permalink
Changes according to @PonteIneptique
Browse files Browse the repository at this point in the history
Updated the init documentation on the build class to include the new parameters (tar, txt, cites)

Updated the CHANGES.txt file to reflect the addition of the plain_text function

Bumped the version number to 1.1.0

Updated the setup.py to show this version
  • Loading branch information
sonofmun committed Jun 6, 2017
1 parent b03adf7 commit 142bb06
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 1 deletion.
7 changes: 7 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## 1.1.0 - 2017-06-06

Added the --txt parameter to the build class to allow all passing XML files to be converted to plain text and included
in the build.
In addition, added the --cites parameter (also to the build class). When this parameter is true, the citation unit of
each textual node in the plain text file will be introduced with a string representing the citation unit, e.g., 1.1.1

## 1.0.3 - 2017-05-12

Made the creation of the tar archive on build an option with the --tar flag.
Expand Down
27 changes: 27 additions & 0 deletions HookTest/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,28 @@ class Build(object):
:type path: str
:param dest: the folder in which to save the cleaned corpus
:type dest: str
:param tar: whether to zip the contents of the build into an extra tar.gz file
:type tar: bool
:param txt: whether to create plain text files for all of the passing XML text files
:type txt: bool
:param cites: whether to include the citation string for each of the lowest level citation elements
:type cites: bool
"""

def __init__(self, path, dest, tar=False, txt=False, cites=False):
"""
:param path: the path to the directory that contains the corpus's data directory
:type path: str
:param dest: the folder in which to save the cleaned corpus
:type dest: str
:param tar: whether to zip the contents of the build into an extra tar.gz file
:type tar: bool
:param txt: whether to create plain text files for all of the passing XML text files
:type txt: bool
:param cites: whether to include the citation string for each of the lowest level citation elements
:type cites: bool
"""

if path.endswith('/'):
self.path = path
Expand Down Expand Up @@ -72,6 +91,14 @@ def remove_failing(self, files, passing):
def plain_text(self):
""" Extracts the text from the citation nodes of all passing texts in the repository and saves them
in the ./text directory under their text identifier (e.g., tlg001.tlg001.1st1K-grc1.txt)
Each of the lowest-level citation units in these files is separated by \n\n.
If self.cites == True, then each of these citation units will be introduced with #CITATION_STRING#, e.g.:
\n
#1.1.1#\n
Lorum ipsum...
\n
#1.1.2#\n
Lorum ipsum...
"""
os.mkdir('{}text'.format(self.dest))
passing_texts = [x for x in glob('{}data/*/*/*.xml'.format(self.dest)) if '__cts__' not in x]
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

setup(
name='HookTest',
version="1.0.3",
version="1.1.0",
description='Hook Test Script for GitHub/CapiTainS repositories',
long_description=long_description,
url='http://github.com/Capitains/HookTest',
Expand Down

0 comments on commit 142bb06

Please sign in to comment.