Skip to content

Commit

Permalink
tidying up some of the code and adding comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
Nathan Davies committed Feb 1, 2017
1 parent 3701b77 commit 663a3de
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 8 deletions.
3 changes: 0 additions & 3 deletions .gitignore
Expand Up @@ -85,6 +85,3 @@ ENV/

# Spyder project settings
.spyderproject

# Go files used for local testing
*.go
6 changes: 1 addition & 5 deletions WikiExtractor.py
Expand Up @@ -525,11 +525,7 @@ def extract(self, out):
#
text = self.transform(text)
text = self.wiki2text(text)
# the text is still present
# NOTE: Something in the combination of clean and compact is dropping the tables
# If we remove these calls thant the data is there but the odd wikiml formatting type characters remain
text = self.clean(text)
text = compact(text)
text = compact(self.clean(text))
footer = "\n</doc>\n"
if sum(len(line) for line in text) < Extractor.min_text_length:
return
Expand Down

0 comments on commit 663a3de

Please sign in to comment.