Skip to content

Commit

Permalink
fix regex to filter out *leading and trailing* punctation marks from …
Browse files Browse the repository at this point in the history
…ocr text
  • Loading branch information
keshavmagge committed Oct 31, 2013
1 parent 4266c99 commit 8431f85
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion core/ocr_extractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from xml.sax.handler import ContentHandler, feature_namespaces
from xml.sax import make_parser

trailing_punctuation = re.compile('''[^a-zA-Z0-9]+$''')
trailing_punctuation = re.compile('''^[^a-zA-Z0-9]+|[^a-zA-Z0-9]+$''')

class OCRHandler(ContentHandler):

Expand Down

0 comments on commit 8431f85

Please sign in to comment.