First version of the tokenizer #1

arsduo · 2015-09-27T17:13:38Z

This PR implements the first version of the tokenizer! It can take in texts and split them apart into words, each containing punctuation and sentence flow information.

While the tokenizer has several important gaps (as described in the readme) but should be sufficient to integrate into Markovian's text parsing without any loss of functionality. The Markov text can then start incorporating punctuation and sentence start/end data at the same time as the gaps in Tokeneyes get fixed.

First version of the tokenizer

arsduo added 16 commits September 24, 2015 18:35

More test setup

63c59c9

Add Word class

08d9dca

Add Word#to_s

7e85f86

Add WordBoundarySurveyor

39ca4e0

Properly handle pairs of possible boundaries in WordBoundarySurveyor

9598b7a

Initialize instance vars to avoid warnings

24407d4

Add WordReader w/ mostly-passing spec

13692cd

Refactor WordBoundarySurveyor into WoundBuilder

89b3819

Get WordReader specs passing

b4cf28f

Add Word#length

292786d

WordBuilder#sentence_ended? should return a bool

17fd807

Minor WordReader renames/refactors

457d2e0

For now, treat # and @ as word characters

c511643

Create Tokenizer to read in texts

3523466

Add readme/changelog

7c7ee05

Fix jruby failures

5e815e9

arsduo force-pushed the first-go branch 2 times, most recently from 7425d0a to 187fd92 Compare September 27, 2015 18:00

Run jruby tests in jruby-9000

278d4ed

arsduo force-pushed the first-go branch from 187fd92 to 278d4ed Compare September 27, 2015 18:02

arsduo added a commit that referenced this pull request Sep 27, 2015

Merge pull request #1 from arsduo/first-go

36e5bf8

First version of the tokenizer

arsduo merged commit 36e5bf8 into master Sep 27, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First version of the tokenizer #1

First version of the tokenizer #1

arsduo commented Sep 27, 2015

First version of the tokenizer #1

First version of the tokenizer #1

Conversation

arsduo commented Sep 27, 2015