Skip to content
A sentence segmenter that actually works!
Branch: master
Clone or download
Latest commit 6dc53a5 Mar 26, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
data added data generation script and example of deep_segment function Nov 16, 2018
deepsegment moved to new functions, added Nov 18, 2018
LICENSE Initial commit Nov 15, 2018 Update Mar 26, 2019 added Nov 17, 2018 updated documentation Nov 18, 2018


A sentence segmenter that actually works! Now for English, French and Italian.

The Demo is available at

The code and pre-trained models for "DeepCorrection 1: Sentence Segmentation of unpunctuated text." as explained in the medium posts at and

The pre-trained models are available at



# if you are using gpu for prediction, please see for restricting memory usage

from deepsegment import DeepSegment
# the config file can be found at in the pre-trained model zip. Change the model paths in the config file before loading. 
# Since the complete glove embeddings are not needed for predictions, "glove_path" can be left empty in config file
segmenter = DeepSegment('path_to_config')
segmenter.segment('I am Batman i live in gotham')
['I am Batman', 'i live in gotham']

To Do:

Add a sliding window for processing very long texts.

Update the seqtag model to work with tf 2.0 (Change to may be).

Update to add Indic languages.


Of all the sentence segmentation models I evaluated, without doubt deepsegment is the best in terms of accuracy in real word (bad punctuation, wrong punctuation)

I trained flair's ner model on the same data and flair has better results but, it's miniscule (0.3% absolute accuracy increase).

Since I want to keep using tf and keras for now, and since flair embeddings are not available for all the languages I want deepsegment to work on, I am going to keep using seqtag for this project.

You can’t perform that action at this time.