# Sentence Boundary Detection

> This module contains a function called `sentence_splitter` which takes in a string of text and splits the text at every full stop and prints out each sentence on a new line.

In [1]:
#| default_exp sentence_boundary_detection

In [2]:
#| hide
from nbdev.showdoc import *
from fastcore.test import *

In [3]:
#| export
import re

In [4]:
#| export
def sentence_splitter(text:str # The input text to be split
                     ):
    '''
    This sentence splitter takes in a string of text 's' and 
    returns a split version where every sentence is printed on a new line
    '''
    abbr = ['[MDJ]r', 'Hon', 'Esq', 'Prof', 'Mrs','Ms']
    pattern = re.compile(r"(?<!{}.)(?<=[.!?]) ".format('.)(?<!'.join(abbr)))
    sentences = re.split(pattern, text)
    return '\n'.join(sentences)

In [5]:
show_doc(sentence_splitter)

---

### sentence_splitter

>      sentence_splitter (text:str)

This sentence splitter takes in a string of text 's' and 
returns a split version where every sentence is printed on a new line

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| text | str | The input text to be split |

As an example let's say we have the sentence:
> The difference for them can be significant. A 10-year loan of €50,000 from you at zero interest would involve monthly payments of €417, or €5,000 a year. With An Post’s market-leading rate, the friend would be paying €525.45 a month, or over €6,300 a year and over €13,000 interest over the life of the loan.

Using sentence splitter we get:

In [6]:
text = '''The difference for them can be significant. A 10-year loan of €50,000 from you at zero interest would involve monthly payments of €417, or €5,000 a year. With An Post’s market-leading rate, the friend would be paying €525.45 a month, or over €6,300 a year and over €13,000 interest over the life of the loan.'''
sentence_splitter(text)

'The difference for them can be significant.\nA 10-year loan of €50,000 from you at zero interest would involve monthly payments of €417, or €5,000 a year.\nWith An Post’s market-leading rate, the friend would be paying €525.45 a month, or over €6,300 a year and over €13,000 interest over the life of the loan.'

In [7]:
#| hide
text = '''Mr. Hudson and Ms. Johnson want to buy a house costing €20.4M. They are having trouble with money so they will ask Henry Jr. who is not Dr. Hudson whose wife is Mrs. Hudson. I think they are using Henry's money.'''
result = '''Mr. Hudson and Ms. Johnson want to buy a house costing €20.4M.
They are having trouble with money so they will ask Henry Jr. who is not Dr. Hudson whose wife is Mrs. Hudson.
I think they are using Henry's money.'''
test_eq(sentence_splitter(text), result)

In [8]:
#| hide
import nbdev; nbdev.nbdev_export()