## SENTENCE DETECTION
Sentence detection is the process of locating where sentences start and end in a given text. This allows you to you divide a text into linguistically meaningful units. You’ll use these units when you’re processing your text to perform tasks such as part-of-speech (POS) tagging and named-entity recognition. https://en.wikipedia.org/wiki/Part-of-speech_tagging.
In spacy, the .sents property is used to extract sentences from the doc object.


In [7]:
import spacy

nlp = spacy.load("en_core_web_sm")

In [8]:
about_txt = (
    "spaCy is an open-source software library for advanced Natural Language Processing (NLP) in Python"
    "It's designed to be fast, efficient, and highly accessible, making it a popular choice for building NLP applications"
    "spaCy excels in tasks like part-of-speech tagging, named entity recognition, dependency parsing, and more"
    "It supports multiple languages and offers pre-trained models for various NLP tasks"
    "optimized for both performance and accuracy.Its design is geared towards practical"
    "real-world applications and it's widely used in industry and academia for building NLP pipelines and applications."
)

In [14]:
about_txt = nlp(about_txt)
# print(about_txt.ents)
sentences = list(about_txt.sents)
len(sentences)

2

In [17]:
for sent in sentences:
    print(f"{sent[:5]}...")

spaCy is an open-...
Its design is geared towards...


In [19]:
# customize sentence detection behavior by using custom delimiters
ellipsis_txt = (
    "Gus, can you, ... never mind, I forgot"
    " what I was saying. So, do you think"
    " we should ..."
)

from spacy.language import Language


@Language.component("set_custom_boundaries")
def set_custom_boundaries(doc):
    for token in doc[:-1]:
        if token.text == "...":
            doc[token.i + 1].is_sent_start = True
    return doc

In [20]:
custom_nlp = spacy.load("en_core_web_sm")
custom_nlp.add_pipe("set_custom_boundaries", before="parser")
custom_ellipsis_doc = custom_nlp(ellipsis_txt)
custom_ellipsis_sentences = list(custom_ellipsis_doc.sents)
for sentence in custom_ellipsis_sentences:
    print(sentence)

Gus, can you, ...
never mind, I forgot what I was saying.
So, do you think we should ...
