SpacySentencizer

SpacySentencizer takes a DocumentArray, and for each Document:

Checks if it has .text attribute
If so, sentencize it
Store each sentence as a chunk of the Document

Why SpacySentencizer over "Vanilla" Sentencizer?

In English a . (full stop/period) comes at the end of every sentence.
However, . is also used in:
- URLs: docs.jina.ai
- Decimals: 3.14
- Initials: J.R.R Tolkien, H. Sapiens
- Abbreviations: Turn to p. 13
This means that Vanilla Sentencizer tries to split things that aren't sentences

SpacySentencizer should also work for other languages, though I haven't yet tested that

Usage

via Docker image (recommended)

from jina import Flow
	
f = Flow().add(uses='jinahub+docker://SpacySentencizer')

via source code

from jina import Flow
	
f = Flow().add(uses='jinahub://SpacySentencizer')

To override __init__ args & kwargs, use .add(..., uses_with: {'key': 'value'})
To override class metas, use .add(..., uses_metas: {'key': 'value})

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.yml		config.yml
executor.py		executor.py
manifest.yml		manifest.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests

tests

.gitignore

.gitignore

Dockerfile

Dockerfile

README.md

README.md

config.yml

config.yml

executor.py

executor.py

manifest.yml

manifest.yml

requirements.txt

requirements.txt

Repository files navigation

SpacySentencizer

Why SpacySentencizer over "Vanilla" Sentencizer?

Usage

via Docker image (recommended)

via source code

About

Releases

Packages

Contributors 2

Languages

alexcg1/executor-spacy-sentencizer

Folders and files

Latest commit

History

Repository files navigation

SpacySentencizer

Why SpacySentencizer over "Vanilla" Sentencizer?

Usage

via Docker image (recommended)

via source code

About

Resources

Stars

Watchers

Forks

Languages