issues Search Results · repo:tsproisl/SoMaJo language:Python
Filter by
29 results
(50 ms)29 results
intsproisl/SoMaJo (press backspace or delete to remove)Is there any way to just split the text into the sentences like the from nltk.tokenize import sent_tokenize function?
sambaPython24
- 3
- Opened on May 12
- #33
Hi,
I noticed that the English tokenizer splits the term 3G into 3 and G , while it should not. Also, I wonder whether
time-varying and location-dependent should have been split by the hyphens, as it ...
g3rfx
- 2
- Opened on Aug 4, 2024
- #32
Hi Thomas, I am hoping to use SoMaJo s sentence_splitter in rust, and I am wondering if it would be possible to
formulate it in terms of SRX rules? I would happily contribute to making that happen, but ...
fewzee
- 1
- Opened on May 29, 2024
- #31
same as here: #28
But with: [Konfiguration](https://a-link.com)\\
PhilipMay
- 2
- Opened on Feb 13, 2024
- #29
Links in this format: *[Neubau](https://www.some-link.com)* have an issue.
Code:
text = *[Neubau](https://www.some-link.com)*
sentences = somajo.tokenize_text([text])
for sentence in sentences:
...
PhilipMay
- 1
- Opened on Feb 13, 2024
- #28
Links in this format: MD link https://heise.de example. have an issue.
Code:
text = MD link https://heise.de example.
sentences = somajo.tokenize_text([text])
for sentence in sentences:
for token ...
PhilipMay
- 3
- Opened on Feb 6, 2024
- #27
Hi. I have this text: This is a Markdown link: [https://one_link.com](https://other_link.com).
And split it with SoMaJo:
from somajo import SoMaJo
tokenizer = SoMaJo( de_CMC )
paragraphs = [ This is ...
PhilipMay
- 6
- Opened on Sep 18, 2023
- #26
- full stops at the end of dates are not correctly split from the date
- steps to recreate the issue using somajo-2.2.4:
from somajo import SoMaJo
tokenizer = SoMaJo( de_CMC , split_camel_case=True) ...
ausgerechnet
- 1
- Opened on Aug 14, 2023
- #25
From user perspective it would be nice to know all possible values for token_class.
Here the doc only mentions some:
https://github.com/tsproisl/SoMaJo/blob/87e9cb00ff4f122e714643f55d9dc9b2bb5ab723/somajo/token.py#L20 ...
PhilipMay
- 6
- Opened on Jul 5, 2023
- #24
Hi, looks like you created a great package! I have built a tool that utilizes SoMaJo, amongst others, and would like to
make it available on conda-forge in addition to pypi. Since SoMaJo is not yet available ...
iulusoy
- 2
- Opened on Nov 16, 2022
- #22

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.