Skip to content

issues Search Results · repo:tsproisl/SoMaJo language:Python

Filter by

29 results
 (50 ms)

29 results

intsproisl/SoMaJo (press backspace or delete to remove)

Is there any way to just split the text into the sentences like the from nltk.tokenize import sent_tokenize function?
  • sambaPython24
  • 3
  • Opened 
    on May 12
  • #33

Hi, I noticed that the English tokenizer splits the term 3G into 3 and G , while it should not. Also, I wonder whether time-varying and location-dependent should have been split by the hyphens, as it ...
  • g3rfx
  • 2
  • Opened 
    on Aug 4, 2024
  • #32

Hi Thomas, I am hoping to use SoMaJo s sentence_splitter in rust, and I am wondering if it would be possible to formulate it in terms of SRX rules? I would happily contribute to making that happen, but ...
  • fewzee
  • 1
  • Opened 
    on May 29, 2024
  • #31

same as here: #28 But with: [Konfiguration](https://a-link.com)\\
  • PhilipMay
  • 2
  • Opened 
    on Feb 13, 2024
  • #29

Links in this format: *[Neubau](https://www.some-link.com)* have an issue. Code: text = *[Neubau](https://www.some-link.com)* sentences = somajo.tokenize_text([text]) for sentence in sentences: ...
  • PhilipMay
  • 1
  • Opened 
    on Feb 13, 2024
  • #28

Links in this format: MD link https://heise.de example. have an issue. Code: text = MD link https://heise.de example. sentences = somajo.tokenize_text([text]) for sentence in sentences: for token ...
  • PhilipMay
  • 3
  • Opened 
    on Feb 6, 2024
  • #27

Hi. I have this text: This is a Markdown link: [https://one_link.com](https://other_link.com). And split it with SoMaJo: from somajo import SoMaJo tokenizer = SoMaJo( de_CMC ) paragraphs = [ This is ...
  • PhilipMay
  • 6
  • Opened 
    on Sep 18, 2023
  • #26

- full stops at the end of dates are not correctly split from the date - steps to recreate the issue using somajo-2.2.4: from somajo import SoMaJo tokenizer = SoMaJo( de_CMC , split_camel_case=True) ...
  • ausgerechnet
  • 1
  • Opened 
    on Aug 14, 2023
  • #25

From user perspective it would be nice to know all possible values for token_class. Here the doc only mentions some: https://github.com/tsproisl/SoMaJo/blob/87e9cb00ff4f122e714643f55d9dc9b2bb5ab723/somajo/token.py#L20 ...
  • PhilipMay
  • 6
  • Opened 
    on Jul 5, 2023
  • #24

Hi, looks like you created a great package! I have built a tool that utilizes SoMaJo, amongst others, and would like to make it available on conda-forge in addition to pypi. Since SoMaJo is not yet available ...
  • iulusoy
  • 2
  • Opened 
    on Nov 16, 2022
  • #22
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue search results · GitHub