The Vision and goals of the Open Natural Language Processing in Hebrew Project
Branch: master
Clone or download
Latest commit 60bf6c4 Oct 12, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
Meeting_summaries Update dev_meeting_1_2017_12_25.rst Dec 27, 2017
common_voice Update and rename to data_collection_m… May 22, 2018
LICENSE Initial commit Oct 3, 2017 Update Oct 12, 2018
project_ideas.rst Update project_ideas.rst Nov 30, 2017

The Open Natural Language Processing in Hebrew Initiative

The Open Natural Language Processing in Hebrew (NLPH) initiative is a joint effort by members of DataHack and The Public Knowledge Workshop to promote open tools and resources for Natural Language Processing in Hebrew.


Our vision is to bring Natural Language Processing capabilities in Hebrew to a level on par with international industry standards, keeping up with state-of-the-art techniques by providing open source implementations to new algorithms and tools, and making these capabilities publicly available for both public and commercial use.


  1. Creating, maintaining, adapting and spreading resources that enable high-quality, production-ready, open-licensed Natural Language Processing in Hebrew.
  2. Enable, foster and catalyze cooperation between stakeholders in academia, private and the public sectors, in order to promote better Open Source Hebrew NLP solutions, and share existing knowledge and tools.

Who's taking part?

Active projects

What's our current focus?

  • Forming a group of volunteers to start work on the core projects during the developer meetings of the Public Knowledge Workshop and in other frameworks - including events like hackathons and as part of educational and research projects.
  • Encouraging the open-licensing of high quality, open-licensed, tagged and labelled datasets from various domains (social media, articles, research papers, etc.) and for various tasks (part-of-speech tagging, text classification, sentiment analysis, named entity recognition, etc.), and helping in authoring these datasets where they are missing.
  • Adapting and integrating existing Hebrew NLP Python tools with existing popular frameworks:
  • Creating those tools when they are missing, focusing on:
    • Tokenization. Specifically stemming and lemmatization.
    • A word embeddings model for Hebrew
    • Part-of-speech tagger

How can I help?

  • Help expand our list of resources for NLP in Hebrew!
  • Join our mailing list, for updates and for opportunities to contribure!
  • Need something more specific? Email us at
  • Join the discussion in our Facebook group.
  • If you are associated with an organization that already has good, working solutions for some of the problems we are interested in, and you'd like to consider sharing those solutions (or a subset thereof) in a suitable open license, we'd love to hear from you!