The Open Natural Language Processing in Hebrew Initiative
The Open Natural Language Processing in Hebrew (NLPH) initiative is a joint effort by members of DataHack and The Public Knowledge Workshop to promote open tools and resources for Natural Language Processing in Hebrew.
Our vision is to bring Natural Language Processing capabilities in Hebrew to a level on par with international industry standards, keeping up with state-of-the-art techniques by providing open source implementations to new algorithms and tools, and making these capabilities publicly available for both public and commercial use.
- Creating, maintaining, adapting and spreading resources that enable high-quality, production-ready, open-licensed Natural Language Processing in Hebrew.
- Enable, foster and catalyze cooperation between stakeholders in academia, private and the public sectors, in order to promote better Open Source Hebrew NLP solutions, and share existing knowledge and tools.
Who's taking part?
- The Public Knowledge Workshop
- Dr. Reut Tsarfaty's Natural Language Processing Lab at the Open University of Israel
- Dr. Yoav Goldberg's lab at the Bar Ilan University
- Your company/organization/lab/faculty, we hope!
- Hebrew NLP Resources List - Maintaining a list of relevant resources: https://github.com/NLPH/NLPH_Resources
- Common Voice: Hebrew - A Hebrew version for the Mozilla: Common Voice apps.
What's our current focus?
- Forming a group of volunteers to start work on the core projects during the developer meetings of the Public Knowledge Workshop and in other frameworks - including events like hackathons and as part of educational and research projects.
- Encouraging the open-licensing of high quality, open-licensed, tagged and labelled datasets from various domains (social media, articles, research papers, etc.) and for various tasks (part-of-speech tagging, text classification, sentiment analysis, named entity recognition, etc.), and helping in authoring these datasets where they are missing.
- Adapting and integrating existing Hebrew NLP Python tools with existing popular frameworks:
- Creating those tools when they are missing, focusing on:
- Tokenization. Specifically stemming and lemmatization.
- A word embeddings model for Hebrew
- Part-of-speech tagger
How can I help?
- Help expand our list of resources for NLP in Hebrew!
- Join our mailing list, for updates and for opportunities to contribure!
- Need something more specific? Email us at firstname.lastname@example.org.
- Join the discussion in our Facebook group.
- If you are associated with an organization that already has good, working solutions for some of the problems we are interested in, and you'd like to consider sharing those solutions (or a subset thereof) in a suitable open license, we'd love to hear from you!