We are an open-source, community-driven initiative working to document, preserve, and digitally represent Himachali dialects by building open translation datasets between local dialects and Hindi.
Many Himachali dialects are actively spoken but remain largely absent from modern digital tools and research. Our goal is to change that by creating freely available language resources that can be used for education, research, and future language technologies.
- Build open, high-quality parallel datasets for Himachali dialects
- Preserve linguistic and cultural knowledge through community contribution
- Enable research and development in low-resource language NLP
- Ensure Himachali languages are represented in the digital ecosystem
- Native speakers contribute sentences via simple submission forms
- Data is cleaned, structured, and released openly
- Technical contributors help with validation, modeling, and tools
- All work is transparent and community-driven
- Contribute sentences: [Google Form link]
- Join discussions: https://discord.com/invite/PgJWcFXRTB
- Technical contributions: See individual repositories
All datasets and resources are released under the CC BY-SA 4.0 license unless stated otherwise.