The Online Support Conversations Dataset contains Hebrew and Arabic text-based emotional support chats that have undergone an extensive cleaning and anonymization process.
These conversations are part of a larger collection recorded between 2019–2023 by Sahar (Support and Listening Online), a nonprofit organization providing online emotional assistance.
Access to the dataset is restricted.
Data sharing will be considered only for approved research purposes or projects that advance social well-being, mental health support, and natural language processing (NLP) in Hebrew and Arabic.
Due to the highly sensitive nature of the content, researchers and developers must adhere to strict ethical and privacy guidelines.
The data access review process ensures:
- Responsible and secure use of the dataset
- Prevention of biased or misleading data representations
- Protection from any potential exposure of sensitive or identifiable information
The dataset includes approximately 300 anonymized conversations in Hebrew and Arabic.
Each conversation entry includes:
- Timestamps and duration
- Speaker age and gender
- Primary emotional difficulty discussed (as labeled by human annotators)
To initiate the data access request process, please complete the official
Data Access Request Form
Hebrew NLP development in this project was supported by tools from the
ONLP LAB, including the AlephBERT model.
Automated anonymization was performed using the
HebSafeHarbor – Clalit Validation Project.
For any questions or collaboration inquiries regarding this project, please contact:
Avi Segal — avisegal@gmail.com
