Finnish version of the databricks-dolly-15k instruction dataset, machine translated from the original English using DeepL.
The data is found in the file dolly-15k-fi.jsonl
. The format and
uses of this data match those of the original English dataset. For
more information, please see
https://github.com/databrickslabs/dolly/tree/master/data.
The data was processed from the original as follows:
- Convert original data from JSONL to DOCX files
python3 jsonl2doc.py original-data/databricks-dolly-15k.jsonl
-
Translate DOCX files from
dolly-doc-in/
using DeepL and save outputs indolly-doc-out/
. -
Convert back to JSONL
python3 doc2jsonl.py \
--add-id \
--include-original \
original-data/databricks-dolly-15k.jsonl \
dolly-doc-out/dolly-000*.docx \
> dolly-15k-fi.jsonl
This dataset is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC BY-SA).
Note that under the DeepL terms and conditions, this data may not be used to develop, market or train a machine translation algorithm.