Skip to content

Alpaca-style instruction-following dataset for Estonian

Notifications You must be signed in to change notification settings

TartuNLP/alpaca-est

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Alpaca-est

Alpaca-est is an instruction dataset generated for Estonian with gpt-3.5-turbo-0613, following Alpaca (https://github.com/tatsu-lab/stanford_alpaca/tree/main).

alpaca_est.json contains 52,006 instruction-following examples. Seed tasks and prompt used to generate the dataset are in seed_tasks_est.jsonl and prompt_est.txt respectively.

Example of generating the dataset using generate_instructions.py:

python -u generate_instruction.py \
  --output-dir data \
  --seed-tasks-path data/seed_tasks_est.jsonl \
  --prompt-path data/prompt_est.txt \
  --num-instructions-to-generate 52000 \
  --num-prompt-instructions 3 --num-cpus 16 --num-parallel-requests 8

About

Alpaca-style instruction-following dataset for Estonian

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages