Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaMini-instruction #548

Open
AkihikoWatanabe opened this issue Apr 26, 2023 · 2 comments
Open

LaMini-instruction #548

AkihikoWatanabe opened this issue Apr 26, 2023 · 2 comments

Comments

@AkihikoWatanabe
Copy link
Owner

https://huggingface.co/datasets/MBZUAI/LaMini-instruction

@AkihikoWatanabe
Copy link
Owner Author

AkihikoWatanabe commented Oct 22, 2023

We distill the knowledge from large language models by performing sentence/offline distillation (Kim and Rush, 2016). We generate a total of 2.58M pairs of instructions and responses using gpt-3.5-turbo based on several existing resources of prompts, including self-instruct (Wang et al., 2022), P3 (Sanh et al., 2022), FLAN (Longpre et al., 2023) and Alpaca (Taori et al., 2023). More information about the process for generating our instruction dataset, please refer to our paper.

Translation (by gpt-3.5-turbo)

  • 私たちは、大規模言語モデルからの知識を抽出するために、文/オフライン蒸留(Kim and Rush, 2016)を行います。私たちは、gpt-3.5-turboを使用して、self-instruct(Wang et al., 2022)、P3(Sanh et al., 2022)、FLAN(Longpre et al., 2023)、およびAlpaca(Taori et al., 2023)を含むいくつかの既存のプロンプトリソースに基づいて、合計258万ペアの指示と応答を生成します。私たちの指示データセットの生成プロセスの詳細については、私たちの論文を参照してください。

Summary (by gpt-3.5-turbo)

  • 私たちは、大規模言語モデルからの知識を抽出するために、文/オフライン蒸留を行います。具体的には、いくつかの既存のプロンプトリソースに基づいて、合計258万ペアの指示と応答を生成します。詳細は論文を参照してください。

@AkihikoWatanabe
Copy link
Owner Author

AkihikoWatanabe commented Oct 22, 2023

既存のInstruction DatasetのInstructionをseedとして、gpt-3.5-turboで新たなInstructionとresponseを生成したデータセット
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant