Code for paper Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes
- Add support for GCS
- Add command line invocation with arguments
- Add support for hosting distilled models using docker
- Add support for hosting models as vertex AI endpoints
- Add support for hosting models as TF Serving endpoints
- Add kedro pipeline for distillation
- Add support for vertex AI pipelines
Work in progress.
This is a fork of the distilling-step-by-step repository with the aim of creating a task-specific LLM distillation framework for healthcare. The data should be in the format (This may change):
{
"input": "The input here",
"label": "The output here",
"rationale": "The rationale generated by chain of thought"
}
The rationale may be generated using MEDprompt's Self Generated COT Chain.
in the path:
- datasets/generic/generic_test.json
- datasets/generic/generic_train.json
- WIP: GCS support
- You can use a "teacher LLM" to generate labels and rationale.
git clone https://github.com/dermatologist/distilling-step-by-step.git
cd distilling-step-by-step
pip install -e .
distillm
- Distilling step-by-step with
PaLM label
andPaLM rationale
:
distillm --from_pretrained google/t5-v1_1-small \
--alpha 0.5 \
--batch_size 4 \
--max_steps 100 \
--eval_steps 2 \
--no_log \
--output_dir output
--from_pretrained
:google/t5-v1_1-small
,google/t5-v1_1-base
,google/t5-v1_1-large
,google/t5-v1_1-xxl
--dataset
:esnli
,anli1
,cqa
,svamp
,generic
--label_type
:--label_type gt
: Use GT label for training--label_type llm
: Use LLM predicted label for training--label_type generic
: Use provided label for training
--alpha
: Task weight for multi-task training. Loss = alpha * label_prediction_loss + (1 - alpha) * rationale_generation_loss--alpha 0.5
: recommended
--batch_size
: Batch size--grad_steps
: Gradient accumulation step--max_input_length
: Maximum input length--eval_steps
: How many steps to evaluate the model during training--max_steps
: Maximum steps for training--run
: Random seed to use--model_type
:standard
: Standard finetuning (--label_type gt
) or distillation (--label_type llm
)task_prefix
: Distilling step-by-step
--parallelize
: Model parallelism--output_dir
: The directory for saving the distilled model--gcs_project
: The GCP project name--gcs_path
: The GCS path. _train.json and _test.json will be added to the path
If you find this repository useful, please consider citing:
@article{hsieh2023distilling,
title={Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes},
author={Hsieh, Cheng-Yu and Li, Chun-Liang and Yeh, Chih-Kuan and Nakhost, Hootan and Fujii, Yasuhisa and Ratner, Alexander and Krishna, Ranjay and Lee, Chen-Yu and Pfister, Tomas},
journal={arXiv preprint arXiv:2305.02301},
year={2023}
}
- Contact Bell Eapen | for information related to this fork.