Skip to content

Commit

Permalink
[AIR] LightningTrainer Dolly V2 FSDP Fine-tuning Example (ray-project…
Browse files Browse the repository at this point in the history
…#34990)

Signed-off-by: woshiyyya <xiaoyunxuan1998@gmail.com>
  • Loading branch information
woshiyyya authored and architkulkarni committed May 16, 2023
1 parent 3d7b2ff commit 8492f80
Show file tree
Hide file tree
Showing 14 changed files with 1,130 additions and 1 deletion.
1 change: 1 addition & 0 deletions doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ parts:
- file: ray-air/examples/gptj_batch_prediction
- file: ray-air/examples/gptj_serving
- file: ray-air/examples/dreambooth_finetuning
- file: ray-air/examples/dolly_lightning_fsdp_finetuning
- file: ray-air/api/api
- file: ray-air/benchmarks

Expand Down
1 change: 1 addition & 0 deletions doc/source/ray-air/examples/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ py_test_run_all_notebooks(
"stablediffusion_batch_prediction.ipynb", # Requires GPUs
"gptj_deepspeed_fine_tuning.ipynb", # Requires release test
"opt_deepspeed_batch_inference.ipynb", # Requires release test
"dolly_lightning_fsdp_finetuning.ipynb", # Requires release test
],
data = ["//doc/source/ray-air/examples:air_examples"],
tags = ["exclusive", "team:ml", "ray_air"],
Expand Down
1,043 changes: 1,043 additions & 0 deletions doc/source/ray-air/examples/dolly_lightning_fsdp_finetuning.ipynb

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions doc/source/ray-air/examples/gptj_deepspeed_fine_tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"(gptj_deepspeed_finetune)=\n",
"\n",
"# GPT-J-6B Fine-Tuning with Ray AIR and DeepSpeed\n",
"\n",
"In this example, we will showcase how to use the Ray AIR for **GPT-J fine-tuning**. GPT-J is a GPT-2-like causal language model trained on the Pile dataset. This particular model has 6 billion parameters. For more information on GPT-J, click [here](https://huggingface.co/docs/transformers/model_doc/gptj).\n",
Expand Down
1 change: 1 addition & 0 deletions doc/source/ray-air/examples/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ Text/NLP
- :doc:`/ray-air/examples/gptj_serving`: How to use Ray AIR to do online serving with the Hugging Face Transformers GPT-J model.
- :doc:`/ray-air/examples/dreambooth_finetuning`: How to fine-tune a DreamBooth text-to-image model with your own images.
- :doc:`/ray-air/examples/opt_deepspeed_batch_inference`: How to run batch inference on a dataset of texts with a 30B OPT model.
- :doc:`/ray-air/examples/dolly_lightning_fsdp_finetuning`: How to fine-tune a dolly-v2-7b model with Ray AIR LightningTrainer and FSDP.

Image/CV
--------
Expand Down
8 changes: 8 additions & 0 deletions doc/source/train/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,14 @@ Distributed Training Examples using Ray Train

Use LightningTrainer with Ray Data and Batch Predictor

.. grid-item-card::
:img-top: /images/pytorch_lightning_small.png
:class-img-top: pt-2 w-75 d-block mx-auto fixed-height-img

.. button-ref:: dolly_lightning_fsdp_finetuning

Fine-tune LLM with AIR LightningTrainer and FSDP


Ray Train Examples Using Loggers & Callbacks
--------------------------------------------
Expand Down
11 changes: 11 additions & 0 deletions doc/source/train/examples/lightning/lightning_cola_advanced.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1483,6 +1483,17 @@
"print(results.head(10))\n",
"print(matthews_corr)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## What's next?\n",
"\n",
"- {ref}`Fine-tune a Large Language Model with LightningTrainer and FSDP <dolly_lightning_fsdp_finetuning>`\n",
"- {ref}`Hyperparameter searching with LightningTrainer + Ray Tune. <tune-pytorch-lightning-ref>`"
]
}
],
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,7 @@
"## What's next?\n",
"\n",
"- {ref}`Use LightningTrainer with Ray Data and Batch Predictor <lightning_advanced_example>`\n",
"- {ref}`Fine-tune a Large Language Model with LightningTrainer and FSDP <dolly_lightning_fsdp_finetuning>`\n",
"- {ref}`Hyperparameter searching with LightningTrainer + Ray Tune. <tune-pytorch-lightning-ref>`"
]
}
Expand Down
3 changes: 2 additions & 1 deletion doc/source/tune/examples/tune-pytorch-lightning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -582,6 +582,7 @@
"\n",
"- {ref}`Use LightningTrainer for Image Classification <lightning_mnist_example>`.\n",
"- {ref}`Use LightningTrainer with Ray Data and Batch Predictor <lightning_advanced_example>`\n",
"- {ref}`Fine-tune a Large Language Model with LightningTrainer and FSDP <dolly_lightning_fsdp_finetuning>`\n",
"- {doc}`/tune/examples/includes/mlflow_ptl_example`: Example for using [MLflow](https://github.com/mlflow/mlflow/)\n",
" and [Pytorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) with Ray Tune.\n",
"- {doc}`/tune/examples/includes/mnist_ptl_mini`:\n",
Expand All @@ -607,7 +608,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.15"
"version": "3.8.16"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
cloud_id: {{env["ANYSCALE_CLOUD_ID"]}}
region: us-west-2

head_node_type:
name: head_node
instance_type: g4dn.8xlarge

worker_node_types:
- name: worker_node
instance_type: g4dn.4xlarge
min_workers: 15
max_workers: 15
use_spot: false

aws:
TagSpecifications:
- ResourceType: "instance"
Tags:
- Key: ttl-hours
Value: '24'
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
base_image: {{ env["RAY_IMAGE_ML_NIGHTLY_GPU"] | default("anyscale/ray:nightly-py38-cu118") }}
env_vars: {}
debian_packages:
- curl

python:
pip_packages:
- "datasets"
- "evaluate"
- "scikit-learn"
- "boto3"
- myst-parser==0.15.2
- myst-nb==0.13.1
- jupytext==1.13.6
conda_packages: []

post_build_cmds:
- pip uninstall -y ray || true && pip3 install -U {{ env["RAY_WHEELS"] | default("ray") }}
- {{ env["RAY_WHEELS_SANITY_CHECK"] | default("echo No Ray wheels sanity check") }}
- pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
- pip3 install "pytorch_lightning>=2.0.0" "transformers>=4.28.0" "accelerate>=0.18.0"
17 changes: 17 additions & 0 deletions release/release_tests.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -827,6 +827,23 @@
cluster_compute: gptj_deepspeed_compute_gce.yaml


- name: air_example_dolly_v2_lightning_fsdp_finetuning
group: AIR examples
working_dir: air_examples/dolly_v2_lightning_fsdp_finetuning

python: "3.8"

frequency: weekly
team: ml
cluster:
cluster_env: dolly_v2_fsdp_env.yaml
cluster_compute: dolly_v2_fsdp_compute_aws.yaml

run:
timeout: 4700
script: python test_myst_doc.py --path lightning-llm-finetuning-7b.ipynb


- name: air_example_opt_deepspeed_batch_inference
group: AIR examples
working_dir: air_examples/opt_deepspeed_batch_inference
Expand Down

0 comments on commit 8492f80

Please sign in to comment.