In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex AI: Distill a large language model

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/distillation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fgenerative_ai%2Fdistillation.ipynb">
      <img width="32px" src="https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/generative_ai/distillation.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/generative_ai/distillation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

## Overview

This tutorial demonstrates how to use the distilling Step by Step on the Vertex AI.

The distilling step-by-step (DSS) method ([paper](https://arxiv.org/abs/2305.02301v1)) can enrich customer’s data by eliciting the reasoning process (rationales) from a large language model (LLM). This new mechanism has shown to be able to (a) train smaller models that outperform LLMs, and (b) achieves so by leveraging less training data needed by fine-tuning or distillation. This method extracts LLM rationales as additional supervision within a multi-task training framework.

Learn more about [distill-text-models](https://cloud.google.com/vertex-ai/generative-ai/docs/models/distill-text-models).

**_NOTE_**: This notebook is tested in the following environment:

* Python version = 3.9

### Objective

In this tutorial, you learn how to distill and deploy a large language model using Vertex AI LLM.

This tutorial uses the following Vertex AI services:

- Vertex AI LLM
- Vertex AI Model Garden
- Vertex AI Online prediction


The steps performed include:

- Get the Vertex AI LLM model.
- Distill the model(this automatically creates a Vertex AI endpoint and deploys the model to the endpoint). 
- Make a prediction using Vertex AI LLM.

### Dataset

Distillation works on a labeled or an unlabeled dataset. If you have a high quality labeled dataset with hundreds of examples, then it's recommended that you use the labeled dataset. Otherwise, you can use an unlabeled prompt dataset. If you use an unlabeled dataset, then the teacher model generates the labels and the rationale for distillation. More than 1,000 examples are recommended if you use an unlabeled dataset.

For this tutorial, you use a dataset stored in a public Cloud Storage bucket at the below paths. 
- Train sample: `gs://cloud-samples-data/vertex-ai/model-evaluation/peft_train_sample.jsonl`
- Validation sample: `gs://cloud-samples-data/vertex-ai/model-evaluation/peft_eval_sample.jsonl`

#### Input format requirement

The labeled or unlabeled distillation dataset must be in JSON Lines (JSONL) format where each line contains a single tuning example. Before you distill your model, upload your dataset to a Cloud Storage bucket.

Each dataset example contains an `input_text` field with the model prompt and an optional `output_text` field that contains an example response that the distilled model is expected to produce.

The maximum token length for `input_text` is 7,168 and the maximum token length for `output_text` is 1,024. If either field exceeds the maximum token length, the excess tokens are truncated.

The maximum number of examples that a dataset for a text generation model can contain is 10,000.


Example:

```
{"input_text": "question: How many people live in Beijing? context: With over 21 million residents, Beijing is the world's most populous national capital city and is China's second largest city after Shanghai. It is located in Northern China, and is governed as a municipality under the direct administration of the State Council with 16 urban, suburban, and rural districts.[14] Beijing is mostly surrounded by Hebei Province with the exception of neighboring Tianjin to the southeast; together, the three divisions form the Jingjinji megalopolis and the national capital region of China.", "output_text": "over 21 million people"}
{"input_text": "question: How many parishes are there in Louisiana? context: The U.S. state of Louisiana is divided into 64 parishes (French: paroisses) in the same manner that 48 other states of the United States are divided into counties, and Alaska is divided into boroughs.", "output_text": "64"}
```

### Costs

This tutorial uses billable components of Google Cloud:

* Vertex AI
* Cloud Storage

Learn about [Vertex AI
pricing](https://cloud.google.com/vertex-ai/pricing), [Cloud Storage
pricing](https://cloud.google.com/storage/pricing), and use the [Pricing
Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Get started

### Install Vertex AI SDK for Python and other required packages


In [None]:
! pip3 install --upgrade --quiet google-cloud-aiplatform \
                                 "shapely<2.0.0" \
                                 PyYAML

### Restart runtime (Colab only)

To use the newly installed packages, you must restart the runtime on Google Colab.

In [None]:
import sys

if "google.colab" in sys.modules:

    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

Authenticate your environment on Google Colab.


In [None]:
import sys

if "google.colab" in sys.modules:

    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK for Python

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}
LOCATION = "us-central1"  # @param {type:"string"}

import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

### Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.

In [None]:
BUCKET_URI = f"gs://your-bucket-name-{PROJECT_ID}-unique"  # @param {type:"string"}

**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket.

In [None]:
! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

#### Copy the dataset to your bucket

Before you start the distillation, copy the dataset from the source to your Cloud Storage bucket.

**Note**: Alternatively, you can directly specify the source path for the data when you perform distillation. Copying the data to your Google Cloud project is only optional.

In [None]:
! gsutil cp gs://cloud-samples-data/vertex-ai/model-evaluation/peft_eval_sample.jsonl {BUCKET_URI}/peft_eval_sample.jsonl
! gsutil cp gs://cloud-samples-data/vertex-ai/model-evaluation/peft_train_sample.jsonl {BUCKET_URI}/peft_train_sample.jsonl

### Import libraries

In [None]:
from google.cloud import aiplatform
from vertexai.preview.language_models import (TextGenerationModel,
                                              TuningEvaluationSpec)

## Load pretrained model

Load the pretrained BISON model from Vertex AI LLM Model Garden.
See the [list of models that support distillation](https://cloud.google.com/vertex-ai/docs/generative-ai/models/distill-text-models#supported_models).

In [None]:
student_model = TextGenerationModel.from_pretrained("text-bison@002")
teacher_model = TextGenerationModel.from_pretrained(
    "text-unicorn@001"
)  # you can also use string 'text-unicorn@001'

## Distill the model

Next, you distill the model using the `distill_from()` method, with the following parameters:

- `teacher_model`: The teacher model that you would like to distill the knowledge from.
- `dataset`: A pandas Dataframe or Cloud Storage location of the training data for tuning the model.
- `learning_rate_multiplier`: A multiplier to apply to the recommended learning rate. To use the recommended learning rate, use 1.0.
- `train_steps`: The number of steps to run for model tuning. The default value is 300. The batch size varies by tuning location as below for 8k models such as `text-bison@002`:
    
    - us-central1 has a batch size of 8.
    - europe-west4 has a batch size of 24.

For parameter definitions and further context, see [Create a text model distilling job](https://cloud.google.com/vertex-ai/docs/generative-ai/models/distill-text-models#create_a_text_model_distilling_job). 

In [None]:
# Optional: TuningEvaluationSpec
# see https://cloud.google.com/vertex-ai/docs/generative-ai/models/distill-text-models#create_a_text_model_distilling_job for full context

eval_spec = TuningEvaluationSpec()
eval_spec.evaluation_data = f"{BUCKET_URI}/peft_eval_sample.jsonl"
eval_spec.evaluation_interval = 20

Set a display name for your model resource and the endpoint resource using the `DISPLAY_NAME` parameter.

**Note**: In the tuning pipeline, the model and endpoint share the same display name.

In [None]:
# Set the display name
DISPLAY_NAME = "vertex-distillation-model-unique"  # @param {type:"string"}

# Create the tuning pipeline job
pipeline = student_model.distill_from(
    teacher_model=teacher_model,
    dataset=f"{BUCKET_URI}/peft_train_sample.jsonl",
    train_steps=200,
    learning_rate_multiplier=1,
    accelerator_type="TPU",
    model_display_name=DISPLAY_NAME,
    evaluation_spec=eval_spec,
)

# Wait until the tuning pipeline job finishes
pipeline._job.wait()

## Make a prediction with Vertex AI LLM

Now, make a prediction using the `predict()` method from the Vertex AI LLM interface.

In [None]:
# Define the prompt
prompt = "TRANSCRIPT: \nPROCEDURE PERFORMED: , Umbilical hernia repair.,PROCEDURE:,  After informed consent was obtained, the patient was brought to the operative suite and placed supine on the operating table.  The patient was sedated, and an adequate local anesthetic was administered using 1% lidocaine without epinephrine.  The patient was prepped and draped in the usual sterile manner.,A standard curvilinear umbilical incision was made, and dissection was carried down to the hernia sac using a combination of Metzenbaum scissors and Bovie electrocautery.  The sac was cleared of overlying adherent tissue, and the fascial defect was delineated.  The fascia was cleared of any adherent tissue for a distance of 1.5 cm from the defect.  The sac was then placed into the abdominal cavity and the defect was closed primarily using simple interrupted 0 Vicryl sutures.  The umbilicus was then re-formed using 4-0 Vicryl to tack the umbilical skin to the fascia.,The wound was then irrigated using sterile saline, and hemostasis was obtained using Bovie electrocautery.  The skin was approximated with 4-0 Vicryl in a subcuticular fashion.  The skin was prepped with benzoin, and Steri-Strips were applied.  A dressing was then applied.  All surgical counts were reported as correct.,Having tolerated the procedure well, the patient was subsequently taken to the recovery room in good and stable condition.\n\n LABEL: "

In [None]:
# Print the prompt
print(student_model.predict(prompt))

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# Fetch the endpoint resource using the display name and create time
endpoints = aiplatform.Endpoint.list(
    filter=f"display_name={DISPLAY_NAME}", order_by="create_time"
)
if len(endpoints) > 0:
    # Undeploy the model from the endpoint
    endpoints[0].undeploy_all()
    # Delete the endpoint
    endpoints[0].delete()

# Fetch the model resource using the display name and create time
models = aiplatform.Model.list(
    filter=f"display_name={DISPLAY_NAME}", order_by="create_time"
)
if len(models) > 0:
    # Delete the model
    models[0].delete()

# Delete the pipeline job
pipeline._job.delete()

# Delete the Cloud Storage bucket
delete_bucket = True
if delete_bucket:
    ! gsutil rm -rf {BUCKET_URI}