<a href="https://colab.research.google.com/github/HarshShinde0/Fine-Tuning-Large-Language-Models-How-Vertex-AI-Takes-LLMs-to-the-Next-Level/blob/main/llm_fine_tuning_supervised.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Model Tuning with Vertex AI Foundation Model

LLMOps, or Large Language Model Operations, is an important methodology as organizations increasingly adopt large language models (LLMs) for a wide range of applications. LLMOps is the set of tools, processes, and best practices for managing the lifecycle of LLMs, from development and deployment to monitoring and maintenance. Vertex AI offers services to manage LLMOps pipelines as also mechanisms to evaluate the new models quality after every pipeline execution that you run. Model fine-tuning is a powerful technique used to improve the performance of pre-trained language models (LLMs) for specific tasks or domains. It involves adjusting the model's parameters based on a new dataset or task-specific data to enhance its ability to make accurate predictions or generate relevant text. By fine-tuning an LLM, we can leverage its existing knowledge and adapt it to a specific context, resulting in improved results and better-tailored outputs. For more details on tuning have a look at the official [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune-models).

# Objective

This lab teaches you how to tune a foundational model on new unseen data and you will use the following Google Cloud products:
*   Vertex AI Pipelines
*   Vertex AI Evaluation Services
*   Vertex AI Model Registry
*   Vertex AI Endpoints

# Use Case

Using Generative AI we will generate a suitable TITLE for a news BODY from BBC FULLTEXT DATA (Sourced from BigQuery Public Dataset *bigquery-public-data.bbc_news.fulltext*). We will fine tune text-bison@002 to a new fine-tuned model called "bbc-news-summary-tuned" and compare the result with the response from the base model.

# Install and Import Dependencies

In [None]:
!pip install google-cloud-aiplatform
!pip install --user datasets
!pip install --user google-cloud-pipeline-components

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

In [None]:
import IPython
from google.cloud import aiplatform
from google.colab import auth as google_auth
google_auth.authenticate_user()

In [None]:
import vertexai
PROJECT_ID = "structured-unstructured-data" #@param
vertexai.init(project=PROJECT_ID)

In [None]:
region = "us-central1"
REGION = "us-central1"
project_id = "structured-unstructured-data"

In [None]:
! gcloud config set project {project_id}

In [None]:
#Import the necessary libraries

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import warnings
warnings.filterwarnings('ignore')
import vertexai
vertexai.init(project=PROJECT_ID, location=REGION)
import kfp
import sys
import uuid
import json
import vertexai
import pandas as pd
from google.auth import default
from datasets import load_dataset
from google.cloud import aiplatform
from vertexai.preview.language_models import TextGenerationModel, EvaluationTextSummarizationSpec


# Prepare & Load Training Data

In [None]:
BUCKET_NAME = 'llmtuning_ssn'
BUCKET_URI = f"gs://llmtuning_ssn/TRAIN.jsonl"
REGION = "us-central1"

In [None]:
print(df.shape)

Fine Tune Text Bison@002 Model

In [None]:
model_display_name = 'bbc-finetuned-model' # @param {type:"string"}
tuned_model = TextGenerationModel.from_pretrained("text-bison@002")
tuned_model.tune_model(
training_data=df,
train_steps=100,
tuning_job_location="us-central1",
tuned_model_location="us-central1",
)


# Predict with the new Fine Tuned Model

In [None]:
response = tuned_model.predict("Summarize this text to generate a title: \n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable it it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.")
print(response.text)

In [None]:
tuned_model_name = tuned_model._endpoint.gca_resource.deployed_models[0].model
tuned_model_1 = TextGenerationModel.get_tuned_model(tuned_model_name)
#TextGenerationModel.get_tuned_model("bbc-finetuned-model")
response = tuned_model_1.predict("Summarize this text to generate a title: \n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable it it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.")
print(response.text)

# Predict with Base Model for comparison

In [None]:
base_model = TextGenerationModel.from_pretrained("text-bison@002")
response = base_model.predict("Summarize this text to generate a title: \n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable it it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.")
print(response.text)