Objectives
Learn to quickly build applications related to speech recognition using existing APIs. (學習以現成的API快速搭建語音辨識相關的應用。)

In [None]:
# Install packages.
!pip install srt==3.5.3
!pip install datasets==2.20.0
!pip install DateTime==5.5
!pip install OpenCC==1.1.7
!pip install opencv-contrib-python==4.8.0.76
!pip install opencv-python==4.8.0.76
!pip install opencv-python-headless==4.10.0.84
!pip install openpyxl==3.1.4
!pip install openai==1.35.3
!pip install git+https://github.com/openai/whisper.git@ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab
!pip install numpy==1.25.2
!pip install soundfile==0.12.1
!pip install -q -U google-generativeai==0.7.0
!pip install anthropic==0.29.0
!pip install librosa
!pip install soundfile

In [None]:
# Import packages.
import whisper
import srt
import datetime
import time
import os
import re
import pathlib
import textwrap
import numpy as np
import soundfile as sf
from opencc import OpenCC
from tqdm import tqdm
from datasets import load_dataset
from openai import OpenAI
import google.generativeai as genai
import anthropic


  from .autonotebook import tqdm as notebook_tqdm


Download data
The code block below takes about 10 seconds to run, although there might be some slight variation depending on the state of Colab.

In [4]:
# Load dataset.
dataset_name = "kuanhuggingface/NTU-GenAI-2024-HW9"
dataset = load_dataset(dataset_name)

Downloading readme: 100%|██████████| 305/305 [00:00<?, ?B/s] 
Downloading data: 100%|██████████| 3.14M/3.14M [00:00<00:00, 3.77MB/s]
Generating test split: 100%|██████████| 1/1 [00:00<00:00, 19.01 examples/s]


In [10]:
# Prepare audio.
input_audio = dataset["test"]["audio"][0]
input_audio_name = input_audio["path"]
input_audio_array = input_audio["array"].astype(np.float32)
sampling_rate = input_audio["sampling_rate"]

print(f"Now, we are going to transcribe the audio: 李琳山教授 信號與人生 (2023) ({input_audio_name}).")

Now, we are going to transcribe the audio: 李琳山教授 信號與人生 (2023) (ntu-gen-ai-2024-hw9-16k.mp3).


Part2 - Automatic Speech Recognition (ASR)

In [11]:
def speech_recognition(model_name, input_audio, output_subtitle_path, decode_options, cache_dir="./"):
    '''
        (1) Objective:
            - This function aims to convert audio to subtitle.

        (2) Arguments:

            - model_name (str):
                The name of the model. There are five model sizes, including tiny, base, small, medium, large-v3.
                For example, you can use 'tiny', 'base', 'small', 'medium', 'large-v3' to specify the model name.
                You can see 'https://github.com/openai/whisper' for more details.

            - input_audio (Union[str, np.ndarray, torch.Tensor]):
                The path to the audio file to open, or the audio waveform
                - For example, if your input audio path is 'input.wav', you can use 'input.wav' to specify the input audio path.
                - For example, if your input audio array is 'audio_array', you can use 'audio_array' to specify the input audio array.

            - output_subtitle_path (str):
                The path of the output subtitle file.
                For example, if you want to save the subtitle file as 'output.srt', you can use 'output.srt' to specify the output subtitle path.

            - decode_options (dict):
                The options for decoding the audio file, including 'initial_prompt', 'prompt', 'prefix', 'temperature'.
                - initial_prompt (str):
                    Optional text to provide as a prompt for the first window. This can be used to provide, or
                    "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns
                    to make it more likely to predict those word correctly.
                    Default: None.

                You can see "https://github.com/openai/whisper/blob/main/whisper/decoding.py" and "https://github.com/openai/whisper/blob/main/whisper/transcribe.py"
                for more details.

                - temperature (float):
                    The temperature for sampling from the model. Higher values mean more randomness.
                    Default: 0.0

            - cache_dir (str):
                The path of the cache directory for saving the model.
                For example, if you want to save the cache files in 'cache' directory, you can use 'cache' to specify the cache directory.
                Default: './'

        (3) Example:

            - If you want to use the 'base' model to convert 'input.wav' to 'output.srt' and save the cache files in 'cache' directory,
            you can call this function as follows:

                speech_recognition(model_name='base', input_audio_path='input.wav', output_subtitle_path='output.srt', cache_dir='cache')
    '''

    # Record the start time.
    start_time = time.time()

    print(f"=============== Loading Whisper-{model_name} ===============")

    # Load the model.
    model = whisper.load_model(name=model_name, download_root=cache_dir)

    print(f"Begin to utilize Whisper-{model_name} to transcribe the audio.")

    # Transcribe the audio.
    transcription = model.transcribe(audio=input_audio, language=decode_options["language"], verbose=False,
                                     initial_prompt=decode_options["initial_prompt"], temperature=decode_options["temperature"])

    # Record the end time.
    end_time = time.time()

    print(f"The process of speech recognition costs {end_time - start_time} seconds.")

    subtitles = []
    # Convert the transcription to subtitle and iterate over the segments.
    for i, segment in tqdm(enumerate(transcription["segments"])):

        # Convert the start time to subtitle format.
        start_time = datetime.timedelta(seconds=segment["start"])

        # Convert the end time to subtitle format.
        end_time = datetime.timedelta(seconds=segment["end"])

        # Get the subtitle text.
        text = segment["text"]

        # Append the subtitle to the subtitle list.
        subtitles.append(srt.Subtitle(index=i, start=start_time, end=end_time, content=text))

    # Convert the subtitle list to subtitle content.
    srt_content = srt.compose(subtitles)

    print(f"\n=============== Saving the subtitle to {output_subtitle_path} ===============")

    # Save the subtitle content to the subtitle file.
    with open(output_subtitle_path, "w", encoding="utf-8") as file:
        file.write(srt_content)

In [12]:
# @title Parameter Setting of Whisper { run: "auto" }

''' In this block, you can modify your desired parameters and the path of input file. '''

# The name of the model you want to use.
# For example, you can use 'tiny', 'base', 'small', 'medium', 'large-v3' to specify the model name.
# @markdown **model_name**: The name of the model you want to use.
model_name = "medium" # @param ["tiny", "base", "small", "medium", "large-v3"]

# Define the suffix of the output file.
# @markdown **suffix**: The output file name is "output-{suffix}.* ", where .* is the file extention (.txt or .srt)
suffix = "信號與人生" # @param {type: "string"}

# Path to the output file.
output_subtitle_path = f"./output-{suffix}.srt"

# Path of the output raw text file from the SRT file.
output_raw_text_path = f"./output-{suffix}.txt"

# Path to the directory where the model and dataset will be cached.
cache_dir = "./"

# The language of the lecture video.
# @markdown **language**: The language of the lecture video.
language = "zh" # @param {type:"string"}

# Optional text to provide as a prompt for the first window.
# @markdown **initial_prompt**: Optional text to provide as a prompt for the first window.
initial_prompt = "請用繁體中文" #@param {type:"string"}

# The temperature for sampling from the model. Higher values mean more randomness.
# @markdown  **temperature**: The temperature for sampling from the model. Higher values mean more randomness.
temperature = 0 # @param {type:"slider", min:0, max:1, step:0.1}

In [13]:
# Construct DecodingOptions
decode_options = {
    "language": language,
    "initial_prompt": initial_prompt,
    "temperature": temperature
}

In [14]:
# print message.
message = "Transcribe 李琳山教授 信號與人生 (2023)"
print(f"Setting: (1) Model: whisper-{model_name} (2) Language: {language} (2) Initial Prompt: {initial_prompt} (3) Temperature: {temperature}")
print(message)

Setting: (1) Model: whisper-medium (2) Language: zh (2) Initial Prompt: 請用繁體中文 (3) Temperature: 0
Transcribe 李琳山教授 信號與人生 (2023)


In [15]:
# Running ASR.
speech_recognition(model_name=model_name, input_audio=input_audio_array, output_subtitle_path=output_subtitle_path, decode_options=decode_options, cache_dir=cache_dir)



100%|█████████████████████████████████████| 1.42G/1.42G [02:11<00:00, 11.6MiB/s]


Begin to utilize Whisper-medium to transcribe the audio.


100%|██████████| 104500/104500 [13:45<00:00, 126.53frames/s]


The process of speech recognition costs 968.5297017097473 seconds.


370it [00:00, 370733.99it/s]







You can check the result of automatic speech recognition.

In [16]:
''' Open the SRT file and read its content.
The format of SRT is:

[Index]
[Begin time] (hour:minute:second) --> [End time] (hour:minute:second)
[Transcription]

'''

with open(output_subtitle_path, 'r', encoding='utf-8') as file:
    content = file.read()

print(content)

1
00:00:00,000 --> 00:00:04,000
每次說這個學問是做出來的

2
00:00:06,000 --> 00:00:08,000
什麼意思?

3
00:00:08,000 --> 00:00:12,000
要做才會獲得學問

4
00:00:13,000 --> 00:00:16,000
你如果每天光是坐在那裡聽

5
00:00:17,000 --> 00:00:20,000
學問很可能是左耳進右耳出的

6
00:00:21,000 --> 00:00:23,000
你光是坐在那兒讀

7
00:00:23,000 --> 00:00:26,000
學問可能從眼睛進入腦海之後就忘掉了

8
00:00:26,000 --> 00:00:29,000
如何能夠學問在腦海裡面

9
00:00:31,000 --> 00:00:33,000
真的變成你自己學問

10
00:00:33,000 --> 00:00:35,000
就是要做

11
00:00:36,000 --> 00:00:39,000
可能有很多同學有這個經驗

12
00:00:39,000 --> 00:00:41,000
你如果去修某一門課

13
00:00:41,000 --> 00:00:44,000
或者做某一個實驗

14
00:00:44,000 --> 00:00:47,000
在期末就是要教一個final project

15
00:00:48,000 --> 00:00:50,000
那個final project就是要你把

16
00:00:51,000 --> 00:00:53,000
學到的很多東西

17
00:00:53,000 --> 00:00:56,000
最後整合在你的final project裡面

18
00:00:56,000 --> 00:00:58,000
最後做出來的時候

19
00:00:58,000 --> 00:01:00,000
就是把它們都整合了

20
00:01:00,000 --> 00:01:02,000
當你學期結束

21
00:01:02,000 --> 00:01:04,000
真的把final project做完的時候

22
00:01:04,000 --> 00:01:05,00

Part3 - Preprocess the results of automatic speech recognition

In [17]:
def extract_and_save_text(srt_filename, output_filename):

    '''
    (1) Objective:
        - This function extracts the text from an SRT file and saves it to a new text file.
        - It also converts the Simplified Chinese to Traditional Chinese.

    (2) Arguments:

        - srt_filename: The path to the SRT file.

        - output_filename: The name of the output text file.

    (3) Example:
        - If your SRT file is named 'subtitle.srt' and you want to save the extracted text to a file named 'output.txt', you can use the function like this:
            extract_and_save_text('subtitle.srt', 'output.txt')

    '''

    # Open the SRT file and read its content.
    with open(srt_filename, 'r', encoding='utf-8') as file:
        content = file.read()

    # Use regular expression to remove the timecode.
    pure_text = re.sub(r'\d+\n\d{2}:\d{2}:\d{2},\d{3} --> \d{2}:\d{2}:\d{2},\d{3}\n', '', content)

    # Remove the empty lines.
    pure_text = re.sub(r'\n\n+', '\n', pure_text)

    # Creating an instance of OpenCC for Simplified to Traditional Chinese conversion.
    cc = OpenCC('s2t')
    pure_text_conversion = cc.convert(pure_text)

    # Write the extracted text to a new file.
    with open(output_filename, 'w', encoding='utf-8') as output_file:
        output_file.write(pure_text_conversion)

    print(f'Extracted text has been saved to {output_filename}.\n\n')

    return pure_text_conversion

In [18]:
def chunk_text(text, max_length):
    """
    (1) Objective:
        - This function is used to split a long string into smaller strings of a specified length.

    (2) Arguments:
        - text: str, the long string to be split.
        - max_length: int, the maximum length of each smaller string.

    (3) Returns:
        - split_text: list, a list of smaller strings.

    (3) Example:
        - If you want to split a string named "long_string" into smaller strings of length 100, you can use the function like this:
            chunk_text(long_string, 100)

    """

    return textwrap.wrap(text, max_length)

In [19]:
''' In this block, you can modify your desired parameters and the path of input file. '''

# # The length of the text chunks.
chunk_length = 512

In [20]:
# Extracts the text from an SRT file and saves it to a new text file
pure_text = extract_and_save_text(srt_filename=output_subtitle_path, output_filename=output_raw_text_path)

# Split a long document into smaller chunks of a specified length
chunks = chunk_text(text=pure_text, max_length=512)

# You can see the number of words and contents in each paragraph.
print("Review the results of splitting the long text into several short texts.\n")
for index, chunk in enumerate(chunks):
    if index == 0:
        print(f"\n========== The {index + 1}-st segment of the split ({len(chunk)} words) ==========\n\n")
        for text in textwrap.wrap(chunk, 80):
            print(f"{text}\n")
    elif index == 1:
        print(f"\n========== The {index + 1}-nd segment of the split ({len(chunk)} words) ==========\n\n")
        for text in textwrap.wrap(chunk, 80):
            print(f"{text}\n")
    elif index == 2:
        print(f"\n========== The {index + 1}-rd segment of the split ({len(chunk)} words) ==========\n\n")
        for text in textwrap.wrap(chunk, 80):
            print(f"{text}\n")
    else:
        print(f"\n========== The {index + 1}-th segment of the split ({len(chunk)} words) ==========\n\n")
        for text in textwrap.wrap(chunk, 80):
            print(f"{text}\n")

Extracted text has been saved to ./output-信號與人生.txt.


Review the results of splitting the long text into several short texts.




每次說這個學問是做出來的 什麼意思? 要做才會獲得學問 你如果每天光是坐在那裡聽 學問很可能是左耳進右耳出的 你光是坐在那兒讀

學問可能從眼睛進入腦海之後就忘掉了 如何能夠學問在腦海裡面 真的變成你自己學問 就是要做 可能有很多同學有這個經驗 你如果去修某一門課 或者做某一個實驗

在期末就是要教一個final project 那個final project就是要你把 學到的很多東西 最後整合在你的final project裡面

最後做出來的時候 就是把它們都整合了 當你學期結束 真的把final project做完的時候 你會忽然發現 我真的學到很多東西 那就是做出來的學問

也許可以舉另外一個例子 就是你如果學了某一些很複雜的演算法 或者什麼 好像覺得那些不見得在你的腦海裡 可是後來老師出了個習題 那個習題教你寫一個很大的程式

要把所有東西都包進去 當你把這個程式寫完的時候你會發現 你忽然把演算法裡所有東西都弄通了 那就是學問是做出來的 所以我們永遠要記得 盡量多動手多做

在動手跟做的過程之中 學問纔可以變成是自己的 同樣的情形就是說 很多時候這樣動手或者做的表現或者成績 沒有一個成績單上的數字




使得很多人覺得那不重要 很多人甚至覺得這門課要做final project 我就不修了太累了 或者說那門課需要怎麼樣怎麼樣太累 我就不要做了

而不知道其實那個纔是讓你做的機會 然後可以學到最多 也就是說雖然很可能那麼辛苦的做很多事 沒有讓你獲得什麼具體成績 對你的overfitting可能沒有幫助

可是對你的全面學習是很有幫助 是該學的 那不要漏掉這些事 那這是我所說的 那這個課業內可以做的這些事 那剛才我們講到思考的時候 我覺得我漏掉一點

你如果修我的信號課你可能會發現 我上課沒講到一個數學式子的時候 我通常都不推他的 我是在解釋那個數學式子在說什麼話 同樣的呢 沒講到一個什麼什麼事情的時候

我通常就在解釋他在說什麼話 也就是說 我在講的就是我讀到特本那裡的時候 我心裡怎麼想的 也就是我

Summarization

In [21]:
def summarization(summarization_prompt, model_name="gemini-pro", temperature=0.0, top_p=1.0, max_tokens=512):
    """
    (1) Objective:
        - Use the OpenAI Chat API to summarize a given text.

    (2) Arguments:
        - summarization_prompt: The summarization prompt.
        - model_name: The model name, default is "gemini-pro". You can refer to "https://ai.google.dev/models/gemini" for more details.
        - temperature: Controls randomness in the response. Lower values make responses more deterministic, default is 0.0.
        - top_p: Controls diversity via nucleus sampling. Higher values lead to more diverse responses, default is 1.0.
        - max_tokens: The maximum number of tokens to generate in the completion, default is 512.

    (3) Return:
        - The summarized text.

    (4) Example:
        - If the text is "ABC" and the summarization prompt is "DEF", model_name is "gemini-pro",
          temperature is 0.0, top_p is 1.0, and max_tokens is 512, then you can call the function like this:

              summarization(text="ABC", summarization_prompt="DEF", model_name="gemini-pro", temperature=0.0, top_p=1.0, max_tokens=512)

    """

    # The user prompt is a concatenation of the summarization_prompt and text.
    user_prompt = summarization_prompt

    # Load the generative model.
    model = genai.GenerativeModel(model_name)

    # Set the generation configuration.
    generation_config = genai.GenerationConfig(temperature=temperature, top_p=top_p, max_output_tokens=max_tokens)

    while True:

        try:
            # Use the OpenAI Chat API to summarize the text.
            response = model.generate_content(contents=user_prompt, generation_config=generation_config)

            break

        except:
            # If the API call fails, wait for 1 second and try again.
            print("The API call fails, wait for 1 second and try again.")
            time.sleep(1)

    return response.text

In [28]:
# @title Parameter Setting of Gemini { run: "auto" }
''' In this block, you can modify your desired parameters and set your api key. '''

# Your google api key.
# @markdown **google_api_key**: Your google api key.
import cred
google_api_key = cred.keys['GEMINI_API_KEY'] # @param {type:"string"}

# The model name. You can refer to "https://ai.google.dev/models/gemini" for more details.
# @markdown **model_name**: The model name. You can refer to "https://ai.google.dev/models/gemini" for more details.
model_name = "gemini-1.5-flash" # @param {type:"string"}

# Controls randomness in the response. Lower values make responses more deterministic
# @markdown **temperature**: Controls randomness in the response. Lower values make responses more deterministic.
temperature = 0.0 # @param {type:"slider", min:0, max:1, step:0.1}

# Controls diversity via nucleus sampling. Higher values lead to more diverse responses
# @markdown **top_p**: Controls diversity via nucleus sampling. Higher values lead to more diverse responses.
top_p = 1.0 # @param {type:"slider", min:0, max:1, step:0.1}

# Set Google API key.
genai.configure(api_key=google_api_key)

In [26]:
# @title Prompt Setting of Gemini Multi-Stage Summarization: Paragraph { run: "auto" }
''' You can modify the summarization prompt and maximum number of tokens. '''
''' However, DO NOT modify the part of <text>.'''

# The maximum number of tokens to generate in the completion.
# @markdown **max_tokens**: The maximum number of tokens to generate in the completion.
max_tokens = 350 # @param {type:"integer"}

# @markdown #### Changing **summarization_prompt_template**
# @markdown You can modify the summarization prompt and maximum number of tokens. However, **DO NOT** modify the part of `<text>`.
summarization_prompt_template = "用 300 個字內寫出這段文字的摘要，其中包括要點和所有重要細節：<text>" # @param {type:"string"}

In [29]:
paragraph_summarizations = []

# First, we summarize each section that has been split up separately.
for index, chunk in enumerate(chunks):

    # Record the start time.
    start = time.time()

    # Construct summarization prompt.
    summarization_prompt = summarization_prompt_template.replace("<text>", chunk)

    # We summarize each section that has been split up separately.
    response = summarization(summarization_prompt=summarization_prompt, model_name=model_name, temperature=temperature, top_p=top_p, max_tokens=max_tokens)

    # Calculate the execution time and round it to 2 decimal places.
    cost_time = round(time.time() - start, 2)

    # Print the summary and its length.
    print(f"----------------------------Summary of Segment {index + 1}----------------------------\n")
    for text in textwrap.wrap(response, 80):
        print(f"{text}\n")
    print(f"Length of summary for segment {index + 1}: {len(response)}")
    print(f"Time taken to generate summary for segment {index + 1}: {cost_time} sec.\n")

    # Record the result.
    paragraph_summarizations.append(response)

----------------------------Summary of Segment 1----------------------------

這段文字的核心觀點是：「學問是做出來的」。  單純聽課或閱讀，知識容易遺忘，唯有透過實踐才能真正內化。  文中以期末專題 (final project)

和編寫大型程式為例，說明在動手做的過程中，能將零散知識整合，並深刻理解其原理，最終將學問變成自己的東西。

即使沒有成績單上的數字來衡量，動手實踐的過程本身就是學習和掌握知識的關鍵。  因此，作者強調要積極動手實踐，才能真正獲得學問。

Length of summary for segment 1: 195
Time taken to generate summary for segment 1: 1.34 sec.

----------------------------Summary of Segment 2----------------------------

許多學生因課程作業繁重（例如期末專題）而放棄學習機會，卻不知這些挑戰正是最佳學習途徑。

即使這些作業可能不會直接提升成績或避免過擬合，卻能全面提升學習能力，培養重要的思考能力和習慣。  作者以自身教學為例，強調理解概念而非死記公式的重要性。

他鼓勵學生在學習過程中，針對每個數學公式和文本內容深入思考其含義，藉此培養思考能力和習慣，這才是學習的關鍵。

作者認為，主動思考並理解文本內容，而非僅僅完成作業，才是學習中最重要的一環。

Length of summary for segment 2: 218
Time taken to generate summary for segment 2: 1.34 sec.

----------------------------Summary of Segment 3----------------------------

這段文字的核心概念是將「學習」定義為帶來成長、進步和快樂的任何活動，並非僅限於課業。作者認為課業外的活動，例如打球、爬山、旅行等，都屬於學習的範疇。

打球能增進健康、手腦協調和團隊合作；爬山能拓展見聞；旅行則能增長見識和拓展人脈。

總之，任何能帶來成長、進步和快樂的活動，都值得投入時間和精力，並視為學習的一部分。  作

In [30]:
# First, we collect all the summarizations obtained before and print them.

collected_summarization = ""
for index, paragraph_summarization in enumerate(paragraph_summarizations):
    collected_summarization += f"Summary of segment {index + 1}: {paragraph_summarization}\n"

print(collected_summarization)

Summary of segment 1: 這段文字的核心觀點是：「學問是做出來的」。  單純聽課或閱讀，知識容易遺忘，唯有透過實踐才能真正內化。  文中以期末專題 (final project) 和編寫大型程式為例，說明在動手做的過程中，能將零散知識整合，並深刻理解其原理，最終將學問變成自己的東西。  即使沒有成績單上的數字來衡量，動手實踐的過程本身就是學習和掌握知識的關鍵。  因此，作者強調要積極動手實踐，才能真正獲得學問。

Summary of segment 2: 許多學生因課程作業繁重（例如期末專題）而放棄學習機會，卻不知這些挑戰正是最佳學習途徑。  即使這些作業可能不會直接提升成績或避免過擬合，卻能全面提升學習能力，培養重要的思考能力和習慣。  作者以自身教學為例，強調理解概念而非死記公式的重要性。  他鼓勵學生在學習過程中，針對每個數學公式和文本內容深入思考其含義，藉此培養思考能力和習慣，這才是學習的關鍵。  作者認為，主動思考並理解文本內容，而非僅僅完成作業，才是學習中最重要的一環。

Summary of segment 3: 這段文字的核心概念是將「學習」定義為帶來成長、進步和快樂的任何活動，並非僅限於課業。作者認為課業外的活動，例如打球、爬山、旅行等，都屬於學習的範疇。  打球能增進健康、手腦協調和團隊合作；爬山能拓展見聞；旅行則能增長見識和拓展人脈。  總之，任何能帶來成長、進步和快樂的活動，都值得投入時間和精力，並視為學習的一部分。  作者強調學習的廣泛性，並鼓勵將課業外的興趣愛好也納入學習的範圍。

Summary of segment 4: 這段文字的核心論點是：人際互動與參與活動是學習的重要途徑。作者以談戀愛和交友為例，說明這些經歷能讓人體驗人際關係中的各種感受、期待與互動，從而提升個人能力。即使不談戀愛，交友也能達到同樣的學習效果，尤其在大學校園中，同學之間便提供了良好的互動機會。此外，參與社團活動，例如戲劇社的演出或幕後工作，也能帶來成長與進步。總之，作者鼓勵積極參與人際互動和各種活動，藉此學習與成長。

Summary of segment 5: 這段文字強調課外活動（如戲劇社、校內外活動）的重要性，即使這些活動沒有成績，也不會體現在成績單上。作者認為參與這些活動能帶來成長和進步，是寶貴的學習機會，培養重要的軟實力（soft 

Get a summary of the entire text.
Step2: After obtaining summaries for each smaller text piece separately, process these summaries to generate the final summary.

In [31]:
# @title Prompt Setting of Gemini Multi-Stage Summarization: Total { run: "auto" }
''' You can modify the summarization prompt and maximum number of tokens. '''
''' However, DO NOT modify the part of <text>.'''

# We set the maximum number of tokens to ensure that the final summary does not exceed 550 tokens.
# @markdown **max_tokens**: We set the maximum number of tokens to ensure that the final summary does not exceed 550 tokens.
max_tokens = 550 # @param {type:"integer"}

# @markdown ### Changing **summarization_prompt_template**
# @markdown You can modify the summarization prompt and maximum number of tokens. However, **DO NOT** modify the part of `<text>`.
summarization_prompt_template = "在 500 字以內寫出以下文字的簡潔摘要：<text>" # @param {type:"string"}

In [32]:
# Finally, we compile a final summary from the summaries of each section.

# Record the start time.
start = time.time()

# Run final summarization.
summarization_prompt = summarization_prompt_template.replace("<text>", collected_summarization)
final_summarization = summarization(summarization_prompt=summarization_prompt, model_name=model_name, temperature=temperature, top_p=top_p, max_tokens=max_tokens)

# Calculate the execution time and round it to 2 decimal places.
cost_time = round(time.time() - start, 2)

# Print the summary and its length.
print(f"----------------------------Final Summary----------------------------\n")
for text in textwrap.wrap(final_summarization, 80):
        print(f"{text}")
print(f"\nLength of final summary: {len(final_summarization)}")
print(f"Time taken to generate the final summary: {cost_time} sec.")

----------------------------Final Summary----------------------------

This text argues that true learning extends far beyond academic coursework and
grades.  It emphasizes the importance of **hands-on experience** (Segment 1),
highlighting that actively engaging with material, like completing a final
project, leads to deeper understanding than passive learning.  The author
stresses the value of **challenging coursework** (Segment 2), arguing that
difficult assignments, even if not directly impacting grades, cultivate crucial
thinking skills.  Learning is broadly defined (Segment 3) to include any
activity promoting growth and happiness, encompassing hobbies and social
interactions.  **Interpersonal skills and active participation** (Segments 4, 5,
6) are crucial, with extracurricular activities building soft skills vital for
success, especially in engineering.  These soft skills, alongside hard skills,
contribute to long-term career success (Segment 7), which depends on a
combination 

In [33]:
''' In this block, you can modify your desired output path of final summary. '''

output_path = f"./final-summary-{suffix}-gemini-multi-stage.txt"

# If you need to convert Simplified Chinese to Traditional Chinese, please set this option to True; otherwise, set it to False.
convert_to_tradition_chinese = False

if convert_to_tradition_chinese == True:
    # Creating an instance of OpenCC for Simplified to Traditional Chinese conversion.
    cc = OpenCC('s2t')
    final_summarization = cc.convert(final_summarization)

# Output your final summary
with open(output_path, "w") as fp:
    fp.write(final_summarization)

print(f"Final summary has been saved to {output_path}")

Final summary has been saved to ./final-summary-信號與人生-gemini-multi-stage.txt
