The following code snippets for Youtube video transript summarizer are adapted from

https://github.com/royca/yt-gpt/tree/master

by replacing
- the original url with the url of the video to be summarized
- the OpenAI LLM model with the VertexAI model for summarization

In [19]:
import re
import os
import textwrap
# from langchain.llms import OpenAI
from langchain.llms import VertexAI
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './aiap-13-ds-7e16bb946970.json'


def youtube_video_url_is_valid(url: str) -> bool:
    pattern = r'^https:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]+)(\&ab_channel=[\w\d]+)?$'
    match = re.match(pattern, url)
    return match is not None


# def find_insights(api_key: str, url: str) -> str:
def find_insights(url: str) -> str:
    try:
        loader = YoutubeLoader.from_youtube_url(url, language="en-US")
        transcript = loader.load()
    except Exception as e:
        return f"Error while loading YouTube video and transcript: {e}"
    try:
        # llm = OpenAI(temperature=0.6, openai_api_key=api_key)
        llm = VertexAI(temperature=0.3)
        prompt = PromptTemplate(
            template="""Summarize the youtube video whose transcript is provided within backticks \
            ```{text}```
            """, input_variables=["text"]
        )
        combine_prompt = PromptTemplate(
            template="""Combine all the youtube video transcripts  provided within backticks \
            ```{text}```
            Provide a summary between 50-100 sentences.
            """, input_variables=["text"]
        )
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=100000, chunk_overlap=50)
        text = text_splitter.split_documents(transcript)
        chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True,
                                     map_prompt=prompt, combine_prompt=combine_prompt)
        answer = chain.run(text)
    except Exception as e:
        return f"Error while processing and summarizing text: {e}"

    return answer.strip()


# youtube_video_url = "https://www.youtube.com/watch?v=-hxeDjAxvJ8"
# Version 3.8 Special Program｜Genshin Impact Youtube video
youtube_video_url = "https://www.youtube.com/watch?v=jlivBvu3Jrc"
if not youtube_video_url_is_valid(youtube_video_url):
    print("Please enter a valid youtube video URL.")

answer = find_insights(youtube_video_url)



[1m> Entering new  chain...[0m


[1m> Entering new  chain...[0m
Prompt after formatting:
[32;1m[1;3mSummarize the youtube video whose transcript is provided within backticks             ```Long, long ago, there was a place in the desert called Bottleland... Hmm? You wanted to learn more about Bottleland? I left that place long ago... But I still clearly remember its beautiful sights, even to this day Whenever travelers ask me how to travel to Bottleland I warn them that the desert is a very dangerous place Despite the risks... There will always be enthusiastic newcomers who venture into the desert... Haha! The treasures of Bottleland are as good as mine! I'm sorry, Mom... I shouldn't have played with magic like that... Now I can't find my way back home The desert entices people with hope... But it devours that hope from the moment they step foot within its scorching domain... But that magic bottle can bring forth an endless and refreshing spring Revitalizing the hopes that had 

In [20]:
# Apply word wrapping and print the wrapped text
width = 80
wrapped_text = textwrap.fill(answer, width=width)
print(wrapped_text)
# print(answer)
print("---------------------------------")
print("\n")

Genshin Impact Version 3.8 Special Program  The Special Program begins with a
short introduction to the new version, which will feature a new limited-time map
called the Veluriyam Mirage. The map is located in a desert and is said to be
home to many secrets.  The program then introduces the new characters and events
that will be coming in Version 3.8. Klee and Eula will be returning with their
own event wishes, and Kaeya will be getting his own Hangout Event. The main
event for the version will take place in the Veluriyam Mirage and will feature a
variety of new activities
---------------------------------




In [36]:
# Save the summary to a text file
with open(f'summary_english.txt', 'w', encoding='utf-8') as f:
    f.write(wrapped_text)

To convert text into speech, we can consider using Text-to-Speech (TTS) services. Many companies offer these services, including Google (Google Text to Speech), Amazon (Amazon Polly), and Microsoft (Microsoft Azure Cognitive Services Text to Speech).

The following illustrates how we can convert text into speech using Google Text to Speech in Python.

First, we need to install the gTTS (Google Text-to-Speech) library. We can install it using pip:

In [None]:
!pip install gTTS

In [21]:
wrapped_text

'Genshin Impact Version 3.8 Special Program  The Special Program begins with a\nshort introduction to the new version, which will feature a new limited-time map\ncalled the Veluriyam Mirage. The map is located in a desert and is said to be\nhome to many secrets.  The program then introduces the new characters and events\nthat will be coming in Version 3.8. Klee and Eula will be returning with their\nown event wishes, and Kaeya will be getting his own Hangout Event. The main\nevent for the version will take place in the Veluriyam Mirage and will feature a\nvariety of new activities'

In [24]:
from gtts import gTTS
import os

# the text that you want to convert to audio
# text = "your text here"
text = wrapped_text
# language in which you want to convert
language = 'en'

# Passing the text and language to the engine,
# here we have marked slow=False. Which tells
# the module that the converted audio should
# have a high speed
myobj = gTTS(text=text, lang=language, slow=False)

# Saving the converted audio in a mp3 file named
# output
myobj.save("output.mp3")

# Playing the converted file using VLC (assumed installed)
# Use 'cvlc' to use vlc without interface.
os.system("cvlc output.mp3")

[000055f9092ada00] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface.


0

In [27]:
from gtts import gTTS
import os

# the text that you want to convert to audio
# text = "your text here"
text = wrapped_text
# language in which you want to convert
language = 'zh-CN'  # 'zh-CN' for Mandarin Chinese

# Passing the text and language to the engine,
# here we have marked slow=False. Which tells
# the module that the converted audio should
# have a high speed
myobj = gTTS(text=text, lang=language, slow=False)

# Saving the converted audio in a mp3 file named
# output
myobj.save("output-zh-CN.mp3")

# Playing the converted file using VLC (assumed installed)
# Use 'cvlc' to use vlc without interface.
os.system("cvlc output-zh-CN.mp3")

[0000555fe759ba00] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface.


0

https://www.thepythoncode.com/article/translate-text-in-python


https://pypi.org/project/googletrans-py/

Convert English text to multilingual, currently trying to resolve technical issues with the GoogleTrans library


In [None]:
from googletrans import Translator, constants
from pprint import pprint
# init the Google API translator
translator = Translator()

# translate a English text to Chinese text
translation = translator.translate(wrapped_text, dest="zh-CN")
print(f"{translation.origin} ({translation.src}) --> {translation.text} ({translation.dest})")

In [None]:
from googletrans import Translator

translator = Translator()
text_to_translate = "Hello, world!"
languages = ['ms', 'ta', 'zh-CN']  # Malay, Tamil, and Mandarin

for language in languages:
    translation = translator.translate(text_to_translate, dest=language)

    # Save the translation to a text file
    with open(f'translation_{language}.txt', 'w', encoding='utf-8') as f:
        f.write(translation.text)

    print(f"In {language}, '{text_to_translate}' translates to: '{translation.text}' and is saved in 'translation_{language}.txt'")

Does not seem to work yet with the AISG Google Cloud account, to investigate further

In [None]:
!pip install --upgrade google-cloud-texttospeech

In [None]:
from google.cloud import texttospeech

# Create a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.SynthesisInput(text=wrapped_text)

# Build the voice request, select the language code ("zh-CN" for Chinese)
# and the WaveNet model ("en-US-Wavenet-F" for US English)
voice = texttospeech.VoiceSelectionParams(
    language_code="zh-CN", ssml_gender=texttospeech.SsmlVoiceGender.FEMALE
)

# Select the type of audio file you want
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
    input=synthesis_input, voice=voice, audio_config=audio_config
)

# The response's audio_content is binary.
with open("output_zh-CN.mp3", "wb") as out:
    out.write(response.audio_content)
    print('Audio content written to file "output_zh-CN.mp3"')

Alternatively,we could use a Python package like pygame to play the audio file directly from our script, like so:

In [None]:
!pip install pygame

In [25]:
from pygame import mixer

# Starting the mixer
mixer.init()

# Loading the song
mixer.music.load("output.mp3")

# Setting the volume
mixer.music.set_volume(0.7)

# Start playing the song
mixer.music.play()

# infinite loop
while True:

    print("Press 'p' to pause, 'r' to resume")
    print("Press 'e' to exit the program")
    query = input(" ")

    if query == 'p':

        # Pausing the music
        mixer.music.pause()
    elif query == 'r':

        # Resuming the music
        mixer.music.unpause()
    elif query == 'e':

        # Stop the mixer
        mixer.music.stop()
        break

pygame 2.5.0 (SDL 2.28.0, Python 3.10.11)
Hello from the pygame community. https://www.pygame.org/contribute.html
Press 'p' to pause, 'r' to resume
Press 'e' to exit the program


The following code snippets are adapted from 

https://github.com/JorisdeJong123/7-Days-of-LangChain/blob/main/day_5/podcast.py

by replacing 
- the podcast url with a game release YouTube video url.
- the Langchain OpenAI LLM chatmodel with the VertexAI LLM chatmodel.


In [3]:
# GAME RELEASE VIDEO Q&A BOT
import sys
import textwrap

from langchain.text_splitter import TokenTextSplitter
from langchain.chat_models import ChatVertexAI
from langchain.document_loaders import YoutubeLoader
from langchain.embeddings import VertexAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA

In [7]:
# # Version 3.8 Special Program｜Genshin Impact Youtube video
loader = YoutubeLoader.from_youtube_url(
    'https://www.youtube.com/watch?v=jlivBvu3Jrc',
    add_video_info=True,
    language=["en-US"])

data = loader.load()
data

[Document(page_content='Long, long ago, there was a place in the desert called Bottleland... Hmm? You wanted to learn more about Bottleland? I left that place long ago... But I still clearly remember its beautiful sights, even to this day Whenever travelers ask me how to travel to Bottleland I warn them that the desert is a very dangerous place Despite the risks... There will always be enthusiastic newcomers who venture into the desert... Haha! The treasures of Bottleland are as good as mine! I\'m sorry, Mom... I shouldn\'t have played with magic like that... Now I can\'t find my way back home The desert entices people with hope... But it devours that hope from the moment they step foot within its scorching domain... But that magic bottle can bring forth an endless and refreshing spring Revitalizing the hopes that had shriveled in the arid desert My dear, weary traveler... Why do you wish to enter the desert? Are you so determined to find Bottleland? Alright... I\'ll help you then She\

In [8]:
text_splitter_summary = TokenTextSplitter(
    chunk_size=10000, chunk_overlap=250)
# Initialize text splitter for QA (Smaller chunks for better QA)
text_splitter_qa = TokenTextSplitter(chunk_size=1000, chunk_overlap=200)
# Split text into docs for QA
docs_qa = text_splitter_qa.split_documents(data)

In [12]:
# Create the LLM model for the question answering
llm_question_answer = ChatVertexAI(temperature=0.2)
# Create the vector database and RetrievalQA Chain
embeddings = VertexAIEmbeddings()
# embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
db = FAISS.from_documents(docs_qa, embeddings)
qa = RetrievalQA.from_chain_type(
    llm=llm_question_answer, chain_type="stuff", retriever=db.as_retriever())

In [13]:
# Set the desired width limit for wrapping
width = 80

try:
    # Your code here

    question = ""
    history = []
    # Run the QA chain continuously & end when the user types "exit" or "quit"
    while question.lower() not in ["exit", "quit"]:
        # Get the user question
        question = input("Ask a question or enter exit to close the app: ")
        # Run the QA chain to query the Youtube video transcript
        # answer = qa.run(question, callbacks=[cb])
        answer = qa.run(question)
        history.append(answer)
        # Apply word wrapping and print the wrapped text
        wrapped_text = textwrap.fill(answer, width=width)
        print(wrapped_text)
        # print(answer)
        print("---------------------------------")
        print("\n")

except KeyboardInterrupt:
    sys.exit()

The video is about the new events and updates coming to Genshin Impact in
version 3.8. The events include Shared Sight, Perilous Expedition, Adventurer's
Trials: Advanced, and Bing-Bang Finchball. The updates include new character
cards and a new game mode for Genius Invokation TCG, as well as new outfits for
Kaeya and Klee. The video also features a new trailer for the upcoming HoYo FEST
2023 and a preview of the third Commemorative OST album "The Shimmering Voyage
Vol. 3".
---------------------------------


The main points of the video are: - Version 3.8's main event will take place
inside a bottle located somewhere in the desert. - Travelers will be able to
collect Joyeux Vouchers to obtain rewards, including Kaeya's new outfit. - There
are two new attractions in Bottleland: Preprints and Choo-Choo Cart. - Preprints
are two-dimensional objects that can be used to construct items. - Choo-Choo
Cart is a new way to get around the Veluriyam Mirage. - There are also several
new mini-gam

In [14]:
history

['The video is about the new events and updates coming to Genshin Impact in version 3.8. The events include Shared Sight, Perilous Expedition, Adventurer\'s Trials: Advanced, and Bing-Bang Finchball. The updates include new character cards and a new game mode for Genius Invokation TCG, as well as new outfits for Kaeya and Klee. The video also features a new trailer for the upcoming HoYo FEST 2023 and a preview of the third Commemorative OST album "The Shimmering Voyage Vol. 3".',
 "The main points of the video are:\n- Version 3.8's main event will take place inside a bottle located somewhere in the desert.\n- Travelers will be able to collect Joyeux Vouchers to obtain rewards, including Kaeya's new outfit.\n- There are two new attractions in Bottleland: Preprints and Choo-Choo Cart.\n- Preprints are two-dimensional objects that can be used to construct items.\n- Choo-Choo Cart is a new way to get around the Veluriyam Mirage.\n- There are also several new mini-games in Version 3.",
 'Go

In [38]:
# Save the summary to a text file
with open('QA_conversation_history.txt', 'w', encoding='utf-8') as f:
    f.write('\n'.join(history))