![image.png](../background_photos/py_18_hafo.jpg)
Ծառավ հաֆո, [լուսանկարի հղումը](https://unsplash.com/photos/black-labrador-retriever-puppy-biting-purple-and-white-ball-tj0XGdGWUmE), Հեղինակ՝ [Rafael Ishkhanyan](https://unsplash.com/@rafael_ishkhanyan)

<a href="ToDo" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> (ToDo)

# Plan

Use all 4 pillars of OOP and data classes
1. Extract metadata for given video from youtube
2. Extract transcript 
3. Translate the transcript  


First of all, you may want to create a venv

- `conda create -n youtube`
- `conda activate youtube`


## YouTube video metadata (title, description, etc.)

In [None]:
!pip install pytubefix 

ERROR: Invalid requirement: '#': Expected package name at the start of dependency specifier
    #
    ^


We use [pytube**fix**](https://github.com/JuanBindez/pytubefix), because the original pytube is not working anymore.

In [77]:
from typing import List
from datetime import datetime
from pytubefix import YouTube
from dataclasses import dataclass, field, asdict

In [79]:
sample_video = "https://www.youtube.com/watch?v=tERRFWuYG48"

In [5]:
yt = YouTube(sample_video)

In [6]:
yt.__dir__()[15:20]

['fallback_clients',
 '_signature_timestamp',
 '_visitor_data',
 'stream_monostate',
 '_author']

In [7]:
yt.video_id, yt.title, yt.keywords, yt.views, yt.length

('tERRFWuYG48',
 'Barfuß Am Klavier - AnnenMayKantereit',
 ['AnnenMayKantereit',
  'Barfuß Am Klavier',
  'oft gefragt',
  'henning may',
  'klavier'],
 76842615,
 201)

In [26]:
@dataclass()
class VideoInfo:
    video_id: str
    title: str
    keywords: List[str]
    publish_date: str  
    length_seconds: int


In [27]:
VideoInfo(video_id=1, title="Sample Video", keywords=["sample", "video"], publish_date="2023-10-01", length_seconds=300)

VideoInfo(video_id=1, title='Sample Video', keywords=['sample', 'video'], publish_date='2023-10-01', length_seconds=300)

Does not throw an error although the video_id is not an int. 
We'll use [pydantic](https://docs.pydantic.dev/latest/) to validate the data in future


In [40]:
class YouTubeVideo:
    def __init__(self, url: str):
        self.video = YouTube(url)

    def get_metadata(self) -> VideoInfo:
        return VideoInfo(
            video_id=self.video.video_id,
            title=self.video.title, 
            keywords=self.video.keywords,
            publish_date=self.video.publish_date,
            length_seconds=self.video.length
        )
    

In [41]:
video = YouTubeVideo(sample_video)

In [42]:
vid_info = video.get_metadata()

In [43]:
asdict(vid_info)

{'video_id': 'tERRFWuYG48',
 'title': 'Barfuß Am Klavier - AnnenMayKantereit',
 'keywords': ['AnnenMayKantereit',
  'Barfuß Am Klavier',
  'oft gefragt',
  'henning may',
  'klavier'],
 'publish_date': datetime.datetime(2014, 10, 27, 4, 6, 51, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=61200))),
 'length_seconds': 201}

In [93]:
@dataclass()
class VideoInfo:
    video_id: str
    title: str
    keywords: List[str]
    publish_date: str  
    length_seconds: int
    
    @staticmethod
    def get_days_since_publish(publish_date) -> int:
        if isinstance(publish_date, datetime):
            publish_date = publish_date.strftime("%Y-%m-%d")
        publish_date = datetime.strptime(publish_date, "%Y-%m-%d")
        current_date = datetime.now()
        return (current_date - publish_date).days
        
        
    def __post_init__(self):
        if not isinstance(self.video_id, str):
            raise ValueError("video_id must be a string")
        if not isinstance(self.length_seconds, int):
            raise ValueError("length_seconds must be an integer")
        
        # set days_since_publish attribute
        self.days_since_publish = self.get_days_since_publish(self.publish_date)

In [59]:
vid_info = video.get_metadata()

In [61]:
vid_info.days_since_publish

3894

# YouTube transcript

In [62]:
yt = YouTube(sample_video)

In [67]:
yt.captions

{'de': <Caption lang="German" code="de">}

In [68]:
captions = yt.captions["de"]

captions.generate_txt_captions()

'Ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren wunderlich. Nicht für mich. Für die, die es störte, wenn man uns nachts hörte. Ich hab mit dir gemeinsam einsam rumgesessen und geschwiegen. Ich erinnere mich am Besten ans gemeinsam einsam Liegen. Jeden Morgen danach bei dir; du nackt im Bett – und ich barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, das ging so nicht. Du wolltest alles wissen und das hat mich vertrieben. Eigentlich dich, du bist nicht länger geblieben; bei mir. Also sitz ich, um zu lieben, lieber barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren zu wenig. Ich sitz schon wieder barfuß am Klavier. Und träum dabei von dir. Ich träum dabei von dir.'

In [None]:
captions.download(title="song", output_path="captions.txt")

'c:\\Users\\hayk_\\OneDrive\\Desktop\\python_math_ml_course\\python\\captions.txt\\song (de).srt'

## Class

In [76]:
class YouTubeVideo:
    def __init__(self, url: str):
        self.video = YouTube(url)

    def get_metadata(self) -> VideoInfo:
        return VideoInfo(
            video_id=self.video.video_id,
            title=self.video.title, 
            keywords=self.video.keywords,
            publish_date=self.video.publish_date,
            length_seconds=self.video.length
        )
    
    def get_transcript(self, language: str = "de") -> str:
        captions = self.video.captions.get(language)
        if not captions:
            raise ValueError(f"No captions available for language: {language}")
        self.text = captions.generate_txt_captions()
        return self.text
    
    def download_transcript(self, language: str = "de", title: str = "transcript", output_path: str = "transcript.txt") -> None:
        captions = self.video.captions.get(language)
        if not captions:
            raise ValueError(f"No captions available for language: {language}")
        captions.download(title=title, output_path=output_path)
        

In [77]:
video = YouTubeVideo(sample_video)

transcript = video.get_transcript(language="de")

In [79]:
transcript

'Ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren wunderlich. Nicht für mich. Für die, die es störte, wenn man uns nachts hörte. Ich hab mit dir gemeinsam einsam rumgesessen und geschwiegen. Ich erinnere mich am Besten ans gemeinsam einsam Liegen. Jeden Morgen danach bei dir; du nackt im Bett – und ich barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, das ging so nicht. Du wolltest alles wissen und das hat mich vertrieben. Eigentlich dich, du bist nicht länger geblieben; bei mir. Also sitz ich, um zu lieben, lieber barfuß am Klavier. Und ich sitz schon wieder barfuß am Klavier. Ich träume Liebeslieder und sing dabei von dir. Du und ich, wir waren zu wenig. Ich sitz schon wieder barfuß am Klavier. Und träum dabei von dir. Ich träum dabei von dir.'

# Audio, video

In [80]:
yt = YouTube(sample_video)


In [87]:

yt.streams.get_audio_only()

<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" sabr="False" type="audio">

In [None]:
output_path = f"{yt.video_id}.mp3"

yt.streams.get_audio_only().download(filename=output_path)

'c:\\Users\\hayk_\\OneDrive\\Desktop\\python_math_ml_course\\python\\tERRFWuYG48.mp3'

In [91]:
output_path = f"{yt.video_id}.mp4"

yt.streams.get_lowest_resolution().download(filename=output_path)

'c:\\Users\\hayk_\\OneDrive\\Desktop\\python_math_ml_course\\python\\tERRFWuYG48.mp4'

In [100]:
class YouTubeVideo:
    def __init__(self, url: str):
        self.video = YouTube(url)

    def get_metadata(self) -> VideoInfo:
        return VideoInfo(
            video_id=self.video.video_id,
            title=self.video.title, 
            keywords=self.video.keywords,
            publish_date=self.video.publish_date,
            length_seconds=self.video.length
        )
    
    def get_transcript(self, language: str = "de") -> str:
        captions = self.video.captions.get(language)
        if not captions:
            raise ValueError(f"No captions available for language: {language}")
        self.text = captions.generate_txt_captions()
        return self.text
    
    def download_transcript(self, language: str = "de", title: str = "transcript", output_path: str = "transcript.txt") -> None:
        captions = self.video.captions.get(language)
        if not captions:
            raise ValueError(f"No captions available for language: {language}")
        captions.download(title=title, output_path=output_path)
    
    def download_audio(self, output_path=None) -> None:
        if output_path is None:
            output_path = self.video.video_id
        
        audio_stream = self.video.streams.get_audio_only()
        if not audio_stream:
            raise ValueError("No audio stream available")
        audio_stream.download(output_path=output_path)
        
    def download_video(self, output_path=None) -> None:
        if output_path is None:
            output_path = self.video.video_id
        
        video_stream = self.video.streams.get_lowest_resolution()
        if not video_stream:
            raise ValueError("No video stream available")
        video_stream.download(output_path=output_path)

In [101]:
yt = YouTubeVideo(sample_video)

In [102]:
yt.download_audio()

In [103]:
yt.download_video()

# Translate

We're gonna use [googletrans](https://github.com/ssut/py-googletrans/tree/main) and [DeepL](https://developers.deepl.com/). Maybe more stuff later.

## Google Translate

In [None]:
!pip install googletrans 

In [55]:
import googletrans
from googletrans import Translator

# https://github.com/ssut/py-googletrans/tree/main
# https://stackoverflow.com/questions/55409641/asyncio-run-cannot-be-called-from-a-running-event-loop-when-using-jupyter-no

async def translate_text(text):
    async with Translator() as translator:
        result = await translator.translate(text, dest='hy')
        print(result)  # Translated(src=en, dest=hy, text=պանիր, pronunciation=panir, extra_data="{'translat...")

await translate_text(text="cheese")

Translated(src=en, dest=hy, text=պանիր, pronunciation=panir, extra_data="{'translat...")


## DeepL

- `pip install deepl` 
- [Docs](https://developers.deepl.com/docs/getting-started/your-first-api-request)
- [API keys](https://www.deepl.com/en/your-account/keys)

In [None]:
SETX DEEPL_API_KEY some_string


SUCCESS: Specified value was saved.


In [1]:
echo %DEEPL_API_KEY%

%DEEPL_API_KEY%


In [2]:
import os

In [None]:
os.environ

In [9]:
os.environ['DEEPL_API_KEY'] = "38d0e2dc-e2ef-41f5-98e8-e22e4deb236b:fx"

In [14]:
import deepl

auth_key = os.getenv('DEEPL_API_KEY')
assert auth_key is not None, "Please set the DEEPL_API_KEY environment variable."

deepl_client = deepl.Translator(auth_key)



In [None]:
print("Source languages:")
for language in deepl_client.get_source_languages():
    print(f"{language.name} ({language.code})")  # Example: "German (DE)"

print("Target languages:")
for language in deepl_client.get_target_languages():
    if language.supports_formality:
        print
        
        (f"{language.name} ({language.code}) supports formality")
        # Example: "Italian (IT) supports formality"
    else:
        print(f"{language.name} ({language.code})")
        # Example: "Lithuanian (LT)"

Source languages:
arabic (AR)
Bulgarian (BG)
Czech (CS)
Danish (DA)
German (DE)
Greek (EL)
English (EN)
Spanish (ES)
Estonian (ET)
Finnish (FI)
French (FR)
Hungarian (HU)
Indonesian (ID)
Italian (IT)
Japanese (JA)
Korean (KO)
Lithuanian (LT)
Latvian (LV)
Norwegian (NB)
Dutch (NL)
Polish (PL)
Portuguese (PT)
Romanian (RO)
Russian (RU)
Slovak (SK)
Slovenian (SL)
Swedish (SV)
Turkish (TR)
Ukrainian (UK)
Chinese (ZH)
Target languages:
Arabic (AR)
Bulgarian (BG)
Czech (CS)
Danish (DA)
German (DE) supports formality
Greek (EL)
English (British) (EN-GB)
English (American) (EN-US)
Spanish (ES) supports formality
Estonian (ET)
Finnish (FI)
French (FR) supports formality
Hungarian (HU)
Indonesian (ID)
Italian (IT) supports formality
Japanese (JA) supports formality
Korean (KO)
Lithuanian (LT)
Latvian (LV)
Norwegian (NB)
Dutch (NL) supports formality
Polish (PL) supports formality
Portuguese (Brazilian) (PT-BR) supports formality
Portuguese (European) (PT-PT) supports formality
Romanian (RO)
Russ

վայ, հայերեն չկար :)

In [None]:
result = deepl_client.translate_text("Ich liebe dich", 
                                     target_lang="EN-US")
print(result.text) 

I love you


## Class

In [22]:
from abc import ABC, abstractmethod
from typing import List

In [23]:
class BaseTranslator(ABC):
    @abstractmethod
    def translate(self, text: str, target_language: str) -> str:
        pass
    
    @abstractmethod
    def detect_language(self, text: str) -> str:
        pass
    
    @abstractmethod
    def get_supported_languages(self) -> List[str]:
        pass

### Google Translate

In [154]:
class GoogleTranslator(BaseTranslator):
    """Google Translate API implementation of BaseTranslator."""
    @staticmethod
    async def translate(text, target_lang):
        async with Translator() as translator:
            result = await translator.translate(text, dest=target_lang)
            return result.text        

    def detect_language(self, text: str) -> str:
        pass 
    
    def get_supported_languages(self) -> List[str]:
        return googletrans.LANGUAGES 


In [155]:
gt = GoogleTranslator()


In [156]:
# gt.get_supported_languages()

In [157]:
gt = GoogleTranslator()

# gt.translate(text="Ich liebe dich", target="EN-US")
await gt.translate(text="Ich liebe dich", target_lang="en")

'I love you'

### DeepL

In [158]:
class DeepLTranslator(BaseTranslator):
    def __init__(self, api_key) -> None:
        try:
            deepl_client = deepl.DeepLClient(api_key)
        except:
            raise ValueError("Invalid DeepL API key provided.")
        self._client = deepl_client

        
    def translate(self, text: str, target_lang: str) -> str:
        result = self._client.translate_text(text, target_lang=target_lang)
        return result.text  
    
    def detect_language(self, text: str) -> str:
        pass 
    
    def get_supported_languages(self) -> List[str]:
        pass

In [159]:
dl = DeepLTranslator(api_key=os.getenv('DEEPL_API_KEY'))

In [160]:
dl.translate(text="Ich liebe dich", target_lang="EN-US")

'I love you'

# Putting everything together

In [184]:
import json


class Pipeline(YouTubeVideo):
    def __init__(self, url: str, deepl_api_key: str):
        YouTubeVideo.__init__(self, url)
        self.dl_translator = DeepLTranslator(api_key=deepl_api_key)
        self.google_translator = GoogleTranslator()
        
        self.result = None
    
    async def get_translated_transcript(self) -> str:
        self.transcript = self.get_transcript()
        
        if not self.transcript:
            raise ValueError("No transcript available for this video.")
        
        text_google = await self.google_translator.translate(text=self.transcript, target_lang="en")
        text_deepl = self.dl_translator.translate(text=self.transcript, target_lang="EN-US")

        
        self.result = {
            "original": self.transcript,
            "google": text_google,
            "deepl": text_deepl
        }
        
        return self.result
    
    def save_transcript_json(self, output_path: str = "transcript.json") -> None:
        if self.result is None:
            self.get_translated_transcript()
        
        with open(output_path, 'w', encoding='utf-8') as f:
            json.dump(self.result, f, ensure_ascii=False, indent=4)

In [185]:
p = Pipeline(sample_video, deepl_api_key=os.getenv('DEEPL_API_KEY'))

In [186]:
res = await p.get_translated_transcript()

In [180]:
for k, v in res.items():
    print(f"{k}: {type(v)}")

original: <class 'str'>
google: <class 'str'>
deepl: <class 'str'>


In [187]:
p.save_transcript_json()

# 🎲 18
- ▶️[Video]()
- ▶️[Video 🔥]()
- 🇦🇲🎶[]()
- 🌐🎶[]()
- 🤌[Կարգին]()


<a href="http://s01.flagcounter.com/more/1oO"><img src="https://s01.flagcounter.com/count2/1oO/bg_FFFFFF/txt_000000/border_CCCCCC/columns_2/maxflags_10/viewers_0/labels_0/pageviews_1/flags_0/percent_0/" alt="Flag Counter"></a>
