# Video Object Detection and Translation

In this notebook, we will use Google Cloud APIs to perform object detection and translation on a video file. The process involves:

1. **Object Detection**: Using Google Cloud's Video Intelligence API to detect objects within the video.
2. **Object Translation**: Using Google Cloud's Translation API to translate the detected objects into a target language.
3. **Output**: Writing the detected and translated objects along with timestamps to a text file.

Note: it take very long time to translate video visual text content into text.
Note: it is very good at recognizing cat.

In [None]:
from google.cloud import videointelligence_v1 as videointelligence
from google.cloud import translate_v2 as translate
import os
import time
from pathlib import Path

src_path = "TestingSamples/cats.mp4"
dst_path = "TestingOutputs/google_video_to_objects.txt"
log_path = "TestingLogs/google_video_to_objects_testing_results.txt"

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'kaleidoo-435715-96fdd3ef71f6.json'

def detect_objects(video_path):
    client = videointelligence.VideoIntelligenceServiceClient()

    with open(video_path, "rb") as video_file:
        input_content = video_file.read()

    features = [videointelligence.Feature.OBJECT_TRACKING]

    operation = client.annotate_video(
        request={"features": features, "input_content": input_content}
    )

    print("Processing video for object detection...")
    result = operation.result(timeout=300)

    object_annotations = []
    for annotation_result in result.annotation_results:
        for object_annotation in annotation_result.object_annotations:
            start_time = object_annotation.segment.start_time_offset.total_seconds()
            end_time = object_annotation.segment.end_time_offset.total_seconds()
            object_annotations.append({
                "start_time": start_time,
                "end_time": end_time,
                "entity": object_annotation.entity.description,
                "confidence": object_annotation.confidence
            })

    return object_annotations

def translate_text(text, target_language='he'):
    client = translate.Client()

    translation = client.translate(text, target_language=target_language)
    return translation['translatedText']

def main():
    start_time = time.time()

    video_path = src_path
    object_annotations = detect_objects(video_path)

    with open(dst_path, 'w', encoding='utf-8') as file:
        for annotation in object_annotations:
            file.write(f"Detected object: {annotation['entity']} (from {annotation['start_time']}s to {annotation['end_time']}s, confidence: {annotation['confidence']})\n")
            translated_text = translate_text(annotation['entity'])
            file.write(f"Translated text: {translated_text}\n")

    end_time = time.time()

    print("Done processing.")
    print(f"Time to process = {end_time-start_time}")

    with open(log_path, 'a', encoding='utf-8') as file:
        file.write(f"Time to process - {Path(src_path).name}: {end_time-start_time}\n")

if __name__ == "__main__":
    main()


Time to process = 317.94519686698914

Detected object: cat (from 0.0s to 3.2s, confidence: 0.8528473973274231)
Translated text: חָתוּל
Detected object: dog (from 0.0s to 1.6s, confidence: 0.735107958316803)
Translated text: כֶּלֶב
Detected object: mouse (from 0.9s to 2.1s, confidence: 0.5991301536560059)
Translated text: עַכְבָּר
Detected object: dog (from 1.3s to 3.8s, confidence: 0.6973498463630676)
Translated text: כֶּלֶב
Detected object: shoe (from 3.4s to 3.7s, confidence: 0.5302704572677612)
Translated text: נַעַל
Detected object: sunglasses (from 3.6s to 3.6s, confidence: 0.6832783818244934)
Translated text: מִשְׁקפֵי שֶׁמֶשׁ
Detected object: shoe (from 4.1s to 4.2s, confidence: 0.7420737743377686)
Translated text: נַעַל
Detected object: top (from 4.4s to 6.0s, confidence: 0.5669806599617004)
Translated text: רֹאשׁ
Detected object: animal (from 5.6s to 6.0s, confidence: 0.7249440550804138)
Translated text: חַיָה
Detected object: cat (from 6.1s to 8.3s, confidence: 0.854455828666687)
Translated text: חָתוּל
Detected object: dog (from 6.1s to 12.9s, confidence: 0.7776948809623718)
Translated text: כֶּלֶב
Detected object: dog (from 6.1s to 7.0s, confidence: 0.753810465335846)
Translated text: כֶּלֶב
Detected object: shoe (from 6.9s to 6.9s, confidence: 0.5160681009292603)
Translated text: נַעַל
Detected object: person (from 7.6s to 7.9s, confidence: 0.7607039213180542)
Translated text: אָדָם
Detected object: packaged goods (from 7.8s to 9.8s, confidence: 0.6232653856277466)
Translated text: סחורה ארוזה
Detected object: box (from 8.1s to 8.3s, confidence: 0.6340911984443665)
Translated text: קוּפסָה
Detected object: sunglasses (from 8.4s to 9.2s, confidence: 0.6289539337158203)
Translated text: מִשְׁקפֵי שֶׁמֶשׁ
Detected object: shoe (from 9.0s to 9.2s, confidence: 0.6788097620010376)
Translated text: נַעַל
Detected object: shoe (from 9.0s to 9.8s, confidence: 0.6556146740913391)
Translated text: נַעַל
Detected object: shoe (from 9.2s to 9.5s, confidence: 0.6489077806472778)
Translated text: נַעַל
Detected object: shoe (from 9.3s to 9.3s, confidence: 0.5396280884742737)
Translated text: נַעַל
Detected object: box (from 9.6s to 14.3s, confidence: 0.7753773927688599)
Translated text: קוּפסָה
Detected object: sunglasses (from 9.8s to 9.9s, confidence: 0.6432665586471558)
Translated text: מִשְׁקפֵי שֶׁמֶשׁ
Detected object: baby (from 10.3s to 14.1s, confidence: 0.8112463355064392)
Translated text: תִינוֹק
Detected object: cat (from 10.4s to 12.1s, confidence: 0.8114438056945801)
Translated text: חָתוּל
Detected object: animal (from 10.6s to 13.3s, confidence: 0.7944740056991577)
Translated text: חַיָה
Detected object: food (from 12.0s to 12.2s, confidence: 0.5528773069381714)
Translated text: מָזוֹן
Detected object: box (from 12.0s to 12.0s, confidence: 0.5712153911590576)
Translated text: קוּפסָה
Detected object: packaged goods (from 12.2s to 12.3s, confidence: 0.7187315225601196)
Translated text: סחורה ארוזה
Detected object: person (from 12.5s to 13.1s, confidence: 0.840459942817688)
Translated text: אָדָם
Detected object: animal (from 13.6s to 14.7s, confidence: 0.7058385610580444)
Translated text: חַיָה
Detected object: person (from 14.0s to 15.4s, confidence: 0.8561058640480042)
Translated text: אָדָם
Detected object: person (from 14.0s to 14.3s, confidence: 0.7866703867912292)
Translated text: אָדָם
Detected object: hamster (from 14.2s to 15.1s, confidence: 0.7650876045227051)
Translated text: אוֹגֵר
Detected object: packaged goods (from 14.8s to 15.4s, confidence: 0.7266969680786133)
Translated text: סחורה ארוזה
Detected object: shelf (from 15.5s to 17.9s, confidence: 0.6529454588890076)
Translated text: מַדָף
Detected object: person (from 15.6s to 15.9s, confidence: 0.7610477805137634)
Translated text: אָדָם
Detected object: cat (from 15.6s to 17.9s, confidence: 0.6085233092308044)
Translated text: חָתוּל
Detected object: toy (from 15.7s to 17.9s, confidence: 0.5295617580413818)
Translated text: צַעֲצוּעַ
Detected object: shelf (from 18.0s to 19.8s, confidence: 0.69512939453125)
Translated text: מַדָף
Detected object: animal (from 18.1s to 18.3s, confidence: 0.7433465123176575)
Translated text: חַיָה
Detected object: cat (from 18.1s to 19.6s, confidence: 0.7545060515403748)
Translated text: חָתוּל
Detected object: toy (from 18.2s to 19.5s, confidence: 0.5365860462188721)
Translated text: צַעֲצוּעַ
Detected object: furniture (from 18.7s to 18.7s, confidence: 0.5124227404594421)
Translated text: רְהִיטִים
Detected object: person (from 19.1s to 19.9s, confidence: 0.8919133543968201)
Translated text: אָדָם
Detected object: furniture (from 19.3s to 19.3s, confidence: 0.5593839287757874)
Translated text: רְהִיטִים
Detected object: animal (from 19.6s to 20.1s, confidence: 0.6958495378494263)
Translated text: חַיָה
Detected object: shelf (from 19.8s to 22.1s, confidence: 0.6843827366828918)
Translated text: מַדָף
Detected object: animal (from 20.2s to 20.4s, confidence: 0.7746719121932983)
Translated text: חַיָה
Detected object: toy (from 20.5s to 20.5s, confidence: 0.5127670168876648)
Translated text: צַעֲצוּעַ
Detected object: furniture (from 20.6s to 20.8s, confidence: 0.6055014133453369)
Translated text: רְהִיטִים
Detected object: furniture (from 21.4s to 21.4s, confidence: 0.5281039476394653)
Translated text: רְהִיטִים
Detected object: person (from 21.6s to 21.6s, confidence: 0.8224933743476868)
Translated text: אָדָם
Detected object: animal (from 21.7s to 22.0s, confidence: 0.7426542043685913)
Translated text: חַיָה
Detected object: cat (from 22.2s to 26.5s, confidence: 0.8722850680351257)
Translated text: חָתוּל
Detected object: shelf (from 22.2s to 22.3s, confidence: 0.6582334041595459)
Translated text: מַדָף
Detected object: table top (from 22.4s to 22.6s, confidence: 0.5966082215309143)
Translated text: פלטת שולחן
Detected object: tableware (from 22.9s to 25.9s, confidence: 0.6397745013237)
Translated text: כלי שולחן
Detected object: table top (from 24.4s to 26.1s, confidence: 0.6642993092536926)
Translated text: פלטת שולחן
Detected object: cat (from 24.8s to 26.5s, confidence: 0.7939125895500183)
Translated text: חָתוּל
Detected object: animal (from 24.8s to 25.1s, confidence: 0.6898297667503357)
Translated text: חַיָה
Detected object: shoe (from 25.4s to 25.4s, confidence: 0.5513880848884583)
Translated text: נַעַל
Detected object: animal (from 26.6s to 26.7s, confidence: 0.7323909997940063)
Translated text: חַיָה
Detected object: tableware (from 26.9s to 27.0s, confidence: 0.6515241861343384)
Translated text: כלי שולחן
Detected object: remote control (from 27.1s to 27.1s, confidence: 0.5767147541046143)
Translated text: שְׁלַט רָחוֹק
Detected object: pillow (from 27.3s to 27.3s, confidence: 0.6477858424186707)
Translated text: כָּרִית
Detected object: top (from 27.4s to 28.8s, confidence: 0.5624907612800598)
Translated text: רֹאשׁ
Detected object: home appliance (from 27.6s to 27.6s, confidence: 0.5552805662155151)
Translated text: מכשיר ביתי
Detected object: glove (from 27.6s to 27.6s, confidence: 0.5463806390762329)
Translated text: כְּפָפָה
Detected object: luggage & bags (from 27.9s to 27.9s, confidence: 0.6919721961021423)
Translated text: מזוודות ותיקים
Detected object: shoe (from 28.6s to 28.6s, confidence: 0.7179497480392456)
Translated text: נַעַל
Detected object: table top (from 28.6s to 28.6s, confidence: 0.5617731809616089)
Translated text: פלטת שולחן
Detected object: person (from 28.8s to 28.8s, confidence: 0.8616508841514587)
Translated text: אָדָם
Detected object: mobile phone (from 28.8s to 28.8s, confidence: 0.5622220039367676)
Translated text: טֶלֶפוֹן סֶלוּלָרי
Detected object: dog (from 28.9s to 31.8s, confidence: 0.9218456149101257)
Translated text: כֶּלֶב
Detected object: bowl (from 28.9s to 32.1s, confidence: 0.8322234153747559)
Translated text: קְעָרָה
Detected object: table (from 28.9s to 30.5s, confidence: 0.5970646739006042)
Translated text: לוּחַ
Detected object: person (from 30.0s to 31.1s, confidence: 0.7745580673217773)
Translated text: אָדָם
Detected object: food (from 31.2s to 31.4s, confidence: 0.5394467711448669)
Translated text: מָזוֹן
Detected object: furniture (from 31.8s to 31.8s, confidence: 0.6819330453872681)
Translated text: רְהִיטִים
Detected object: ball (from 31.8s to 31.8s, confidence: 0.5890166759490967)
Translated text: כַּדוּר
Detected object: packaged goods (from 31.9s to 31.9s, confidence: 0.5276054739952087)
Translated text: סחורה ארוזה
Detected object: container (from 32.0s to 32.0s, confidence: 0.5217052102088928)
Translated text: מְכוֹלָה
Detected object: animal (from 32.1s to 32.4s, confidence: 0.7548351287841797)
Translated text: חַיָה
Detected object: ball (from 32.3s to 32.3s, confidence: 0.556037187576294)
Translated text: כַּדוּר
Detected object: dog (from 32.4s to 34.4s, confidence: 0.8136547803878784)
Translated text: כֶּלֶב
Detected object: dog (from 32.5s to 34.2s, confidence: 0.8229188323020935)
Translated text: כֶּלֶב
Detected object: plant (from 32.7s to 32.8s, confidence: 0.6997487545013428)
Translated text: לִשְׁתוֹל
Detected object: bowl (from 33.0s to 34.0s, confidence: 0.8177229762077332)
Translated text: קְעָרָה
Detected object: tire (from 34.4s to 34.4s, confidence: 0.5635820031166077)
Translated text: צְמִיג
Detected object: nightstand (from 34.5s to 41.6s, confidence: 0.610463559627533)
Translated text: שידת לילה
Detected object: shoe (from 34.6s to 34.6s, confidence: 0.5666685700416565)
Translated text: נַעַל
Detected object: cat (from 35.3s to 41.6s, confidence: 0.837104856967926)
Translated text: חָתוּל
Detected object: cat (from 36.2s to 37.5s, confidence: 0.8095089197158813)
Translated text: חָתוּל
Detected object: cat (from 36.2s to 37.6s, confidence: 0.6687825322151184)
Translated text: חָתוּל
Detected object: home appliance (from 38.6s to 41.5s, confidence: 0.5376156568527222)
Translated text: מכשיר ביתי
Detected object: cat (from 38.9s to 40.2s, confidence: 0.8194187879562378)
Translated text: חָתוּל
Detected object: animal (from 40.5s to 40.6s, confidence: 0.7350361347198486)
Translated text: חַיָה
Detected object: cat (from 41.7s to 48.8s, confidence: 0.911783754825592)
Translated text: חָתוּל
Detected object: mobile phone (from 41.7s to 42.3s, confidence: 0.5573334097862244)
Translated text: טֶלֶפוֹן סֶלוּלָרי
Detected object: mobile phone (from 44.9s to 48.1s, confidence: 0.6987113952636719)
Translated text: טֶלֶפוֹן סֶלוּלָרי
Detected object: cat (from 45.7s to 45.7s, confidence: 0.8585442304611206)
Translated text: חָתוּל
Detected object: cat (from 47.3s to 50.7s, confidence: 0.8182305693626404)
Translated text: חָתוּל
Detected object: cat (from 50.8s to 60.6s, confidence: 0.9374937415122986)
Translated text: חָתוּל
Detected object: person (from 50.8s to 57.9s, confidence: 0.893791913986206)
Translated text: אָדָם
Detected object: person (from 52.8s to 53.1s, confidence: 0.834334135055542)
Translated text: אָדָם
Detected object: electronic device (from 53.0s to 53.0s, confidence: 0.52102130651474)
Translated text: מכשיר אלקטרוני
Detected object: tableware (from 55.9s to 55.9s, confidence: 0.5352984666824341)
Translated text: כלי שולחן
Detected object: person (from 56.0s to 56.3s, confidence: 0.9335148334503174)
Translated text: אָדָם
Detected object: animal (from 61.1s to 61.5s, confidence: 0.8047589063644409)
Translated text: חַיָה
Detected object: animal (from 61.6s to 62.8s, confidence: 0.8163132071495056)
Translated text: חַיָה
Detected object: person (from 61.6s to 61.9s, confidence: 0.8584477305412292)
Translated text: אָדָם
Detected object: cat (from 62.7s to 67.9s, confidence: 0.8655117750167847)
Translated text: חָתוּל
Detected object: cat (from 65.5s to 67.7s, confidence: 0.8874375224113464)
Translated text: חָתוּל
Detected object: person (from 66.8s to 67.1s, confidence: 0.8009416460990906)
Translated text: אָדָם
Detected object: person (from 68.0s to 68.3s, confidence: 0.9265026450157166)
Translated text: אָדָם
Detected object: cat (from 68.0s to 70.9s, confidence: 0.6699553728103638)
Translated text: חָתוּל
Detected object: cat (from 68.0s to 75.8s, confidence: 0.7303812503814697)
Translated text: חָתוּל
Detected object: blanket (from 68.0s to 73.9s, confidence: 0.5980523824691772)
Translated text: שְׂמִיכָה
Detected object: clothing (from 68.1s to 68.5s, confidence: 0.5289710760116577)
Translated text: הַלבָּשָׁה
Detected object: cat (from 68.1s to 75.9s, confidence: 0.6949251294136047)
Translated text: חָתוּל
Detected object: table (from 68.4s to 70.4s, confidence: 0.531315267086029)
Translated text: לוּחַ
Detected object: animal (from 69.1s to 72.3s, confidence: 0.6916438937187195)
Translated text: חַיָה
Detected object: person (from 70.0s to 70.3s, confidence: 0.7563865184783936)
Translated text: אָדָם
Detected object: clothing (from 72.6s to 72.8s, confidence: 0.5207765102386475)
Translated text: הַלבָּשָׁה
Detected object: person (from 73.2s to 73.5s, confidence: 0.785195529460907)
Translated text: אָדָם
Detected object: animal (from 74.7s to 75.9s, confidence: 0.7069710493087769)
Translated text: חַיָה
Detected object: cat (from 75.1s to 75.3s, confidence: 0.7898706793785095)
Translated text: חָתוּל
Detected object: table (from 75.2s to 75.9s, confidence: 0.5778510570526123)
Translated text: לוּחַ
Detected object: animal (from 75.3s to 75.6s, confidence: 0.7192040681838989)
Translated text: חַיָה
Detected object: hat (from 75.8s to 80.2s, confidence: 0.767134964466095)
Translated text: יש
Detected object: animal (from 75.9s to 76.0s, confidence: 0.709578275680542)
Translated text: חַיָה
Detected object: cat (from 75.9s to 76.0s, confidence: 0.6805353760719299)
Translated text: חָתוּל
Detected object: table (from 75.9s to 80.2s, confidence: 0.7157275676727295)
Translated text: לוּחַ
Detected object: person (from 76.0s to 76.3s, confidence: 0.9007478356361389)
Translated text: אָדָם
Detected object: top (from 76.0s to 80.2s, confidence: 0.6261129379272461)
Translated text: רֹאשׁ
Detected object: houseplant (from 76.4s to 76.4s, confidence: 0.5437384247779846)
Translated text: צמח בית
Detected object: plant (from 76.5s to 76.6s, confidence: 0.5662490129470825)
Translated text: לִשְׁתוֹל
Detected object: animal (from 77.0s to 77.4s, confidence: 0.6933488249778748)
Translated text: חַיָה
Detected object: animal (from 77.0s to 77.0s, confidence: 0.6773064136505127)
Translated text: חַיָה
Detected object: cat (from 77.0s to 77.5s, confidence: 0.6915778517723083)
Translated text: חָתוּל
Detected object: cat (from 77.0s to 77.8s, confidence: 0.606193482875824)
Translated text: חָתוּל
Detected object: person (from 77.2s to 80.2s, confidence: 0.8174652457237244)
Translated text: אָדָם
Detected object: animal (from 77.8s to 78.3s, confidence: 0.718535304069519)
Translated text: חַיָה
Detected object: plant (from 78.8s to 80.1s, confidence: 0.5648661255836487)
Translated text: לִשְׁתוֹל
Detected object: cat (from 80.5s to 85.9s, confidence: 0.7302202582359314)
Translated text: חָתוּל
Detected object: shoe (from 80.5s to 81.3s, confidence: 0.6107122898101807)
Translated text: נַעַל
Detected object: shoe (from 81.3s to 82.9s, confidence: 0.5842328071594238)
Translated text: נַעַל
Detected object: shoe (from 81.4s to 81.5s, confidence: 0.703336238861084)
Translated text: נַעַל
Detected object: shoe (from 82.9s to 83.0s, confidence: 0.6004396677017212)
Translated text: נַעַל
Detected object: cat (from 86.8s to 88.5s, confidence: 0.6459187865257263)
Translated text: חָתוּל
Detected object: cat (from 88.6s to 90.2s, confidence: 0.8090400695800781)
Translated text: חָתוּל

and so on...