<a href="https://colab.research.google.com/github/atilatech/atlas-service/blob/master/notebooks/deploy_whisper_huggingface.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deploy Whisper and Sentence Transformer Model to HuggingFace

### Install Dependencies

In [1]:
!pip install transformers pytube sentence-transformers

# optional install pytorch so you can use a gpu for faster transcription
# command below is for Linux. See instructions for mac and windows: https://pytorch.org/get-started/locally/
!pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

!pip install git+https://github.com/openai/whisper.git -q
!apt install ffmpeg # https://stackoverflow.com/questions/51856340/how-to-install-package-ffmpeg-in-google-colab

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.25.1-py3-none-any.whl (5.8 MB)
[K     |████████████████████████████████| 5.8 MB 29.6 MB/s 
[?25hCollecting pytube
  Downloading pytube-12.1.2-py3-none-any.whl (57 kB)
[K     |████████████████████████████████| 57 kB 5.5 MB/s 
[?25hCollecting sentence-transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[K     |████████████████████████████████| 85 kB 5.8 MB/s 
[?25hCollecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 55.2 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.11.1-py3-none-any.whl (182 kB)
[K     |████████████████████████████████| 182 kB 42.2 MB/s 
Collecting sentencepiece
  Downloading sentencepiece-0.1.97-cp38-cp38-manylinux_2_17_x86_64

## Create a Custom inference handler

See:

1. https://huggingface.co/docs/inference-endpoints/guides/custom_handler
2. https://www.philschmid.de/custom-inference-handler

Another alternative is using a [Serverless Deployment of the sentence-transformer model](https://aseifert.com/p/serverless-sentence-transformer/)

In [None]:
!pip install transformers pytube sentence-transformers

# optional install pytorch so you can use a gpu for faster transcription
# command below is for Linux. See instructions for mac and windows: https://pytorch.org/get-started/locally/
!pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu

!pip install git+https://github.com/openai/whisper.git -q
!apt install ffmpeg # https://stackoverflow.com/questions/51856340/how-to-install-package-ffmpeg-in-google-colab

In [28]:
from typing import Dict

from sentence_transformers import SentenceTransformer
from tqdm import tqdm
import whisper
import torch
import pytube
import time


class EndpointHandler():
    def __init__(self, path=""):
        # load the model
        WHISPER_MODEL_NAME = "tiny.en"
        SENTENCE_TRANSFORMER_MODEL_NAME = "multi-qa-mpnet-base-dot-v1"

        device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f'whisper will use: {device}')

        t0 = time.time()
        self.whisper_model = whisper.load_model(WHISPER_MODEL_NAME).to(device)
        t1 = time.time()

        total = t1 - t0
        print(f'Finished loading whisper_model in {total} seconds')

        t0 = time.time()
        self.sentence_transformer_model = SentenceTransformer(SENTENCE_TRANSFORMER_MODEL_NAME)
        t1 = time.time()

        total = t1 - t0
        print(f'Finished loading sentence_transformer_model in {total} seconds')

    def __call__(self, data: Dict[str, str]) -> Dict:
        """
        Args:
            data (:obj:):
                includes the URL to video for transcription
        Return:
            A :obj:`dict`:. transcribed dict
        """
        # process input
        print('data', data)

        if "inputs" not in data:
            raise Exception(f"data is missing 'inputs' key which  EndpointHandler expects. Received: {data}"
                            f" See: https://huggingface.co/docs/inference-endpoints/guides/custom_handler#2-create-endpointhandler-cp")
        video_url = data.pop("video_url", None)
        query = data.pop("query", None)
        encoded_segments = {}
        if video_url:
            video_with_transcript = self.transcribe_video(video_url)
            encode_transcript = data.pop("encode_transcript", True)
            if encode_transcript:
                encoded_segments = self.combine_transcripts(video_with_transcript)
                encoded_segments = {
                    "encoded_segments": self.encode_sentences(encoded_segments)
                }
            return {
                **video_with_transcript,
                **encoded_segments
            }
        elif query:
            query = [{"text": query, "id": ""}]
            encoded_segments = self.encode_sentences(query)

            return {
                "encoded_segments": encoded_segments
            }

    def transcribe_video(self, video_url):
        decode_options = {
            # Set language to None to support multilingual,
            # but it will take longer to process while it detects the language.
            # Realized this by running in verbose mode and seeing how much time
            # was spent on the decoding language step
            "language": "en",
            "verbose": True
        }
        yt = pytube.YouTube(video_url)
        video_info = {
            'id': yt.video_id,
            'thumbnail': yt.thumbnail_url,
            'title': yt.title,
            'views': yt.views,
            'length': yt.length,
            # Althhough, this might seem redundant since we already have id
            # but it allows the link to the video be accessed in 1-click in the API response
            'url': f"https://www.youtube.com/watch?v={yt.video_id}"
        }
        stream = yt.streams.filter(only_audio=True)[0]
        path_to_audio = f"{yt.video_id}.mp3"
        stream.download(filename=path_to_audio)
        t0 = time.time()
        transcript = self.whisper_model.transcribe(path_to_audio, **decode_options)
        t1 = time.time()
        for segment in transcript['segments']:
            # Remove the tokens array, it makes the response too verbose
            segment.pop('tokens', None)

        total = t1 - t0
        print(f'Finished transcription in {total} seconds')

        # postprocess the prediction
        return {"transcript": transcript, 'video': video_info}

    def encode_sentences(self, transcripts, batch_size=64):
        """
        Encoding all of our segments at once or storing them locally would require too much compute or memory.
        So we do it in batches of 64
        :param transcripts:
        :param batch_size:
        :return:
        """
        # loop through in batches of 64
        all_batches = []
        for i in tqdm(range(0, len(transcripts), batch_size)):
            # find end position of batch (for when we hit end of data)
            i_end = min(len(transcripts), i + batch_size)
            # extract the metadata like text, start/end positions, etc
            batch_meta = [{
                **row
            } for row in transcripts[i:i_end]]
            # extract only text to be encoded by embedding model
            batch_text = [
                row['text'] for row in batch_meta
            ]
            # create the embedding vectors
            batch_vectors = self.sentence_transformer_model.encode(batch_text).tolist()

            batch_details = [
                {
                    **batch_meta[x],
                    'vectors':batch_vectors[x]
                } for x in range(0, len(batch_meta))
            ]
            all_batches.extend(batch_details)

        return all_batches

    @staticmethod
    def combine_transcripts(video, window=6, stride=3):
        """

        :param video:
        :param window: number of sentences to combine
        :param stride: number of sentences to 'stride' over, used to create overlap
        :return:
        """
        new_transcript_segments = []

        video_info = video['video']
        transcript_segments = video['transcript']['segments']
        for i in tqdm(range(0, len(transcript_segments), stride)):
            i_end = min(len(transcript_segments), i + window)
            text = ' '.join(transcript['text']
                            for transcript in
                            transcript_segments[i:i_end])
            # TODO: Should int (float to seconds) conversion happen at the API level?
            start = int(transcript_segments[i]['start'])
            end = int(transcript_segments[i]['end'])
            new_transcript_segments.append({
                **video_info,
                **{
                    'start': start,
                    'end': end,
                    'title': video_info['title'],
                    'text': text,
                    'id': f"{video_info['id']}-t{start}",
                    'url': f"https://youtu.be/{video_info['id']}?t={start}",
                    'video_id': video_info['id'],
                }
            })
        return new_transcript_segments


In [3]:
# Initialize the Handler

my_handler = EndpointHandler(path="")


whisper will use: cuda


100%|█████████████████████████████████████| 72.1M/72.1M [00:02<00:00, 33.8MiB/s]


Finished loading whisper_model in 9.460813283920288 seconds


Downloading:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/8.65k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.9k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

Finished loading sentence_transformer_model in 35.54280710220337 seconds


In [None]:
# prepare sample payload
# payload = {"video_url": "https://www.youtube.com/watch?v=aNxigRg1yEQ"}
payload = {"video_url": "https://www.youtube.com/watch?v=bGk8qcHc1A0"} # Joe Rogan & Lex Fridman: Lionel Messi Is The GOAT Over Cristiano Ronaldo


# # test the handler
payload_pred=my_handler(payload)
payload_pred

In [30]:
my_handler = EndpointHandler(path="")
payload = {"query": "mighty mouse", "inputs": ""} # Joe Rogan & Lex Fridman: Lionel Messi Is The GOAT Over Cristiano Ronaldo


# # test the handler
payload_pred=my_handler(payload)
payload_pred

whisper will use: cuda
Finished loading whisper_model in 0.687432050704956 seconds
Finished loading sentence_transformer_model in 1.2009577751159668 seconds
data {'query': 'mighty mouse', 'inputs': ''}


100%|██████████| 1/1 [00:00<00:00,  5.25it/s]


{'encoded_segments': [{'text': 'mighty mouse',
   'id': '',
   'vectors': [0.027853503823280334,
    -0.40070658922195435,
    -0.2346675992012024,
    0.09729225188493729,
    0.2813168466091156,
    -0.05503133311867714,
    -0.3142769932746887,
    -0.031377118080854416,
    -0.8565509915351868,
    -0.08481072634458542,
    0.1783233880996704,
    0.0891503244638443,
    -0.29496511816978455,
    -0.2774517834186554,
    0.3086421489715576,
    0.03930829092860222,
    0.1941390484571457,
    0.263176828622818,
    -0.5084221363067627,
    -0.009546858258545399,
    -0.06513401865959167,
    -0.09507754445075989,
    -0.12632127106189728,
    0.2841081917285919,
    -0.029323918744921684,
    0.10884086787700653,
    0.16561704874038696,
    0.1498178392648697,
    -0.14696060121059418,
    -0.4477223753929138,
    -0.007809673435986042,
    0.06591729074716568,
    -0.07277068495750427,
    0.4331079423427582,
    -0.00010477645264472812,
    -0.2418029010295868,
    0.50076675415

In [22]:
my_handler = EndpointHandler(path="")
payload = {"video_url": "https://www.youtube.com/watch?v=ciKdF97JWpU", "encode_transcript": True} # Jimmy Butler Reveals What Made Him Leave the Philadelphia 76ers | The JJ Redick Podcast | The Ringer


# # test the handler
payload_pred=my_handler(payload)
payload_pred


whisper will use: cuda
Finished loading whisper_model in 0.6834800243377686 seconds
Finished loading sentence_transformer_model in 1.0409791469573975 seconds
data {'video_url': 'https://www.youtube.com/watch?v=ciKdF97JWpU', 'encode_transcript': True}
[00:00.000 --> 00:06.080]  Was last your difficult for you? Yeah, not just getting traded, but the whole summer preseason. Yeah
[00:06.640 --> 00:12.640]  Hell yeah, it was difficult man. It was it was so different and on any given day
[00:14.000 --> 00:19.040]  Me as a as a person as a player. I didn't know who the f***ing charge
[00:19.040 --> 00:20.720]  I think that was that was my biggest thing
[00:20.720 --> 00:26.000]  I didn't know what the f*** to expect whenever I were going to the to the gym whenever I go into the plane whenever I go into the game
[00:26.560 --> 00:28.560]  I was a man
[00:28.560 --> 00:31.120]  I think I was as losses in next month
[00:32.000 --> 00:39.040]  Meaning there was just that was just a lot of voices.

100%|██████████| 51/51 [00:00<00:00, 92521.41it/s]
100%|██████████| 1/1 [00:00<00:00,  1.66it/s]


{'transcript': {'text': " Was last your difficult for you? Yeah, not just getting traded, but the whole summer preseason. Yeah Hell yeah, it was difficult man. It was it was so different and on any given day Me as a as a person as a player. I didn't know who the f***ing charge I think that was that was my biggest thing I didn't know what the f*** to expect whenever I were going to the to the gym whenever I go into the plane whenever I go into the game I was a man I think I was as losses in next month Meaning there was just that was just a lot of voices. Yeah, a lot of influence from a lot of replaces and just so much going on in a given day I was like, yep, I guess I'm just hit a work. I didn't know who to talk to at what point did you like realize that? What point did and that f***ing meeting in the office that I told you that I Was like I cannot believe wait when did he brought when bre Obviously, I'll tell my side story about what happened in Portland with you with you in the meetin

# Transcribe and Search Video

1. Call the Model Inference we created in the last step
    1. Using the Inference endpoint class locally
    1. Using the deployed model on Huggingface

1. Combine 6 segments together to create more meaningful sentences

1. Embed sentences into vectors using transformers

1. Save vectors into a vector database

1. Query phrases using vector database

1. [Fixing YouTube Search with OpenAI's Whisper](https://www.pinecone.io/learn/openai-whisper/)

## Transcribe Video

In [None]:
!pip install requests

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from getpass import getpass

HUGGING_FACE_ENDPOINT_URL = getpass('Enter HUGGING_FACE_ENDPOINT_URL')
HUGGING_FACE_API_KEY = getpass('Enter HUGGING_FACE_API_KEY')


Enter HUGGING_FACE_ENDPOINT_URL··········
Enter HUGGING_FACE_API_KEY··········


In [None]:
import json
from typing import List
import requests
import base64
import mimetypes

def send_transcription_request(url:str=None):
    payload = json.dumps({
      "inputs": video_url
    })
    headers = {
      'Authorization': f'Bearer {HUGGING_FACE_API_KEY}',
      'Content-Type': 'application/json'
    }

    response = requests.request("POST", HUGGING_FACE_ENDPOINT_URL, headers=headers, data=payload)
    return response.json()

video_url="https://www.youtube.com/watch?v=bGk8qcHc1A0" # Joe Rogan & Lex Fridman: Lionel Messi Is The GOAT Over Cristiano Ronaldo
video_data = send_transcription_request(video_url)

In [None]:
# verify that it worked
video_data['transcript']['segments'][0]

{'id': 0,
 'seek': 0,
 'start': 0.0,
 'end': 1.56,
 'text': " You're a part of mixed martial arts.",
 'temperature': 0.0,
 'avg_logprob': -0.3029073941505561,
 'compression_ratio': 1.6091549295774648,
 'no_speech_prob': 0.13397082686424255}

## Combine Transcript Segments

In [None]:
!pip install tqdm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from tqdm.auto import tqdm

new_transcript_segments = []

def combine_transcripts(video):
  window = 6  # number of sentences to combine
  stride = 3  # number of sentences to 'stride' over, used to create overlap

  video_info=video['video']
  transcript_segments=video['transcript']['segments']
  for i in tqdm(range(0, len(transcript_segments), stride)):
      i_end = min(len(transcript_segments)-1, i+window)
      text = ' '.join(transcript['text'] 
                    for transcript in
                    transcript_segments[i:i_end])
      # TODO: Should int (float to seconds) conversion happen at the API level?
      start=int(transcript_segments[i]['start'])
      end=int(transcript_segments[i]['end'])
      new_transcript_segments.append({
          **video_info,
          **{
          'start': start,
          'end': end,
          'title': video_info['title'],
          'text': text,
          'id': f"{video_info['id']}-t{start}",
          'url': f"https://youtu.be/{video_info['id']}?t={start}",
          'video_id': video_info['id'],
          }
      })
  return new_transcript_segments
combined_transcripts = combine_transcripts(video_data)

  0%|          | 0/31 [00:00<?, ?it/s]

In [None]:
combined_transcripts[3]

{'id': 'bGk8qcHc1A0-t25',
 'thumbnail': 'https://i.ytimg.com/vi/bGk8qcHc1A0/sddefault.jpg',
 'title': 'Joe Rogan & Lex Fridman: Lionel Messi Is The GOAT Over Cristiano Ronaldo',
 'views': 187530,
 'length': 218,
 'url': 'https://youtu.be/bGk8qcHc1A0?t=25',
 'start': 25,
 'end': 28,
 'text': " But what is the division?  They're just both incredible.  But what is it like who's better LeBron or Michael Jordan?  Is it like that kind of thing?  People get very passionate about that.  They extremely passionate.",
 'video_id': 'bGk8qcHc1A0'}

## Convert Transcripts to Vectors

1. Use Sentence Tranformers



In [None]:
!pip install -U sentence-transformers pinecone-client

In [None]:
from getpass import getpass

PINECONE_API_KEY = getpass('Enter PINECONE_API_KEY')

Enter PINECONE_API_KEY··········


In [None]:
from sentence_transformers import SentenceTransformer

model_id = "multi-qa-mpnet-base-dot-v1"

sentence_transformer_model = SentenceTransformer(model_id)
sentence_transformer_model

dimensions = sentence_transformer_model.get_sentence_embedding_dimension()

Downloading:   0%|          | 0.00/737 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/8.65k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/25.5k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/13.9k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

In [None]:
import pinecone  # !pip install pinecone-client

index_id = "youtube-search"

pinecone.init(
    api_key=PINECONE_API_KEY,  # app.pinecone.io
    environment="us-west1-gcp"
)

if index_id not in pinecone.list_indexes():
    pinecone.create_index(
        index_id,
        dimensions,
        metric="dotproduct"
    )

pinecone_index = pinecone.Index(index_id)
pinecone_index.describe_index_stats()

{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 30}},
 'total_vector_count': 30}

## Upload to Vector Database

In [None]:
# we encode and insert in batches of 64
batch_size = 64

def upload_transcripts_to_vector_db(transcripts_for_upload):
  # loop through in batches of 64
  for i in tqdm(range(0, len(transcripts_for_upload), batch_size)):
      # find end position of batch (for when we hit end of data)
      i_end = min(len(transcripts_for_upload)-1, i+batch_size)
      # extract the metadata like text, start/end positions, etc
      batch_meta = [{
          **transcripts_for_upload[x]
      } for x in range(i, i_end)]
      # extract only text to be encoded by embedding model
      batch_text = [
          row['text'] for row in transcripts_for_upload[i:i_end]
      ]
      # create the embedding vectors
      batch_embeds = sentence_transformer_model.encode(batch_text).tolist()
      # extract IDs to be attached to each embedding and metadata
      batch_ids = [
          row['id'] for row in transcripts_for_upload[i:i_end]
      ]
      # 'upsert' (insert) IDs, embeddings, and metadata to index
      to_upsert = list(zip(
          batch_ids, batch_embeds, batch_meta
      ))
      pinecone_index.upsert(to_upsert)
      print(f'Uploaded Batches: {i} to {i_end}')

upload_transcripts_to_vector_db(combined_transcripts)

  0%|          | 0/4 [00:00<?, ?it/s]

Uploaded Batches: 0 to 64
Uploaded Batches: 64 to 128
Uploaded Batches: 128 to 192
Uploaded Batches: 192 to 238


{'dimension': 768,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 54}},
 'total_vector_count': 54}

## Search Transcript

In [None]:
def query_model(query, video_id=""):
  encoded_query = sentence_transformer_model.encode(query).tolist()
  metadata_filter = { "video_id": {"$eq": video_id}} if video_id else None
  return pinecone_index.query(encoded_query, top_k=5,
                              include_metadata=True,
                              filter=metadata_filter)

query_phrase = "basketball"
results = query_model(query_phrase, "bGk8qcHc1A0")

results['matches'][0]

{'id': 'bGk8qcHc1A0-t25',
 'metadata': {'end': 28.0,
              'id': 'bGk8qcHc1A0-t25',
              'length': 218.0,
              'start': 25.0,
              'text': " But what is the division?  They're just both "
                      "incredible.  But what is it like who's better LeBron or "
                      'Michael Jordan?  Is it like that kind of thing?  People '
                      'get very passionate about that.  They extremely '
                      'passionate.',
              'thumbnail': 'https://i.ytimg.com/vi/bGk8qcHc1A0/sddefault.jpg',
              'title': 'Joe Rogan & Lex Fridman: Lionel Messi Is The GOAT Over '
                       'Cristiano Ronaldo',
              'url': 'https://youtu.be/bGk8qcHc1A0?t=25',
              'video_id': 'bGk8qcHc1A0',
              'views': 187530.0},
 'score': 18.0920982,
 'sparseValues': {},
 'values': []}

## Add some utility functions

In [None]:
from datetime import timedelta
import urllib

def convert_seconds_to_string(seconds):
    days, seconds = divmod(seconds, 86400)
    return str(timedelta(days=days, seconds=seconds)).split(',')[-1].strip()


def parse_video_id(url):
    # Parse the URL
    parsed_url = urllib.parse.urlparse(url)
    
    # Check if the URL is a YouTube URL
    if parsed_url.netloc in ['www.youtube.com', 'youtu.be']:
        # Extract the video ID from the path or query parameters
        if parsed_url.netloc == 'www.youtube.com':
            video_id = urllib.parse.parse_qs(parsed_url.query)['v'][0]
        else:
            video_id = parsed_url.path.split('/')[-1]
        return video_id
    else:
        return None

def does_video_exist(video_url):
  # create a placeholder vector of zeros to see if any vectors with the 
  # given video_id match.
  video_id = parse_video_id(video_url)
  query_response = pinecone_index.query(
      top_k=1,
      vector=[0] * dimensions,
      filter={
          "video_id": {"$eq": video_id}
      }
  )
  return len(query_response['matches']) > 0

# Putting it all Together

In [None]:
import time
video_url="https://www.youtube.com/watch?v=lKXv19eRLZg" # Making Friends with Machine Learning
query_phrase = "three degrees"


def transcribe_and_search_video(url, query, verbose=True):
  t0 = time.time()
  if not does_video_exist(url):
    video_with_transcript = send_transcription_request(url)
    video_with_transcript_combined = combine_transcripts(video_with_transcript)

    upload_transcripts_to_vector_db(video_with_transcript_combined)
  else:
    print(f'Skipping transcribing and embedding.'\
    ' Video already exists:{url}')
  results = query_model(query)
  t1 = time.time()
  total = t1-t0
  if verbose:
    video_length = f"{convert_seconds_to_string(results['matches'][0]['metadata']['length'])} "\
                      "long video" \
      if len(results['matches']) > 0 else 'no video found'
    print(f'Transcribed and searched {video_length} in {total} seconds')
  return results

search_results = transcribe_and_search_video(video_url, query_phrase)
search_results['matches'][0:3]

Skipping transcribing and embedding. Video already exists:{url}
Transcribed and searched 0:04:21 long video in 0.5128779411315918 seconds


[{'id': 'lKXv19eRLZg-t159',
  'metadata': {'end': 162.0,
               'id': 'lKXv19eRLZg-t159',
               'length': 261.0,
               'start': 159.0,
               'text': ' worse even than I play.  I give it a little while '
                       "longer, and it's a flawless expert.  It's getting some "
                       'pretty steep angles there.  And then, see what we want '
                       'it to do.  We give it controls of a joystick, '
                       'essentially,  and its sensory input is this '
                       'environment.',
               'thumbnail': 'https://i.ytimg.com/vi/lKXv19eRLZg/sddefault.jpg',
               'title': 'MFML 000 - Welcome to machine learning!',
               'url': 'https://youtu.be/lKXv19eRLZg?t=159',
               'video_id': 'lKXv19eRLZg',
               'views': 6918.0},
  'score': 12.0709095,
  'sparseValues': {},
  'values': []}, {'id': 'lKXv19eRLZg-t146',
  'metadata': {'end': 150.0,
               'id'

## Examples

In [None]:
video_url="https://www.youtube.com/watch?v=s5yguqapy6s" # How Jimmy Butler and Mark Wahlberg Became Close Friends
query_phrase = "filming transformers"

results = transcribe_and_search_video(video_url, query_phrase)
results['matches'][0]

Skipping transcribing and embedding. Video already exists:{url}
Transcribed and searched 0:04:51 long video in 0.5042881965637207 seconds


{'id': 's5yguqapy6s-t156',
 'metadata': {'end': 160.0,
              'id': 's5yguqapy6s-t156',
              'length': 291.0,
              'start': 156.0,
              'text': ' Chicago filming for Transformers and then he wanted '
                      'to  He wanted to play basketball with like his guys  So '
                      'they put him in the Birdo Center, which was where the '
                      "Bulls used to practice way up north now  We're down we "
                      'now there downtown  I do that all the time like all '
                      'science 21 on stuff 22 on stuff 23 on stuff like yeah  '
                      "But yeah, that's where they used to practice and he was "
                      'in their filming one day and then they call me back to',
              'thumbnail': 'https://i.ytimg.com/vi/s5yguqapy6s/sddefault.jpg',
              'title': 'How Jimmy Butler and Mark Wahlberg Became Close '
                       'Friends | The JJ Redick Podcast

In [None]:
video_url="https://www.youtube.com/watch?v=Gqev5NrWnvM" # Rio Ferdinand on Messi | Most Embarrassing Night of my Life
query_phrase = "we would have won"

results = transcribe_and_search_video(video_url, query_phrase)
results['matches'][0]


Skipping transcribing and embedding. Video already exists:{url}
Transcribed and searched 0:06:18 long video in 0.9052891731262207 seconds


{'id': 'Gqev5NrWnvM-t182',
 'metadata': {'end': 186.0,
              'id': 'Gqev5NrWnvM-t182',
              'length': 378.0,
              'start': 182.0,
              'text': ' And I feel it will kind of felt if we played against '
                      'that Barcelona team  without Messi, we probably would '
                      "have won.  And it's had mixed response.  Some people "
                      'were very supportive.  Some not so supportive.  So Tony '
                      'Bell, who, on Cruiserweight World Champions, said this,',
              'thumbnail': 'https://i.ytimg.com/vi/Gqev5NrWnvM/sddefault.jpg',
              'title': 'Rio Ferdinand on Messi | Most Embarrassing Night of my '
                       'Life',
              'url': 'https://youtu.be/Gqev5NrWnvM?t=182',
              'video_id': 'Gqev5NrWnvM',
              'views': 6594929.0},
 'score': 21.0212822,
 'sparseValues': {},
 'values': []}