# Custom Video Key Frame Image Classification and Brand Detection for Azure Video Indexer

Often, we wish to extract useful tags from videos content. On social media services such as instagram, facebook, and youtube such tags are often the difference between having a successful engagment. This tutorial will show how to use the Azure Video Indexer, Computer Vision API and Custom Vision Services to extract key frames and extract custom tags. We will use the Azure APIs to detect custom brand logos in indexed videos. This code can be extended to support almost any image classification or object detection task.



## Step 1 Download A Sample Video with the pyTube API

In [None]:
## Install pyTube3 https://github.com/nficano/pytube
!pip install pytube3 --upgrade

In [4]:
#importing the module 
from pathlib import Path
from pytube import YouTube
video2Index = YouTube('https://www.youtube.com/watch?v=ijtKxXiS4hE').streams[0].download()
video_name = Path(video2Index).stem

In [5]:
video2Index

'c:\\Users\\abornst\\Documents\\video_indexer_python\\The Whos Who of the Azure AI Platform (Azure Mythbusters).mp4'

In [7]:
video_name

'The Whos Who of the Azure AI Platform (Azure Mythbusters)'

## Step 2 Use the UnOffical Video Indexer Python API to process video

In [None]:
# Install unoffical video-indexer client https://github.com/bklim5/python_video_indexer_lib
!pip install video-indexer

In [28]:
from video_indexer import VideoIndexer
#These Id's can be found in the screenshot here https://docs.microsoft.com/en-us/azure/media-services/video-indexer/video-indexer-use-apis?WT.mc_id=vikeyframedetection-notebook-abornst

vi = VideoIndexer(
    vi_subscription_key='SUBSCRIPTION_KEY',
    vi_location='LOCATION',
    vi_account_id='ACCOUNT_ID'
)

In [10]:
video_id = vi.upload_to_video_indexer(
   input_filename = video2Index,
   video_name=video_name,  # identifier for video in Video Indexer platform, must be unique during indexing time
   video_language='English'
)


Uploading video to video indexer...


In [None]:
# Get Video Info
info = vi.get_video_info(
    video_id,
    video_language='English'
)

In [None]:
# Extract keyframes
keyframes = []
for shot in info["videos"][0]["insights"]["shots"]:
    for keyframe in shot["keyFrames"]:
        keyframes.append(keyframe["instances"][0]['thumbnailId'])
print("Found #{} keyframes in video".format(len(keyframes)))

In [None]:
# Visualize Key Frames
from IPython.display import display
from PIL import Image
import io

for keyframe in keyframes:
    img_str = vi.get_thumbnail_from_video_indexer(
             video_id,
             keyframe)
    img = Image.open(io.BytesIO(img_str))
    display(img)

## Step 3 Key Frames to Azure Computer Vision API to Detect Brands

https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-brand-detection?WT.mc_id=videoindexer-github-abornst


In [105]:
# Set up brand detection with computer vision api https://docs.microsoft.com/en-us/azure/cognitive-services/computer-vision/concept-brand-detection?WT.mc_id=videoindexer-github-abornst

In [None]:
# Install Azure Computer Vision Client API 
!pip install --upgrade azure-cognitiveservices-vision-computervision

In [20]:
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from msrest.authentication import CognitiveServicesCredentials
endpoint = "Azure Computer Vision Endpoint"
subscription_key = "Azure Computer Vision Key"
computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))

In [None]:
import time
timeout_interval, timeout_time = 5, 10.0
image_features = ["brands"]

for index, keyframe in enumerate(keyframes):
    if index % timeout_interval == 0:
        print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout_time))
        time.sleep(timeout_time)
    # Get KeyFrame Image Byte String From Video Indexer
    img_str = vi.get_thumbnail_from_video_indexer(video_id, keyframe)
    # Convert Byte Stream to Image Stream
    img_stream = io.BytesIO(img_str)  
    # Analyze with Azure Computer Vision
    cv_results = computervision_client.analyze_image_in_stream(img_stream, image_features) 
    print("Detecting brands in keyframe {}: ".format(keyframe))
    if len(cv_results.brands) == 0:
        print("No brands detected.")
    else:
        for brand in cv_results.brands:
            print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( \
            brand.name, brand.confidence * 100, brand.rectangle.x, brand.rectangle.x + brand.rectangle.w, \
            brand.rectangle.y, brand.rectangle.y + brand.rectangle.h))

## Step 4 Key Frames to Azure Custom Vision API to Custom Detect Brands



In [None]:
!pip install azure-cognitiveservices-vision-customvision

In [None]:
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
prediction_threshold = .8
prediction_key =  "Custom Vision Service Key" 
custom_endpoint = "Custom Vision Service Endpoint"
project_id = "Custom Vision Service Model ProjectId"
published_name = "Custom Vision Service Model Iteration Name"

# Now there is a trained endpoint that can be used to make a prediction
predictor = CustomVisionPredictionClient(prediction_key, endpoint=published_name)

In [None]:
import time
timeout_interval, timeout_time = 5, 10.0
image_features = ["brands"]

for index, keyframe in enumerate(keyframes):
    if index % timeout_interval == 0:
        print("Trying to prevent exceeding request limit waiting {} seconds".format(timeout_time))
        time.sleep(timeout_time)
    # Get KeyFrame Image Byte String From Video Indexer
    img_str = vi.get_thumbnail_from_video_indexer(video_id, keyframe)
    # Convert Byte Stream to Image Stream
    img_stream = io.BytesIO(img_str)  
    # Analyze with Azure Computer Vision
    cv_results = predictor.detect_image(project_id, published_name, img_stream)
    predictions = [pred for pred in cv_results.predictions if pred.probability > prediction_threshold]
    print("Detecting brands in keyframe {}: ".format(keyframe))
    if len(predictions) == 0:
        print("No custom brands detected.")
    else:
        for brand in predictions:
            print("'{}' brand detected with confidence {:.1f}% at location {}, {}, {}, {}".format( \
                brand.tag_name, brand.probability * 100, brand.bounding_box.left, brand.bounding_box.top, \
                brand.bounding_box.width, brand.bounding_box.height))

## About the Author
Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.