<a href="https://colab.research.google.com/github/Rohan581/Video-Recommendation/blob/main/Video_Recommendation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Installing pytube

In [None]:
!pip install pytube

Importing required libraries

In [2]:
import pandas as pd
import cv2
import numpy as np
import re
from pytube import YouTube
import ast

# Setting Up YOLO

In [None]:
import os
os.environ['PATH'] += ':/usr/local/cuda/bin'
!rm -fr darknet
!git clone https://github.com/AlexeyAB/darknet
%cd /content/darknet
!sed -i 's/GPU=0/GPU=1/g' Makefile
!sed -i 's/OPENCV=0/OPENCV=1/g' Makefile
!make
!wget https://pjreddie.com/media/files/yolov3.weights
!chmod a+x ./darknet

**Setting up requirements for YOLO**

In [None]:
!apt install ffmpeg libopencv-dev libgtk-3-dev python-numpy python3-numpy libdc1394-22 libdc1394-22-dev libjpeg-dev libtiff5-dev libavcodec-dev libavformat-dev libswscale-dev libxine2-dev libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev libv4l-dev libtbb-dev qtbase5-dev libfaac-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev x264 v4l-utils unzip

# **Initialising YOLO**

In [5]:
net = cv2.dnn.readNet("yolov3.weights", "/content/darknet/cfg/yolov3.cfg")
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
classes = []
with open("/content/darknet/data/coco.names", "r") as f:
    classes = [line.strip() for line in f.readlines()]

The function below is used to extract frames from a Youtube video and returns a list of unique objects present in the video.

In [6]:
def extract_frames_with_yolo(url):

    # Download YouTube video and extract frames
    yt = YouTube(url)
    stream = yt.streams.filter(progressive=True, file_extension="mp4").order_by("resolution").desc().first()
    stream.download(filename="video.mp4")
    cap = cv2.VideoCapture("video.mp4")
    frame_count = 0
    frames = []
    class_ids_list = [] # List of class_ids for each frame
    objects_list = []
    while True:
        ret, frame = cap.read()
        if not ret:
            break
        frame_count += 1
        if frame_count % 100 == 0:  # Extract one frame every 100 frames
            # Detect objects/text in the frame using YOLO
            blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)
            net.setInput(blob)
            layer_names = net.getLayerNames()
            output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
            outs = net.forward(output_layers)
            # Extract detected objects/text and their confidence scores
            class_ids = []
            confidences = []
            boxes = []
            for out in outs:
                for detection in out:
                    scores = detection[5:]
                    class_id = np.argmax(scores)
                    confidence = scores[class_id]
                    if confidence > 0.5:
                        # Object detected
                        center_x = int(detection[0] * frame.shape[1])
                        center_y = int(detection[1] * frame.shape[0])
                        width = int(detection[2] * frame.shape[1])
                        height = int(detection[3] * frame.shape[0])
                        left = int(center_x - width / 2)
                        top = int(center_y - height / 2)
                        if class_id !=0:
                          class_ids.append(class_id)
                        confidences.append(float(confidence))
                        boxes.append([left, top, width, height])
            # Add class_ids to list for each frame
            class_ids_list.append(class_ids)
            frames.append(frame)
            objects = []
            for class_id in class_ids_list:
              for c_id in class_id:
                objects.append(classes[c_id])
            objects = list(set(objects))  # Keep only unique objects in the list
            objects_list.append(objects)
    cap.release()
    return objects

The code below is used create a new column named 'Objects' in the dataframe and extract objects from the last 50 videos using YOLO and the list of objects is put in the Objects column of the data frame.
**NOTE:** **DO NOT RUN THE FUNCTION BELOW**

In [37]:
url = 'https://raw.githubusercontent.com/Rohan581/Video-Recommendation/main/Youtube_Video_Dataset.csv'
data = pd.read_csv(url)

data = data.iloc[-50:]

data['Videourl'] = "www.youtube.com" + data['Videourl']

url_list = data['Videourl'].values.tolist()

data['Objects'] = None
data['Objects'] = data['Objects'].astype(object)

for url in range(50):
    ext_objs = extract_frames_with_yolo(url_list[url])
    print(ext_objs)
    data['Objects'] = data['Objects'].astype(object)
    data.at[11161 + url, 'Objects'] = ext_objs

data.to_csv('Extracted Objects.csv')

This function detects objects from an image

In [7]:
def detect_objects(image_path):

    # Load image
    image = cv2.imread(image_path)

    # Detect objects in the image using YOLO
    blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416), swapRB=True, crop=False)
    net.setInput(blob)
    layer_names = net.getLayerNames()
    output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
    outs = net.forward(output_layers)

    frames = []
    class_ids_list = [] # List of class_ids for each frame
    objects_list = []

    # Extract detected objects and their confidence scores
    class_ids = []
    confidences = []
    boxes = []
    for out in outs:
        for detection in out:
            scores = detection[5:]
            class_id = np.argmax(scores)
            confidence = scores[class_id]
            if confidence > 0.5:
                # Object detected
                center_x = int(detection[0] * image.shape[1])
                center_y = int(detection[1] * image.shape[0])
                width = int(detection[2] * image.shape[1])
                height = int(detection[3] * image.shape[0])
                left = int(center_x - width / 2)
                top = int(center_y - height / 2)
                if class_id !=0:
                    class_ids.append(class_id)
                    confidences.append(float(confidence))
                    boxes.append([left, top, width, height])
            # Add class_ids to list for each frame
            class_ids_list.append(class_ids)
            objects = []
            for class_id in class_ids_list:
                for c_id in class_id:
                    objects.append(classes[c_id])
            objects = list(set(objects))  # Keep only unique objects in the list
            objects_list.append(objects)
    return objects

The function below will calculate the cosine similarity between two lists of strings

In [8]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

def cosine_similarity_strings(str_list1, str_list2):
    # Combine the input lists of strings into single strings
    str1 = ' '.join(str_list1)
    str2 = ' '.join(str_list2)

    # Create a CountVectorizer object and transform the strings into a sparse matrix of token counts
    vectorizer = CountVectorizer().fit_transform([str1, str2])

    # Compute the cosine similarity between the two sparse matrices
    cosine_sim = cosine_similarity(vectorizer[0], vectorizer[1])[0][0]

    return cosine_sim

To save time I've already extracted objects from 50 data points and around 32 of them were valid. I've saved the extracted objects as explained above and stored them in a **csv** file in my github account. Run the below function to get videos similar to the image that you're going to upload and these videos are sorted accorind to similarity score.

In [None]:
from google.colab import files

data = pd.read_csv('https://raw.githubusercontent.com/Rohan581/Video-Recommendation/main/Extracted%20Objects.csv')
data['Objects'] = data['Objects'].apply(ast.literal_eval)


uploaded = files.upload()
filename = list(uploaded.keys())[0]

# Rename the file
os.rename(filename, 'input_file.jpg')
objects = detect_objects('input_file.jpg')
data['Similarity'] = 0.0

for ind in range(len(data['Objects'])):
    data['Similarity'][ind] = cosine_similarity_strings(objects, data['Objects'][ind])

data = data[data['Similarity']>0]
data.sort_values(by='Similarity', ascending=False)

for index, row in data.iterrows():
  print(f"Title: {row[0]} , Link: {row[1]}")