Skip to content

Program that takes a clip_id (aka video_id) as input and outputs "similar" clips based on the clip's title, description, and/or category. I included metadata of around 4,000 staff picks in a dataset along with their categories.

Notifications You must be signed in to change notification settings

agade007/Video-Recommendation-System-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Video-Recommendation-System-2.0

Software Engineer - ML and Search - Coding Exercise

Program that takes a clip_id (aka video_id) as input and outputs "similar" clips based on the clip's title, description, and/or category. I included metadata of around 4,000 staff picks in a dataset along with their categories.

Accepts a clip_id as the only argument on the command line
Return a list of the 10 most similar videos in the dataset in order of similarity given a single clip_id (also from the dataset.)
The format of the results returned should be an ordered JSON list of JSON objects with the following fields for each clip result:

id
title
description
categories (comma separated list)
image (url of thumbnail)

Approaches and Deliverables:

Used Inverse index and/or TF-IDF information directly find closest matches.
Generated a vector/embedding and find closest vector matches. For instance:
Used an existing word embedding model Word2Vec to create vectors.
Vectorize the TF-IDF information and use that as your vector.
Implemented an additional web interface where you can submit any clip_id in the dataset and get the similar clips JSON as a response.
Submitting your work as a git repo with clean commit messages.

About

Program that takes a clip_id (aka video_id) as input and outputs "similar" clips based on the clip's title, description, and/or category. I included metadata of around 4,000 staff picks in a dataset along with their categories.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published