Aim of the project

Ever wondered how Netflix or Hotstar recommends new movies based on the watch history, how Amazon or Flipkart suggests new products based on your order or search history? In this machine learning project, I build a recommendation system from the ground up to suggest movies to the user based on his/her preferences.

DataSet:

https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata/download?datasetVersionNumber=2 The dataset contains two CSV files, credits, and movies. The credits file contains all the metadata information about the movie and the movie file contains the information like name and id of the movie, budget, languages in the movie that has been released, etc.

About Dataset

Data Source Transfer Details that contains the metadata (cast, crew, budget, etc..) of the movie. Several of the new columns contain json. There's now a separate file containing the full credits for both the cast and crew. All fields are filled out by users so don't expect them to agree on keywords, genres, ratings, or the like.

Important Features

We only need the following features:

id -title -cast -crew columns

Run Locally

Clone the project

  git clone https://github.com/anilbarmola/TMDV-5000-movie-Recommendation--project

Go to the project directory

  cd my-project

Install dependencies

  npm install

Start the server

  npm run start

Steps to build a Movie Recommendation System using Machine Learning

The approach to build the movie recommendation engine consists of the following steps.

Perform Exploratory Data Analysis (EDA) on the data
Build the recommendation system
Get recommendations

EDA

 a. get data 
 b. check null
 c. check duplicate values
 d. fetch related column
 e. preproceing

MODEL BUILDING

Our movie recommendation engine works by suggesting movies to the user based on the metadata information. The similarity between the movies is calculated and then used to make recommendations. For that, our text data should be preprocessed and converted into a vectorizer using the CountVectorizer. As the name suggests, CountVectorizer counts the frequency of each word and outputs a 2D vector containing frequencies. a. buling a model(vactorization) b. Bag of words(done) c.Cosine similarity We don’t take into account the words like a, an, the (these are called “stopwords”) because these words are usually present in higher amounts in the text and don’t make any sense.

There exist several similarity score functions such as cosine similarity, Pearson correlation coefficient, etc. Here, we use the cosine similarity score as this is just the dot product of the vector output by the CountVectorizer.

Summary

In this machine learning project, we build movie recommendation systems. We built a content-based recommendation engine that makes recommendations given the title of the movie as input

Content-based Filtering:

These suggest recommendations based on the item metadata (movie, product, song, etc). Here, the main idea is if a user likes an item, then the user will also like items similar to it.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
EDA.ipynb		EDA.ipynb
README.md		README.md
steps.txt		steps.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EDA.ipynb

EDA.ipynb

README.md

README.md

steps.txt

steps.txt

Repository files navigation

Aim of the project

DataSet:

About Dataset

Important Features

Run Locally

Steps to build a Movie Recommendation System using Machine Learning

EDA

MODEL BUILDING

Summary

Content-based Filtering:

About

Releases

Packages

Languages

anilbarmola/TMDV-5000-movie-Recommendation--project

Folders and files

Latest commit

History

Repository files navigation

Aim of the project

DataSet:

About Dataset

Important Features

Run Locally

Steps to build a Movie Recommendation System using Machine Learning

EDA

MODEL BUILDING

Summary

Content-based Filtering:

About

Resources

Stars

Watchers

Forks

Languages