<a href="https://colab.research.google.com/github/GaurRitika/Movie_Recommender_System/blob/main/Movie_Recommender_system.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Type of recommender system
---

## 1️⃣ Content-Based Recommender System

**Logic:** *“Show the content that user likes”*

* Use item's feature
  (movie genre, actors, director, keywords, description)
* User profile formation (user'slike / watch )

### Tech used

* TF-IDF
* Cosine similarity
* Embeddings

### Pros

* good works with new users , content get repeated for old users
* Personalised recommendations

### Cons

* Limited variety

---

## 2️⃣ Collaborative Filtering Recommender System

**Logic:** *“Similar people's like”*

### Types:

#### a) User-Based CF

* Find Similar users

#### b) Item-Based CF

* Find Similar items

### Tech used

* User-Item Matrix
* Cosine / Pearson similarity

### Pros

* Diverse recommendations

### Cons

* Cold start problem (new user / new movie)

---

## 3️⃣ Hybrid Recommender System

**Logic:** Content-Based + Collaborative

### Example (Netflix, Amazon)

### Pros

* Low Cold start
* Better accuracy

### Cons

* System becomes complex

---

## 4️⃣ Knowledge-Based Recommender System

**Logic:** *Basis on Rules & conditions*

### Example

* “wants a family movie of less than 2 hours”

### Used when

* User history less
* Explicit requirements needed

---

## 5️⃣ Popularity-Based Recommender

**Logic:** *content which is in trend*

### Example

* Trending movies
* Most watched

### Pros

* Simple
* Cold start isn't issue

### Cons

* No Personalisation

---

## 6️⃣ Demographic-Based Recommender

**Logic:** on basis of Age, gender, location

### Example

* Teen users → teen movies
* Kids → cartoon

---

## 7️⃣ Model-Based Recommender (Advanced)

**Logic:** Use ML / DL models

### Tech

* Matrix Factorization (SVD)
* Neural Networks
* Deep Learning
* LLM-based recommenders (recent)

### Used in

* Netflix
* YouTube
* Spotify

offcourse , we are going to make content based


In [238]:
import pandas as pd
import numpy as np


In [239]:
movies = pd.read_csv('/content/sample_data/tmdb_5000_movies.csv')
credits = pd.read_csv('/content/sample_data/tmdb_5000_credits.csv')

In [240]:
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


In [241]:
credits.head()

Unnamed: 0,movie_id,title,cast,crew
0,19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [242]:
credits.head(1)['cast'].values

array(['[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_id": "52fe48009251416c750ac9cb", "gender": 1, "id": 8691, "name": "Zoe Saldana", "order": 1}, {"cast_id": 25, "character": "Dr. Grace Augustine", "credit_id": "52fe48009251416c750aca39", "gender": 1, "id": 10205, "name": "Sigourney Weaver", "order": 2}, {"cast_id": 4, "character": "Col. Quaritch", "credit_id": "52fe48009251416c750ac9cf", "gender": 2, "id": 32747, "name": "Stephen Lang", "order": 3}, {"cast_id": 5, "character": "Trudy Chacon", "credit_id": "52fe48009251416c750ac9d3", "gender": 1, "id": 17647, "name": "Michelle Rodriguez", "order": 4}, {"cast_id": 8, "character": "Selfridge", "credit_id": "52fe48009251416c750ac9e1", "gender": 2, "id": 1771, "name": "Giovanni Ribisi", "order": 5}, {"cast_id": 7, "character": "Norm Spellman", "credit_id": "52fe48009251416c750ac9dd", "ge

In [243]:
credits.head(1)['crew'].values

array(['[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job": "Sound Designer", "name": "Christopher Boyes"}, {"credit_id": "54491cb70e0a267480001bd0", "department": "Sound", "gender": 0, "id": 900, "job": "Supervising Sound Editor", "name": "Christopher Boyes"}, {"credit_id": "539c4a4cc3a36810c9002101", "department": "Production", "gender": 1, "id": 1262, "job": "Casting", "name": "Mali Finn"}, {"credit_id": "5544ee3b925141499f0008fc", "department": "Sound", "gender": 2, "id": 1729, "job": "Original Music Composer", "name": "James Horner"}, {"credit_id": "52fe48009251416c750ac9c3", "department": "Directing", "gender": 2, "id": 2710, "job": "Director", "name": "James Cam

Now , having 2 different datasets is kinda hectic hence lets merge this on the basis of movie id or movie title

In [244]:
movies.shape

(4803, 20)

In [245]:
credits.shape

(4803, 4)

In [246]:
#merge both datasets

movies = movies.merge(credits , on = 'title')

In [247]:
movies.shape

(4809, 23)

In [248]:
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [249]:
#remove irrlevent column
#budget high hona isn't the criteria which will decide my taste in movies , homepage , now if u check the count of original_language around 4510 is in english so drop it also , remove original_title , popularity , production_comapnies , production_countries , release_date , revenue , runtime , spoeken_language, status , tagline , vote_average

# so , wanting these columns
# genres, movie_id , title , cast , crew , overview , keywords

movies[['genres' , 'movie_id' , 'title' , 'cast' , 'crew' , 'overview' , 'keywords']]

Unnamed: 0,genres,movie_id,title,cast,crew,overview,keywords
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",19995,Avatar,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","In the 22nd century, a paraplegic Marine is di...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":..."
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",285,Pirates of the Caribbean: At World's End,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","Captain Barbossa, long believed to be dead, ha...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na..."
2,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",206647,Spectre,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...",A cryptic message from Bond’s past sends him o...,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name..."
3,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",49026,The Dark Knight Rises,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...",Following the death of District Attorney Harve...,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,..."
4,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",49529,John Carter,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","John Carter is a war-weary, former military ca...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":..."
...,...,...,...,...,...,...,...
4804,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",9367,El Mariachi,"[{""cast_id"": 1, ""character"": ""El Mariachi"", ""c...","[{""credit_id"": ""52fe44eec3a36847f80b280b"", ""de...",El Mariachi just wants to play his guitar and ...,"[{""id"": 5616, ""name"": ""united states\u2013mexi..."
4805,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",72766,Newlyweds,"[{""cast_id"": 1, ""character"": ""Buzzy"", ""credit_...","[{""credit_id"": ""52fe487dc3a368484e0fb013"", ""de...",A newlywed couple's honeymoon is upended by th...,[]
4806,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",231617,"Signed, Sealed, Delivered","[{""cast_id"": 8, ""character"": ""Oliver O\u2019To...","[{""credit_id"": ""52fe4df3c3a36847f8275ecf"", ""de...","""Signed, Sealed, Delivered"" introduces a dedic...","[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam..."
4807,[],126186,Shanghai Calling,"[{""cast_id"": 3, ""character"": ""Sam"", ""credit_id...","[{""credit_id"": ""52fe4ad9c3a368484e16a36b"", ""de...",When ambitious New York attorney Sam is sent t...,[]


In [250]:
movies = movies[['movie_id' , 'title' , 'overview' ,'genres', 'cast' , 'crew' , 'keywords']]

In [251]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":..."


Now , in data preprocessing , we need to reduce the data in 3 column , adding overview , keywords , cast and crew to make 1 single column known as tags.

For this lots of data preprocessing needed , as many data is in the weird format . Like in cast , add only top 3 actors , as in crew member , only director name will be added.

In [252]:
#Check for duplicacy
movies.isnull().sum()

Unnamed: 0,0
movie_id,0
title,0
overview,3
genres,0
cast,0
crew,0
keywords,0


In [253]:
#that's means there are 3 movies whose overview i don't know . Hence drop those 3 movies.
movies.dropna(inplace = True)

# dropna() is used for deleteing that row in which value is missing (NaN / None)
# inplace = True . that's means directly change the original dataframe.
# If there is inplace = False , this will return a new dataframe.

In [254]:
# Now again check for duplicacy

movies.duplicated().sum()

np.int64(0)

In [255]:
movies.iloc[0].genres

# now iloc stands for index location , that's means here pick the 0th row , in which pick genres's data.
# this is equivalent to movies.at[0 , "genres"]

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [256]:
# now here , this is actually list of dictionaries , and we just wanted this
# {'Action' , 'Aventure' , 'Fantasy' , 'SciFi'}
# lets create a helper function


def convert(obj):
  L = []
  for i in obj:
    L.append(i['name'])
  return L




In [257]:
convert([{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}])

['Action', 'Adventure', 'Fantasy', 'Science Fiction']

i used this which gave error , as the data is in string not list

movies['genres'].apply(convert)

In [258]:
# the problem is because , becuase genre's content is in string not lists
# in convert i passed this = [{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}] , which is  a list , but
# in reality this is actually = "[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]" , which is a string , check it

type(movies['genres'].iloc[0])


str

In [259]:
# hence firstly convert string -> list
import ast

def convert(obj):
  L = []
  obj = ast.literal_eval(obj)
  for i in obj:
    L.append(i['name'])
  return L

In [260]:
movies['genres'].apply(convert)

Unnamed: 0,genres
0,"[Action, Adventure, Fantasy, Science Fiction]"
1,"[Adventure, Fantasy, Action]"
2,"[Action, Adventure, Crime]"
3,"[Action, Crime, Drama, Thriller]"
4,"[Action, Adventure, Science Fiction]"
...,...
4804,"[Action, Crime, Thriller]"
4805,"[Comedy, Romance]"
4806,"[Comedy, Drama, Romance, TV Movie]"
4807,[]


In [261]:
movies['genres'] = movies['genres'].apply(convert)

In [262]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[{""id"": 818, ""name"": ""based on novel""}, {""id"":..."


In [263]:
# same apply for keywords also

movies['keywords'].apply(convert)

Unnamed: 0,keywords
0,"[culture clash, future, space war, space colon..."
1,"[ocean, drug abuse, exotic island, east india ..."
2,"[spy, based on novel, secret agent, sequel, mi..."
3,"[dc comics, crime fighter, terrorist, secret i..."
4,"[based on novel, mars, medallion, space travel..."
...,...
4804,"[united states–mexico barrier, legs, arms, pap..."
4805,[]
4806,"[date, love at first sight, narration, investi..."
4807,[]


In [264]:
movies['keywords'] = movies['keywords'].apply(convert)

In [265]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[based on novel, mars, medallion, space travel..."


In [266]:
# now in cast we need to find top3 dictionaries , after that data we don't care
# lets do a little change in that convert helper function

def convert3(obj):
  L = []
  counter = 0
  for i in ast.literal_eval(obj):
    if counter != 3:
      L.append(i['name'])
      counter += 1
    else:
      break
  return L

In [267]:
movies['cast'] = movies['cast'].apply(convert3)

In [268]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[based on novel, mars, medallion, space travel..."


In [269]:
#lets check in crew dictionaries does in any place director role is missing
movies['crew'].iloc[0]


'[{"credit_id": "52fe48009251416c750aca23", "department": "Editing", "gender": 0, "id": 1721, "job": "Editor", "name": "Stephen E. Rivkin"}, {"credit_id": "539c47ecc3a36810e3001f87", "department": "Art", "gender": 2, "id": 496, "job": "Production Design", "name": "Rick Carter"}, {"credit_id": "54491c89c3a3680fb4001cf7", "department": "Sound", "gender": 0, "id": 900, "job": "Sound Designer", "name": "Christopher Boyes"}, {"credit_id": "54491cb70e0a267480001bd0", "department": "Sound", "gender": 0, "id": 900, "job": "Supervising Sound Editor", "name": "Christopher Boyes"}, {"credit_id": "539c4a4cc3a36810c9002101", "department": "Production", "gender": 1, "id": 1262, "job": "Casting", "name": "Mali Finn"}, {"credit_id": "5544ee3b925141499f0008fc", "department": "Sound", "gender": 2, "id": 1729, "job": "Original Music Composer", "name": "James Horner"}, {"credit_id": "52fe48009251416c750ac9c3", "department": "Directing", "gender": 2, "id": 2710, "job": "Director", "name": "James Cameron"},

In [270]:
import ast

def has_director(crew):
    crew = ast.literal_eval(crew)   # string → list
    for member in crew:
        if member['job'] == 'Director':
            return True
    return False


In [271]:
movies['has_director'] = movies['crew'].apply(has_director)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords,has_director
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[culture clash, future, space war, space colon...",True
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[ocean, drug abuse, exotic island, east india ...",True
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[spy, based on novel, secret agent, sequel, mi...",True
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[dc comics, crime fighter, terrorist, secret i...",True
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[based on novel, mars, medallion, space travel...",True


In [272]:
movies[movies['has_director'] == False]


Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords,has_director
3665,19615,Flying By,A real estate developer goes to his 25th high ...,[Drama],"[Billy Ray Cyrus, Heather Locklear, Ahnaise Ch...",[],[],False
3674,447027,Running Forever,After being estranged since her mother's death...,[Family],[],[],[],False
3734,26379,Paa,He suffers from a progeria like syndrome. Ment...,"[Drama, Family, Foreign]","[Amitabh Bachchan, Abhishek Bachchan, Vidya Ba...","[{""credit_id"": ""52fe44fec3a368484e042a29"", ""de...",[],False
3982,55831,Boynton Beach Club,A handful of men and women of a certain age pi...,"[Comedy, Drama, Romance]","[Brenda Vaccaro, Dyan Cannon, Joseph Bologna]",[],[independent film],False
4073,371085,Sharkskin,The Post War II story of Manhattan born Mike E...,[],[],[],[],False
4110,48382,"The Book of Mormon Movie, Volume 1: The Journey",The story of Lehi and his wife Sariah and thei...,[],"[Kirby Heyborne, Michael Flynn]",[],[],False
4123,325140,Hum To Mohabbat Karega,"Raju, a waiter, is in love with the famous TV ...",[],[],[],[],False
4128,20653,Roadside Romeo,This is the story of Romeo. A dude who was liv...,"[Animation, Family, Foreign]","[Saif Ali Khan, Kareena Kapoor, Javed Jaffrey]",[],[],False
4252,361505,Me You and Five Bucks,"A womanizing yet lovable loser, Charlie, a wai...","[Romance, Comedy, Drama]",[],[],[],False
4311,114065,Down & Out With The Dolls,"The raunchy, spunky tale of the rise and fall ...","[Comedy, Music]",[],[],[],False


i done this just to understand how to check those dictionaries in which director's job is missing , hence now we can drop has_director column

In [273]:
#let's drop the has_director column
movies.drop('has_director', axis=1, inplace=True)
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]","[{""credit_id"": ""52fe48009251416c750aca23"", ""de...","[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de...","[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de...","[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de...","[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de...","[based on novel, mars, medallion, space travel..."


In [274]:
#now in crew we only need to see in those dictionaries where job is about director , from there only we need to extract name
#let's make another helper function

def fetch_director(obj):
  L = []
  for i in ast.literal_eval(obj):
    if i['job'] == 'Director':
      L.append(i['name'])
      break
  return L

In [275]:
movies['crew'].apply(fetch_director)

Unnamed: 0,crew
0,[James Cameron]
1,[Gore Verbinski]
2,[Sam Mendes]
3,[Christopher Nolan]
4,[Andrew Stanton]
...,...
4804,[Robert Rodriguez]
4805,[Edward Burns]
4806,[Scott Smith]
4807,[Daniel Hsia]


In [276]:
movies['crew'] = movies['crew'].apply(fetch_director)

In [277]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron],"[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski],"[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...,"[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes],"[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...,"[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan],"[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,"John Carter is a war-weary, former military ca...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton],"[based on novel, mars, medallion, space travel..."


In [278]:
movies['overview'] = movies['overview'].apply(lambda x: x.split() if isinstance(x, str) else x)

In [279]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, Science Fiction]","[Sam Worthington, Zoe Saldana, Sigourney Weaver]",[James Cameron],"[culture clash, future, space war, space colon..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[Johnny Depp, Orlando Bloom, Keira Knightley]",[Gore Verbinski],"[ocean, drug abuse, exotic island, east india ..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[Daniel Craig, Christoph Waltz, Léa Seydoux]",[Sam Mendes],"[spy, based on novel, secret agent, sequel, mi..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[Christian Bale, Michael Caine, Gary Oldman]",[Christopher Nolan],"[dc comics, crime fighter, terrorist, secret i..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, Science Fiction]","[Taylor Kitsch, Lynn Collins, Samantha Morton]",[Andrew Stanton],"[based on novel, mars, medallion, space travel..."


In [280]:
# last thing that left is we need to remove space like Sam Altman = SamAltman as their may be Sam Chadwick also
movies['genres'] = movies['genres'].apply(lambda x:[i.replace(" " , "") for i in x])
movies['keywords'] = movies['keywords'].apply(lambda x:[i.replace(" " , "") for i in x])
movies['cast'] = movies['cast'].apply(lambda x:[i.replace(" " , "") for i in x])
movies['crew'] = movies['crew'].apply(lambda x:[i.replace(" " , "") for i in x])

In [281]:
movies.head()
#data is now properly clean

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[cultureclash, future, spacewar, spacecolony, ..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[ocean, drugabuse, exoticisland, eastindiatrad..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[spy, basedonnovel, secretagent, sequel, mi6, ..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[dccomics, crimefighter, terrorist, secretiden..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[basedonnovel, mars, medallion, spacetravel, p..."


In [282]:
#lets concatenate

movies['tags'] = movies['overview'] + movies['genres'] + movies['keywords'] + movies['cast']

In [283]:
movies.head()

Unnamed: 0,movie_id,title,overview,genres,cast,crew,keywords,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin...","[Action, Adventure, Fantasy, ScienceFiction]","[SamWorthington, ZoeSaldana, SigourneyWeaver]",[JamesCameron],"[cultureclash, future, spacewar, spacecolony, ...","[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d...","[Adventure, Fantasy, Action]","[JohnnyDepp, OrlandoBloom, KeiraKnightley]",[GoreVerbinski],"[ocean, drugabuse, exoticisland, eastindiatrad...","[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send...","[Action, Adventure, Crime]","[DanielCraig, ChristophWaltz, LéaSeydoux]",[SamMendes],"[spy, basedonnovel, secretagent, sequel, mi6, ...","[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney...","[Action, Crime, Drama, Thriller]","[ChristianBale, MichaelCaine, GaryOldman]",[ChristopherNolan],"[dccomics, crimefighter, terrorist, secretiden...","[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili...","[Action, Adventure, ScienceFiction]","[TaylorKitsch, LynnCollins, SamanthaMorton]",[AndrewStanton],"[basedonnovel, mars, medallion, spacetravel, p...","[John, Carter, is, a, war-weary,, former, mili..."


In [284]:
new_df = movies[['movie_id' , 'title' , 'tags']]

In [285]:
new_df
#amazing!!

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"[In, the, 22nd, century,, a, paraplegic, Marin..."
1,285,Pirates of the Caribbean: At World's End,"[Captain, Barbossa,, long, believed, to, be, d..."
2,206647,Spectre,"[A, cryptic, message, from, Bond’s, past, send..."
3,49026,The Dark Knight Rises,"[Following, the, death, of, District, Attorney..."
4,49529,John Carter,"[John, Carter, is, a, war-weary,, former, mili..."
...,...,...,...
4804,9367,El Mariachi,"[El, Mariachi, just, wants, to, play, his, gui..."
4805,72766,Newlyweds,"[A, newlywed, couple's, honeymoon, is, upended..."
4806,231617,"Signed, Sealed, Delivered","[""Signed,, Sealed,, Delivered"", introduces, a,..."
4807,126186,Shanghai Calling,"[When, ambitious, New, York, attorney, Sam, is..."


In [286]:
#convert list to string in tags

new_df['tags'] = new_df['tags'].apply(lambda x:" ".join(x))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(lambda x:" ".join(x))


In [287]:
new_df.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


In [288]:
new_df['tags'][0]

'In the 22nd century, a paraplegic Marine is dispatched to the moon Pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. Action Adventure Fantasy ScienceFiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d SamWorthington ZoeSaldana SigourneyWeaver'

In [289]:
new_df['tags'] = new_df['tags'].apply(lambda x:x.lower())

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(lambda x:x.lower())


In [290]:
new_df.head()

Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"in the 22nd century, a paraplegic marine is di..."
1,285,Pirates of the Caribbean: At World's End,"captain barbossa, long believed to be dead, ha..."
2,206647,Spectre,a cryptic message from bond’s past sends him o...
3,49026,The Dark Knight Rises,following the death of district attorney harve...
4,49529,John Carter,"john carter is a war-weary, former military ca..."


In [291]:
new_df['tags'][0]

'in the 22nd century, a paraplegic marine is dispatched to the moon pandora on a unique mission, but becomes torn between following orders and protecting an alien civilization. action adventure fantasy sciencefiction cultureclash future spacewar spacecolony society spacetravel futuristic romance space alien tribe alienplanet cgi marine soldier battle loveaffair antiwar powerrelations mindandsoul 3d samworthington zoesaldana sigourneyweaver'

In [292]:
new_df['tags'][1]

"captain barbossa, long believed to be dead, has come back to life and is headed to the edge of the earth with will turner and elizabeth swann. but nothing is quite as it seems. adventure fantasy action ocean drugabuse exoticisland eastindiatradingcompany loveofone'slife traitor shipwreck strongwoman ship alliance calypso afterlife fighter pirate swashbuckler aftercreditsstinger johnnydepp orlandobloom keiraknightley"

In [293]:
#lets convert them into vector

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 5000 , stop_words = 'english')

In [294]:
#array
cv.fit_transform(new_df['tags']).toarray()

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [295]:
#this way , har movie vector form meh aa jayegi
vectors=cv.fit_transform(new_df['tags']).toarray().shape

In [296]:
vectors

(4806, 5000)

In [297]:
vectors[0]

4806

In [298]:
vectors[1]

5000

In [299]:
cv.get_feature_names_out()

array(['000', '007', '10', ..., 'zone', 'zoo', 'zooeydeschanel'],
      dtype=object)

In [300]:
#apply stemming , u studied in nlp
!pip install nltk



In [301]:
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()

In [302]:
def stem(text):
  y = []

  for i in text.split(): # list meh convert karke har word ko stem karte jaayenge
      y.append(ps.stem(i))

  return " ".join(y)

In [303]:
ps.stem('caring')

'care'

In [304]:
new_df['tags'].apply(stem)

Unnamed: 0,tags
0,"in the 22nd century, a parapleg marin is dispa..."
1,"captain barbossa, long believ to be dead, ha c..."
2,a cryptic messag from bond’ past send him on a...
3,follow the death of district attorney harvey d...
4,"john carter is a war-weary, former militari ca..."
...,...
4804,el mariachi just want to play hi guitar and ca...
4805,a newlyw couple' honeymoon is upend by the arr...
4806,"""signed, sealed, delivered"" introduc a dedic q..."
4807,when ambiti new york attorney sam is sent to s...


In [305]:
new_df['tags'] = new_df['tags'].apply(stem)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags'] = new_df['tags'].apply(stem)


In [306]:
#now lets come back to cv
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 5000 , stop_words = 'english')


In [307]:
vectors = cv.fit_transform(new_df['tags']).toarray()

In [308]:
vectors

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [309]:
vectors[0]

array([0, 0, 0, ..., 0, 0, 0])

In [310]:
cv.get_feature_names_out()
#now , like above will be abov , ability = abil , academia = academi , accident = accid hence many more , now words repeat nahi ho rahein

array(['000', '007', '10', ..., 'zone', 'zoo', 'zooeydeschanel'],
      dtype=object)

In [311]:
# now there are 4806 movies hence 4806  vectors , now har movie ka har movie ke saath distence calculate karna heh
# now , here we will not find the Euclidean distance (which is b/w tip to tip) , we will find the cosine distance , that's means about the angle
# jitna jyaada angle utna jyaada distance , Euclidean distance is not a reliable measure in higher dimension

from sklearn.metrics.pairwise import cosine_similarity

In [312]:
# now har point ka means har ek vector ka left 4806 vectors ke saath distance , total distances will be 4806 * 4806


similarity = cosine_similarity(vectors)

In [313]:
similarity.shape

(4806, 4806)

In [314]:
similarity[0]


array([1.        , 0.08471737, 0.08740748, ..., 0.04499213, 0.        ,
       0.        ])

In [315]:
similarity[0].shape

(4806,)

In [316]:
list(enumerate(similarity[0]))

#that's means 0 ka 0 , 1, 2 , 3 , 4 ....4806 ke saath joh distance aa rha


[(0, np.float64(1.0000000000000004)),
 (1, np.float64(0.08471737420873576)),
 (2, np.float64(0.08740748201220976)),
 (3, np.float64(0.07394738666465357)),
 (4, np.float64(0.1892994097121204)),
 (5, np.float64(0.10936965981495178)),
 (6, np.float64(0.03905832834322535)),
 (7, np.float64(0.14673479641335554)),
 (8, np.float64(0.05923488777590923)),
 (9, np.float64(0.0978231976089037)),
 (10, np.float64(0.10390486669322622)),
 (11, np.float64(0.09567297464698798)),
 (12, np.float64(0.09037128496931669)),
 (13, np.float64(0.04543108504242546)),
 (14, np.float64(0.12988108336653278)),
 (15, np.float64(0.06282808624375433)),
 (16, np.float64(0.07894736842105264)),
 (17, np.float64(0.13872950617564817)),
 (18, np.float64(0.09558988911273408)),
 (19, np.float64(0.0837707816583391)),
 (20, np.float64(0.057807331301608)),
 (21, np.float64(0.10968169942141635)),
 (22, np.float64(0.06765100914917384)),
 (23, np.float64(0.08885233166386385)),
 (24, np.float64(0.05407380704358751)),
 (25, np.float64

In [317]:
#apply sorting here
sorted(list(enumerate(similarity[0])),reverse = True )
# this sorted on the basis of index no. which we don't wants

[(4805, np.float64(0.0)),
 (4804, np.float64(0.0)),
 (4803, np.float64(0.04499212706658476)),
 (4802, np.float64(0.05407380704358751)),
 (4801, np.float64(0.019389168358237032)),
 (4800, np.float64(0.0)),
 (4799, np.float64(0.052631578947368425)),
 (4798, np.float64(0.042601432284230495)),
 (4797, np.float64(0.0)),
 (4796, np.float64(0.0)),
 (4795, np.float64(0.0)),
 (4794, np.float64(0.0)),
 (4793, np.float64(0.05407380704358751)),
 (4792, np.float64(0.0)),
 (4791, np.float64(0.0)),
 (4790, np.float64(0.057353933467640436)),
 (4789, np.float64(0.060833032924035954)),
 (4788, np.float64(0.0)),
 (4787, np.float64(0.019672236884115842)),
 (4786, np.float64(0.0)),
 (4785, np.float64(0.020121090914638345)),
 (4784, np.float64(0.043355498476206004)),
 (4783, np.float64(0.0)),
 (4782, np.float64(0.027036903521793755)),
 (4781, np.float64(0.0582716546748065)),
 (4780, np.float64(0.0)),
 (4779, np.float64(0.0)),
 (4778, np.float64(0.0)),
 (4777, np.float64(0.11470786693528087)),
 (4776, np.flo

In [318]:
sorted(list(enumerate(similarity[0])),reverse = True , key = lambda x:x[1])

[(0, np.float64(1.0000000000000004)),
 (1214, np.float64(0.2847987184339659)),
 (2405, np.float64(0.26600795837367097)),
 (3728, np.float64(0.2605130246476754)),
 (507, np.float64(0.257841025556124)),
 (539, np.float64(0.25038669783359574)),
 (582, np.float64(0.24511108480187255)),
 (1202, np.float64(0.24455799402225922)),
 (1192, np.float64(0.2367785320221084)),
 (61, np.float64(0.23179316248638276)),
 (778, np.float64(0.23174488732966075)),
 (4046, np.float64(0.2278389747471728)),
 (1916, np.float64(0.2252817784447915)),
 (2782, np.float64(0.21853668936906193)),
 (972, np.float64(0.2108663315950723)),
 (322, np.float64(0.2105263157894737)),
 (172, np.float64(0.2075143391598224)),
 (151, np.float64(0.20751433915982237)),
 (973, np.float64(0.2073221072156823)),
 (2329, np.float64(0.20647416048350561)),
 (74, np.float64(0.20443988269091456)),
 (3606, np.float64(0.20437977982832192)),
 (260, np.float64(0.20395079136182276)),
 (4190, np.float64(0.2029530274475215)),
 (1440, np.float64(0.2

In [319]:
#top 5 distance

sorted(list(enumerate(similarity[0])),reverse = True , key = lambda x:x[1])[1:6]

[(1214, np.float64(0.2847987184339659)),
 (2405, np.float64(0.26600795837367097)),
 (3728, np.float64(0.2605130246476754)),
 (507, np.float64(0.257841025556124)),
 (539, np.float64(0.25038669783359574))]

In [320]:
#lets sort them and fetch the top5 movies
# now , on similarity[0] , that's means 0 ka 0 , 1, 2 , 3 , 4 ....4806 ke saath joh distance aa rha . If we sort here , then it will errupt the indexing
# so we need to hold the index , hence we will use enumerate function
def recommend(movie):
  movie_index = new_df[new_df['title'] == movie].index[0] #use to find index of the movie
  distances = similarity[movie_index]


  movies_list = sorted(list(enumerate(distances)),reverse = True , key = lambda x:x[1])[1:6]

  for i in movies_list:
    print(i[0])

  return

In [321]:
recommend('Avatar')
# this is giving the indexes of the recommendations

1214
2405
3728
507
539


In [322]:
#hence use like this

def recommend(movie):
  movie_index = new_df[new_df['title'] == movie].index[0] #use to find index of the movie
  distances = similarity[movie_index]


  movies_list = sorted(list(enumerate(distances)),reverse = True , key = lambda x:x[1])[1:6]

  for i in movies_list:
    print(new_df.iloc[i[0]].title)

  return

In [323]:
recommend('Avatar')

Aliens vs Predator: Requiem
Aliens
Falcon Rising
Independence Day
Titan A.E.
