# NLP
Branch of AI that focus on the interaction between computers and human language

* Text Preprocessing
* Language Understanding
* Sentiment Analysis
* Information Extraction
* Text Summarization

* Tokenization --> Breaking words into phases
* Lemmetization & Stemming --> Techniques used to reduce word to their root / base form
* POS --> Parts of Speech

# Recommendation System

* Content-Based Filtering
* Collaborative Filtering
  * User-Based
  * Item-Based
* Hybrid

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('/content/netflix_titles.csv')

data

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...
...,...,...,...,...,...,...,...,...,...,...,...,...
8802,s8803,Movie,Zodiac,David Fincher,"Mark Ruffalo, Jake Gyllenhaal, Robert Downey J...",United States,"November 20, 2019",2007,R,158 min,"Cult Movies, Dramas, Thrillers","A political cartoonist, a crime reporter and a..."
8803,s8804,TV Show,Zombie Dumb,,,,"July 1, 2019",2018,TV-Y7,2 Seasons,"Kids' TV, Korean TV Shows, TV Comedies","While living alone in a spooky town, a young g..."
8804,s8805,Movie,Zombieland,Ruben Fleischer,"Jesse Eisenberg, Woody Harrelson, Emma Stone, ...",United States,"November 1, 2019",2009,R,88 min,"Comedies, Horror Movies",Looking to survive in a world taken over by zo...
8805,s8806,Movie,Zoom,Peter Hewitt,"Tim Allen, Courteney Cox, Chevy Chase, Kate Ma...",United States,"January 11, 2020",2006,PG,88 min,"Children & Family Movies, Comedies","Dragged from civilian life, a former superhero..."


In [3]:
from sklearn.feature_extraction.text import TfidfVectorizer

tfidf_vectorizer = TfidfVectorizer(stop_words='english')

In [4]:
tfidf_matrix = tfidf_vectorizer.fit_transform(data['description'])

In [5]:
from sklearn.metrics.pairwise import cosine_similarity

cosine_sim = cosine_similarity(tfidf_matrix, tfidf_matrix)

In [8]:
def get_recommendation(title, num_recommendations=10, cosine_sim=cosine_sim):
  title = title.lower()

  if title not in data['title'].str.lower().values:
    return "Title not found in the dataset"

  idx = data[data['title'].str.lower() == title].index[0]

  sim_scores = list(enumerate(cosine_sim[idx]))

  sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

  top_indices = [i[0] for i in sim_scores[1:num_recommendations+1]]

  recommendations = data.loc[top_indices, ['title', 'type']].reset_index(drop=True)

  return recommendations

In [10]:
input_title = input('Enter Title: ')

recommendations = get_recommendation(input_title)

print(f'Recommended Titles for "{input_title}":')
recommendations

Enter Title: Dick Johnson Is Dead
Recommended Titles for "Dick Johnson Is Dead":


Unnamed: 0,title,type
0,End Game,Movie
1,The Soul,Movie
2,Moon,Movie
3,The Cloverfield Paradox,Movie
4,The Death and Life of Marsha P. Johnson,Movie
5,Kazoops!,TV Show
6,Alelí,Movie
7,Secrets in the Hot Spring,Movie
8,Tere Naal Love Ho Gaya,Movie
9,Kannum Kannum Kollaiyadithaal,Movie
