#### **Import Pandas and Load the Dataset**
We're using a tool called **pandas** to help organize and work with data more easily.



In [10]:
# We're using a tool called "pandas" that helps us organize and work with data more easily.
import pandas as pd  # type: ignore

# Here, we're opening a file called 'loan_approval_dataset.csv' from a folder named 'data.'
# This file probably contains information related to loan approvals.
df = pd.read_csv('data\data_preprocessing.csv')  # Note: We use forward slashes for better compatibility on different systems.

# Now, we'll take a quick look at the first five rows of the data to understand what it looks like.
df.head()

  df = pd.read_csv('data\data_preprocessing.csv')  # Note: We use forward slashes for better compatibility on different systems.


Unnamed: 0,movie_id,title,tags
0,19995,Avatar,"In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,Following the death of District Attorney Harve...
4,49529,John Carter,"John Carter is a war-weary, former military ca..."


#### **Import CountVectorizer for Text Feature Extraction**
We will use **CountVectorizer** from the `sklearn` library to convert text data into a matrix of token counts. This helps in preparing the text data for machine learning models.

In [11]:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,stop_words='english')
   

In [12]:
vector = cv.fit_transform(df['tags']).toarray()

In [13]:
vector.shape

(4809, 5000)

#### **Import Cosine Similarity for Measuring Similarity**
We will use **cosine_similarity** from the `sklearn` library to compute the similarity between two sets of data.


In [14]:
from sklearn.metrics.pairwise import cosine_similarity

In [15]:
similarity = cosine_similarity(vector)

In [16]:
df[df['title'] == 'The Lego Movie'].index[0]

np.int64(744)

#### **Define a Function to Recommend Movies**
The following function `recommend` takes a movie title as input and suggests similar movies based on precomputed similarity scores.


In [17]:
def recommend(movie):
    index = df[df['title'] == movie].index[0]
    distances = sorted(list(enumerate(similarity[index])),reverse=True,key = lambda x: x[1])
    for i in distances[1:6]:
        print(df.iloc[i[0]].title)
        

In [29]:
recommend('The Dark Knight Rises')


The Dark Knight
Batman Begins
Batman
Batman Returns
Batman Forever


In [28]:
recommend('Gandhi')

The Wind That Shakes the Barley
A Passage to India
Ramanujan
Guiana 1838
Chariots of Fire
