<a href="https://colab.research.google.com/github/alicetw40342/AI-Recommender-System/blob/main/Project2_Task2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

📖 Research: Content-based vs Collaborative Filtering
There are two primary approaches for recommender systems:

Content-based Filtering

Recommends items similar in content to what a user has liked.

Uses item features (e.g. movie plots, product descriptions).

Good when user data is limited.

Collaborative Filtering

Recommends items based on the preferences of similar users.

Learns hidden relationships between users and items.

Needs large user-item interaction data.

✅ My Chosen Approach: Content-based Filtering
I chose a content-based recommender because my dataset contains textual movie plots. By transforming plots into TF-IDF vectors, I can calculate similarities and recommend movies with similar storylines. This approach works well without requiring user ratings.



In [8]:
# 安裝必要套件
!pip install pandas scikit-learn

# 上傳 CSV
from google.colab import files
uploaded = files.upload()

# 讀 CSV
import pandas as pd
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# 請換成你實際的檔名
csv_file = list(uploaded.keys())[0]

df = pd.read_csv(csv_file)

# 假設你的 CSV 欄位名稱
print("原始欄位：", df.columns)

# 若欄位名稱不同請修改下方：
title_col = "Title"
plot_col = "Plot"

# 篩選必要欄位
df = df[[title_col, plot_col]].dropna()

# 篩掉空字串
df = df[df[plot_col].str.strip() != ""]

print("總共有幾筆：", len(df))
print(df.head())

# 清理文字
def clean_text(text):
    text = str(text).lower()
    text = re.sub(r'[^a-z\s]', '', text)     # 移除標點符號
    words = text.split()
    stopwords = set(TfidfVectorizer(stop_words='english').get_stop_words())
    words = [w for w in words if w not in stopwords]
    return ' '.join(words)

df['Cleaned_Plot'] = df[plot_col].apply(clean_text)

# 計算 TF-IDF
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(df['Cleaned_Plot'])

print("TF-IDF 矩陣大小：", tfidf_matrix.shape)

# 建立推薦函式
def recommend_movies(movie_title, top_n=5):
    if movie_title not in df[title_col].values:
        print(f"找不到 {movie_title}")
        return

    idx = df[df[title_col] == movie_title].index[0]

    cosine_sim = cosine_similarity(tfidf_matrix[idx], tfidf_matrix).flatten()

    similar_indices = cosine_sim.argsort()[-(top_n+1):-1][::-1]

    print(f"與《{movie_title}》劇情相似的 {top_n} 部電影：\n")
    for i in similar_indices:
        print(f"{df.iloc[i][title_col]} (Similarity: {cosine_sim[i]:.4f})")

# 測試推薦
recommend_movies(df.iloc[0][title_col])



Saving movie_plots.csv to movie_plots (1).csv
原始欄位： Index(['Title', 'Plot'], dtype='object')
總共有幾筆： 100
       Title                                               Plot
0  Movie 001  A secret agent fights crime in a futuristic city.
1  Movie 002  A young woman discovers magical powers within ...
2  Movie 003      A group of friends goes on an epic road trip.
3  Movie 004  A detective investigates mysterious disappeara...
4  Movie 005  Aliens visit Earth and reveal secrets of the u...
TF-IDF 矩陣大小： (100, 339)
與《Movie 001》劇情相似的 5 部電影：

Movie 019 (Similarity: 0.2701)
Movie 083 (Similarity: 0.1562)
Movie 092 (Similarity: 0.1474)
Movie 097 (Similarity: 0.1245)
Movie 033 (Similarity: 0.1244)
