# Hybrid Content Recommendation System Using AI

Course: AI Applications – Individual Open Project  
Track: Recommendation Systems (Hybrid AI Application)


## 1. Problem Definition & Objective

This project builds a hybrid content recommendation system that combines
content-based filtering and collaborative filtering to recommend items
to users based on past interactions.


In [32]:
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix


from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import normalize

from implicit.als import AlternatingLeastSquares


## 2. Data Understanding & Preparation

Dataset Type: Synthetic  
Reason: Ethical safety and full explainability


In [33]:
np.random.seed(42)

users = range(1, 21)
items = range(1, 16)

titles = [f"Item {i}" for i in items]
contents = ["machine learning data science" for _ in items]
tags = ["ml data ai" for _ in items]

rows = []

for u in users:
    interacted = np.random.choice(items, size=5, replace=False)
    for i in interacted:
        rows.append([u, i, 1, titles[i-1], contents[i-1], tags[i-1]])

df = pd.DataFrame(
    rows,
    columns=["user_id", "item_id", "interaction", "title", "content", "tags"]
)

df.head()


Unnamed: 0,user_id,item_id,interaction,title,content,tags
0,1,10,1,Item 10,machine learning data science,ml data ai
1,1,12,1,Item 12,machine learning data science,ml data ai
2,1,1,1,Item 1,machine learning data science,ml data ai
3,1,14,1,Item 14,machine learning data science,ml data ai
4,1,6,1,Item 6,machine learning data science,ml data ai


In [34]:
print("Dataset shape:", df.shape)


Dataset shape: (100, 6)


## 3. Model / System Design

We use:
- TF-IDF for content-based filtering
- ALS for collaborative filtering
- A weighted hybrid score


In [35]:
texts = df.drop_duplicates("item_id")[["title", "content", "tags"]]
texts["combined"] = texts["title"] + " " + texts["content"] + " " + texts["tags"]

vectorizer = TfidfVectorizer()
item_tfidf = vectorizer.fit_transform(texts["combined"])
item_tfidf = normalize(item_tfidf)


In [36]:
from scipy.sparse import csr_matrix

interaction_matrix = df.pivot_table(
    index="user_id",
    columns="item_id",
    values="interaction",
    fill_value=0
)

# Convert to CSR matrix (THIS IS THE FIX)
interaction_csr = csr_matrix(interaction_matrix.values)

als = AlternatingLeastSquares(factors=10, iterations=5)
als.fit(interaction_csr)



100%|████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 2531.87it/s]


## 4. Recommendation Output


In [37]:
def hybrid_recommend(user_id, k=5, alpha=0.6):
    user_index = user_id - 1

    collab_scores = als.user_factors[user_index] @ als.item_factors.T
    content_scores = cosine_similarity(item_tfidf).mean(axis=0)

    hybrid_scores = alpha * collab_scores + (1 - alpha) * content_scores
    top_items = np.argsort(-hybrid_scores)[:k]

    return list(top_items + 1)

recommendations = hybrid_recommend(1)

print("Recommended items for user 1:")
print(recommendations)


Recommended items for user 1:
[np.int64(12), np.int64(14), np.int64(1), np.int64(10), np.int64(6)]


## 5. Evaluation


In [38]:
def precision_at_k(actual, predicted, k):
    return len(set(actual) & set(predicted[:k])) / k

actual_items = df[df.user_id == 1]["item_id"].tolist()
score = precision_at_k(actual_items, recommendations, 5)

print("Precision@5:", score)


Precision@5: 1.0


## 6. Ethical Considerations

- Synthetic data avoids privacy issues
- No real user data is used
- No automated decision-making impact


## 7. Conclusion

The hybrid recommendation system successfully demonstrates how combining
content-based and collaborative filtering improves personalization.


In [39]:
recommendations = hybrid_recommend(1)
print("Recommended items for user 1:")
print(recommendations)



Recommended items for user 1:
[np.int64(12), np.int64(14), np.int64(1), np.int64(10), np.int64(6)]
