<a href="https://colab.research.google.com/github/MehrdadJalali-AI/RecommenderSystems/blob/main/POI_Recommender_Yelp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


<div style="background-color:#ffffff; border:3px solid #f15a22; border-radius:10px; padding:25px; text-align:center; font-family:'Segoe UI',sans-serif;">
  <h1 style="color:#003366;">Point-of-Interest (POI) Recommender System</h1>
  <h2 style="color:#f15a22;">Yelp Dataset ¬∑ Collaborative Filtering ¬∑ Geolocation</h2>
  <p style="color:#003366;"><strong>Author:</strong> Prof. Mehrdad Jalali | SRH University Heidelberg</p>
  <p style="font-style:italic; color:#444;">An interactive and visually engaging tutorial for building location-aware recommender systems.</p>
</div>



## üéØ Learning Objectives

By completing this notebook, you will:
1. **Load and explore** real Yelp review data  
2. **Build a user‚Äìitem matrix** for collaborative filtering  
3. **Compute cosine similarity** between businesses  
4. **Predict missing ratings** using item-item CF  
5. **Integrate geolocation** for distance-aware recommendations  
6. **Generate personalized POIs** ranked by proximity and preference  
7. **Visualize results** interactively on a map  

---

## üß≠ Notebook Roadmap

| Step | Goal |
|------|------|
| 1 | Setup & import libraries |
| 2 | Load & explore data |
| 3 | Build rating matrix |
| 4 | Compute item similarity |
| 5 | Predict ratings |
| 6 | Add distance weighting |
| 7 | Generate recommendations |
| 8 | Visualize results |
| 9 | Summary & reflections |



## Step 1Ô∏è‚É£ ‚Äî Setup & Import Libraries

We use the following:
- **pandas** / **numpy** for data manipulation  
- **cosine_similarity** from `sklearn` for CF  
- **geopy** for geographic distance  
- **folium** for map visualization  

Let's load them and confirm success.


In [1]:

import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from geopy.distance import geodesic
import folium
from IPython.display import IFrame, display
import warnings
warnings.filterwarnings("ignore")

print("‚úÖ All libraries imported successfully.")


‚úÖ All libraries imported successfully.



## Step 2Ô∏è‚É£ ‚Äî Load and Explore the Yelp Dataset

We use a **sample** from the Yelp dataset hosted in your course repository.  
This includes:
- User IDs  
- Business IDs  
- Ratings (`stars`)  
- Geographic coordinates  
- Categories


In [2]:

url = "https://raw.githubusercontent.com/MehrdadJalali-AI/RecommenderSystems/main/YelpDataset/yelp_sample.csv"
yelp = pd.read_csv(url)
print(f"‚úÖ Yelp dataset loaded successfully! Shape: {yelp.shape}")
display(yelp.head())


‚úÖ Yelp dataset loaded successfully! Shape: (5000, 6)


Unnamed: 0,user_id,business_id,stars,latitude,longitude,categories
0,JJ-qgqyl4M3N88owKJRJHw,H3JjbID9Zhukpqr9uupHAA,4,36.13663,-86.800437,"Burgers, Bars, Restaurants, Nightlife, America..."
1,6j4NN66UESCtdcNJulB4fw,_Xo-JzgxbMaDi5cvlfHpwg,1,27.994221,-82.220255,"Restaurants, Barbeque"
2,nE90dt6_P6UyFe80LUrnvQ,WbA5ud4InNWkizW7HE5kRQ,1,32.22097,-110.970157,"Mexican, Restaurants, Nightlife, Bars, Venues ..."
3,kaHmTcEoVS3oQvfJfDe8kg,KhBUg5QhBYuK8RZAe5gDMQ,5,27.99195,-82.459645,"Restaurants, Nightlife, Pubs, Food, Bars, Amer..."
4,SIhJRPMoUxspakLdm6NS_w,GBTPC53ZrG1ZBY3DT8Mbcw,5,29.950742,-90.070416,"German, Restaurants, Seafood, Cocktail Bars, F..."



## Step 3Ô∏è‚É£ ‚Äî Build the User‚ÄìItem Rating Matrix

Each **row** represents a user, each **column** represents a business,  
and each cell stores a rating. Missing values are replaced with 0.


In [3]:

ratings = yelp.pivot_table(values='stars', index='user_id', columns='business_id', fill_value=0)
print(f"Matrix created ‚Üí Users: {ratings.shape[0]}, Businesses: {ratings.shape[1]}")
display(ratings.iloc[:5, :5].style.background_gradient(cmap='Blues'))


Matrix created ‚Üí Users: 4906, Businesses: 2034


business_id,-1MhPXk1FglglUAmuPLIGg,-2Axhv9AZ_n7qjQefECpVw,-3AooxIkg38UyUdlz5oXdw,-7GDqSUaXrpC8Ql7nDBxWA,-ATiAtTikuGuqvaW2O6tNA
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
--2bpE5vyR-2hAP7sZZ4lA,0.0,0.0,0.0,0.0,0.0
--t1GgwabT-J_6OQG8f5QQ,0.0,0.0,0.0,0.0,0.0
--u09WAjW741FdfkJXxNmg,0.0,0.0,0.0,0.0,0.0
-0Ath8bD1-e01-oMSPw9ig,0.0,0.0,0.0,0.0,0.0
-0EcgtUXe1rzrkmdih_tYg,0.0,0.0,0.0,0.0,0.0



## Step 4Ô∏è‚É£ ‚Äî Compute Item‚ÄìItem Similarity

We use **cosine similarity** to measure how similar two businesses are  
based on how users rated them.


In [4]:

print("‚è≥ Computing cosine similarity between items...")
item_similarity = cosine_similarity(ratings.T)
item_similarity_df = pd.DataFrame(item_similarity, index=ratings.columns, columns=ratings.columns)
print("‚úÖ Item‚ÄìItem similarity matrix ready.")
display(item_similarity_df.iloc[:5, :5].style.background_gradient(cmap='coolwarm'))


‚è≥ Computing cosine similarity between items...
‚úÖ Item‚ÄìItem similarity matrix ready.


business_id,-1MhPXk1FglglUAmuPLIGg,-2Axhv9AZ_n7qjQefECpVw,-3AooxIkg38UyUdlz5oXdw,-7GDqSUaXrpC8Ql7nDBxWA,-ATiAtTikuGuqvaW2O6tNA
business_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
-1MhPXk1FglglUAmuPLIGg,1.0,0.0,0.0,0.0,0.0
-2Axhv9AZ_n7qjQefECpVw,0.0,1.0,0.0,0.0,0.0
-3AooxIkg38UyUdlz5oXdw,0.0,0.0,1.0,0.0,0.0
-7GDqSUaXrpC8Ql7nDBxWA,0.0,0.0,0.0,1.0,0.0
-ATiAtTikuGuqvaW2O6tNA,0.0,0.0,0.0,0.0,1.0



## Step 5Ô∏è‚É£ ‚Äî Predict Ratings (Item‚ÄìItem CF)

For each unrated business, we estimate a rating  
based on ratings of **similar items**.


In [5]:

def predict_ratings_for_user(user_id):
    user_ratings = ratings.loc[user_id]
    rated_items = user_ratings[user_ratings > 0].index.tolist()
    predictions = {}
    for item in ratings.columns:
        if item not in rated_items:
            sim_items = item_similarity_df[item][rated_items]
            sim_scores = sim_items * user_ratings[rated_items]
            pred = sim_scores.sum() / (sim_items.sum() + 1e-9)
            predictions[item] = pred
    return pd.Series(predictions).sort_values(ascending=False)
print("‚úÖ Rating prediction function defined.")


‚úÖ Rating prediction function defined.



## Step 6Ô∏è‚É£ ‚Äî Add Geolocation Context

We apply **distance-aware scoring**:  
Closer POIs get slightly higher priority using exponential decay weighting.


In [6]:

def recommend_poi(user_id, user_lat, user_lon, top_n=5, lambda_geo=0.02):
    preds = predict_ratings_for_user(user_id)
    recs = pd.DataFrame(preds, columns=['pred_rating']).reset_index()
    recs.rename(columns={'index': 'business_id'}, inplace=True)
    recs = recs.merge(yelp[['business_id','latitude','longitude','categories']].drop_duplicates(), on='business_id', how='left')
    recs['distance_km'] = recs.apply(lambda x: geodesic((user_lat, user_lon), (x.latitude, x.longitude)).km, axis=1)
    recs['final_score'] = recs['pred_rating'] * np.exp(-lambda_geo * recs['distance_km'])
    return recs.sort_values('final_score', ascending=False).head(top_n)
print("‚úÖ Location-aware recommender function ready.")


‚úÖ Location-aware recommender function ready.



## Step 7Ô∏è‚É£ ‚Äî Generate Personalized Recommendations

Let‚Äôs pick one user and generate top recommendations  
based on their preferences and proximity.


In [7]:

sample_user = yelp['user_id'].iloc[0]
sample_lat = yelp['latitude'].mean()
sample_lon = yelp['longitude'].mean()
recommendations = recommend_poi(sample_user, sample_lat, sample_lon, top_n=5)
print(f"üìç Top Recommendations for User {sample_user[:8]}:")
display(recommendations[['business_id','categories','pred_rating','distance_km','final_score']].style.background_gradient(cmap='Oranges'))


üìç Top Recommendations for User JJ-qgqyl:


Unnamed: 0,business_id,categories,pred_rating,distance_km,final_score
2032,-2Axhv9AZ_n7qjQefECpVw,"Greek, Mediterranean, Restaurants, American (Traditional)",0.0,158.665746,0.0
2016,zYu2D8FzczailDkEMURExg,"Sandwiches, Restaurants, Seafood, Steakhouses",0.0,1015.722585,0.0
2015,zYy9lS8HlpbCHsGPmORTfA,"Bars, Pubs, Restaurants, Nightlife, Food, Breweries, Brewpubs, Burgers",0.0,360.535401,0.0
2014,zbrIMldF_O1ZQ0vpUaaa8A,"Seafood, Venues & Event Spaces, Cocktail Bars, Restaurants, Nightlife, Breakfast & Brunch, American (Traditional), Event Planning & Services, Bars, American (New)",0.0,2855.963034,0.0
2013,zbvu8pRKcOQqdjqRGbncyQ,"Southern, American (Traditional), American (New), Restaurants, Comfort Food",0.0,162.235049,0.0


## üó∫Ô∏è Step 8Ô∏è‚É£ ‚Äî Interactive Map Visualization (Inline in Colab)

Now that we have our personalized POI recommendations üéØ,  
let‚Äôs **visualize them directly on an interactive map** ‚Äî right here in the notebook.

---

### üåç What This Map Shows

| Marker | Meaning | Color |
|---------|----------|--------|
| üîµ Circle | **User Location** | SRH Blue (`#003366`) |
| üü† Circle | **Recommended Places** | SRH Orange (`#f15a22`) |

Each recommendation is represented by a circle whose:
- **Size** corresponds to its **final recommendation score**  
- **Popup window** shows:
  - Category  
  - Predicted rating  
  - Distance from user (in km)  
  - Weighted final score  

---

### üß† How It Works
1. The code uses **Folium** to create a map centered on the user‚Äôs location.  
2. Recommended businesses are added with dynamically scaled circle markers.  
3. You can **zoom**, **pan**, and **click markers** to see recommendation details.  
4. The map is displayed **inline** (no external file or download required).

> üí° Tip: Try zooming in or panning the map ‚Äî it‚Äôs fully interactive!


In [17]:
import folium

try:
    # üßπ Clean data: remove missing coordinates or scores
    recommendations = recommendations.dropna(subset=['latitude', 'longitude', 'final_score'])
    if recommendations.empty:
        raise ValueError("No valid recommendations with coordinates to plot.")

    # Handle zero or NaN scores safely
    max_score = recommendations['final_score'].max()
    if max_score == 0 or np.isnan(max_score):
        max_score = 1e-5  # Prevent division by zero

    # üó∫Ô∏è Create map centered on the user
    m = folium.Map(location=[sample_lat, sample_lon], zoom_start=12, tiles='CartoDB positron')

    # üîµ Add user marker (SRH Blue)
    folium.CircleMarker(
        location=[sample_lat, sample_lon],
        radius=8,
        color='#003366',
        fill=True,
        fill_color='#003366',
        popup=folium.Popup("<b>You are here</b>", max_width=200)
    ).add_to(m)

    # üü† Add recommended POIs (SRH Orange)
    for _, row in recommendations.iterrows():
        score = max(row['final_score'], 0)
        popup_html = f"""
        <div style='font-family:sans-serif; font-size:12px;'>
            <b style='color:#f15a22;'>Recommended Place</b><br>
            <b>Category:</b> {row['categories'][:80]}<br>
            <b>Predicted Rating:</b> {row['pred_rating']:.2f}<br>
            <b>Distance:</b> {row['distance_km']:.1f} km<br>
            <b>Score:</b> {score:.3f}
        </div>
        """
        folium.CircleMarker(
            location=[row['latitude'], row['longitude']],
            radius=5 + 10 * score / max_score,   # Safe scaling
            color='#f15a22',
            fill=True,
            fill_opacity=0.85,
            fill_color='#f78b55',
            popup=folium.Popup(popup_html, max_width=300)
        ).add_to(m)

    # ‚úÖ Display the map directly in Colab
    print("‚úÖ Interactive map created successfully!")
    m

except Exception as e:
    print(f"‚ö†Ô∏è Map visualization skipped due to error: {e}")




‚úÖ Interactive map created successfully!



<div style="background-color:#f4f6f8; border-left:5px solid #f15a22; padding:20px; border-radius:8px;">
<h2 style="color:#003366;">üß© Step 9 ‚Äî Summary & Reflection</h2>
<p><strong>You built a full Point-of-Interest Recommender System!</strong></p>
<ul>
  <li>‚úÖ Implemented Item‚ÄìItem Collaborative Filtering</li>
  <li>‚úÖ Integrated geolocation context for distance-aware results</li>
  <li>‚úÖ Visualized recommendations interactively with Folium</li>
</ul>
<p><strong>Next Steps:</strong></p>
<ul>
  <li>üìä Add temporal context (time of day, recency)</li>
  <li>üß† Use deep learning (autoencoders or embeddings)</li>
  <li>üó∫Ô∏è Deploy as a web app using Streamlit</li>
</ul>
<p style="color:#003366; font-weight:bold;">SRH University Heidelberg ¬∑ Applied Data Science</p>
</div>
