# Weekend Getaway Ranker 

This project builds a recommendation engine that suggests the **best weekend travel destinations**
based on a given **source city**.

### Ranking Factors:
- Distance from source city
- User Rating
- Popularity

### Technologies Used:
- Python
- Pandas
- NumPy
- Jupyter Notebook




**Import Required Libraries**

In [12]:
import pandas as pd
import numpy as np


## Load Travel Dataset
We load the provided dataset containing Indian travel destinations.


In [13]:
df = pd.read_csv(r"Data/travel_dataset.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Zone,State,City,Name,Type,Establishment Year,time needed to visit in hrs,Google review rating,Entrance Fee in INR,Airport with 50km Radius,Weekly Off,Significance,DSLR Allowed,Number of google review in lakhs,Best Time to visit
0,0,Northern,Delhi,Delhi,India Gate,War Memorial,1921,0.5,4.6,0,Yes,,Historical,Yes,2.6,Evening
1,1,Northern,Delhi,Delhi,Humayun's Tomb,Tomb,1572,2.0,4.5,30,Yes,,Historical,Yes,0.4,Afternoon
2,2,Northern,Delhi,Delhi,Akshardham Temple,Temple,2005,5.0,4.6,60,Yes,,Religious,No,0.4,Afternoon
3,3,Northern,Delhi,Delhi,Waste to Wonder Park,Theme Park,2019,2.0,4.1,50,Yes,Monday,Environmental,Yes,0.27,Evening
4,4,Northern,Delhi,Delhi,Jantar Mantar,Observatory,1724,2.0,4.2,15,Yes,,Scientific,Yes,0.31,Morning


**Dataset Inspection**## Dataset Overview

Let's check:


Number of rows and columns

In [4]:
df.shape

(325, 16)

Column names

In [19]:
df.columns

Index(['Zone', 'State', 'City', 'Name', 'Type', 'Establishment Year',
       'Visit_Time_Hours', 'Rating', 'Entrance_Fee',
       'Airport with 50km Radius', 'Weekly Off', 'Significance',
       'DSLR Allowed', 'Popularity', 'Best Time to visit'],
      dtype='object')

In [18]:
# Remove unnecessary column
df = df.drop(columns=["Unnamed: 0"], errors="ignore")

# Rename columns to standard names
df = df.rename(columns={
    "Google review rating": "Rating",
    "Number of google review in lakhs": "Popularity",
    "time needed to visit in hrs": "Visit_Time_Hours",
    "Entrance Fee in INR": "Entrance_Fee"
})

df.columns


Index(['Zone', 'State', 'City', 'Name', 'Type', 'Establishment Year',
       'Visit_Time_Hours', 'Rating', 'Entrance_Fee',
       'Airport with 50km Radius', 'Weekly Off', 'Significance',
       'DSLR Allowed', 'Popularity', 'Best Time to visit'],
      dtype='object')

Data types

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 325 entries, 0 to 324
Data columns (total 16 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Unnamed: 0                        325 non-null    int64  
 1   Zone                              325 non-null    object 
 2   State                             325 non-null    object 
 3   City                              325 non-null    object 
 4   Name                              325 non-null    object 
 5   Type                              325 non-null    object 
 6   Establishment Year                325 non-null    object 
 7   time needed to visit in hrs       325 non-null    float64
 8   Google review rating              325 non-null    float64
 9   Entrance Fee in INR               325 non-null    int64  
 10  Airport with 50km Radius          325 non-null    object 
 11  Weekly Off                        32 non-null     object 
 12  Signific

 Missing values

In [14]:
df.isnull().sum()


Unnamed: 0                            0
Zone                                  0
State                                 0
City                                  0
Name                                  0
Type                                  0
Establishment Year                    0
time needed to visit in hrs           0
Google review rating                  0
Entrance Fee in INR                   0
Airport with 50km Radius              0
Weekly Off                          293
Significance                          0
DSLR Allowed                          0
Number of google review in lakhs      0
Best Time to visit                    0
dtype: int64

## Data Cleaning

We ensure:
- No missing values in important columns
- Correct data types


In [20]:
df = df.dropna(subset=[
    "City",
    "State",
    "Rating",
    "Popularity",
    "Visit_Time_Hours"
])

# Convert to numeric
df["Rating"] = pd.to_numeric(df["Rating"], errors="coerce")
df["Popularity"] = pd.to_numeric(df["Popularity"], errors="coerce")
df["Visit_Time_Hours"] = pd.to_numeric(df["Visit_Time_Hours"], errors="coerce")

df.head()


Unnamed: 0,Zone,State,City,Name,Type,Establishment Year,Visit_Time_Hours,Rating,Entrance_Fee,Airport with 50km Radius,Weekly Off,Significance,DSLR Allowed,Popularity,Best Time to visit
0,Northern,Delhi,Delhi,India Gate,War Memorial,1921,0.5,4.6,0,Yes,,Historical,Yes,2.6,Evening
1,Northern,Delhi,Delhi,Humayun's Tomb,Tomb,1572,2.0,4.5,30,Yes,,Historical,Yes,0.4,Afternoon
2,Northern,Delhi,Delhi,Akshardham Temple,Temple,2005,5.0,4.6,60,Yes,,Religious,No,0.4,Afternoon
3,Northern,Delhi,Delhi,Waste to Wonder Park,Theme Park,2019,2.0,4.1,50,Yes,Monday,Environmental,Yes,0.27,Evening
4,Northern,Delhi,Delhi,Jantar Mantar,Observatory,1724,2.0,4.2,15,Yes,,Scientific,Yes,0.31,Morning


## Distance Calculation

We calculate distance between two cities using the **Haversine formula**.


In [21]:
from math import radians, sin, cos, sqrt, atan2

def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Earth radius in KM

    lat1, lon1, lat2, lon2 = map(radians, [lat1, lon1, lat2, lon2])

    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))

    return R * c


## Ranking Logic

The final score is calculated using:
- Higher Rating → Better
- Higher Popularity → Better
- Shorter Distance → Better

### Score Formula:
Score = (Rating × 0.5) + (Popularity × 0.3) − (Distance × 0.2)


## Recommendation Function

In [22]:
def recommend_weekend_getaways(source_city, top_n=5):
    source = df[df["City"].str.lower() == source_city.lower()]

    if source.empty:
        return "Source city not found in dataset"

    source_state = source.iloc[0]["State"]

    # Prefer same state (weekend friendly)
    candidates = df[df["State"] == source_state].copy()

    # Normalize values
    candidates["Rating_norm"] = candidates["Rating"] / candidates["Rating"].max()
    candidates["Popularity_norm"] = candidates["Popularity"] / candidates["Popularity"].max()
    candidates["Time_norm"] = candidates["Visit_Time_Hours"] / candidates["Visit_Time_Hours"].max()

    # Final ranking score
    candidates["Score"] = (
        candidates["Rating_norm"] * 0.5 +
        candidates["Popularity_norm"] * 0.3 -
        candidates["Time_norm"] * 0.2
    )

    # Remove source city itself
    candidates = candidates[candidates["City"].str.lower() != source_city.lower()]

    return candidates.sort_values("Score", ascending=False)[
        ["Name", "City", "State", "Rating", "Popularity", "Visit_Time_Hours", "Score"]
    ].head(top_n)


## Sample Recommendation – Source City: Kolkata


In [23]:
recommend_weekend_getaways("Kolkata")


Unnamed: 0,Name,City,State,Rating,Popularity,Visit_Time_Hours,Score
220,Kankalitala Temple,Bolpur,West Bengal,4.7,0.045,0.5,0.48625
221,Hangseswari Temple,Hooghly,West Bengal,4.6,0.07,0.5,0.481862
223,Cooch Behar Palace,Cooch Behar,West Bengal,4.5,0.09,1.0,0.451223
219,Hazarduari Palace,Murshidabad,West Bengal,4.5,0.18,1.5,0.448723
215,Tiger Hill,Darjeeling,West Bengal,4.5,0.025,1.0,0.434973


Save Output for Kolkata

In [29]:
output_kolkata =recommend_weekend_getaways("Kolkata")

output_kolkata.to_csv(
    "sample_outputs/output_kolkata.txt",
    index=False,
     sep="\t"
   
)


## Sample Recommendation – Source City: Delhi


In [24]:
recommend_weekend_getaways("Delhi")


Unnamed: 0,Name,City,State,Rating,Popularity,Visit_Time_Hours,Score
305,Gurudwara Bangla Sahib,New Delhi,Delhi,4.8,1.05,1.0,0.581154
313,Jama Masjid,New Delhi,Delhi,4.5,0.49,1.0,0.485288
318,Rail Museum,New Delhi,Delhi,4.4,0.24,2.0,0.406026


In [31]:
output_Delhi =recommend_weekend_getaways("Delhi")

output_Delhi.to_csv(
    "sample_outputs/output_Delhi.txt",
    index=False,
     sep="\t"
   
)

## Sample Recommendation – Source City: Mumbai


In [25]:
recommend_weekend_getaways("Mumbai")


Unnamed: 0,Name,City,State,Rating,Popularity,Visit_Time_Hours,Score
130,Mahalakshmi Temple,Kolhapur,Maharastra,4.8,0.9,1.0,0.535
126,Sai Baba Temple,Shirdi,Maharastra,4.7,0.69,1.5,0.487083
123,Shaniwar Wada,Pune,Maharastra,4.4,1.2,2.0,0.478333
128,Ganapatipule Temple,Ratnagiri,Maharastra,4.7,0.1,1.0,0.457917
129,Deekshabhoomi,Nagpur,Maharastra,4.5,0.11,1.0,0.437917


In [32]:
output_mumbai =recommend_weekend_getaways("Mumbai")

output_mumbai.to_csv(
    "sample_outputs/output_Mumbai.txt",
    index=False,
     sep="\t"
   
)