# Touristic Tour Recommendation Application
This notebook outlines the steps involved in creating an algorithm that generates a one-week itinerary for tourists in Algeria. The itinerary is optimized based on user preferences, proximity, and travel costs. Various search techniques, including **Uninformed Search Algorithms**, **A*** and **Hill Climbing**, are employed to create the optimal itinerary.


## Data Collection & Research
We gathered - clean - data about **+100 Algerian tourist attractions**, including the following attributes:
- **Attraction Name**
- **Type of Attraction** (museum, nature, beach, etc.)
- **City**
- **Cost** (entry fee)
- **Rating** (user rating)
- **GPS Coordinates** (latitude, longitude)
- **Description** (short description)


In [5]:
import json
from collections import Counter

DATA_PATH = "../Data/attractions.json"

with open(DATA_PATH, "r", encoding="utf-8") as f:
    attractions_data = json.load(f)
    

if not isinstance(attractions_data, list):
    raise ValueError("The JSON file does not contain a list of attractions.")

print("Number of attractions:", len(attractions_data))

# Count attractions per city
city_counts = Counter(attraction.get("city", "Unknown") for attraction in attractions_data)

# Count attractions per category
category_counts = Counter(attraction.get("category", "Unknown") for attraction in attractions_data)

print("\nNumber of attractions per city:")
for city, count in city_counts.items():
    print(f"{city}: {count}")

print("\nNumber of attractions per category:")
for category, count in category_counts.items():
    print(f"{category}: {count}")


Number of attractions: 142

Number of attractions per city:
Algiers: 18
Tipaza: 2
Blida: 1
Médéa: 3
Oran: 13
Tlemcen: 8
Batna: 1
Ghardaïa: 2
Bejaia: 1
Constantine: 6
Djanet: 1
Sétif: 13
Annaba: 4
Guelma: 3
El Tarf: 6
Tamanrasset: 3
Béchar: 1
Bouira: 1
El Bayadh: 1
Khenchela: 1
Biskra: 4
Timimoun: 3
El Oued: 1
M'Sila: 1
Tizi Ouzou: 10
Beni Abbes: 1
Skikda: 3
Souk Ahras: 2
Tébessa: 3
Oum El-Bouaghi: 5
Jijel: 20

Number of attractions per category:
Garden: 3
Museum: 10
Cultural: 18
Historical: 25
Religious: 9
Amusement Park: 5
Port: 2
Shopping Mall: 4
Nature: 46
Lake: 2
Resort: 1
Beach: 11
Island: 1
Wildlife Park: 1
Coastal Town: 1
Recreational: 1
Wellness: 2


## Distance and Proximity Calculation
To calculate the optimal itinerary, we need to compute the distances between attractions. We will use the **Haversine Formula** to calculate the distance between two locations based on their GPS coordinates.


In [2]:
import math

# Haversine formula to calculate distance between two points (in km)
def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Radius of the Earth in kilometers
    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)
    a = math.sin(dlat / 2)**2 + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * math.sin(dlon / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    return R * c

# e.g.
distance = haversine(36.7452, 3.0750, 36.5894, 2.4477)
distance


58.570295060336306

## Helper Functions

In [None]:
def parse_duration(duration_str):
    """
    Convert a duration string like "1-2 hours", "2-4 hours", or "3 hours"
    to a numeric estimate (e.g., hours as a float).
    For simplicity, we'll pick the lower bound or an average.
    """
    # Examples: "1-2 hours" -> we might take 1.5 as an average
    #           "3 hours"   -> 3.0
    #           "Variable..." -> let's pick a default
    import re
    
    # for a pattern like "X-Y hours"
    match_range = re.match(r"(\d+)-(\d+)\s*hours", duration_str.lower())
    if match_range:
        low = float(match_range.group(1))
        high = float(match_range.group(2))
        return (low + high) / 2.0
    
    # for a pattern like "X hours"
    match_single = re.match(r"(\d+)\s*hours", duration_str.lower())
    if match_single:
        return float(match_single.group(1))
    
    # If "Variable", "Unknown", or any other text
    return 2.0  # DEFAULT ! ?

def parse_cost(cost_str):
    """
    Convert the cost field (like "Free", "400 DZD", "Unknown", or "Variable")
    into a numeric value (DZD). For simplicity:
      - "Free" -> 0
      - "400 DZD" -> 400
      - else -> let's guess 200 as a placeholder
    """
    cost_str_lower = cost_str.lower()
    if "free" in cost_str_lower:
        return 0
    import re
    match = re.search(r"(\d+)", cost_str)
    if match:
        # Return first integer found
        return int(match.group(1))
    # Otherwise "variable", "unknown", etc.
    return 200  # DEFAULT ! ?


## User Preferences Simulation 
The application will take inputs (from the website interface) such as:
- **Starting location**: Algiers
- **Preferred attractions**: Museums, Historical Sites
- **Budget**: 1500 DZD
- **Hotel rating**: 3 stars


## Problem Formulation
**1. Define the Problem Components** 

We model the itinerary planning as a **search problem** and a **CSP**.  
- **States**: Partial itineraries (sequence of attractions).  
- **Actions**: Adding a new attraction to the itinerary.  
- **Goal**: A 7-day itinerary that maximizes user satisfaction and minimizes cost/time.  
- **Constraints**: No repeated attractions, budget limits, and proximity.

**2. State Representation**

In this formulation, the **state** represents a partial itinerary for the traveler. It includes the list of visited attractions, remaining attractions, total cost so far, and total time spent so far.

**Node Class**

A `Node` class is used to represent each state in the search space. It holds information about the current state, parent node, action taken, cumulative costs (`g`), and the evaluation function (`f`).

In [20]:
import copy
from math import radians, sin, cos, sqrt, atan2

class Node:
    def __init__(self, state, parent=None, action=None, g=0, f=0):
        """
        Initialize a search tree node.

        Input Parameters:
            - state: The state represented by this node (e.g., a city name).
                     Example: "Algiers"
            - parent: The parent Node that generated this node. Default is None.
            - action: The action taken to reach this node from the parent. Often the same as the state.
                      Example: "Oran"
            - g: The cumulative cost (actual cost) from the start node to this node. Default is 0.
            - f: The evaluation function value for the node (e.g., for UCS, A*). Default is 0.

        Output:
            - A Node instance with attributes: state, parent, action, g, f, and depth.
        """
        self.state = state
        self.parent = parent
        self.action = action
        self.g = g  # Cumulative cost from start to this node
        self.f = f  # Evaluation cost (g + heuristic if applicable)
        # Calculate the depth of the node
        if parent is None:
            self.depth = 0
        else:
            self.depth = parent.depth + 1

    def __hash__(self):
        """
        Compute a hash value for the node.

        Input Parameters:
            - None (uses the node's state)

        Output:
            - An integer hash value.
        """
        if isinstance(self.state, list):
            state_tuple = tuple([tuple(row) for row in self.state])
            return hash(state_tuple)
        return hash(self.state)

    def __eq__(self, other):
        """
        Check equality with another Node based on the state.

        Input Parameters:
            - other: Another Node instance.

        Output:
            - True if the states are equal, False otherwise.
        """
        return isinstance(other, Node) and self.state == other.state

    def __gt__(self, other):
        """
        Compare this node with another node based on the evaluation function (f).

        Input Parameters:
            - other: Another Node instance.

        Output:
            - True if this node's f is greater than the other's f, else False.
        """
        return isinstance(other, Node) and self.f > other.f

**3. Actions**

Actions are the valid attractions that can be added to the itinerary based on constraints.

**4. Goal Test**

Check if the itinerary is complete (7 days) and meets all constraints.

In [None]:
class TouristicTourProblem:
    def __init__(self, attractions, preferences, start_location, budget, max_time):
        """
        Initialize the touristic tour problem.

        Input Parameters:
            - attractions: List of available attractions with details (name, category, cost, gps, etc.)
            - preferences: A dictionary of user preferences (e.g., type of attractions preferred).
            - start_location: The starting city for the traveler.
            - budget: The total budget for the trip (in local currency).
            - max_time: The maximum time available for the trip (in hours).
        """
        self.attractions = attractions
        self.preferences = preferences
        self.start_location = start_location
        self.budget = budget
        self.max_time = max_time
        self.visited_attractions = []
        self.remaining_attractions = attractions

    def is_goal(self, state):
        """Check if the tour meets the constraints for budget, time, and satisfaction."""
        # Check if total time spent on the selected attractions is within the maximum allowed
        if state["total_time"] > self.max_time:
            return False

        # Check if total cost is within the maximum allowed budget
        if state["total_cost"] > self.max_budget:
            return False

        # Check if user preferences are satisfied
        satisfaction_score = self.calculate_satisfaction(state)
        if satisfaction_score < self.user_preferences["min_satisfaction"]:
            return False

        # CAN add more conditions as needed, e.g., ensuring the user visited a certain category of attractions
        return True
    
    def calculate_satisfaction(self, state):
        """Calculate a satisfaction score based on user preferences."""
        satisfaction = 0
        for attraction in state["visited"]:
            # Score attractions based on categories that the user prefers
            if self.preferences.get("nature") and attraction["category"] == "Nature":
                satisfaction += attraction["rating"]
            if self.preferences.get("cultural") and attraction["category"] == "Cultural":
                satisfaction += attraction["rating"]
            if self.preferences.get("historical") and attraction["category"] == "Historical":
                satisfaction += attraction["rating"]
            # CAN add more categories and refine the logic as needed
        return satisfaction

    def get_valid_actions(self, state):
        """
        Get valid actions (next attractions to visit) from the current state.
        
        Input:
            - state: The current state (includes visited and remaining attractions).
        
        Output:
            - List of possible actions (attractions to visit next).
        """
        return state["remaining"]

    def transition(self, state, attraction):
        """
        Transition function: Move from one state to another after visiting an attraction.
        
        Input:
            - state: Current state (visited and remaining attractions).
            - attraction: The attraction to visit next.
        
        Output:
            - New state after the transition.
        """
        new_visited = state["visited"] + [attraction]
        new_remaining = [a for a in state["remaining"] if a != attraction]
        
        # Calculate the cost and time for the selected attraction
        cost = self.calculate_cost(attraction)
        time = self.calculate_time(attraction)
        
        # Create the new state
        new_state = {
            "visited": new_visited,
            "remaining": new_remaining,
            "total_cost": state["total_cost"] + cost,
            "total_time": state["total_time"] + time,
            "preferences": state["preferences"]
        }
        return new_state

    def calculate_cost(self, attraction):
        """
        Calculate the cost to visit an attraction.
        
        Input:
            - attraction: The attraction to visit.
        
        Output:
            - The cost to visit the attraction (in local currency).
        """
        # Here we assume the cost is given as part of the attraction data
        return int(attraction["cost"].split()[0])  # Parsing the DZD cost

    def calculate_time(self, attraction):
        """
        Calculate the time to visit an attraction.
        
        Input:
            - attraction: The attraction to visit.
        
        Output:
            - The time to visit the attraction (in hours).
        """
        return float(attraction["visit_duration"].split()[0])  # Assuming duration is given in hours

    def heuristic(self, state):
        """
        Heuristic function to estimate the cost/time to visit the remaining attractions.

        Input:
            - state: The current state.

        Output:
            - Estimated cost to complete the tour (remaining cost and time).
        """
        remaining_cost = sum([self.calculate_cost(a) for a in state["remaining"]])
        remaining_time = sum([self.calculate_time(a) for a in state["remaining"]])
        return remaining_cost + remaining_time

def is_goal_state(state, max_time):
    """
    Check if the current state is a goal state.
    
    Input:
        - state: The current state.
        - max_time: The maximum time available for the trip.
    
    Output:
        - True if all attractions are visited, and the total time and cost are within the limits.
    """
    total_time = state["total_time"]
    total_cost = state["total_cost"]
    return len(state["remaining"]) == 0 and total_time <= max_time and total_cost <= state["preferences"]["budget"]


## Search Algorithm Implementations

In this project, we use a combination of **uninformed search algorithms** (BFS, DFS) and **informed search algorithms** (A\* Search, Hill Climbing) to generate the optimal itinerary based on cost, proximity, and user preferences.

Each search algorithm has its strengths and weaknesses. In this section, we will implement and compare their performance in solving the itinerary optimization problem.


### BFS
Explores all paths at the current depth before moving on to the next level, ensuring the shortest path is found if the graph is unweighted.


In [8]:
from collections import deque

def bfs(): # parameters needed !
    """
    Perform a BFS to find the shortest path from start to goal.
    
    Args:
    - start: Starting location (current location).
    - goal: Desired goal (final destination).
    - attractions: List of all attractions (places to visit).
    - distance_function: Function to calculate the distance between locations.
    - maybe more, depends on the one implementing the algorithm
    
    Returns:
    - List of attractions in the optimal path.
    """
    pass

# maybe other functions here

### DFS
Explores each path as deep as possible before backtracking, often used when exploring solutions in a depth-first manner.

In [9]:
def dfs(): # parameters needed !
    """
    Perform a DFS to find a path from start to goal.
    
    Args:
    - start: Starting location.
    - goal: Desired goal.
    - attractions: List of all attractions.
    - distance_function: Function to calculate distance.
    - maybe more, depends on the one implementing the algorithm
    
    Returns:
    - List of attractions in the path, or None if no path is found.
    """
    pass

# maybe other functions here

### A\*
Uses a heuristic to optimize the search process by combining the current cost with the estimated cost to reach the goal.

In [10]:
import heapq

def a_star_search(): # parameters needed !
    """
    Perform A* search to find the optimal path from start to goal.
    
    Args:
    - start: Starting point.
    - goal: Goal point.
    - attractions: List of all attractions.
    - distance_function: Function to calculate distance between locations.
    - heuristic_function: Heuristic function to estimate the cost from a point to the goal.
    - maybe more, depends on the one implementing the algorithm
    
    Returns:
    - List of attractions in the optimal path.
    """
    pass


### Hill-Climbing
A simple greedy approach that evaluates only the immediate next state and chooses the best option available without considering future paths.

In [11]:
def hill_climb_search(): # parameters needed !
    """
    Perform Hill Climbing to find the optimal path based on local evaluations.
    
    Args:
    - start: Starting location.
    - attractions: List of all attractions.
    - distance_function: Function to calculate distance.
    - heuristic_function: Function to evaluate the "goodness" of a path.
    - maybe more, depends on the one implementing the algorithm
    
    Returns:
    - List of attractions in the path, or None if no solution is found.
    """
    pass


## CSP Approach
We model the itinerary planning as a **Constraint Satisfaction Problem (CSP)**. Each attraction represents a variable, and the domain of each variable is the set of attractions that can be visited. Constraints include:
  1. **One destination per day**: Each day gets exactly one destination.
  2. **Proximity and time**: Destinations must be scheduled in a way that minimizes travel time and adheres to time constraints.
  3. **Cost**: The total cost of the trip should be within the user’s budget.
  4. **Preferences**: Ensure that the user’s preferences for types of attractions are respected.


In [12]:
# Define constraints for the CSP
def check_constraints(itinerary, user_preferences):
    pass


## Comparative Evaluation of Algorithms 

We will compare the performance of the following search algorithms:

1. **BFS**: Guarantees the shortest path but may be slow for large datasets.
2. **DFS**: Fast but not guaranteed to find the optimal solution.
3. **A\***: Combines optimality with efficiency when the heuristic is accurate.
4. **Hill Climbing**: Greedy and fast but may get stuck in local optima.
5. **CSP Approach**: Will compare the CSP solution with the search-based methods.

We will evaluate the following metrics:
- **Execution time**: How long each algorithm takes to generate the itinerary.
- **Path optimality**: How close the solution is to the optimal itinerary in terms of cost, proximity, and user satisfaction.
- **Complexity**: The computational complexity of each approach.


In [5]:
def evaluate_solution(itinerary, total_cost, time_taken):
    # Compare the total cost, time efficiency, and other factors
    pass  # To be implemented


## Visualizations

We will include several visualizations to help understand the results:
- **Route Map**: Visualize the itinerary on a map.
- **Cost Breakdown**: Graph showing the total cost per destination.
- **Satisfaction**: Bar chart comparing user satisfaction for different search strategies.


In [6]:
import matplotlib.pyplot as plt

# Example plot for the cost breakdown
def plot_cost_breakdown(itinerary, attractions):
    costs = [attractions[a]['cost'] for a in itinerary]
    plt.bar(range(len(itinerary)), costs)
    plt.xlabel('Attraction')
    plt.ylabel('Cost')
    plt.title('Cost Breakdown')
    plt.show()


## Demo

In this section, we will showcase the working prototype of the **Touristic Tour Recommendation Application**. This demo will cover:
- The **interactive input** from the user (e.g., preferences, current location, etc.).
- Displaying the **optimized itinerary** generated by the selected search algorithm.
- Visualizations of the **travel route**, **cost breakdown**, and **satisfaction level**.

In [13]:
pass

## Conclusion

In this notebook, we explored different search algorithms (A\*, Hill Climbing, BFS, DFS) and a CSP approach to generate optimal itineraries for travelers in Algeria. 
We found that **.....** performed well in terms of solution quality, but it was more computationally expensive compared to **.....**. 
Future work could involve integrating real-time weather data and optimizing routes based on current traffic conditions.
