# Drone Delivery Flights Dataset Generation Notebook 
**Paper Title**: Optimizing QoS fulfillment of Drone Services

**Conference Submission**: International Conference on Service-Oriented Computing, 2025

**Purpose**  
This notebook generates the dataset of drone delivery flights to support experimentation and evaluation of drone-based delivery services in urban environments.

**Overview**  
- This dataset generates multiple drone delivery flights across the real-world skyway network (refer to Skyway Network Dataset).  
- Each flight is based on the realistic parameters for real-world drone operations.
- This notebook can be used to generate varying drone traffic loads by adjusting the number of requests per provider. 
  The baseline assumes **40 requests per provider in an hour**, which is based on a real-world scenario, and this value can be scaled from **10% to 100% increase** (e.g., 44, 48, ..., up to 80) to generate growing delivery demands.


**Drone Flight Characteristics**  
Each drone flight varies in the following aspects:
- **Flight range**
- **Payload weight**
- **Delivery time window**
- **Source and destination nodes**
- **Battery consumption**
- **Charging time at intermediate nodes**

**Dataset Generation**  
- Source and destination nodes are selected from the skyway network. The building types, such as restaurants and cafes, serve as source nodes, and   apartments serve as destination nodes.   
- Each drone flight is assigned a random payload based on the maximum payload capacity of the drone. 
- Each drone delivery path from source to destination node is generated using the A* shortest path algorithm.  
- Drones may recharge at intermediate nodes if the battery level is insufficient. The charging time is computed based on the model proposed in [15](refer to the paper).
- The energy consumption for the drone flight is computed based on the energy consumption model proposed in [8] (refer to the paper) using the physical parameters of a DJI Phantom 3 drone.  
- The final dataset ('RequestsWithShortestPaths.csv') includes a list of drone delivery flights that serve as the skyway path requests.

**Instructions**  
- Run **all cells sequentially** to generate the dataset.  
- This notebook can be used to generate varying drone traffic loads by adjusting the number of requests per provider. The baseline scenario assumes **40 requests per provider in an hour**, and this value can be scaled from **10% to 100% increase** (e.g., 44, 48, ..., up to 80) to generate increasing loads of drone traffic. For each generated dataset, the notebooks for all three approaches (**Fairness-based**, **Greedy**, and **Proposed**) must be executed to collect the results.

**Dependencies**  
- Python 3.x  
- pandas  
- numpy  
- shutil
- networkx  
- datetime  
(Ensure all required packages are installed before execution.)

***Note**
Random seeds (where applicable). All random seeds are explicitly set to ensure consistency.

In [14]:
import pandas as pd
import shutil
import networkx as nx
import random
import math
from datetime import datetime, timedelta

In [2]:
# Load the skyway network csv file
skyway_data = pd.read_csv('Skyway_network_data_retained.csv')  
# create backup file
df = skyway_data.copy()
df.to_csv('Skyway_network_data_copy1.csv', index=False)

In [3]:
# Random seeds used for reproducibility. We run it multiple times for different seeds and take the average result over multiple runs.
random.seed(45) 
#random.seed(80) 
#random.seed(62) 
#random.seed(20) 

skyway_data = pd.read_csv('Skyway_network_data_copy1.csv')
source_dest_data = pd.read_csv('Skyway_Source_Destination.csv')

# Step 1: Build the skyway network graph
G = nx.Graph()
for _, row in skyway_data.iterrows():
    G.add_edge(
        row['Node1'], row['Node2'],
        Segment_ID=row['Segment_ID'],
        Edge_Length=row['Segment_Length'],
        Coordinates_Node1=row['GPS_Coordinates_Node1'],
        Coordinates_Node2=row['GPS_Coordinates_Node2'],
        Node1_ID=row['Node1_ID'],
        Node2_ID=row['Node2_ID']
    )

# Step 2: Extract valid source and destination nodes
valid_sources = list(source_dest_data['Source_Node'].unique())
valid_destinations = list(source_dest_data['Destination_Node'].unique())

provider_count = 33 
# Baseline number of delivery requests per provider in an hour is set to 40 based on a real-world scenario.
# This value is varied to generate 10% to 100% growth in besline demand (e.g., 40, 44, 46, 50, ..., up to 80).
# Used to evaluate system performance under increasing load of drone traffic.
requests_per_provider = 40 

if len(valid_sources) < provider_count:
    raise ValueError("Not enough unique source nodes to assign one per provider.")

# Step 3: Generate requests
requests = []
drone_counter = 1
request_counter = 1

for sp_index in range(1, provider_count + 1):
    service_provider_id = f"SP_{sp_index}"
    source_node = valid_sources[(sp_index - 1) % len(valid_sources)]  # wrap around if fewer than providers
    source_rows = skyway_data[
        (skyway_data['Node1'] == source_node) | (skyway_data['Node2'] == source_node)
    ]
    if source_rows.empty:
        continue

    source_row = source_rows.iloc[0]
    if source_row['Node1'] == source_node:
        source_id = source_row['Node1_ID']
        source_coord = source_row['GPS_Coordinates_Node1']
    else:
        source_id = source_row['Node2_ID']
        source_coord = source_row['GPS_Coordinates_Node2']

    for _ in range(requests_per_provider):
        drone_id = f"Drone_{drone_counter}"
        request_id = f"Request_{request_counter}"
        request_time = (datetime(2023, 1, 1, 8, 0) + timedelta(minutes=random.randint(0, 59))).strftime('%H:%M:%S')
        payload_weight = random.uniform(0.5, 1.1)

        destination_node = random.choice(valid_destinations)

        destination_rows = skyway_data[
            (skyway_data['Node1'] == destination_node) | (skyway_data['Node2'] == destination_node)
        ]
        if destination_rows.empty:
            continue
        destination_row = destination_rows.iloc[0]
        if destination_row['Node1'] == destination_node:
            dest_id = destination_row['Node1_ID']
            dest_coord = destination_row['GPS_Coordinates_Node1']
        else:
            dest_id = destination_row['Node2_ID']
            dest_coord = destination_row['GPS_Coordinates_Node2']

        request = {
            'Service_Provider_ID': service_provider_id,
            'Drone_ID': drone_id,
            'Request_ID': request_id,
            'Request_Time': request_time,
            'Payload_Weight (g)': payload_weight,
            'Source_Node_ID': source_id,
            'Source_Node_Name': source_node,
            'Source_Coordinates': source_coord,
            'Destination_Node_ID': dest_id,
            'Destination_Node_Name': destination_node,
            'Destination_Coordinates': dest_coord
        }
        requests.append(request)
        drone_counter += 1
        request_counter += 1


requests_df = pd.DataFrame(requests)
requests_df.to_csv('Requests_data.csv', index=False)
print(f"Total requests generated: {len(requests_df)} — saved to 'Requests_data.csv'")

Total requests generated: 1320 — saved to 'Requests_data.csv'


In [4]:
file_path = 'Requests_data.csv'  
df = pd.read_csv(file_path)
skyway_data_path = 'Skyway_network_data_copy1.csv'
df_skyway = pd.read_csv(skyway_data_path)

# Convert Request_Time to datetime format
df['Request_Time'] = pd.to_datetime(df['Request_Time'], format='%H:%M:%S')

# Sort the dataframe by Source_Node_ID, Request_Time, and Drone_ID
df = df.sort_values(by=['Source_Node_ID', 'Request_Time', 'Drone_ID']).reset_index(drop=True)

# Create a new column for adjusted Request_Time
df['Adjusted_Request_Time'] = df['Request_Time']

# Group by Source_Node_ID and Request_Time
grouped = df.groupby(['Source_Node_ID', 'Request_Time'])
# Scaling up segment length by a factor of 5
df_skyway['Segment_Length'] = df_skyway['Segment_Length']*5

# Add a delay of 0.01875 seconds for drones with the same Request_Time and Source_Node_ID to ensure safe separation to avoid collisions
delay_increment = pd.to_timedelta(0.01875, unit='s')

for _, group in grouped:
    if len(group) > 1:  # Check if there are multiple entries with the same Request_Time and Source_Node_ID
        for i, idx in enumerate(group.index):
            df.at[idx, 'Adjusted_Request_Time'] += i * delay_increment

# Convert Adjusted_Request_Time back to string format
df['Adjusted_Request_Time'] = df['Adjusted_Request_Time'].dt.strftime('%H:%M:%S.%f').str[:-3]


output_file_path = 'Requests_data.csv' 
df.to_csv(output_file_path, index=False)
output_file_path2 = 'Skyway_network_data_copy1.csv' 
df_skyway.to_csv(output_file_path2, index=False)


In [5]:
skyway_data = pd.read_csv('Skyway_network_data_copy1.csv')
requests_df = pd.read_csv('Requests_data.csv')

# Step 1: Build a recharging pad availability map for intermediate charging at nodes
recharge_pad_map = {}
for _, row in skyway_data.iterrows():
    recharge_pad_map[row['Node1_ID']] = row['Recharging_Pads_Node1']
    recharge_pad_map[row['Node2_ID']] = row['Recharging_Pads_Node2']

# Step 2: Build the graph
G = nx.Graph()
for _, row in skyway_data.iterrows():
    G.add_edge(
        row['Node1_ID'],
        row['Node2_ID'],
        Segment_ID=row['Segment_ID'],
        weight=row['Segment_Length'],
        Node1_Name=row['Node1'],
        Node2_Name=row['Node2']
    )

# Step 3: Helper function to validate intermediate nodes
def is_valid_path(path, pad_map):
    intermediates = path[1:-1]
    return all(pad_map.get(node, 0) > 0 for node in intermediates)

# Output containers
shortest_paths = []
shortest_paths_ids = []
nodes_visited = []
segments_visited = []

# Step 4: Process each request
for _, request in requests_df.iterrows():
    source = request['Source_Node_ID']
    destination = request['Destination_Node_ID']

    if source not in G or destination not in G:
        shortest_paths.append('No Path Available (Node Missing)')
        shortest_paths_ids.append('No Path Available (Node Missing)')
        nodes_visited.append([])
        segments_visited.append([])
        continue

    try:
        # All shortest paths (equal cost)
        all_paths = list(nx.all_shortest_paths(G, source=source, target=destination, weight='weight'))

        # Filter valid ones
        valid_paths = [p for p in all_paths if is_valid_path(p, recharge_pad_map)]

        if not valid_paths:
            shortest_paths.append('No Valid Path (Intermediate Nodes Lack Pads)')
            shortest_paths_ids.append('No Valid Path (Intermediate Nodes Lack Pads)')
            nodes_visited.append([])
            segments_visited.append([])
            continue

        valid_path = valid_paths[0]  
        traced_path = []
        traced_path_ids = []
        node_list = []
        segment_list = []

        for i in range(len(valid_path) - 1):
            node1 = valid_path[i]
            node2 = valid_path[i + 1]
            edge_data = G[node1][node2]
            segment_id = edge_data['Segment_ID']

            node1_name = edge_data['Node1_Name'] if node1 == row['Node1_ID'] else edge_data['Node2_Name']
            node2_name = edge_data['Node2_Name'] if node2 == row['Node2_ID'] else edge_data['Node1_Name']
            traced_path.extend([node1_name, segment_id])
            traced_path_ids.extend([node1, segment_id])

            if i == 0:
                node_list.append(node1)
            node_list.append(node2)
            segment_list.append(segment_id)

        traced_path.append(node2_name)
        traced_path_ids.append(valid_path[-1])
        shortest_paths.append(' -> '.join(map(str, traced_path)))
        shortest_paths_ids.append(' -> '.join(map(str, traced_path_ids)))
        nodes_visited.append(node_list)
        segments_visited.append(segment_list)

    except nx.NetworkXNoPath:
        shortest_paths.append('No Path Available')
        shortest_paths_ids.append('No Path Available')
        nodes_visited.append([])
        segments_visited.append([])

requests_df['Shortest_Path'] = shortest_paths
requests_df['Shortest_Path_ID'] = shortest_paths_ids
requests_df['Node_IDs_Visited'] = nodes_visited
requests_df['Segment_IDs_Visited'] = segments_visited
requests_df.to_csv('RequestsWithShortestPaths.csv', index=False)

In [6]:
skyway_data = pd.read_csv('Skyway_network_data_copy1.csv') 
requests_df = pd.read_csv('RequestsWithShortestPaths.csv')  

# Create a dictionary to map segment IDs to distances
segment_distance_map = skyway_data.set_index('Segment_ID')['Segment_Length'].to_dict()

# Add columns for segment distances, total distance, and speed for each drone delivery flight
segment_distances_list = []
total_distances = []
speed_list = []

for _, row in requests_df.iterrows():
    segment_ids = row['Segment_IDs_Visited'] 
    if isinstance(segment_ids, str):
        segment_ids = eval(segment_ids)
    # Retrieve distances for each segment and calculate the total distance
    distances = [segment_distance_map.get(seg_id, 0) for seg_id in segment_ids]
    total_distance = sum(distances)
    segment_distances_list.append(distances)
    total_distances.append(total_distance)
    speed_list.append(57.6)  # Constant speed in km/h


requests_df['Segment_Distances (km)'] = segment_distances_list
requests_df['Total_Distance (km)'] = total_distances
requests_df['Speed (km/h)'] = speed_list
output_file = 'RequestsWithShortestPaths.csv'
requests_df.to_csv(output_file, index=False)

In [7]:
requests_df = pd.read_csv('RequestsWithShortestPaths.csv') 

# Convert payload weight from grams to kilograms and create a new column
requests_df['Payload_Weight (kg)'] = requests_df['Payload_Weight (g)'] / 1000
output_file = 'RequestsWithShortestPaths.csv'
requests_df.to_csv(output_file, index=False)

In [8]:
# Constants for the battery consumption formula
M = 1.216  # kg
η = 0.5
r = 3
pw = 0.1  # kW
g = 9.81  # m/s^2

# Function to calculate battery consumption per segment
def calculate_battery_consumption(segment_distance_km, speed_kmh, payload_kg):
    M_total = M + payload_kg
    term1 = segment_distance_km / speed_kmh
    term2 = (M_total * speed_kmh) / ((3600 / g) * (η * r))
    battery_consumption_kWh = term1 * (term2 + pw)
    return battery_consumption_kWh

requests_df = pd.read_csv('RequestsWithShortestPaths.csv')
requests_df['Segment_Distances (km)'] = requests_df['Segment_Distances (km)'].apply(lambda x: eval(x) if isinstance(x, str) else x)
# Compute battery consumption for each segment
requests_df['Battery_Consumption_Per_Segment (kWh)'] = requests_df.apply(
    lambda row: [
        calculate_battery_consumption(dist, row['Speed (km/h)'], row['Payload_Weight (kg)']) 
        for dist in row['Segment_Distances (km)']
    ],
    axis=1
)

output_file = 'RequestsWithShortestPaths.csv'
requests_df.to_csv(output_file, index=False)

In [9]:
requests_df = pd.read_csv('RequestsWithShortestPaths.csv')  

# Total battery capacity of drone in kWh (constant)
full_battery_capacity_kWh = 0.068  
requests_df['Battery_Consumption_Per_Segment (kWh)'] = requests_df['Battery_Consumption_Per_Segment (kWh)'].apply(
    lambda x: eval(x) if isinstance(x, str) else x
)

# Computing total battery consumption
requests_df['Total_Battery_Consumption (kWh)'] = requests_df['Battery_Consumption_Per_Segment (kWh)'].apply(sum)
# Computing battery consumption as a percentage of total capacity
requests_df['Battery_Consumption_Per_Segment (%)'] = requests_df['Battery_Consumption_Per_Segment (kWh)'].apply(
    lambda consumption_list: [(consumption / full_battery_capacity_kWh) * 100 for consumption in consumption_list]
)
requests_df['Total_Battery_Consumption (%)'] = (requests_df['Total_Battery_Consumption (kWh)'] / full_battery_capacity_kWh) * 100

output_file = 'RequestsWithShortestPaths.csv'
requests_df.to_csv(output_file, index=False)

In [10]:
requests_df = pd.read_csv('RequestsWithShortestPaths.csv')

# Constants for recharging time calculation
T_full_minutes = 80  # Full charge time to reach 100% 
factor = 0.2
tau = T_full_minutes * factor

# Threshold and corresponding final state of charge(SOC) pairs
threshold_soc_pairs = {
    20: 0.33,
    30: 0.44,
    40: 0.55,
    50: 0.66,
    60: 0.77,
    70: 0.88,
    80: 0.99
}

# Function to calculate recharging time
def calculate_recharging_time(consumption_percentages, node_ids, min_threshold, sf):
    recharging_times = []
    cumulative_consumption = 0

    for i in range(len(node_ids)):
        if i == 0:
            recharging_times.append(0)
        else:
            cumulative_consumption += consumption_percentages[i - 1]
            future_consumption = consumption_percentages[i] if i < len(consumption_percentages) else 0
            predicted_total_consumption = cumulative_consumption + future_consumption

            if cumulative_consumption >= min_threshold or predicted_total_consumption >= min_threshold or i == len(node_ids) - 1:
                Si = 1 - (consumption_percentages[i - 1] / 100)
                if Si < 0 or Si >= sf:
                    recharging_times.append(0)
                else:
                    try:
                        time_to_recharge = -tau * math.log((1 - sf) / (1 - Si))
                        recharging_times.append(time_to_recharge)
                    except ValueError:
                        recharging_times.append(0)
                cumulative_consumption = 0
            else:
                recharging_times.append(0)
    return recharging_times


requests_df['Battery_Consumption_Per_Segment (%)'] = requests_df['Battery_Consumption_Per_Segment (%)'].apply(
    lambda x: eval(x) if isinstance(x, str) else x
)
requests_df['Node_IDs_Visited'] = requests_df['Node_IDs_Visited'].apply(
    lambda x: eval(x) if isinstance(x, str) else x
)

min_thresholds = []
sfs = []
recharging_times_all = []

for _, row in requests_df.iterrows():
    min_thresh = random.choice(list(threshold_soc_pairs.keys()))
    sf_value = threshold_soc_pairs[min_thresh]
    min_thresholds.append(min_thresh)
    sfs.append(sf_value)
    rt = calculate_recharging_time(row['Battery_Consumption_Per_Segment (%)'], row['Node_IDs_Visited'], min_thresh, sf_value)
    recharging_times_all.append(rt)

requests_df['Minimum_Recharge_Threshold'] = min_thresholds
requests_df['Final_SOC'] = sfs
requests_df['Recharging_Time_Per_Node (minutes)'] = recharging_times_all

output_path = "RequestsWithShortestPaths.csv"
requests_df.to_csv(output_path, index=False)

In [11]:
requests_df = pd.read_csv("RequestsWithShortestPaths.csv")
skyway_df = pd.read_csv("Skyway_network_data_copy1.csv")

# Helper function to compute travel time for a segment
def compute_travel_time(segment_id, speed):
    segment_info = skyway_df[skyway_df['Segment_ID'] == segment_id].iloc[0]
    segment_distance = segment_info['Segment_Length'] * 1000  # Convert to meters
    
    # Convert speed to m/min (km/h to m/min)
    speed_m_per_min = speed * 1000 / 60
    travel_time = segment_distance / speed_m_per_min  # Time in minutes
    return travel_time

# Create the travel time per segment column
travel_times = []

for index, row in requests_df.iterrows():
    segments = eval(row['Segment_IDs_Visited'])  # List of segment IDs
    speed = row['Speed (km/h)']  # Speed of the drone
    travel_time_dict = {}  # Dictionary to store travel time per segment
    for segment_id in segments:
        travel_time_dict[segment_id] = compute_travel_time(segment_id, speed)    
    travel_times.append(travel_time_dict)

requests_df['Travel_Time_Per_Segment'] = travel_times
requests_df.to_csv("RequestsWithShortestPaths.csv", index=False)

In [12]:
requests_df = pd.read_csv("RequestsWithShortestPaths.csv")

def correct_adjusted_request_time_format(df):
    df['Adjusted_Request_Time'] = pd.to_datetime(df['Adjusted_Request_Time'], format='%H:%M:%S.%f')
    df['Adjusted_Request_Time'] = df['Adjusted_Request_Time'].dt.strftime('%H:%M:%S.%f').str[:-3]
    return df
requests_df = correct_adjusted_request_time_format(requests_df)

# Function to calculate trajectory time intervals
def calculate_trajectory_time_intervals(row):
    request_time = datetime.strptime(row['Adjusted_Request_Time'], "%H:%M:%S.%f")  # Start time of the drone
    node_ids = eval(row['Node_IDs_Visited'])  # List of nodes visited
    segment_ids = eval(row['Segment_IDs_Visited'])  # List of segment IDs
    segment_distances = eval(row['Segment_Distances (km)'])  # List of segment distances in km
    speed = row['Speed (km/h)']  # Drone speed in km/h
    recharging_times = eval(row['Recharging_Time_Per_Node (minutes)'])  # Time spent at  intermediate nodes for charging
    trajectory_time = {}  # Dictionary to store time intervals
    current_time = request_time 
    
    # Iterate over nodes and segments
    for i in range(len(node_ids) - 1):  # -1 because segments are between nodes
        node = node_ids[i]
        segment = segment_ids[i]
        
        # Add the current node with its time interval
        node_start_time = current_time
        node_end_time = current_time + timedelta(minutes=recharging_times[i])
        trajectory_time[node] = f"{node_start_time.strftime('%H:%M:%S.%f')[:-3]}-{node_end_time.strftime('%H:%M:%S.%f')[:-3]}"
        current_time = node_end_time  # Update current time after staying at the node
        
        # Calculate travel time dynamically using distance and speed
        segment_distance = segment_distances[i]  # Distance in km
        segment_time = (segment_distance / speed) * 60  # Convert time to minutes
        
        # Add the segment with its time interval
        segment_start_time = current_time
        segment_end_time = current_time + timedelta(minutes=segment_time)
        trajectory_time[segment] = f"{segment_start_time.strftime('%H:%M:%S.%f')[:-3]}-{segment_end_time.strftime('%H:%M:%S.%f')[:-3]}"
        current_time = segment_end_time  # Update current time after traversing the segment
    
    # Add the final destination node
    final_node = node_ids[-1]
    node_start_time = current_time
    node_end_time = current_time + timedelta(minutes=recharging_times[-1])
    trajectory_time[final_node] = f"{node_start_time.strftime('%H:%M:%S.%f')[:-3]}-{node_end_time.strftime('%H:%M:%S.%f')[:-3]}"
    return trajectory_time

# Apply the function to calculate trajectory time intervals for each row
requests_df['Trajectory_Time'] = requests_df.apply(calculate_trajectory_time_intervals, axis=1)
requests_df.to_csv("RequestsWithShortestPaths.csv", index=False)