# Introduction

This notebook demonstrates how to use partiiton strategy to solve larger scale route optimization problem. The rationale for partitioning is that usually an optimization problem could be hard to solve given the NP-hard nature for most of the optimization problems. To trade-off the result optimality and running time, one can partition the big problem into many smaller problems, then solve each smaller problem, and finially combine all results as the final result. The whole pipeline is illustrated by the below figure.

<img src=../docs/media/pipeline.png width="90%" />

There are 4 main steps in the pipeline:
1.  Reduce: It will try to assign some of the orders to truck routes in a heuristic way. The remaining unscheduled order will be passed to the later steps for optimization. This step is optional, namely, one can bypass this step but let optimizer search solution for all orders. However, reducing the search space by  heuristic can significantly reduce the search space. This will make it easier for the oprimization solver to find a good solution.  
2.  Partition: This is core step to partition the big problem into smaller problems. 
3.  Solve: This step is to solve individual small problem using whatever optimization solver.
4.  Merge: This final step is to combine all results from each small problem.




# 1.0 Load libraries

We use Azure ML pipeline for the implementation. Specifically, the partitioning step is done by the PrallelRunStep in Azure ML SDK.

In [1]:
# import required libraries
import os
from dotenv import load_dotenv

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient, Input, Output, load_component
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import Environment
from azure.ai.ml.constants import AssetTypes, InputOutputModes
from azure.ai.ml.parallel import parallel_run_function, RunFunction

# 1.1 Setup some environment
## 1.1.1 Load variables

Some parameters are managed by environment variables.To specify your values, create a .env file in the root folder of the repository and set the values for the following parameters.

In [2]:
load_dotenv()

ws_name = os.environ['AML_WORKSPACE_NAME']
subscription_id = os.environ['AML_SUBSCRIPTION_ID']
resource_group = os.environ['AML_RESOURCE_GROUP']


print('---- Check Azure setting ----')
print(f'AML Workspace name       : {ws_name}')
print(f'Subscription ID          : {subscription_id}')
print(f'Resource group           : {resource_group}')

---- Check Azure setting ----
AML Workspace name       : amldemo
Subscription ID          : e4eda206-7aff-4e54-8f55-1e60f0e64093
Resource group           : aml-demo-rg


## 1.1.2 Azure authentication and Load Azure ML Workspace

We are using DefaultAzureCredential to get access to workspace.

DefaultAzureCredential should be capable of handling most Azure SDK authentication scenarios.

Reference for more available credentials if it does not work for you: configure credential example, azure-identity reference doc.

In [3]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

In [4]:
try:
    ml_client = MLClient.from_config(credential=credential)
except Exception as ex:
    # NOTE: Update following workspace information if not correctly configure before
    client_config = {
        "subscription_id": subscription_id,
        "resource_group": resource_group,
        "workspace_name": ws_name,
    }

    if client_config["subscription_id"].startswith("<"):
        print(
            "please update your <SUBSCRIPTION_ID> <RESOURCE_GROUP> <AML_WORKSPACE_NAME> in notebook cell"
        )
        raise ex
    else:  # write and reload from config file
        import json, os

        config_path = "../.azureml/config.json"
        os.makedirs(os.path.dirname(config_path), exist_ok=True)
        with open(config_path, "w") as fo:
            fo.write(json.dumps(client_config))
        ml_client = MLClient.from_config(credential=credential, path=config_path)
print(ml_client)

Found the config file in: C:\Users\zhianhe\Demo\.azureml\config.json


MLClient(credential=<azure.identity._credentials.default.DefaultAzureCredential object at 0x000001EDBC53BD90>,
         subscription_id=e4eda206-7aff-4e54-8f55-1e60f0e64093,
         resource_group_name=aml-demo-rg,
         workspace_name=amldemo)


## 1.1.3 Get Compute Cluster

Read the compute name from the environment varibale. If it doest not exist in the Azure ML workspace, a new compute target will be created.

In [5]:
from azure.ai.ml.entities import AmlCompute

# specify aml compute name for the optimization job
cpu_compute_target = "op-cluster"

try:
    ml_client.compute.get(cpu_compute_target)
except Exception:
    print("Creating a new cpu compute target...")
    compute = AmlCompute(
        name=cpu_compute_target, size="Standard_E4ds_v4", min_instances=0, max_instances=10
    )
    ml_client.compute.begin_create_or_update(compute).result()

## 1.1.4 Create AML Environemnt and Run Configuration

In [6]:
# environment
env_name = 'op-env'

try:
    env = ml_client.environments.get(name=env_name, version="2")
    print("Found existing environment.")

except Exception as ex:
    #Print the error message
    print(ex)
    
    print("Creating new enviroment")
    env_docker_conda = Environment(
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
        conda_file="../src/env.yml",
        name=env_name,
        description="Environment created from a Docker image plus Conda environment.",
    )
    env = ml_client.environments.create_or_update(env_docker_conda)

Found existing environment.


## 1.1.5 Prepare Example Data

In [8]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

order_path = "../sample_data/order_large.csv"
distances_path = "../sample_data/distance.csv"
# set the version number of the data asset
v1 = "1"

order_data = Data(
    name="orders",
    version=v1,
    description="Example order data",
    path=order_path,
    type=AssetTypes.URI_FILE,
)

distances_data = Data(
    name="distances-matrix",
    version=v1,
    description="Example distance data",
    path=distances_path,
    type=AssetTypes.URI_FILE,
)

## create order data asset if it doesn't already exist:
try:
    order_data_asset = ml_client.data.get(name=order_data.name, version=order_data.version)
    print(
        f"Data asset already exists. Name: {order_data.name}, version: {order_data.version}"
    )
except:
    ml_client.data.create_or_update(order_data)
    order_data_asset = ml_client.data.get(name=order_data.name, version=order_data.version)
    print(f"Data asset created. Name: {order_data.name}, version: {order_data.version}")

## create distances data asset if it doesn't already exist:
try:
    distances_data_asset = ml_client.data.get(name=distances_data.name, version=distances_data.version)
    print(
        f"Data asset already exists. Name: {distances_data.name}, version: {distances_data.version}"
    )
except:
    ml_client.data.create_or_update(distances_data)
    distances_data_asset = ml_client.data.get(name=distances_data.name, version=distances_data.version)
    print(f"Data asset created. Name: {distances_data.name}, version: {distances_data.version}")


[32mUploading order_large.csv[32m (< 1 MB): 100%|##########| 653k/653k [00:00<00:00, 836kB/s]
[39m



Data asset created. Name: orders, version: 1
Data asset created. Name: distances-matrix, version: 1


# 1.2 Set up Azure ML Pipeline

This section contains the main logic of the optimization pipeline.

## 1.2.1 Reduce the search space of the problem

The first step is to reduce the search space by assigning some of the orders based on heuristic. The detailed logic is implemented in the reduce.py. In general, if we use heuristic propoerly, we can achieve a good trade-off between result optimality and running time.

In [9]:
src_dir = '../src'

In [10]:
from azure.ai.ml import command
from azure.ai.ml import Input, Output

reduce_component = command(
    name="reduce_step",
    display_name="Reduce the problem space",
    description="Read the model input, create partial assignment by heuristic",
    inputs={
        "model_input": Input(type="uri_file"),
        "distance": Input(type="uri_file"),
    },
    outputs=dict(
        model_result_partial=Output(type="uri_folder", mode="rw_mount"),
        model_input_reduced=Output(type="uri_folder", mode="rw_mount"),
    ),
    # The source folder of the component
    code=src_dir,
    command="""python reduce.py \
            --model_input ${{inputs.model_input}} --distance ${{inputs.distance}} \
            --model_result_partial ${{outputs.model_result_partial}} --model_input_reduced ${{outputs.model_input_reduced}} \
            """,
    environment=f"{env.name}:{env.version}",
)

# 1.2.2 Partition the problem

For large scale optimization problem, the problem space is just so big to solve practically. A commonly used idea is to partition the big problem into many smaller problems. Then solve each smaller problem individually and combine all the results as the final result. In some cases, the partition may not affect the result optimality, for example, in the route optimization problem, we can partition the orders by the delivery sources. In other cases, there will be trade-off between result optimality and running time when partitioning is applied. 

In [11]:
partition_component = command(
    name="partition_step",
    display_name="Partition the big problem to many small problems",
    description="Read the reduced model input, partition the problem based on some heuristic",
    inputs={
        "model_input_reduced": Input(type="uri_folder"),
        "distance": Input(type="uri_file"),
    },
    outputs=dict(
        model_input_list=Output(type="uri_folder", mode="rw_mount"),
    ),
    # The source folder of the component
    code=src_dir,
    command="""python partition.py \
            --model_input_reduced ${{inputs.model_input_reduced}} --distance ${{inputs.distance}} \
            --model_input_list ${{outputs.model_input_list}} \
            """,
    environment=f"{env.name}:{env.version}",
)

## 1.2.3 Solve individual problem

After the problem is partitioned, we can solve each individul one by using whatever optimization solver. The optimization solver itself may leverage multi-process to speed up the search of result. This level of parallelism is totally controlled by the solver but not our Azure ML pipeline.

In [12]:
# parallel task to process file data
solve_component = parallel_run_function(
    name="parallel_solver",
    display_name="Solve the small problems in parallel",
    description="parallel component for problem solver",
    inputs=dict(
        model_input_list=Input(
            type=AssetTypes.URI_FOLDER,
            description="The data to be split and scored in parallel",
        ),
        distance=Input(type=AssetTypes.URI_FOLDER, description='The distance file used by the solver.')
    ),
    outputs=dict(model_result_list=Output(type=AssetTypes.MLTABLE)),
    input_data="${{inputs.model_input_list}}",
    instance_count=10,
    max_concurrency_per_instance=1,
    mini_batch_size="1",
    mini_batch_error_threshold=1,
    retry_settings=dict(max_retries=2, timeout=240), # make sure the timeout is larger than the timeout of the task
    logging_level="DEBUG",
    task=RunFunction(
        code=src_dir,
        entry_script="solve.py",
        program_arguments="--distance ${{inputs.distance}}",
        environment=f"{env.name}:{env.version}",
        append_row_to="${{outputs.model_result_list}}",
    ),
)

## 1.2.4 Merge the results

Once all the smaller problems are solved, we can combine the result as the final one. There could be chance to further optimize the result in this step in the case the previous partitioning will affect the global optimal. For example, one may combine two packages into the same truck from two seperated result if the combined one is more cost-efficient. 

In [13]:

merge_component = command(
    name="merge_step",
    display_name="Merge the result of the small problems",
    description="Merge the intermediate result as the final result",
    inputs={
        "model_input": Input(type="uri_file"),
        "distance": Input(type="uri_file"),
        "model_result_partial": Input(type="uri_folder"),
        "model_result_list": Input(type="uri_folder"),
    },
    outputs=dict(
        model_result_final=Output(type="uri_folder", mode="rw_mount"),
    ),
    # The source folder of the component
    code=src_dir,
    command="""python merge.py \
            --model_input ${{inputs.model_input}} --distance ${{inputs.distance}} \
            --model_result_partial ${{inputs.model_result_partial}} --model_result_list ${{inputs.model_result_list}} \
            --model_result_final ${{outputs.model_result_final}} \
            """,
    environment=f"{env.name}:{env.version}",
)

## 1.2.5 Run the Pipeline

Finally, we chained all steps into a single Azure ML pipeline and submit it to run.

In [14]:
# the dsl decorator tells the sdk that we are defining an Azure Machine Learning pipeline
from azure.ai.ml import dsl, Input, Output


@dsl.pipeline(
    compute=cpu_compute_target, 
    description="E2E route optimization pipeline",
)
def route_optimization_pipeline(
    pipeline_job_model_input,
    pipeline_job_distance,
):
    # initialize the reduce step
    reduce_job = reduce_component(
        model_input=pipeline_job_model_input,
        distance=pipeline_job_distance
    )

    # initialize the partition step
    partition_job = partition_component(
        model_input_reduced=reduce_job.outputs.model_input_reduced,  # note: using outputs from previous step
        distance=pipeline_job_distance,  
    )

    # initialize the solve step
    solve_job = solve_component(
        model_input_list=partition_job.outputs.model_input_list,  # note: using outputs from previous step
        distance=pipeline_job_distance,  
    )

    # initialize the merge step
    merge_job = merge_component(
        model_input=pipeline_job_model_input,  # note: using outputs from previous step
        distance=pipeline_job_distance,  
        model_result_partial=reduce_job.outputs.model_result_partial, # note: using outputs from previous step
        model_result_list=solve_job.outputs.model_result_list, # note: using outputs from previous step
    )

    # a pipeline returns a dictionary of outputs
    # keys will code for the pipeline output identifier
    # pipeline_job_model_result_final
    return {
        "pipeline_job_model_result_final": merge_job.outputs.model_result_final,
    }

In [15]:
# Let's instantiate the pipeline with the parameters of our choice
pipeline = route_optimization_pipeline(
    pipeline_job_model_input=Input(type="uri_file", path=order_data_asset.id, mode=InputOutputModes.RO_MOUNT),
    pipeline_job_distance=Input(type="uri_file", path=distances_data_asset.id, mode=InputOutputModes.RO_MOUNT),
)

In [16]:
# submit the pipeline job
pipeline_job = ml_client.jobs.create_or_update(
    pipeline,
    # Project's name
    experiment_name="route_optimization_demo",
)

# Force the pipeline to rerun all steps
pipeline_job.force_rerun = True
ml_client.jobs.stream(pipeline_job.name)

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
pathOnCompute is not a known attribute

RunId: olive_pasta_sym445v2ny
Web View: https://ml.azure.com/runs/olive_pasta_sym445v2ny?wsid=/subscriptions/e4eda206-7aff-4e54-8f55-1e60f0e64093/resourcegroups/aml-demo-rg/workspaces/amldemo

Streaming logs/azureml/executionlogs.txt

[2025-01-31 04:23:43Z] Submitting 1 runs, first five are: 85c5104d:bb669507-09bc-4981-817d-d91d14f0c93b
[2025-01-31 04:28:24Z] Completing processing run id bb669507-09bc-4981-817d-d91d14f0c93b.
[2025-01-31 04:28:25Z] Submitting 1 runs, first five are: cea3f411:7aed87c9-d9ca-41e0-a0b2-9e3ad20cf663
[2025-01-31 04:29:07Z] Completing processing run id 7aed87c9-d9ca-41e0-a0b2-9e3ad20cf663.
[2025-01-31 04:29:08Z] Submitting 1 runs, first five are: e7030575:5da5e3bf-d676-42e0-9a8e-ff567ab67f85
[2025-01-31 04:47:22Z] Execution of experiment failed, update experiment status and cancel running nodes.

Execution Summary
RunId: olive_pasta_sym445v2ny
Web View: https://ml.azure.com/runs/olive_pasta_sym445v2ny?wsid=/subscriptions/e4eda206-7aff-4e54-8f55-1e60f0e64093/re

JobException: Exception : 
 {
    "error": {
        "code": "UserError",
        "message": "Pipeline has failed child jobs. Failed nodes: /parallel_solver. For more details and logs, please go to the job detail page and check the child jobs.",
        "message_format": "Pipeline has failed child jobs. {0}",
        "message_parameters": {},
        "reference_code": "PipelineHasStepJobFailed",
        "details": []
    },
    "environment": "southeastasia",
    "location": "southeastasia",
    "time": "2025-01-31T04:47:22.568872Z",
    "component_name": ""
} 

# 1.3 Check the Model Result

In [108]:
sample_schedule_path = '../tmp'

# output = ml_client.jobs.download(name=pipeline_job.name, download_path=sample_schedule_path, all=True)
# Download specific output
output = ml_client.jobs.download(name=pipeline_job.name, download_path=sample_schedule_path, output_name='pipeline_job_model_result_final')

Downloading artifact azureml://subscriptions/e4eda206-7aff-4e54-8f55-1e60f0e64093/resourcegroups/aml-demo-rg/workspaces/amldemo/datastores/workspaceblobstore/paths/azureml/1cb38885-7704-492b-ba20-3113e85b2d3b/model_result_final/ to ..\tmp\named-outputs\pipeline_job_model_result_final


In [109]:
import pandas as pd

model_output = pd.read_csv('../tmp/named-outputs/pipeline_job_model_result_final/schedule.csv')

In [110]:
model_output.head()

Unnamed: 0,Schedule_ID,Truck_Route,Order_ID,Material_ID,Item_ID,Danger_Type,Source,Destination,Start_Time,Arrival_Time,Deadline,Shared_Truck,Truck_Type,Area_Rate,Weight_Rate,Capacity_Rate
0,0157f3db-2a5b-4738-865a-bdec22d1b043,City_61->City_19,A230153,B-6298,P01-70d4c91f-65e7-434c-89dc-5653fa3d4dff,type_1,City_61,City_19,2022-04-05 23:59:59,2022-04-07 01:21:40,2022-04-07 11:59:59,N,16.5,0.880099,0.954333,0.954333
1,0157f3db-2a5b-4738-865a-bdec22d1b043,City_61->City_19,A230153,B-6298,P01-291a1abe-8b52-4fa9-9667-f6aaba4238bb,type_1,City_61,City_19,2022-04-05 23:59:59,2022-04-07 01:21:40,2022-04-07 11:59:59,N,16.5,0.880099,0.954333,0.954333
2,0157f3db-2a5b-4738-865a-bdec22d1b043,City_61->City_19,A230153,B-6298,P01-1dc8de6a-7ad4-4a3b-a5aa-4b536349f3d7,type_1,City_61,City_19,2022-04-05 23:59:59,2022-04-07 01:21:40,2022-04-07 11:59:59,N,16.5,0.880099,0.954333,0.954333
3,0157f3db-2a5b-4738-865a-bdec22d1b043,City_61->City_19,A230153,B-6298,P01-e1c4511f-f1fc-4f89-b3bd-dea13841510c,type_1,City_61,City_19,2022-04-05 23:59:59,2022-04-07 01:21:40,2022-04-07 11:59:59,N,16.5,0.880099,0.954333,0.954333
4,0157f3db-2a5b-4738-865a-bdec22d1b043,City_61->City_19,A230153,B-6298,P01-86e9b1e3-8f73-49f0-9573-a52a92bbd0f6,type_1,City_61,City_19,2022-04-05 23:59:59,2022-04-07 01:21:40,2022-04-07 11:59:59,N,16.5,0.880099,0.954333,0.954333


# Process the distance file

In [3]:
# # Read the order large file as a pandas dataframe
# import pandas as pd

# order_path = '../sample_data/order_large.csv'
# order_large = pd.read_csv(order_path)

# # trim the 0 after second decimal point of Dealine
# order_large['Deadline'] = order_large['Deadline'].apply(lambda x: x.split('.')[0])

# # overwrite order large file
# order_large.to_csv('../sample_data/order_large.csv', index=False)

# # print(len(order_large))
# # # read the distance file as a pandas dataframe

# # distance = pd.read_csv(distances_path)

# # # join order_large and distance to get the distance between Source and Destination

# # order_large = order_large.merge(distance, left_on=['Source', 'Destination'], right_on=['Source', 'Destination'])

# # print(len(order_large))

# # # calculate the time difference between available time and deadline, need to covert the time to datetime object first
# # order_large['Available_Time'] = pd.to_datetime(order_large['Available_Time'])
# # order_large['Deadline'] = pd.to_datetime(order_large['Deadline'])

# # order_large['Time_Diff'] = (order_large['Deadline'] - order_large['Available_Time']).dt.total_seconds()

# # # assume the truck speed is 40/3.6 m/s
# # truck_speed = 40/3.6

# # # calculate the delivery time 
# # order_large['Delivery_Time'] = order_large['Distance(M)']/truck_speed

# # # find the time difference between delivery time and deadline
# # order_large['Time_Diff_Delivery'] = order_large['Time_Diff'] - order_large['Delivery_Time']

# # # find all orders with Time_Diff_Delivery < 0
# # print(len(order_large[order_large['Time_Diff_Delivery'] < 0]))

# # # # calculate how many days we need to deliver the order
# # # order_large['Delivery_Days'] = order_large['Delivery_Time']/(24*3600)+2



# # # # if Time_Diff_Delivery < 0, set the dealine to be Available_Time + Delivery_Days
# # # order_large['Deadline'] = order_large.apply(lambda x: x['Available_Time'] + pd.Timedelta(days=x['Delivery_Days']) if x['Time_Diff_Delivery'] < 0 else x['Deadline'], axis=1)

# # # order_large['Time_Diff'] = (order_large['Deadline'] - order_large['Available_Time']).dt.total_seconds()
# # # order_large['Time_Diff_Delivery'] = order_large['Time_Diff'] - order_large['Delivery_Time']

# # # print(len(order_large))

# # # # find all orders with Time_Diff_Delivery < 0
# # # print(len(order_large[order_large['Time_Diff_Delivery'] < 0]))

# # # # Update the order large file with the new deadline, remove all the new added columns
# # # order_large = order_large.drop(columns=['Distance(M)','Time_Diff', 'Delivery_Time', 'Time_Diff_Delivery', 'Delivery_Days'])

# # # order_large.to_csv('../sample_data/order_large.csv', index=False)

In [111]:
# import json

# with open('../sample_data/region.json', 'r') as file:
#     data = file.read()
#     region = json.loads(data)

In [None]:
# cities = []

# for province in region['districts']:
#     for city in province['districts']:
#         if city['level'] == 'city':
#             print(city)
#             cities.append({
#                 'name': city['name'],
#                 'longitude': city['center']['longitude'],
#                 'latitude': city['center']['latitude']
#             })

In [None]:
# len(cities)

In [None]:
# import random

# random_numbers = random.sample(range(1, 369), 61)
# print(random_numbers)

In [None]:
# len(random_numbers)

In [116]:
# cities_mapping = {}
# for i in range(0, len(random_numbers)):
#     cities_mapping[f'City_{i}'] = cities[random_numbers[i]]

# cities_mapping['City_61'] = cities[0]

In [117]:
# cities_mapping

# # convert the citiies_mapping to dataframe, keep only the name, longitude and latitude as the columns header
# df = pd.DataFrame(cities_mapping).T.reset_index()

# # remove the index column
# df = df.drop(columns=['index'])

# # save the dataframe to csv

# df.to_csv('../sample_data/cities.csv', index=False)


In [None]:
# df.head()

In [119]:
# # Replace the value in source and destination with the city name
# distances_path = '../sample_data/distance.csv'
# distances = pd.read_csv(distances_path)

# # replace the distance value with the distance between the two cities using the latitude and longitude
# from geopy.distance import geodesic

# distances['Distance(M)'] = distances.apply(lambda x: int(geodesic((cities_mapping[x['Source']]['latitude'], cities_mapping[x['Source']]['longitude']), (cities_mapping[x['Destination']]['latitude'], cities_mapping[x['Destination']]['longitude'])).kilometers*1000), axis=1)

# # distance is a panda dataframe object, replace the value in Source and Destination with the city name in dictionary cities_mapping
# # for example, if the value in source is 1, replace it with the city name in cities_mapping['City_1']['name']

# distances['Source'] = distances['Source'].apply(lambda x: cities_mapping[f'{x}']['name'])
# distances['Destination'] = distances['Destination'].apply(lambda x: cities_mapping[f'{x}']['name'])


In [None]:
# distances.head(20)

In [121]:
# # overwrite the distance file with the new distance value

# distances.to_csv('../sample_data/distance.csv', index=False)


In [122]:
# # read the order small and order large file
# order_small = pd.read_csv('../sample_data/order_small.csv')
# order_large = pd.read_csv('../sample_data/order_large.csv')

# # replace the value in source and destination with the city name
# order_small['Source'] = order_small['Source'].apply(lambda x: cities_mapping[f'{x}']['name'])
# order_small['Destination'] = order_small['Destination'].apply(lambda x: cities_mapping[f'{x}']['name'])

# order_large['Source'] = order_large['Source'].apply(lambda x: cities_mapping[f'{x}']['name'])
# order_large['Destination'] = order_large['Destination'].apply(lambda x: cities_mapping[f'{x}']['name'])

# # overwrite the order small and order large file with the new city name
# order_small.to_csv('../sample_data/order_small.csv', index=False)
# order_large.to_csv('../sample_data/order_large.csv', index=False)