# SurveyScout usage example

This notebook will serve as an example to showcase the different flows implemented.

In [None]:
import numpy as np
import pandas as pd

## Create dataset

We will generate random enumerators and targets data in the city of Chennai in India.

In [None]:
# Set a random seed for reproducibility
np.random.seed(35)
# Define the boundaries of Chennai (approximate)
min_lat = 12.9190
max_lat = 13.2400
min_lon = 80.08
max_lon = 80.2460

N_ENUMS = 6
N_TARGETS = 30

# Generate 100 enumerators with random latitudes and longitudes
enumerator_data = {
    "enum_id": [f"E{i+1:03d}" for i in range(N_ENUMS)],  # Creates IDs like E001, E002, ...
    "enum_lat": np.random.uniform(min_lat, max_lat, N_ENUMS), 
    "enum_long": np.random.uniform(min_lon, max_lon, N_ENUMS), 
}

# Generate 2000 targets with random latitudes and longitudes
target_data = {
    "target_id": [
        f"T{i+1:04d}" for i in range(N_TARGETS)
    ],  # Creates IDs like T0001, T0002, ...
    "target_lat": np.random.uniform(min_lat, max_lat, N_TARGETS),  
    "target_long": np.random.uniform(min_lon, max_lon, N_TARGETS), 
}

# Create the DataFrames
df_enum = pd.DataFrame(enumerator_data)
df_target = pd.DataFrame(target_data)

# Display the shape of the created DataFrames to confirm the number of rows
print(f"Enumerators DataFrame shape: {df_enum.shape}")
print(f"Targets DataFrame shape: {df_target.shape}")

Enumerators DataFrame shape: (6, 3)
Targets DataFrame shape: (30, 3)


Create `LocationDataset`s using our toy data:

In [None]:
from surveyscout.utils import LocationDataset

enum_locations = LocationDataset(df_enum, "enum_id", "enum_lat", "enum_long")
target_locations = LocationDataset(
    df_target, "target_id", "target_lat", "target_long"
)

Let's map the data:

In [None]:
from surveyscout.visualize import plot_enum_targets

plot_enum_targets(enum_locations, target_locations)

## Find optimal assignments

### Case 1. Find optimal assignments using Haversine distance
This is the most lightweight basic way to generate assignments, as it uses Haversine distance: the shortest distance as the crow flies from one GPS point to another. For a more accurate distance, see the next section.
We will generate a dataset of 100 enumerators and 2000 targets and simulate the optimization.

#### Basic min distance flow with haversine

This flow implements the basic min distance model where we specify our parameters and the model will find the optimal results. Here are the parameters of the model:
- min_target: The minimum number of targets each enumerator is required to visit.
- max_target: The maximum number of targets each enumerator is allowed to visit.
- max_cost: The  maximum cost assignable to a surveyor to visit a single target.
- max_total_cost:  The initial maximum total cost assignable to a surveyor

In [None]:
from surveyscout.flows import basic_min_distance_flow

results = basic_min_distance_flow(
    enum_locations=enum_locations,
    target_locations=target_locations,
    min_target=3,
    max_target=7,
    max_cost=10000,
    max_total_cost=100000,
)

Optimal value:  202.23429790087613


In [None]:
results.head()

Unnamed: 0,target_id,enum_id,cost
0,T0001,E001,1.0
1,T0002,E004,1.0
2,T0003,E002,1.0
3,T0004,E005,1.0
4,T0005,E003,1.0


In [None]:
from surveyscout.visualize import plot_assignments

plot_assignments(enum_locations, target_locations, results)

#### Recursive min distance flow with haversine
This flow allows to recursively update parameters  until we reach a solution. 
The parameters are as follow:
- min_target: The minimum number of targets each enumerator is required to visit.
- max_target: The maximum number of targets each enumerator is allowed to visit.
- max_cost: The  maximum cost assignable to a surveyor to visit a single target.
- max_total_cost:  The initial maximum total cost assignable to a surveyor
- max_perc: The initial percentile to determine the maximum surveyor-to-target cost (default is 80).
- param_increment: The value by which the parameter bounds and percentiles are adjusted during the recursion if no solution is found (default is 5).

In [None]:
from surveyscout.flows import recursive_min_distance_flow

results_df, params = recursive_min_distance_flow(
    enum_locations=enum_locations,
    target_locations=target_locations,
    min_target=6,
    max_target=6,
    max_cost=10000,
    max_total_cost=100000,
    param_increment=10,
)

Optimal value:  245.69207658596733


Notice that the constraints have been relaxed:

In [None]:
print(params)

{'min_target': 5, 'max_target': 7, 'max_cost': 11000.0, 'max_total_cost': 110000.00000000001}


In [None]:
results_df.head()

Unnamed: 0,target_id,enum_id,cost
0,T0001,E006,1.0
1,T0002,E003,1.0
2,T0003,E005,1.0
3,T0004,E001,1.0
4,T0005,E004,1.0


In [None]:
plot_assignments(enum_locations, target_locations, results_df)

### Case 2. Find optimal assignments using OSRM

To use OSRM we will specify one more parameter:
- `cost_function`: The API used to get the route distance between points. Can be either
  'haversine', 'osrm', 'google_distance' or 'google_distance'.

You will also need access to an OSRM server serving a map of Chennai. You can follow [OSRM's quick start
guide](https://github.com/Project-OSRM/osrm-backend?tab=readme-ov-file#quick-start) to
download an OpenStreetMap and run an OSRM docker container.


First, download India's southern region map (source: [Geofabrik - Asia - India](https://download.geofabrik.de/asia/india.html))

```shell
wget https://download.geofabrik.de/asia/india/southern-zone-latest.osm.pbf
```

Preprocess the map data with the following commands (it will take some time)

```shell
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend:latest osrm-extract -p /opt/car.lua /data/southern-zone-latest.osm.pbf || echo "osrm-extract failed"
```

```shell
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-partition /data/southern-zone-latest.osm.pbfm || echo "osrm-partition failed"
docker run -t -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-customize /data/southern-zone-latest.osm.pbf || echo "osrm-customize failed"
```
Run the OSRM docker container

```shell
docker run -t -i -p 5001:5000 -v "${PWD}:/data" ghcr.io/project-osrm/osrm-backend osrm-routed --algorithm mld /data/southern-zone-latest.osrm
```

By default, surveyscout expects the OSRM endpoint at `http://localhost:5001` (see
`surveyscount.config.py` for default value.) If your OSRM server is at a different
endpoint, make sure to export it as environment variable:

```shell
export OSRM_URL=<your OSRM endpoint>
```

Now you are ready to create assignments using OSRM distance!

In [None]:
from surveyscout.flows import basic_min_distance_flow

results = basic_min_distance_flow(
    enum_locations=enum_locations,
    target_locations=target_locations,
    min_target=5,
    max_target=100,
    max_cost=1000,
    max_total_cost=10000,
    cost_function="osrm",
)

Optimal value:  352.42999999999995


In [None]:
results.head()

Unnamed: 0,target_id,enum_id,cost
0,T0001,E008,1.0
1,T0002,E006,1.0
2,T0003,E009,1.0
3,T0004,E002,1.0
4,T0005,E009,1.0


### Recursive min distance Flow with OSRM
To use OSRM we will specify one more parameter:
- routing: The API used to get the route distance between points. Can be either 'haversine' or 'osrm'

In [None]:
from surveyscout.flows import recursive_min_distance_flow

results_df, params = recursive_min_distance_flow(
    enum_locations=enum_locations,
    target_locations=target_locations,
    min_target=3,
    max_target=7,
    max_cost=1000,
    max_total_cost=10000,
    param_increment=2,
    cost_function="osrm",
)

Optimal value:  294.6057


In [None]:
print(params)

{'min_target': 15, 'max_target': 35, 'max_cost': 100, 'max_total_cost': 1000}


In [None]:
results_df.head()

Unnamed: 0,target_id,enum_id,cost
0,T0001,E005,1.0
1,T0002,E004,1.0
2,T0003,E002,1.0
3,T0004,E005,1.0
4,T0005,E003,1.0


### Case 3. Find optimal assignments with Google Distance Matrix API

Make sure your Google Distance Matrix API key is exported as an environment variable:

```shell
export GOOGLE_MAPS_PLATFORM_API_KEY=<your Google Maps Platform API Key>
```

In [None]:
google_results, params = recursive_min_distance_flow(
    enum_locations=enum_locations,
    target_locations=target_locations,
    min_target=3,
    max_target=7,
    max_cost=1000,
    max_total_cost=10000,
    cost_function="google_distance", #or google_duration
)

Optimal value:  305813.0000000001


Notice that the max distance has been updated:

In [None]:
print(params)

{'min_target': 3, 'max_target': 7, 'max_cost': 98709.09500306344, 'max_total_cost': 55160.15367592254}


In [None]:
google_results.head()

Unnamed: 0,target_id,enum_id,cost
0,T0001,E004,1.0
1,T0002,E003,1.0
2,T0003,E005,1.0
3,T0004,E001,1.0
4,T0005,E004,1.0


In [None]:
plot_assignments(enum_locations, target_locations, google_results)