# Demo - MATSim Population for West London

This notebook demonstrates an complex example workflow for creating a sample population for an area in West London. It creates agent plans for people and households using a random process.

## Aim
Create a bigger and more realistic sample population automatically for the West London area called Londinium. 
The sample population includes various activities, personal attributes and modes; the population would be used as input for MATSim transport simulation.

Steps:

1. [Import geographic data of Londinium;](#import-geographic-data-of-londinium)
2. [Facility sampling from OpenStreetMap data;](#facility-sampler)
2. [Activity generation model with home based tours. Expand agents with different personal attributes, activities and trips;](#activity-generation-model)
3. [Perform Data Visualization and validation. Plot the activity plan, distance and duration of population;](#data-visulazation-and-validation)
4. [Export intermediate CSV tables of the population](#readwrite-data)

In [1]:
import os

import geopandas as gp
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pam.activity import Activity, Leg
from pam.core import Household, Person, Population
from pam.plot.stats import plot_activity_times
from pam.read import load_travel_diary
from pam.report.benchmarks import distance_counts, duration_counts
from pam.samplers import facility
from pam.utils import minutes_to_datetime as mtdt
from pam.variables import END_OF_DAY
from pam.write import to_csv, write_matsim, write_od_matrices

%matplotlib inline

ModuleNotFoundError: No module named 'pam'

## Import geographic data of Londinium

In [2]:
# Import geographic data of west london area
network_bb_path = os.path.join("data", "network_bounding_box.geojson")
lsoas_path = os.path.join("data", "lsoas")  # lsoas: lower layer super output areas

We will start by plotting Londinium boundary

In [3]:
# Read the file and plot the boundary
boundary = gp.read_file(network_bb_path)

# Transform to epsg:27700
boundary = boundary.to_crs("epsg:27700")
boundary.plot()

Next we will plot Londinium outline shown above over a map of London to see where exactly it is located.

In [4]:
# Plot boundary area in lsoas
lsoas = gp.read_file(lsoas_path)
lsoas.crs = "EPSG:27700"
print(lsoas.crs)
lsoas = lsoas.set_index("LSOA_CODE")

fig, ax = plt.subplots(figsize=(10, 10))
lsoas.plot(ax=ax)
boundary.plot(ax=ax, color="red")

Finally, we will plot Londinium with LSOA boundaries included.

In [5]:
# Overlay the area using geopandas package
lsoas_clipped = gp.overlay(lsoas, boundary, how="intersection")
lsoas_clipped.plot()

In [6]:
lsoas_clipped.head()

## Facility sampler

In [7]:
facilities_path = "data/londinium_facilities_sample.geojson"
facilities = gp.read_file(facilities_path)
facilities = facilities.rename({"activities": "activity"}, axis=1)
facilities.crs = "EPSG:27700"
facilities.head()

Start by plotting different facility types, e.g. educational and medical facilities

In [8]:
education = facilities[facilities["activity"] == "education"]
medical = facilities[facilities["activity"] == "medical"]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 12))

boundary.plot(ax=ax1, color="steelblue")
education.plot(ax=ax1, color="orange", label="Educational facilities")
ax1.legend()

boundary.plot(ax=ax2, color="steelblue")
medical.plot(ax=ax2, color="red", label="Medical facilities")
ax2.legend()

In [9]:
lsoas_clipped.crs = "EPSG:27700"
len(lsoas_clipped)

In [10]:
lsoas_clipped = lsoas_clipped.set_index("LSOA_NAME")

In [11]:
# build the sampler
facility_sampler = facility.FacilitySampler(
    facilities=facilities, zones=lsoas_clipped, build_xml=True, fail=False, random_default=True
)

## Activity generation model

In [12]:
# Create random area sample


def random_area_sampler():
    indexes = list(lsoas_clipped.index)
    return np.random.choice(indexes)


random_area_sampler()  # test

- It is a simple home based tours within 24 hours. 
- We create different activity types: work, leisure, education, shopping, etc. Different transport model types: car, bus, subway, etc. 
- Random number is assigned to the duration for each activity and transport mode

In [13]:
# mapping the MSOA and LAD with index
mapping_dict = dict(zip(lsoas_clipped.index, lsoas_clipped.MSOA_CODE))
mapping_dict1 = dict(zip(lsoas_clipped.index, lsoas_clipped.LA_NAME))

In [14]:
# Generate agents in west london area


def generate_agents(no_of_agents):
    """
    Randomly create agents with simple home-based tours.
    The trip starts from home, has a random number of various acitivites, tranport modes would be added.
    The trip finally ends at home.

    """
    population = Population()  # Initialise an empty population

    # Create simple personal attributes
    income = ["low", "medium", "high"]
    gender = ["male", "female"]
    sort_age = [
        "0 to 4",
        "5 to 10",
        "11 to 15",
        "16 to 20",
        "21 to 25",
        "26 to 29",
        "30 to 39",
        "40 to 49",
        "50 to 59",
        "60 to 64",
        "65 to 69",
        "70 to 74",
        "75 to 79",
        "80 to 84",
        "85  and over",
    ]

    # Create mode and activities
    transport = ["car", "bus", "ferry", "rail", "subway", "bike", "walk"]
    # Removed gym and park due to osmox problem
    activity = [
        "leisure",
        "work",
        "shop",
        "medical",
        "education",
        "park",
        "pub",
        "gym",
    ]  # Primary activity
    sub_activity = [
        "shop",
        "medical",
        "pub",
        "gym",
    ]  # People usually spend less time on sub activity

    # Add activity plan for each person
    for i in range(no_of_agents):
        # Create different agents and household
        agent_id = f"agent_{i}"
        hh_id = f"hh_{i}"
        hh = Household(hh_id, freq=1)

        # Adding Activities and Legs alternately to different agents
        # Activity 1 - home
        leaves_home = (np.random.randint(6, 8) * 60) + np.random.randint(0, 100)  # minutes
        location1 = random_area_sampler()
        location1_loc = facility_sampler.sample(location1, "home")
        lsoa_name = mapping_dict.get(location1)
        lad_name = mapping_dict1.get(location1)

        agent = Person(
            agent_id,
            freq=1,
            attributes={
                "subpopulation": np.random.choice(income) + " income",
                "gender": np.random.choice(gender),
                "age": np.random.choice(sort_age),
                "household_zone": location1,
                "household_LSOA": lsoa_name,
                "household_LAD": lad_name,
            },
        )

        hh.add(agent)
        population.add(hh)

        # Trip duration
        trip_duration_main_activity = np.random.randint(3, 6) * 60
        trip_duration_sub_activity = np.random.randint(1, 3) * 60

        agent.add(
            Activity(
                seq=1,
                act="home",
                area=location1,
                loc=location1_loc,
                start_time=mtdt(0),
                end_time=mtdt(leaves_home),
            )
        )

        # Initiated parameters
        location_prev = location1
        location_prev_loc = location1_loc
        leave_time = leaves_home

        # Add random numbers of activities
        no_of_activities = np.random.randint(1, 5)

        for i in range(no_of_activities):
            arrives_primary = leave_time + np.random.randint(10, 90)  # minutes

            # Activity 2.
            if i < 2:  # Start with main activity
                random_act = np.random.choice(activity)
            else:
                random_act = np.random.choice(sub_activity)

            if random_act == ("work"):
                leaves_primary = arrives_primary + trip_duration_main_activity
            else:
                leaves_primary = arrives_primary + trip_duration_sub_activity

            # Outbound leg
            location_next = random_area_sampler()
            location_next_loc = facility_sampler.sample(location_next, random_act)

            agent.add(
                Leg(
                    seq=i + 1,
                    mode=np.random.choice(transport),
                    start_area=location_prev,
                    start_loc=location_prev_loc,
                    end_area=location_next,
                    end_loc=location_next_loc,
                    start_time=mtdt(leave_time),
                    end_time=mtdt(arrives_primary),
                )
            )

            agent.add(
                Activity(
                    seq=i + 2,
                    act=random_act,
                    area=location_next,
                    loc=location_next_loc,
                    start_time=mtdt(arrives_primary),
                    end_time=mtdt(leaves_primary),
                )
            )

            # Update parameters
            leave_time = leaves_primary
            location_prev = location_next
            location_prev_loc = location_next_loc

        # Inbound leg
        arrives_home = leave_time + np.random.randint(10, 90)  # minutes
        agent.add(
            Leg(
                seq=no_of_activities + 1,
                mode=np.random.choice(transport),
                start_area=location_next,
                start_loc=location_next_loc,
                end_area=location1,
                end_loc=location1_loc,
                start_time=mtdt(leave_time),
                end_time=mtdt(arrives_home),
            )
        )

        # Activity
        agent.add(
            Activity(
                seq=no_of_activities + 2,
                act="home",
                area=location1,
                loc=location1_loc,
                start_time=mtdt(arrives_home),
                end_time=END_OF_DAY,
            )
        )

    return population

In [15]:
# Create 100 agents and check the population statistics
population = generate_agents(20)
print(population.stats)

In [16]:
population.random_person().print()

In [17]:
population.random_person().attributes

## Data Visulazation and validation

In [18]:
# Validation if it works
population.validate()

In [19]:
# Print random person activity plan
population.random_person().print()

Plot the activities as a 24-hour diary schedules for 5 randomly chosen agents

In [20]:
for _i in range(5):
    p = population.random_person()
    p.plot()

Plot the frequency with which each of the activity types happens throughout the 24-hour period.

In [21]:
fig = plot_activity_times(population)

In [22]:
# Check the duration of trips
durations = duration_counts(population)
durations

Now plot a histogram for duration of the trips.

In [23]:
plt.barh(durations["duration"], durations["trips"])
plt.xlabel("Counts")
plt.ylabel("Duration for trips")
plt.title("Duration for different trips")
plt.ylim(ymax="90 to 120 min")

In [24]:
# Check the distance of trips
distances = distance_counts(population)
distances

Next we plot the distribution of trip distances.

In [25]:
plt.barh(distances["distance"], distances["trips"])
plt.xlabel("Counts")
plt.ylabel("distance, km")
plt.title("distance for different trips")
plt.ylim(ymax="25 to 50 km")

## Read/write data

### Export intermediate CSV tables of population

In [26]:
to_csv(population, dir="outputs", crs="epsg:27700")

Plot the distribution of activities by type

In [27]:
df_activity = pd.read_csv(os.path.join("outputs", "activities.csv"))
totals = df_activity.activity.value_counts()
plt.barh(totals.index, totals)
plt.title("activities count")

In [28]:
write_od_matrices(population, path="outputs")
od_matrices = pd.read_csv(
    os.path.join("outputs", "total_od.csv")
)  # we should change this method to be consistent with other - ie return a dataframe
od_matrices["total origins"] = od_matrices.drop("Origin", axis=1).sum(axis=1)
od_matrices

Plot the number of trips originating from each LSOA

In [29]:
lsoas_clipped = lsoas_clipped.reset_index()
origins_heat_map = lsoas_clipped.join(od_matrices["total origins"])

fig, ax = plt.subplots(figsize=(18, 10))
origins_heat_map.plot("total origins", legend=True, ax=ax)
ax.set_title("Total Origins")

### Reload Tabular Data

We load in the csv files we previously wrote to disk. This replicates a simple synthesis process we might typically use for travel diary survey data.

In [30]:
people = pd.read_csv(os.path.join("outputs", "people.csv")).set_index("pid")
hhs = pd.read_csv(os.path.join("outputs", "households.csv")).set_index("hid")
trips = pd.read_csv(os.path.join("outputs", "legs.csv")).drop(["Unnamed: 0"], axis=1)

trips = trips.rename(columns={"origin activity": "oact", "destination activity": "dact"})
trips.head()

In [31]:
population_reloaded = load_travel_diary(trips=trips, persons_attributes=people, hhs_attributes=hhs)

Plot the activities as a 24-hour diary schedules

In [32]:
population["hh_0"]["agent_0"].plot()

In [33]:
population_reloaded["hh_0"]["agent_0"].plot()

In [34]:
population == population_reloaded

The populations are not the same because the csv files did not preserve the coordinates that we previously sampled, so we will sample them again. But the reloaded population will be different as for each location a new coordinate is sampled.

### Write output to MATSim xml

In [35]:
write_matsim(population=population, plans_path=os.path.join("outputs", "population.xml"))