# In Focus

In this chapter, selected aspects of the
[Data Generation Process](../background.rst#data-generation-process)
are explained on a more detailled level and supported by visuals.
In this scope, some internal functions and methods are imported that are not part of the official interface.

In [None]:
import datetime
import pandas as pd
import matplotlib.pyplot as plt
import conflowgen

Load some internal classes and functions that are not part of the regular API.

In [None]:
from conflowgen.domain_models.container import Container
from conflowgen.tools.theoretical_distribution import multiply_discretized_probability_densities

Initialize ConFlowGen.

In [None]:
logger = conflowgen.setup_logger(
    logging_directory="./data/logger",  # use subdirectory relative to Jupyter Notebook
    format_string="%(message)s"  # only show log messages, discard timestamp etc.
)
database_chooser = conflowgen.DatabaseChooser()
database_chooser.create_new_sqlite_database(":memory:")

## Considering both truck arrival and container dwell time distribution

It is a challenge to synthetically generate container flows that take both distributions into account.
If, for instance, a container is delivered to the container terminal by a vessel and a truck is to be generated to pick up the container, two naive approaches exist.
First, a truck arrival time might be drawn from the truck arrival distribution.
This, e.g., ensures that no truck arrivals happen on a Sunday.
However, only considering the truck arrival distribution means that the container dwell time distribution is ignored.
Second, the container dwell time distribution might be picked to draw the arrival of the truck.
This ensures that the container dwell times are realistic.
At the same time, the truck arrival patterns are ignored.
This shows that both distributions must be somehow integrated at the same time.

Prepare the container that arrives at the terminal with a deep sea vessel and departs with a truck

In [None]:
container = Container.create(
    weight=20,
    delivered_by=conflowgen.ModeOfTransport.deep_sea_vessel,
    picked_up_by=conflowgen.ModeOfTransport.truck,
    picked_up_by_initial=conflowgen.ModeOfTransport.truck,
    length=conflowgen.ContainerLength.twenty_feet,
    storage_requirement=conflowgen.StorageRequirement.standard
)
container_arrival_time = datetime.datetime.now().replace(second=0, microsecond=0)

print(f"The container arrives at the terminal at {container_arrival_time.isoformat()}")

In [None]:
earliest_truck_time_slot = container_arrival_time.replace(minute=0) + datetime.timedelta(hours=1)
print(f"The earliest available truck time slot is {earliest_truck_time_slot.isoformat()}")

Load the two distributions that fit the container charateristics.

In [None]:
manager = conflowgen.flow_generator.truck_for_import_containers_manager.TruckForImportContainersManager()
manager.reload_distributions()
container_dwell_time_distribution, truck_arrival_distribution = manager._get_distributions(container)

print(container_dwell_time_distribution)
print(truck_arrival_distribution)

Now the truck arrival distribution is converted to a distribution that reflects the probability that the container is picked up at a given time.
While the truck arrival distribution only covers a work week, the derived distribution must cover the whole time range from the time the container has arrived at the terminal until the point that is determined as the maximum dwell time.
This time range is often longer than a week.

In [None]:
truck_arrival_distribution_slice = truck_arrival_distribution.get_distribution_slice(earliest_truck_time_slot)

truck_arrival_distribution_slice_as_dates = {
    earliest_truck_time_slot + datetime.timedelta(hours=hours_from_now): fraction * 100
    for hours_from_now, fraction in truck_arrival_distribution_slice.items()
}

df_truck_arrival_distribution = \
    pd.Series(truck_arrival_distribution_slice_as_dates).to_frame("Truck Arrival Distribution")

df_truck_arrival_distribution.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()

After having loaded the truck arrival distribution, now it is time to turn to the container dwell time distribution.
It assigns a probability of the container being picked up to any suggested time slot.time_windows_for_truck_arrival

In [None]:
time_windows_for_truck_arrival = list(truck_arrival_distribution_slice.keys())
container_dwell_time_probabilities = container_dwell_time_distribution.get_probabilities(
    time_windows_for_truck_arrival
)

container_dwell_time_probabilities_as_dates = {
    earliest_truck_time_slot + datetime.timedelta(hours=hours_from_now): fraction * 100
    for hours_from_now, fraction in enumerate(container_dwell_time_probabilities)
}

df_container_dwell_time_distribution = \
    pd.Series(container_dwell_time_probabilities_as_dates).to_frame("Container Dwell Time Distribution")

df_container_dwell_time_distribution.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()

In the last step, the two distributions are merged by multiplication.

In [None]:
merged_distribution = multiply_discretized_probability_densities(
    list(truck_arrival_distribution_slice.values()),
    container_dwell_time_probabilities
)

merged_distribution_as_dates = {
    earliest_truck_time_slot + datetime.timedelta(hours=hours_from_now): fraction * 100
    for hours_from_now, fraction in enumerate(merged_distribution)
}

df_merged_distributions = \
    pd.Series(merged_distribution_as_dates).to_frame("Multiplication of Both Distributions")

df_merged_distributions.plot(legend=False)
plt.ylabel("Probability (as percentage overall)")
plt.show()

Let's re-check how the multiplication of the two distributions affected the merged distribution.

In [None]:
df_merged = pd.concat([
    df_truck_arrival_distribution,
    df_container_dwell_time_distribution,
    df_merged_distributions
], axis=1)

ax = df_merged[["Container Dwell Time Distribution", "Truck Arrival Distribution"]].plot(
    color={
        "Truck Arrival Distribution": "navy",
        "Container Dwell Time Distribution": "dimgray",
    },
    alpha=0.5,
    style="--"
)
df_merged[["Multiplication of Both Distributions"]].plot(ax=ax, alpha=1, color="k")
plt.show()

The multiplication of the two distributions clearly leads to a new distribution that will help to approximate both the container dwell time distribution and the truck arrival distribution likewise.

## Further topics

If you have a topic in mind that should be presented step-by-step like the previous one, please reach out to https://github.com/1kastner/conflowgen/issues or write a mail directly to marvin.kastner@tuhh.de.