Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/emme 24 compatibility #567

Draft
wants to merge 5 commits into
base: olusanya
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Scripts/assignment/mock_assignment.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def aggregate_results(self, resultdata):
pass

def calc_noise(self):
return pandas.Series(0, zone_param.area_aggregation)
return pandas.Series(0.0, zone_param.area_aggregation)

def prepare_network(self, car_dist_unit_cost: Optional[float]=None):
pass
Expand Down
2 changes: 1 addition & 1 deletion Scripts/datahandling/matrixdata.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def __init__(self, omx_file: omx.File, zone_numbers: numpy.ndarray):
path)
log.error(msg)
raise IndexError(msg)
if mtx_numbers != zone_numbers:
if not numpy.array_equal(mtx_numbers, zone_numbers):
for i in mtx_numbers:
if i not in zone_numbers:
msg = "Zone number {} from file {} not found in network".format(
Expand Down
2 changes: 1 addition & 1 deletion Scripts/datatypes/histogram.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ class TourLengthHistogram:
def __init__(self):
index = ["{}-{}".format(intervals[i], intervals[i + 1])
for i in range(len(intervals) - 1)]
self.histogram = pandas.Series(0, index)
self.histogram = pandas.Series(0.0, index)

def add(self, dist):
self.histogram.iat[numpy.searchsorted(self._u, dist, "right")] += 1
Expand Down
2 changes: 1 addition & 1 deletion Scripts/demand/external.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def calc_external(self, mode: str, internal_trips: pandas.Series) -> Demand:
Matrix of whole day trips from external to internal zones
"""
base_mtx = self.base_demand.get_external(mode)
mtx = pandas.DataFrame(0, self.all_zone_numbers, self.growth.index)
mtx = pandas.DataFrame(0.0, self.all_zone_numbers, self.growth.index)
municipalities = ZoneIntervals("municipalities")
# Base matrix is aggregated to municipality level,
# so we need to disaggregate it
Expand Down
2 changes: 1 addition & 1 deletion Scripts/helmet.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ def main(args):
log.error(
"Fatal error occured, simulation aborted.", extra=log_extra)
break
gap = model.convergence.iloc[-1, :] # Last iteration convergence
gap = model.convergence[-1] # Last iteration convergence
convergence_criteria_fulfilled = gap["max_gap"] < args.max_gap or gap["rel_gap"] < args.rel_gap
if i == iterations:
log_extra["status"]['state'] = 'finished'
Expand Down
2 changes: 1 addition & 1 deletion Scripts/models/generation.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def __init__(self, purpose, resultdata):

def init_tours(self):
"""Initialize `tours` vector to 0."""
self.tours = pandas.Series(0, self.purpose.zone_numbers)
self.tours = pandas.Series(0.0, self.purpose.zone_numbers)

def add_tours(self):
"""Generate and add (peripheral) tours to zone vector."""
Comment on lines -29 to 32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have small impacts on the demand, because I think in the base case the self.tours vector will change to float32 (because that is what is added from self.zone_data) in add_tours. Now when self.tours is implicitly initialized as float64, this will probably be broadcasted to large parts of the demand model. I suggest adding a dtype=numpy.float32 to see if that changes results. This could also be tested in ExternalModel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Pandas does not change the Series datatype on addition. In the earlier version the series was created with int64 dtype and each addition of real values would only add the integer part of the number (floor()). Comparing to that the difference between float32 and float64 should be minimal. We can add dtype=numpy.float32 here if want to reduce memory consumption, especially if there are bigger matrices calculated based on this Series and same dtype.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this and it turned out that we were both wrong. add_tours has always changed self.tours into float64.

Expand Down
8 changes: 4 additions & 4 deletions Scripts/modelsystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def __init__(self,
self.cdm = CarDensityModel(
self.zdata_base, self.zdata_forecast, bounds, self.resultdata)
self.mode_share: List[Dict[str,Any]] = []
self.convergence = pandas.DataFrame()
self.convergence = []
self.trucks = self.fm.calc_freight_traffic("truck")
self.trailer_trucks = self.fm.calc_freight_traffic("trailer_truck")

Expand Down Expand Up @@ -340,8 +340,8 @@ def run_iteration(self, previous_iter_impedance, iteration=None):
gap = self.dtm.init_demand()
log.info("Demand model convergence in iteration {} is {:1.5f}".format(
iteration, gap["rel_gap"]))
self.convergence = self.convergence.append(gap, ignore_index=True)
self.resultdata._df_buffer["demand_convergence.txt"] = self.convergence
self.convergence.append(gap)
self.resultdata._df_buffer["demand_convergence.txt"] = pandas.DataFrame(self.convergence)
self.resultdata.flush()
return impedance

Expand Down Expand Up @@ -411,7 +411,7 @@ def _calculate_accessibility_and_savu_zones(self):
"result_summary")

def _sum_trips_per_zone(self, mode, include_dests=True):
int_demand = pandas.Series(0, self.zdata_base.zone_numbers)
int_demand = pandas.Series(0.0, self.zdata_base.zone_numbers)
for purpose in self.dm.tour_purposes:
if mode in purpose.modes and purpose.dest != "source":
bounds = (next(iter(purpose.sources)).bounds
Expand Down
3 changes: 2 additions & 1 deletion Scripts/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
openpyxl==2.6.4
openpyxl==2.6.4;python_version<"3.8"
openpyxl==3.1.4;python_version>="3.8"
16 changes: 8 additions & 8 deletions Scripts/utils/read_csv_file.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
from decimal import DivisionByZero
from itertools import groupby
import os
from typing import Optional
import pandas
import numpy # type: ignore
import numpy

import utils.log as log

Expand Down Expand Up @@ -49,9 +47,11 @@ def read_csv_file(data_dir: str,
raise NameError(msg)
header: Optional[str] = None if squeeze else "infer"
data: pandas.DataFrame = pandas.read_csv(
path, delim_whitespace=True, squeeze=squeeze, keep_default_na=False,
path, sep='\s+', keep_default_na=False,
na_values="", comment='#', header=header)
if data.index.is_numeric() and data.index.hasnans: # type: ignore
if squeeze:
data = data.squeeze()
if pandas.api.types.is_numeric_dtype(data.index) and data.index.hasnans:
msg = "Row with only spaces or tabs in file {}".format(path)
log.error(msg)
raise IndexError(msg)
Expand All @@ -68,17 +68,17 @@ def read_csv_file(data_dir: str,
if data.index.has_duplicates:
raise IndexError("Index in file {} has duplicates".format(path))
if zone_numbers is not None:
if not data.index.is_monotonic:
if not data.index.is_monotonic_increasing:
data.sort_index(inplace=True)
log.warn("File {} is not sorted in ascending order".format(path))
map_path = os.path.join(data_dir, "zone_mapping.txt")
if os.path.exists(map_path):
log_path = map_path
mapping = pandas.read_csv(map_path, delim_whitespace=True).squeeze()
mapping = pandas.read_csv(map_path, sep='\s+').squeeze()
if "total" in data.columns:
# If file contains total and shares of total,
# shares are aggregated as averages with total as weight
data = data.groupby(mapping).agg(avg, weights=data["total"])
data = data.groupby(mapping).agg(lambda ser: avg(ser, weights=data["total"]))
elif "detach" in data.columns:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the problem here? I see no indications in the pandas documentation that feeding regular functions into agg would have been deprecated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not make the avg function work without this. It seems that avg function expects to get a Series as an argument, but will get a DataFrame object instead. For some reason adding lambda here fixes that. There might be a more elegant way to fix this.

funcs = dict.fromkeys(data.columns, "sum")
funcs["detach"] = "mean"
Expand Down
8 changes: 4 additions & 4 deletions Scripts/utils/zone_interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ def __init__(self, zone_numbers):
self.init_matrix()

def init_matrix(self):
self.matrix = pandas.DataFrame(0, self.keys, self.keys)
self.matrix = pandas.DataFrame(0.0, self.keys, self.keys)

def add(self, orig, dest):
"""Add individual tour to aggregated matrix.
Expand All @@ -183,7 +183,7 @@ def add(self, orig, dest):
dest : int
Tour destination zone number
"""
self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1
self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why integers cannot be handled as integers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.matrix can not be integers here because we will add floating point values to it in other methods. Adding 1 instead of 1.0 here would work as well (implicit conversion from int to float), but I think it's more clear to be consistent with the datatypes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent model and the aggregate model behave in very different ways here, the agent model is using add and aggregate model aggregate. One idea would be to separate them better, keeping integers in the agent version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to duplicate code because of it? As far as I can see, adding 1 to float64 or int64 values works identically up to 253 (or 224 for float32). Those values are probably more than we need in this case. If the number range could be an issue in some cases we should use float64.


def aggregate(self, matrix):
"""Aggregate (tour demand) matrix to larger areas.
Expand All @@ -194,7 +194,7 @@ def aggregate(self, matrix):
Disaggregated matrix with zone indices and columns
"""
self.init_matrix()
tmp_mtx = pandas.DataFrame(0, self.keys, matrix.columns)
tmp_mtx = pandas.DataFrame(0.0, self.keys, matrix.columns)
for area in self:
i = self._get_slice(area, matrix.index)
tmp_mtx.loc[area] = matrix.loc[i].sum(0).values
Expand All @@ -210,7 +210,7 @@ def __init__(self, zone_numbers):
self.init_array()

def init_array(self):
self.array = pandas.Series(0, self.keys)
self.array = pandas.Series(0.0, self.keys)

def add(self, zone):
"""Add individual tour to aggregated array.
Expand Down
Loading