-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/emme 24 compatibility #567
base: olusanya
Are you sure you want to change the base?
Changes from 4 commits
3e346c5
d238074
1112888
6d71dd1
6d25939
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
openpyxl==2.6.4 | ||
openpyxl==2.6.4;python_version<"3.8" | ||
openpyxl==3.1.4;python_version>="3.8" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,7 @@ | ||
from decimal import DivisionByZero | ||
from itertools import groupby | ||
import os | ||
from typing import Optional | ||
import pandas | ||
import numpy # type: ignore | ||
import numpy | ||
|
||
import utils.log as log | ||
|
||
|
@@ -49,9 +47,11 @@ def read_csv_file(data_dir: str, | |
raise NameError(msg) | ||
header: Optional[str] = None if squeeze else "infer" | ||
data: pandas.DataFrame = pandas.read_csv( | ||
path, delim_whitespace=True, squeeze=squeeze, keep_default_na=False, | ||
path, sep='\s+', keep_default_na=False, | ||
na_values="", comment='#', header=header) | ||
if data.index.is_numeric() and data.index.hasnans: # type: ignore | ||
if squeeze: | ||
data = data.squeeze() | ||
if pandas.api.types.is_numeric_dtype(data.index) and data.index.hasnans: | ||
msg = "Row with only spaces or tabs in file {}".format(path) | ||
log.error(msg) | ||
raise IndexError(msg) | ||
|
@@ -68,17 +68,17 @@ def read_csv_file(data_dir: str, | |
if data.index.has_duplicates: | ||
raise IndexError("Index in file {} has duplicates".format(path)) | ||
if zone_numbers is not None: | ||
if not data.index.is_monotonic: | ||
if not data.index.is_monotonic_increasing: | ||
data.sort_index(inplace=True) | ||
log.warn("File {} is not sorted in ascending order".format(path)) | ||
map_path = os.path.join(data_dir, "zone_mapping.txt") | ||
if os.path.exists(map_path): | ||
log_path = map_path | ||
mapping = pandas.read_csv(map_path, delim_whitespace=True).squeeze() | ||
mapping = pandas.read_csv(map_path, sep='\s+').squeeze() | ||
if "total" in data.columns: | ||
# If file contains total and shares of total, | ||
# shares are aggregated as averages with total as weight | ||
data = data.groupby(mapping).agg(avg, weights=data["total"]) | ||
data = data.groupby(mapping).agg(lambda ser: avg(ser, weights=data["total"])) | ||
elif "detach" in data.columns: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What was the problem here? I see no indications in the pandas documentation that feeding regular functions into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could not make the avg function work without this. It seems that avg function expects to get a Series as an argument, but will get a DataFrame object instead. For some reason adding lambda here fixes that. There might be a more elegant way to fix this. |
||
funcs = dict.fromkeys(data.columns, "sum") | ||
funcs["detach"] = "mean" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -171,7 +171,7 @@ def __init__(self, zone_numbers): | |
self.init_matrix() | ||
|
||
def init_matrix(self): | ||
self.matrix = pandas.DataFrame(0, self.keys, self.keys) | ||
self.matrix = pandas.DataFrame(0.0, self.keys, self.keys) | ||
|
||
def add(self, orig, dest): | ||
"""Add individual tour to aggregated matrix. | ||
|
@@ -183,7 +183,7 @@ def add(self, orig, dest): | |
dest : int | ||
Tour destination zone number | ||
""" | ||
self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1 | ||
self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not understand why integers cannot be handled as integers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. self.matrix can not be integers here because we will add floating point values to it in other methods. Adding 1 instead of 1.0 here would work as well (implicit conversion from int to float), but I think it's more clear to be consistent with the datatypes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The agent model and the aggregate model behave in very different ways here, the agent model is using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a good reason to duplicate code because of it? As far as I can see, adding 1 to float64 or int64 values works identically up to 253 (or 224 for float32). Those values are probably more than we need in this case. If the number range could be an issue in some cases we should use float64. |
||
|
||
def aggregate(self, matrix): | ||
"""Aggregate (tour demand) matrix to larger areas. | ||
|
@@ -194,7 +194,7 @@ def aggregate(self, matrix): | |
Disaggregated matrix with zone indices and columns | ||
""" | ||
self.init_matrix() | ||
tmp_mtx = pandas.DataFrame(0, self.keys, matrix.columns) | ||
tmp_mtx = pandas.DataFrame(0.0, self.keys, matrix.columns) | ||
for area in self: | ||
i = self._get_slice(area, matrix.index) | ||
tmp_mtx.loc[area] = matrix.loc[i].sum(0).values | ||
|
@@ -210,7 +210,7 @@ def __init__(self, zone_numbers): | |
self.init_array() | ||
|
||
def init_array(self): | ||
self.array = pandas.Series(0, self.keys) | ||
self.array = pandas.Series(0.0, self.keys) | ||
|
||
def add(self, zone): | ||
"""Add individual tour to aggregated array. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have small impacts on the demand, because I think in the base case the
self.tours
vector will change tofloat32
(because that is what is added fromself.zone_data
) inadd_tours
. Now whenself.tours
is implicitly initialized asfloat64
, this will probably be broadcasted to large parts of the demand model. I suggest adding adtype=numpy.float32
to see if that changes results. This could also be tested inExternalModel
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that Pandas does not change the Series datatype on addition. In the earlier version the series was created with int64 dtype and each addition of real values would only add the integer part of the number (
floor()
). Comparing to that the difference between float32 and float64 should be minimal. We can adddtype=numpy.float32
here if want to reduce memory consumption, especially if there are bigger matrices calculated based on this Series and same dtype.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this and it turned out that we were both wrong.
add_tours
has always changedself.tours
into float64.