-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/emme 24 compatibility #567
base: olusanya
Are you sure you want to change the base?
Conversation
Hi Sami, unable to check these in detail now, but could you add the new pandas version to the requirements? Could there be some differences because of that? |
Pandas is not listed in the requirements. Only requirements is openpyxl that had to be updated from 2.6.4 to 3.1.4 when running with python 3.11. When using old Emme with Python 3.7 the old version will be used. Pandas comes directly from Emme environment and the model system has to use the version provided. This is 0.24.2 when using Emme 23 and 2.0.2 when using Emme 24. I don't think Pandas version would (or should) explain the differences because my two test cases "emme 23, olusanya" and "emme 23, fix/emme-24-compatibility" used the same Pandas version but still produced different results. |
Scripts/demand/trips.py
Outdated
self.zone_population = pandas.Series(0, zone_numbers) | ||
self.zone_population = pandas.Series(0.0, zone_numbers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The zone population is supposed to be an integer value in the agent model. Does that cause problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, this one should be integer. I think floats would work as well, but int is better if the population is an integer value.
data = data.groupby(mapping).agg(avg, weights=data["total"]) | ||
data = data.groupby(mapping).agg(lambda ser: avg(ser, weights=data["total"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the problem here? I see no indications in the pandas documentation that feeding regular functions into agg
would have been deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not make the avg function work without this. It seems that avg function expects to get a Series as an argument, but will get a DataFrame object instead. For some reason adding lambda here fixes that. There might be a more elegant way to fix this.
self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1 | ||
self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why integers cannot be handled as integers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.matrix can not be integers here because we will add floating point values to it in other methods. Adding 1 instead of 1.0 here would work as well (implicit conversion from int to float), but I think it's more clear to be consistent with the datatypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent model and the aggregate model behave in very different ways here, the agent model is using add
and aggregate model aggregate
. One idea would be to separate them better, keeping integers in the agent version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good reason to duplicate code because of it? As far as I can see, adding 1 to float64 or int64 values works identically up to 253 (or 224 for float32). Those values are probably more than we need in this case. If the number range could be an issue in some cases we should use float64.
True, pandas and most other packages is not in requirements.txt. But pipfile needs to be updated, so that unit tests behave in the same way as production version of the model system. |
|
I cannot say what is the right solution for Helmet version "bleeding edge", but I think we could schedule the migration of LEM to the upcoming weeks. |
Helmet 5 at least should be switched to Emme24. Can the compatibility with Emme23 be retained though? Or is there something that would break the compatibility? |
self.tours = pandas.Series(0, self.purpose.zone_numbers) | ||
self.tours = pandas.Series(0.0, self.purpose.zone_numbers) | ||
|
||
def add_tours(self): | ||
"""Generate and add (peripheral) tours to zone vector.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may have small impacts on the demand, because I think in the base case the self.tours
vector will change to float32
(because that is what is added from self.zone_data
) in add_tours
. Now when self.tours
is implicitly initialized as float64
, this will probably be broadcasted to large parts of the demand model. I suggest adding a dtype=numpy.float32
to see if that changes results. This could also be tested in ExternalModel
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that Pandas does not change the Series datatype on addition. In the earlier version the series was created with int64 dtype and each addition of real values would only add the integer part of the number (floor()
). Comparing to that the difference between float32 and float64 should be minimal. We can add dtype=numpy.float32
here if want to reduce memory consumption, especially if there are bigger matrices calculated based on this Series and same dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this and it turned out that we were both wrong. add_tours
has always changed self.tours
into float64.
I made a new commit to update the development environment and documentation to refer OpenPaths EMME 24 and python 3.11. I also updated the github actions to run the tests with the new dependencies and python version. The model-system will still works with older EMME and Python 3.7, but there are no automated tests to make sure it will do so in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good now, I think! However, I have no idea why the results are fluctuating.
OpenPaths Emme 24 has been released with Python 3.11 and updated Numpy and Pandas versions. The current Helmet model system is not compatible with the new versions.
This PR will make the model system compatible with Emme 24 while maintaining compatibility with Emme 23 (tested) and probably other versions using Python 3.7 (not tested).
While testing I noticed some differences between different Emme and Helmet branches. I run the same network and land use scenarios for 3 different test cases:
I run CBA calculation to compare the calculation results and found 3-6 M€/a differences between all scenarios caused by travel time saving. The code changes and the Emme version should not change the calculation results and I could not find any logical differences between the 3 scenarios. It seams the differences are caused by fluctuations in the demand model.
I marked the PR as a draft until the reason for differences can be identified.