Fix/emme 24 compatibility #567

samakinen · 2024-06-17T08:16:51Z

OpenPaths Emme 24 has been released with Python 3.11 and updated Numpy and Pandas versions. The current Helmet model system is not compatible with the new versions.
This PR will make the model system compatible with Emme 24 while maintaining compatibility with Emme 23 (tested) and probably other versions using Python 3.7 (not tested).
While testing I noticed some differences between different Emme and Helmet branches. I run the same network and land use scenarios for 3 different test cases:

Emme 23, olusanya
Emme 23, fix/emme 24 compatibility
Emme 24, fix/emme 24 compatibility
I run CBA calculation to compare the calculation results and found 3-6 M€/a differences between all scenarios caused by travel time saving. The code changes and the Emme version should not change the calculation results and I could not find any logical differences between the 3 scenarios. It seams the differences are caused by fluctuations in the demand model.
I marked the PR as a draft until the reason for differences can be identified.

…contain floats

hsl-petrhaj · 2024-06-17T09:39:45Z

Hi Sami, unable to check these in detail now, but could you add the new pandas version to the requirements? Could there be some differences because of that?

samakinen · 2024-06-17T12:06:44Z

Pandas is not listed in the requirements. Only requirements is openpyxl that had to be updated from 2.6.4 to 3.1.4 when running with python 3.11. When using old Emme with Python 3.7 the old version will be used. Pandas comes directly from Emme environment and the model system has to use the version provided. This is 0.24.2 when using Emme 23 and 2.0.2 when using Emme 24. I don't think Pandas version would (or should) explain the differences because my two test cases "emme 23, olusanya" and "emme 23, fix/emme-24-compatibility" used the same Pandas version but still produced different results.

zptro · 2024-06-17T14:12:15Z

Scripts/demand/trips.py

-        self.zone_population = pandas.Series(0, zone_numbers)
+        self.zone_population = pandas.Series(0.0, zone_numbers)


The zone population is supposed to be an integer value in the agent model. Does that cause problems?

Good catch, this one should be integer. I think floats would work as well, but int is better if the population is an integer value.

zptro · 2024-06-17T14:22:38Z

Scripts/utils/read_csv_file.py

-                data = data.groupby(mapping).agg(avg, weights=data["total"])
+                data = data.groupby(mapping).agg(lambda ser: avg(ser, weights=data["total"]))


What was the problem here? I see no indications in the pandas documentation that feeding regular functions into agg would have been deprecated.

I could not make the avg function work without this. It seems that avg function expects to get a Series as an argument, but will get a DataFrame object instead. For some reason adding lambda here fixes that. There might be a more elegant way to fix this.

zptro · 2024-06-17T14:23:18Z

Scripts/utils/zone_interval.py

-        self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1
+        self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1.0


I do not understand why integers cannot be handled as integers?

self.matrix can not be integers here because we will add floating point values to it in other methods. Adding 1 instead of 1.0 here would work as well (implicit conversion from int to float), but I think it's more clear to be consistent with the datatypes.

The agent model and the aggregate model behave in very different ways here, the agent model is using add and aggregate model aggregate. One idea would be to separate them better, keeping integers in the agent version.

Is there a good reason to duplicate code because of it? As far as I can see, adding 1 to float64 or int64 values works identically up to 2⁵³ (or 2²⁴ for float32). Those values are probably more than we need in this case. If the number range could be an issue in some cases we should use float64.

zptro · 2024-06-17T14:27:25Z

Pandas is not listed in the requirements. Only requirements is openpyxl that had to be updated from 2.6.4 to 3.1.4 when running with python 3.11. When using old Emme with Python 3.7 the old version will be used. Pandas comes directly from Emme environment and the model system has to use the version provided. This is 0.24.2 when using Emme 23 and 2.0.2 when using Emme 24. I don't think Pandas version would (or should) explain the differences because my two test cases "emme 23, olusanya" and "emme 23, fix/emme-24-compatibility" used the same Pandas version but still produced different results.

True, pandas and most other packages is not in requirements.txt. But pipfile needs to be updated, so that unit tests behave in the same way as production version of the model system.

samakinen · 2024-06-17T15:16:50Z

True, pandas and most other packages is not in requirements.txt. But pipfile needs to be updated, so that unit tests behave in the same way as production version of the model system.
Are we ready to switch the "production version" to be the one defined by Emme 24? I run the unit tests in separate virtualenvs for Emme 23 and Emme 24. I can update the pipfile to the new package versions if we are ready to switch.

zptro · 2024-06-18T09:26:49Z

True, pandas and most other packages is not in requirements.txt. But pipfile needs to be updated, so that unit tests behave in the same way as production version of the model system.
Are we ready to switch the "production version" to be the one defined by Emme 24? I run the unit tests in separate virtualenvs for Emme 23 and Emme 24. I can update the pipfile to the new package versions if we are ready to switch.

I cannot say what is the right solution for Helmet version "bleeding edge", but I think we could schedule the migration of LEM to the upcoming weeks.

hsl-petrhaj · 2024-06-18T09:32:46Z

Helmet 5 at least should be switched to Emme24. Can the compatibility with Emme23 be retained though? Or is there something that would break the compatibility?

zptro · 2024-06-25T07:17:43Z

Scripts/models/generation.py

-        self.tours = pandas.Series(0, self.purpose.zone_numbers)
+        self.tours = pandas.Series(0.0, self.purpose.zone_numbers)

    def add_tours(self):
        """Generate and add (peripheral) tours to zone vector."""


This may have small impacts on the demand, because I think in the base case the self.tours vector will change to float32 (because that is what is added from self.zone_data) in add_tours. Now when self.tours is implicitly initialized as float64, this will probably be broadcasted to large parts of the demand model. I suggest adding a dtype=numpy.float32 to see if that changes results. This could also be tested in ExternalModel.

My understanding is that Pandas does not change the Series datatype on addition. In the earlier version the series was created with int64 dtype and each addition of real values would only add the integer part of the number (floor()). Comparing to that the difference between float32 and float64 should be minimal. We can add dtype=numpy.float32 here if want to reduce memory consumption, especially if there are bigger matrices calculated based on this Series and same dtype.

We discussed this and it turned out that we were both wrong. add_tours has always changed self.tours into float64.

samakinen · 2024-06-26T09:38:30Z

I made a new commit to update the development environment and documentation to refer OpenPaths EMME 24 and python 3.11. I also updated the github actions to run the tests with the new dependencies and python version. The model-system will still works with older EMME and Python 3.7, but there are no automated tests to make sure it will do so in the future.

zptro

This looks good now, I think! However, I have no idea why the results are fluctuating.

samakinen added 3 commits June 17, 2024 10:35

Fix use of deprecated Pandas and Numpy features

3e346c5

Fix initialization of integer Series and Dataframes when they should …

d238074

…contain floats

Upgrade openpyxl to version compatible with python 3.11

1112888

samakinen requested review from zptro, s-hiitola and hsl-petrhaj June 17, 2024 08:16

zptro reviewed Jun 17, 2024

View reviewed changes

Keep zone_population as integer Series

6d71dd1

zptro reviewed Jun 25, 2024

View reviewed changes

Update development environment to OpenPaths EMME 24

6d25939

zptro approved these changes Jun 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/emme 24 compatibility #567

Fix/emme 24 compatibility #567

samakinen commented Jun 17, 2024

hsl-petrhaj commented Jun 17, 2024

samakinen commented Jun 17, 2024

zptro Jun 17, 2024

samakinen Jun 17, 2024

zptro Jun 17, 2024

samakinen Jun 17, 2024

zptro Jun 17, 2024

samakinen Jun 17, 2024

zptro Jun 18, 2024

samakinen Jun 26, 2024

zptro commented Jun 17, 2024

samakinen commented Jun 17, 2024

zptro commented Jun 18, 2024

hsl-petrhaj commented Jun 18, 2024

zptro Jun 25, 2024

samakinen Jun 26, 2024

zptro Jun 27, 2024

samakinen commented Jun 26, 2024

zptro left a comment

		self.zone_population = pandas.Series(0, zone_numbers)
		self.zone_population = pandas.Series(0.0, zone_numbers)

		data = data.groupby(mapping).agg(avg, weights=data["total"])
		data = data.groupby(mapping).agg(lambda ser: avg(ser, weights=data["total"]))

		self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1
		self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1.0

Fix/emme 24 compatibility #567

Are you sure you want to change the base?

Fix/emme 24 compatibility #567

Conversation

samakinen commented Jun 17, 2024

hsl-petrhaj commented Jun 17, 2024

samakinen commented Jun 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zptro commented Jun 17, 2024

samakinen commented Jun 17, 2024

zptro commented Jun 18, 2024

hsl-petrhaj commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samakinen commented Jun 26, 2024

zptro left a comment

Choose a reason for hiding this comment