Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/emme 24 compatibility #567

Draft
wants to merge 5 commits into
base: olusanya
Choose a base branch
from
Draft

Conversation

samakinen
Copy link
Collaborator

OpenPaths Emme 24 has been released with Python 3.11 and updated Numpy and Pandas versions. The current Helmet model system is not compatible with the new versions.
This PR will make the model system compatible with Emme 24 while maintaining compatibility with Emme 23 (tested) and probably other versions using Python 3.7 (not tested).
While testing I noticed some differences between different Emme and Helmet branches. I run the same network and land use scenarios for 3 different test cases:

  1. Emme 23, olusanya
  2. Emme 23, fix/emme 24 compatibility
  3. Emme 24, fix/emme 24 compatibility
    I run CBA calculation to compare the calculation results and found 3-6 M€/a differences between all scenarios caused by travel time saving. The code changes and the Emme version should not change the calculation results and I could not find any logical differences between the 3 scenarios. It seams the differences are caused by fluctuations in the demand model.
    I marked the PR as a draft until the reason for differences can be identified.

@hsl-petrhaj
Copy link
Contributor

Hi Sami, unable to check these in detail now, but could you add the new pandas version to the requirements? Could there be some differences because of that?

@samakinen
Copy link
Collaborator Author

Pandas is not listed in the requirements. Only requirements is openpyxl that had to be updated from 2.6.4 to 3.1.4 when running with python 3.11. When using old Emme with Python 3.7 the old version will be used. Pandas comes directly from Emme environment and the model system has to use the version provided. This is 0.24.2 when using Emme 23 and 2.0.2 when using Emme 24. I don't think Pandas version would (or should) explain the differences because my two test cases "emme 23, olusanya" and "emme 23, fix/emme-24-compatibility" used the same Pandas version but still produced different results.

self.zone_population = pandas.Series(0, zone_numbers)
self.zone_population = pandas.Series(0.0, zone_numbers)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zone population is supposed to be an integer value in the agent model. Does that cause problems?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this one should be integer. I think floats would work as well, but int is better if the population is an integer value.

data = data.groupby(mapping).agg(avg, weights=data["total"])
data = data.groupby(mapping).agg(lambda ser: avg(ser, weights=data["total"]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the problem here? I see no indications in the pandas documentation that feeding regular functions into agg would have been deprecated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not make the avg function work without this. It seems that avg function expects to get a Series as an argument, but will get a DataFrame object instead. For some reason adding lambda here fixes that. There might be a more elegant way to fix this.

self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1
self.matrix.at[self.mapping[orig], self.mapping[dest]] += 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why integers cannot be handled as integers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.matrix can not be integers here because we will add floating point values to it in other methods. Adding 1 instead of 1.0 here would work as well (implicit conversion from int to float), but I think it's more clear to be consistent with the datatypes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent model and the aggregate model behave in very different ways here, the agent model is using add and aggregate model aggregate. One idea would be to separate them better, keeping integers in the agent version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to duplicate code because of it? As far as I can see, adding 1 to float64 or int64 values works identically up to 253 (or 224 for float32). Those values are probably more than we need in this case. If the number range could be an issue in some cases we should use float64.

@zptro
Copy link
Collaborator

zptro commented Jun 17, 2024

Pandas is not listed in the requirements. Only requirements is openpyxl that had to be updated from 2.6.4 to 3.1.4 when running with python 3.11. When using old Emme with Python 3.7 the old version will be used. Pandas comes directly from Emme environment and the model system has to use the version provided. This is 0.24.2 when using Emme 23 and 2.0.2 when using Emme 24. I don't think Pandas version would (or should) explain the differences because my two test cases "emme 23, olusanya" and "emme 23, fix/emme-24-compatibility" used the same Pandas version but still produced different results.

True, pandas and most other packages is not in requirements.txt. But pipfile needs to be updated, so that unit tests behave in the same way as production version of the model system.

@samakinen
Copy link
Collaborator Author

True, pandas and most other packages is not in requirements.txt. But pipfile needs to be updated, so that unit tests behave in the same way as production version of the model system.
Are we ready to switch the "production version" to be the one defined by Emme 24? I run the unit tests in separate virtualenvs for Emme 23 and Emme 24. I can update the pipfile to the new package versions if we are ready to switch.

@zptro
Copy link
Collaborator

zptro commented Jun 18, 2024

True, pandas and most other packages is not in requirements.txt. But pipfile needs to be updated, so that unit tests behave in the same way as production version of the model system.
Are we ready to switch the "production version" to be the one defined by Emme 24? I run the unit tests in separate virtualenvs for Emme 23 and Emme 24. I can update the pipfile to the new package versions if we are ready to switch.

I cannot say what is the right solution for Helmet version "bleeding edge", but I think we could schedule the migration of LEM to the upcoming weeks.

@hsl-petrhaj
Copy link
Contributor

Helmet 5 at least should be switched to Emme24. Can the compatibility with Emme23 be retained though? Or is there something that would break the compatibility?

Comment on lines -29 to 32
self.tours = pandas.Series(0, self.purpose.zone_numbers)
self.tours = pandas.Series(0.0, self.purpose.zone_numbers)

def add_tours(self):
"""Generate and add (peripheral) tours to zone vector."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have small impacts on the demand, because I think in the base case the self.tours vector will change to float32 (because that is what is added from self.zone_data) in add_tours. Now when self.tours is implicitly initialized as float64, this will probably be broadcasted to large parts of the demand model. I suggest adding a dtype=numpy.float32 to see if that changes results. This could also be tested in ExternalModel.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that Pandas does not change the Series datatype on addition. In the earlier version the series was created with int64 dtype and each addition of real values would only add the integer part of the number (floor()). Comparing to that the difference between float32 and float64 should be minimal. We can add dtype=numpy.float32 here if want to reduce memory consumption, especially if there are bigger matrices calculated based on this Series and same dtype.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this and it turned out that we were both wrong. add_tours has always changed self.tours into float64.

@samakinen
Copy link
Collaborator Author

I made a new commit to update the development environment and documentation to refer OpenPaths EMME 24 and python 3.11. I also updated the github actions to run the tests with the new dependencies and python version. The model-system will still works with older EMME and Python 3.7, but there are no automated tests to make sure it will do so in the future.

Copy link
Collaborator

@zptro zptro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now, I think! However, I have no idea why the results are fluctuating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants