Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup of paths, freeform, and OD matrix functionality #678

Merged
merged 26 commits into from
Feb 25, 2021

Conversation

abyrd
Copy link
Member

@abyrd abyrd commented Feb 9, 2021

Recently much work has been done on reporting information other than accessibility, such as full paths or travel times. This is often done using freeform points (rather than grids) as origins or destinations. This PR will include various additional fixes and cleanup that were agreed upon in reviews of previous PRs on these subjects.

This includes:

@abyrd abyrd changed the title Cleanup of freeform and OD matrix functionality Cleanup of paths, freeform, and OD matrix functionality Feb 9, 2021
if (job.templateTask.recordTimes && !job.templateTask.oneToOne) {
if (nOriginsTotal * destinationPointSet.featureCount() > 1_000_000) {
if ((job.templateTask.recordTimes || job.templateTask.includePathResults) && !job.templateTask.oneToOne) {
if (nOriginsTotal * destinationPointSet.featureCount() > 16_000_000) {
Copy link
Member Author

@abyrd abyrd Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O*D will limit the size of the output CSV, but it won't really limit the number of destinations, In the extreme, 16 origins and 1 million destinations is allowed, which would still produce very large results from the worker (probably all 16 million at once). Maybe we should restrict D in addition to or instead of O*D.

Copy link
Member Author

@abyrd abyrd Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the exeption message below does not report the true limit. I'll add a destination limit and some new messages.

abyrd and others added 3 commits February 18, 2021 19:25
Even when counts were stored, the methods returned a count of 1 for every point. We now set the counts from CSV fields, or to 1 if no CSV field is present, so we can return the true stored values.
@abyrd
Copy link
Member Author

abyrd commented Feb 18, 2021

The Simpson Desert tests seem to be failing occasionally, but they pass when re-run. This is probably because they are checking Monte Carlo results against theoretical distributions, and the goodness of fit can change from one run to the next.

abyrd and others added 5 commits February 22, 2021 23:23
New Paths represent specific departure times. The PatternSequence
nested in them is similar to the old Paths. So we now accumulate and
deduplicate these PatternSequences and write them out as Taui paths.
This is replicating things that happen in travelTimeReducer.recordPathsForTarget but for the Taui case
@abyrd
Copy link
Member Author

abyrd commented Feb 23, 2021

Recent changes to Taui Path/PatternSequence have been tested locally on a tiny Taui regional run. Files were successfully generated without exceptions and uploaded to S3. the XY_paths.dat were downloaded, gunzipped and examined in a hex editor. They appear to contain valid path information.

abyrd and others added 3 commits February 23, 2021 23:08
Freeform pointset use is becoming mainstream, we don't want to be
stalling the broker on production by doing potentially slow things in
constructors called in the broker's synchronized methods.

We make sure any PointSet instances we need are preloaded into the
transient fields of the template task on the backend, where they are
visible to the Job and the MultiOriginAssembler.
It's easy to get high zero points with the exponential functions, or
by simply setting the cutoff to 120 minutes with any decay function.
Comment on lines +526 to +529
if (maxTripDurationMinutes > 120) {
LOG.warn("Distance decay function reached zero above 120 minutes. Capping travel time at 120 minutes.");
maxTripDurationMinutes = 120;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes #677

abyrd and others added 7 commits February 23, 2021 23:29
Ensures we only end up with storage keys for files that were actually
written, also avoiding duplicating the logic for determining when
those writers are created/activated.

This also ensures that the "complete" field in the RegionalAnalysis is
flushed out to the database.
Also update some comments/logs about CSV and S3
This ensures all CSV outputs will be visible in the UI, which does not
re-fetch regional analyses from the database when they complete. We may
decide to change this UI behavior eventually, in which case we could do
the database updates when we mark the regional analysis complete.
This factors out a common interface for CSV result writers, removing the
amount of ad-hoc conditionals. It also enables writing out CSV for
multiple percentiles and cutoffs (times and paths), and even multiple
destination grids (for accessibility).
We allow accessibility CSV for gridded destinations, which aren't loaded
on the backend. We can only rely on their keys being present.

// Register the regional job with the broker, which will distribute individual tasks to workers and track progress.
broker.enqueueTasksForRegionalJob(regionalAnalysis);

// Flush to the database any information added to the RegionalAnalysis object when it was enqueued.
// This includes the paths of any CSV files that will be produced by this analysis.
// TODO verify whether there is a reason to use regionalAnalyses.modifyWithoutUpdatingLock() or put().
Copy link
Member

@trevorgerhardt trevorgerhardt Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put updates the nonce, createdAt, and updatedAt and should be used in updates from HTTP handlers. modifyWithoutUpdatingLock should be used when updating a model multiple times by the backend. There could be a better name for it, suggestions welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a case that could be made to always use put also 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing I was wondering is why you'd avoid updating the nonce and update time when something was changed by the backend. I just wasn't sure of the reasoning - we should probably explain it on the method's Javadoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a premature optimization, mainly skipping updates to fields that aren't needed when the multiple writes are done by the backend without user input.

Looking at all the instances it's used, I'm not sure it's necessary anywhere except in BundleController.setServiceBundleDates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even there, it could probably be removed

@abyrd abyrd marked this pull request as ready for review February 25, 2021 01:31
@abyrd abyrd enabled auto-merge February 25, 2021 01:32
@abyrd abyrd merged commit c205c44 into dev Feb 25, 2021
@abyrd abyrd deleted the paths-times-freeform branch February 25, 2021 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants