-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup of paths, freeform, and OD matrix functionality #678
Conversation
126d385
to
596af72
Compare
if (job.templateTask.recordTimes && !job.templateTask.oneToOne) { | ||
if (nOriginsTotal * destinationPointSet.featureCount() > 1_000_000) { | ||
if ((job.templateTask.recordTimes || job.templateTask.includePathResults) && !job.templateTask.oneToOne) { | ||
if (nOriginsTotal * destinationPointSet.featureCount() > 16_000_000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O*D
will limit the size of the output CSV, but it won't really limit the number of destinations, In the extreme, 16 origins and 1 million destinations is allowed, which would still produce very large results from the worker (probably all 16 million at once). Maybe we should restrict D
in addition to or instead of O*D
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the exeption message below does not report the true limit. I'll add a destination limit and some new messages.
Even when counts were stored, the methods returned a count of 1 for every point. We now set the counts from CSV fields, or to 1 if no CSV field is present, so we can return the true stored values.
e2a0147
to
cb0d321
Compare
The Simpson Desert tests seem to be failing occasionally, but they pass when re-run. This is probably because they are checking Monte Carlo results against theoretical distributions, and the goodness of fit can change from one run to the next. |
New Paths represent specific departure times. The PatternSequence nested in them is similar to the old Paths. So we now accumulate and deduplicate these PatternSequences and write them out as Taui paths.
This is replicating things that happen in travelTimeReducer.recordPathsForTarget but for the Taui case
Recent changes to Taui Path/PatternSequence have been tested locally on a tiny Taui regional run. Files were successfully generated without exceptions and uploaded to S3. the XY_paths.dat were downloaded, gunzipped and examined in a hex editor. They appear to contain valid path information. |
Freeform pointset use is becoming mainstream, we don't want to be stalling the broker on production by doing potentially slow things in constructors called in the broker's synchronized methods. We make sure any PointSet instances we need are preloaded into the transient fields of the template task on the backend, where they are visible to the Job and the MultiOriginAssembler.
It's easy to get high zero points with the exponential functions, or by simply setting the cutoff to 120 minutes with any decay function.
if (maxTripDurationMinutes > 120) { | ||
LOG.warn("Distance decay function reached zero above 120 minutes. Capping travel time at 120 minutes."); | ||
maxTripDurationMinutes = 120; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #677
Ensures we only end up with storage keys for files that were actually written, also avoiding duplicating the logic for determining when those writers are created/activated. This also ensures that the "complete" field in the RegionalAnalysis is flushed out to the database.
Also update some comments/logs about CSV and S3
Also updated a few comments.
This ensures all CSV outputs will be visible in the UI, which does not re-fetch regional analyses from the database when they complete. We may decide to change this UI behavior eventually, in which case we could do the database updates when we mark the regional analysis complete.
This factors out a common interface for CSV result writers, removing the amount of ad-hoc conditionals. It also enables writing out CSV for multiple percentiles and cutoffs (times and paths), and even multiple destination grids (for accessibility).
We allow accessibility CSV for gridded destinations, which aren't loaded on the backend. We can only rely on their keys being present.
|
||
// Register the regional job with the broker, which will distribute individual tasks to workers and track progress. | ||
broker.enqueueTasksForRegionalJob(regionalAnalysis); | ||
|
||
// Flush to the database any information added to the RegionalAnalysis object when it was enqueued. | ||
// This includes the paths of any CSV files that will be produced by this analysis. | ||
// TODO verify whether there is a reason to use regionalAnalyses.modifyWithoutUpdatingLock() or put(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put
updates the nonce
, createdAt
, and updatedAt
and should be used in updates from HTTP handlers. modifyWithoutUpdatingLock
should be used when updating a model multiple times by the backend. There could be a better name for it, suggestions welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a case that could be made to always use put
also 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing I was wondering is why you'd avoid updating the nonce and update time when something was changed by the backend. I just wasn't sure of the reasoning - we should probably explain it on the method's Javadoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a premature optimization, mainly skipping updates to fields that aren't needed when the multiple writes are done by the backend without user input.
Looking at all the instances it's used, I'm not sure it's necessary anywhere except in BundleController.setServiceBundleDates
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even there, it could probably be removed
…vwriter Polymorphic multidimensional regional result writers
Recently much work has been done on reporting information other than accessibility, such as full paths or travel times. This is often done using freeform points (rather than grids) as origins or destinations. This PR will include various additional fixes and cleanup that were agreed upon in reviews of previous PRs on these subjects.
This includes: