Cleanup of paths, freeform, and OD matrix functionality #678

abyrd · 2021-02-09T08:41:57Z

Recently much work has been done on reporting information other than accessibility, such as full paths or travel times. This is often done using freeform points (rather than grids) as origins or destinations. This PR will include various additional fixes and cleanup that were agreed upon in reviews of previous PRs on these subjects.

This includes:

Issue Improve Taui path scorer #660 (improving the Taui path scorer) and reuse of that scorer in selecting a small set of paths to report in single point or regional analyses, including those with freeform pointsets.
The list of follow up points on Write path information #650 in this comment Write path information #650 (comment)

abyrd · 2021-02-18T11:17:04Z

src/main/java/com/conveyal/analysis/results/MultiOriginAssembler.java

-                if (job.templateTask.recordTimes && !job.templateTask.oneToOne) {
-                    if (nOriginsTotal * destinationPointSet.featureCount() > 1_000_000) {
+                if ((job.templateTask.recordTimes || job.templateTask.includePathResults) && !job.templateTask.oneToOne) {
+                    if (nOriginsTotal * destinationPointSet.featureCount() > 16_000_000) {


O*D will limit the size of the output CSV, but it won't really limit the number of destinations, In the extreme, 16 origins and 1 million destinations is allowed, which would still produce very large results from the worker (probably all 16 million at once). Maybe we should restrict D in addition to or instead of O*D.

Also, the exeption message below does not report the true limit. I'll add a destination limit and some new messages.

Even when counts were stored, the methods returned a count of 1 for every point. We now set the counts from CSV fields, or to 1 if no CSV field is present, so we can return the true stored values.

abyrd · 2021-02-18T14:43:12Z

The Simpson Desert tests seem to be failing occasionally, but they pass when re-run. This is probably because they are checking Monte Carlo results against theoretical distributions, and the goodness of fit can change from one run to the next.

New Paths represent specific departure times. The PatternSequence nested in them is similar to the old Paths. So we now accumulate and deduplicate these PatternSequences and write them out as Taui paths.

This is replicating things that happen in travelTimeReducer.recordPathsForTarget but for the Taui case

abyrd · 2021-02-23T11:12:21Z

Recent changes to Taui Path/PatternSequence have been tested locally on a tiny Taui regional run. Files were successfully generated without exceptions and uploaded to S3. the XY_paths.dat were downloaded, gunzipped and examined in a hex editor. They appear to contain valid path information.

Freeform pointset use is becoming mainstream, we don't want to be stalling the broker on production by doing potentially slow things in constructors called in the broker's synchronized methods. We make sure any PointSet instances we need are preloaded into the transient fields of the template task on the backend, where they are visible to the Job and the MultiOriginAssembler.

It's easy to get high zero points with the exponential functions, or by simply setting the cutoff to 120 minutes with any decay function.

ansoncfit · 2021-02-23T15:27:41Z

src/main/java/com/conveyal/r5/analyst/cluster/AnalysisWorker.java

+            if (maxTripDurationMinutes > 120) {
+                LOG.warn("Distance decay function reached zero above 120 minutes. Capping travel time at 120 minutes.");
+                maxTripDurationMinutes = 120;
+            }


Ensures we only end up with storage keys for files that were actually written, also avoiding duplicating the logic for determining when those writers are created/activated. This also ensures that the "complete" field in the RegionalAnalysis is flushed out to the database.

Also update some comments/logs about CSV and S3

Also updated a few comments.

This ensures all CSV outputs will be visible in the UI, which does not re-fetch regional analyses from the database when they complete. We may decide to change this UI behavior eventually, in which case we could do the database updates when we mark the regional analysis complete.

This factors out a common interface for CSV result writers, removing the amount of ad-hoc conditionals. It also enables writing out CSV for multiple percentiles and cutoffs (times and paths), and even multiple destination grids (for accessibility).

We allow accessibility CSV for gridded destinations, which aren't loaded on the backend. We can only rely on their keys being present.

trevorgerhardt · 2021-02-24T14:11:36Z

src/main/java/com/conveyal/analysis/controllers/RegionalAnalysisController.java


        // Register the regional job with the broker, which will distribute individual tasks to workers and track progress.
        broker.enqueueTasksForRegionalJob(regionalAnalysis);

+        // Flush to the database any information added to the RegionalAnalysis object when it was enqueued.
+        // This includes the paths of any CSV files that will be produced by this analysis.
+        // TODO verify whether there is a reason to use regionalAnalyses.modifyWithoutUpdatingLock() or put().


put updates the nonce, createdAt, and updatedAt and should be used in updates from HTTP handlers. modifyWithoutUpdatingLock should be used when updating a model multiple times by the backend. There could be a better name for it, suggestions welcome.

There's a case that could be made to always use put also 🤷

The main thing I was wondering is why you'd avoid updating the nonce and update time when something was changed by the backend. I just wasn't sure of the reasoning - we should probably explain it on the method's Javadoc.

It may be a premature optimization, mainly skipping updates to fields that aren't needed when the multiple writes are done by the backend without user input.

Looking at all the instances it's used, I'm not sure it's necessary anywhere except in BundleController.setServiceBundleDates.

Even there, it could probably be removed

…vwriter Polymorphic multidimensional regional result writers

abyrd changed the title ~~Cleanup of freeform and OD matrix functionality~~ Cleanup of paths, freeform, and OD matrix functionality Feb 9, 2021

abyrd force-pushed the paths-times-freeform branch from 126d385 to 596af72 Compare February 12, 2021 11:30

abyrd mentioned this pull request Feb 12, 2021

fix(TravelTimeReducer): null check, fixes #679 #682

Merged

abyrd commented Feb 18, 2021

View reviewed changes

abyrd and others added 3 commits February 18, 2021 19:25

return opportunity counts instead of 1

4220d6c

Even when counts were stored, the methods returned a count of 1 for every point. We now set the counts from CSV fields, or to 1 if no CSV field is present, so we can return the true stored values.

feat(guardrail): limit path analyses to 16m OD pairs

d388c16

check for situation where no ResultWriters are created

cb0d321

abyrd force-pushed the paths-times-freeform branch from e2a0147 to cb0d321 Compare February 18, 2021 11:25

impose max destinations and max od pairs

4756157

abyrd and others added 5 commits February 22, 2021 23:23

Enable Taui path recording using new PatternSequence

e3dbe30

New Paths represent specific departure times. The PatternSequence nested in them is similar to the old Paths. So we now accumulate and deduplicate these PatternSequences and write them out as Taui paths.

feat(paths): remove .gz extension from stored file

9318586

fix(paths): exclude bucket from storage location

1785b54

Store PatternSequence instead of Path, attaching egress info

d129397

This is replicating things that happen in travelTimeReducer.recordPathsForTarget but for the Taui case

Merge branch 'dev' into paths-times-freeform

e15b4f8

abyrd and others added 3 commits February 23, 2021 23:08

fix(local-downloads): add CSV as FileStorageFormat

1cbc7de

Truncate decay functions at 120 minutes instead of asserting

8957fea

It's easy to get high zero points with the exponential functions, or by simply setting the cutoff to 120 minutes with any decay function.

ansoncfit reviewed Feb 23, 2021

View reviewed changes

abyrd and others added 7 commits February 23, 2021 23:29

change mime type "text" to "text/plain"

c0883ff

Use correct logger name in ResultWriter

0c76631

Also update some comments/logs about CSV and S3

Always preload freeform destination pointsets, not just for one-to-one.

2c34f15

Also updated a few comments.

simplify parameter check

aacd100

abyrd mentioned this pull request Feb 24, 2021

Polymorphic multidimensional regional result writers #695

Merged

abyrd added 2 commits February 24, 2021 18:15

check pointset keys instead of pointsets

39b7f71

We allow accessibility CSV for gridded destinations, which aren't loaded on the backend. We can only rely on their keys being present.

clean up logging

c2f831a

abyrd added 2 commits February 24, 2021 18:29

remove duplicate derivation of CSV file names

207b8cc

remove unused fields, clean up comments

5633d4b

trevorgerhardt reviewed Feb 24, 2021

View reviewed changes

abyrd added 3 commits February 24, 2021 22:30

factor out common common interface for CSV and grid writers

acd4cb6

check origin and destination set size before adding writers

2d2fdf8

Merge pull request #695 from conveyal/polymorphic-multidimensional-cs…

bad3aa5

…vwriter Polymorphic multidimensional regional result writers

abyrd marked this pull request as ready for review February 25, 2021 01:31

abyrd enabled auto-merge February 25, 2021 01:32

trevorgerhardt approved these changes Feb 25, 2021

View reviewed changes

abyrd merged commit c205c44 into dev Feb 25, 2021

abyrd deleted the paths-times-freeform branch February 25, 2021 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup of paths, freeform, and OD matrix functionality #678

Cleanup of paths, freeform, and OD matrix functionality #678

abyrd commented Feb 9, 2021 •

edited

Loading

abyrd Feb 18, 2021 •

edited

Loading

abyrd Feb 18, 2021 •

edited

Loading

abyrd commented Feb 18, 2021

abyrd commented Feb 23, 2021

ansoncfit Feb 23, 2021

trevorgerhardt Feb 24, 2021 •

edited

Loading

trevorgerhardt Feb 24, 2021

abyrd Feb 24, 2021

trevorgerhardt Feb 24, 2021

trevorgerhardt Feb 24, 2021

Cleanup of paths, freeform, and OD matrix functionality #678

Cleanup of paths, freeform, and OD matrix functionality #678

Conversation

abyrd commented Feb 9, 2021 • edited Loading

abyrd Feb 18, 2021 • edited Loading

Choose a reason for hiding this comment

abyrd Feb 18, 2021 • edited Loading

Choose a reason for hiding this comment

abyrd commented Feb 18, 2021

abyrd commented Feb 23, 2021

ansoncfit Feb 23, 2021

Choose a reason for hiding this comment

trevorgerhardt Feb 24, 2021 • edited Loading

Choose a reason for hiding this comment

trevorgerhardt Feb 24, 2021

Choose a reason for hiding this comment

abyrd Feb 24, 2021

Choose a reason for hiding this comment

trevorgerhardt Feb 24, 2021

Choose a reason for hiding this comment

trevorgerhardt Feb 24, 2021

Choose a reason for hiding this comment

abyrd commented Feb 9, 2021 •

edited

Loading

abyrd Feb 18, 2021 •

edited

Loading

abyrd Feb 18, 2021 •

edited

Loading

trevorgerhardt Feb 24, 2021 •

edited

Loading