Separate origin/destination subpoints #11

dabreegster · 2022-02-08T11:20:28Z

This splits the --subpoints-path flag into separate --subpoints-origins-path and --subpoints-destinations-path flags. If either one isn't specified, the tool falls back to picking random points instead.

No support for weighted subpoints yet; I'll do that separately. There was some other cleanup to do first, so this PR is already big

(Test fails, because of incorrect handling of rows with 0)

dabreegster

Going to merge and keep going. Next steps:

Do the renaming you proposed
Add weighted subpoint support
Make a separate command for full disaggregation, with a different output format

dabreegster · 2022-02-08T11:26:43Z

README.Rmd

@@ -67,7 +67,8 @@ With reference to the test data in this repo, you can run the `jitter` command l
 ```{bash}
 odjitter --od-csv-path data/od.csv \
  --zones-path data/zones.geojson \
-  --subpoints-path data/road_network.geojson \
+  --subpoints-origins-path data/road_network.geojson \


I only updated this file; I'm having lots of issues regenerating the .md...

The bash commands all rely on an odjitter command existing in my $PATH, but if I'm developing that command, the new copy is in ./target/{release,debug}/odjitter instead. I can temporarily change that, but that's a very clunky workflow. Or is the intention to do cargo install from my local development copy?

I had to Rscript -e 'install.packages("od")' and for readr, and also install pandoc. It very much weirds me out that these dependencies aren't more explicit, get installed in some system-wide place, and aren't pinned to a version. But we've discussed this philosophical difference before, it's fine. :)

When I regenerated the .md, there were a bunch of spurious diffs in the output with whitespace, and it left random output_max50.geojson files without any cleanup

Since there are more changes coming, I'm going to leave the Rmd and md out of sync and sort this out at the end.

Thanks for the heads up and apologies for the clunky nature of it. There should be absolutely minimal dependencies. Plan: I can take care of the docs. Thinking of porting the .Rmd to .qmd which should remove the R dependency totally. Sound like a plan?

Could be worth trying, but doesn't quarto turn around and use R or something else underneath for all the interactive script running? Why don't we just write a regular .md without any magic at all? Embedding command output in the README is an unusual practice for crates.io. Maybe the README could stay really narrowly focused, and the interactive walk-through style guide is better suited for a vignette of a future R package that interfaces with odjitter?

Could be worth trying, but doesn't quarto turn around and use R or something else underneath for all the interactive script running? Why don't we just write a regular .md without any magic at all? Embedding command output in the README is an unusual practice for crates.io. Maybe the README could stay really narrowly focused, and the interactive walk-through style guide is better suited for a vignette of a future R package that interfaces with odjitter?

I don't think so. Quarto does not depend on R (only if you try to compile to .pdf it by default tries to install a minimal LaTeX installation via R I think). Open minded and happy to look at reducing the length of the README although currently it provides context and usage about the odjitter crate not the (currently non existent) R/Python interfaces.

dabreegster · 2022-02-08T11:27:44Z

src/lib.rs

-                // TODO Crashes on NaNs, infinity
-                Value::Number(serde_json::Number::from_f64(x / repeat).unwrap())
+                // Scale all of the numeric values, unless all_key for this row is 0
+                let scaled = if repeat == 0.0 { x } else { x / repeat };


Here's part of the problem we discussed yesterday. We only transform numeric values if all_key / max_per_od is non-zero now.

Sounds reasonable. In my mind repeat would be 1 when the threshold_key is 0 but this makes that explicit (x / 1 equals x).

Looking at other commits and comments, maybe repeat could be set as 1 or more on creation (not zero). But that does matter so ignore these comments!

6f42425
You're right, setting to 1 is simpler. Thanks!

Awesome! Feel I'm taking baby steps in understanding Rust code that will make me a better programmer all round. Many thanks for quick implementation and glad the comment led to a positive change in the code base 🙏

dabreegster · 2022-02-08T11:28:53Z

src/scrape.rs

+    Ok(points)
+}
+
+fn geometry_to_points(geom: geo_types::Geometry<f64>) -> Vec<Point<f64>> {


Michael clued me in I can probably use https://docs.rs/geo/latest/geo/algorithm/coords_iter/trait.CoordsIter.html, I'll try this separately

dabreegster · 2022-02-08T11:29:32Z

src/tests.rs

+    }
+}
+
+#[test]


The unit test code is very verbose and awkward; I'm not happy with it at all, will keep iterating.

dabreegster · 2022-02-08T11:31:55Z

Oops, forgot to associate this with #7

dabreegster added 8 commits February 3, 2022 09:55

Add a disabled test in preparation for supporting weighted subpoints

cff706e

Specify different subsample sources for origins/destinations

08c9cb6

Scrape points from all GeoJSON types

d0e76cd

Split scraping code into a module

ee7a5f4

Handle cases when repeat = 0

b13d32b

Test that sums match for a few modes.

63c2f8d

(Test fails, because of incorrect handling of rows with 0)

Handle repeat = 0 properly, without discarding the counts

bf10e1c

Update just the Rmd to reflect new CLI flags, and fix a clippy warning

3bb0d16

dabreegster commented Feb 8, 2022

View reviewed changes

dabreegster merged commit 14756dd into main Feb 8, 2022

dabreegster deleted the weighted_subp branch February 8, 2022 11:31

Robinlovelace mentioned this pull request Feb 8, 2022

Update README #12

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate origin/destination subpoints #11

Separate origin/destination subpoints #11

dabreegster commented Feb 8, 2022

dabreegster left a comment

dabreegster Feb 8, 2022

Robinlovelace Feb 8, 2022

dabreegster Feb 8, 2022

Robinlovelace Feb 8, 2022

dabreegster Feb 8, 2022

Robinlovelace Feb 8, 2022

Robinlovelace Feb 8, 2022

dabreegster Feb 8, 2022

Robinlovelace Feb 8, 2022

dabreegster Feb 8, 2022

dabreegster Feb 8, 2022

dabreegster commented Feb 8, 2022

+                  }
+              }
+              #[test]

Separate origin/destination subpoints #11

Separate origin/destination subpoints #11

Conversation

dabreegster commented Feb 8, 2022

dabreegster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dabreegster commented Feb 8, 2022