Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot upload geojsons with more than ~15k desire lines via web API #51

Closed
Robinlovelace opened this issue Feb 25, 2023 · 20 comments
Closed

Comments

@Robinlovelace
Copy link
Collaborator

Reproducible example: hope to add one soon. In the meantime any ideas @mvl22? We're using POST requests for this.

@Robinlovelace
Copy link
Collaborator Author

Reprex:

library(tidyverse)
library(cyclestreets)
# devtools::load_all()
# od = readRDS("~/nptscot/npt/inputdata/od_commute_jittered.Rds")
od_raw = pct::get_od()
#> No region provided. Returning national OD data.
#> Rows: 2402201 Columns: 14
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (2): Area of residence, Area of workplace
#> dbl (12): All categories: Method of travel to work, Work mainly at or from h...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 7201 Columns: 6
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): MSOA11CD, MSOA11NM
#> dbl (4): BNGEAST, BNGNORTH, LONGITUDE, LATITUDE
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
lsoas = pct::get_pct(layer = "z", national = TRUE, geography = "msoa")
#> Loading required package: sp
#> Warning in CPL_crs_from_input(x): GDAL Message 1: +init=epsg:XXXX syntax is
#> deprecated. It might return a CRS with a non-EPSG compliant axis order.
od = od_raw %>%
  slice(seq(20000))
summary(od$geo_code1 %in% lsoas)
#>    Mode   FALSE 
#> logical   20000
od = od::od_to_sf(x = od, z = lsoas)
#> 0 origins with no match in zone ids
#> 184 destinations with no match in zone ids
#>  points not in od data removed.
nrow(od) # 19k
#> [1] 19816
od_100 = od %>%
  slice(seq(100))
od_10k = od %>%
  slice(seq(10000))
od_15k = od %>%
  slice(seq(15000))


batch(desire_lines = od_100, wait = FALSE, silent = FALSE, username = "robinlovelace")
#> POSTing the request to create and start the job
#> Posting to: https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx
#> Job id: 3886
#> Sending data, wait...
#> Routing not complete
#> Routing job sent, check back in around 1 minutes if you've just sent this
#> Check at www.cyclestreets.net/journey/batch/ for route id: 3886
#> [1] 3886
batch(desire_lines = od_10k, wait = FALSE, silent = FALSE, username = "robinlovelace")
#> POSTing the request to create and start the job
#> Posting to: https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx
#> Job id: 3887
#> Sending data, wait...
#> Routing not complete
#> Routing job sent, check back in around 6 minutes if you've just sent this
#> Check at www.cyclestreets.net/journey/batch/ for route id: 3887
#> [1] 3887
batch(desire_lines = od_15k, wait = FALSE, silent = FALSE, username = "robinlovelace")
#> POSTing the request to create and start the job
#> Posting to: https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx
#> Job id:
#> Error in batch(desire_lines = od_15k, wait = FALSE, silent = FALSE, username = "robinlovelace"): Check your credentials, try again, and maybe contact CycleStreets

Created on 2023-02-25 with reprex v2.0.2

atumscott added a commit to nptscot/cyclestreets-r that referenced this issue Feb 25, 2023
@Robinlovelace
Copy link
Collaborator Author

It seems that the POST request silently fails:

  res = httr::POST(url = batch_url, body = body, httr::timeout(60))

@mvl22
Copy link
Member

mvl22 commented Feb 25, 2023

Could you run this and let us know an exact time it was run so we can trace this in the logs?

I suspect the API is hitting a POST size limit, which is easy to fix. The original batch API was designed for smaller amounts of data, but since the addition of the A-B mode I guess this will probably be hitting a limit.

If it's possible to supply a data file of what is posted that would help.

PS I recommend adding error handling logging in your code so that res tells you back what HTTP status code or error message you are getting.

@Robinlovelace
Copy link
Collaborator Author

Sure. Here is the error I'm seeing, seems it's CycleStreets side:

    "error": "The number of route calculations that would be required (112,492,500) is...

@Robinlovelace
Copy link
Collaborator Author

Robinlovelace commented Feb 25, 2023

Full details in debug mode. I'll aim to send a file with the body of the request sent:

Browse[2]>   res = httr::POST(url = batch_url, body = body, httr::timeout(60))
Browse[2]> res
Response [https://api.cyclestreets.net/v2/batchroutes.createjob?key=xxx]
  Date: 2023-02-25 14:56
  Status: 200
  Content-Type: application/json; charset=UTF-8
  Size: 241 B
{
    "error": "The number of route calculations that would be required (112,492,500) is...

@Robinlovelace
Copy link
Collaborator Author

Heads-up @mvl22 I have emailed you a link to the JSON uploaded. It seems that the checking mechanism is calculating the number of route calculations incorrectly on your side.

@mvl22
Copy link
Member

mvl22 commented Feb 25, 2023

Ah, it thinks you are sending 112 million calculations, which looks to me like 15k squared in one direction only...

You are intending 15k pre-defined lines, yes?

@Robinlovelace
Copy link
Collaborator Author

Yes, seems it does not realise that these are OD pairs, not points for which every combo is needed. Should be a quick fix.

@Robinlovelace
Copy link
Collaborator Author

Robinlovelace commented Feb 25, 2023

You are intending 15k pre-defined lines, yes?

Correct. Same data structure and POST request works for 10k lines as shown in the reproducible example above.

@Robinlovelace
Copy link
Collaborator Author

In the meantime, I'll try to pull out and print the error message if there is one. Good shout on that Martin, would have made the root cause of this quicker to identify.

@mvl22
Copy link
Member

mvl22 commented Feb 25, 2023

Could you also e-mail the 10k lines version?

@Robinlovelace
Copy link
Collaborator Author

Sure.

@mvl22
Copy link
Member

mvl22 commented Feb 25, 2023

Thanks.

Sorry you've ran into this problem. We did some server upgrades last week. It's possible that a deployment default that we were silently relying on has changed.

@Robinlovelace
Copy link
Collaborator Author

@mvl22
Copy link
Member

mvl22 commented Feb 25, 2023

See the GeoJSONs here: https://github.com/cyclestreets/cyclestreets-r/releases/tag/v0.6.0

The 15k example at least seems to be very long distance lines, e.g. from London to Newcastle. Just checking these are the right files? That seems to be different to the gist you sent.

Please ensure that the specific dataset you give is actually failing. Otherwise I don't have a reliable reproduce-case.

@mvl22
Copy link
Member

mvl22 commented Feb 25, 2023

I've confirmed we are not seeing POST size maxouts in the logs, i.e. the sent data is being correctly received, as indeed the error you are getting implies.

So there is something about the data or the processing code that is making the system this is a set of points rather than linestrings and therefore triggering combinations mode.

@Robinlovelace
Copy link
Collaborator Author

See the GeoJSONs here: https://github.com/cyclestreets/cyclestreets-r/releases/tag/v0.6.0

The 15k example at least seems to be very long distance lines, e.g. from London to Newcastle. Just checking these are the right files? That seems to be different to the gist you sent.

Please ensure that the specific dataset you give is actually failing. Otherwise I don't have a reliable reproduce-case.

That was just an example dataset to show the error. Happy to provide a more realistic one.

@Robinlovelace
Copy link
Collaborator Author

@mvl22
Copy link
Member

mvl22 commented Feb 26, 2023

This should now be working.

I found a bug, whereby the number of calculations in LineString mode was not done correctly. The 10k file was doing the incorrect calculation but was not triggering the calculation limit. This is now fixed.

@Robinlovelace
Copy link
Collaborator Author

Great to hear @mvl22, thanks for quickly identifying and fixing this issue your side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants