Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results when making multiple calls with nearest_pois #73

Closed
stefancoe opened this issue Apr 26, 2017 · 19 comments
Closed

Different results when making multiple calls with nearest_pois #73

stefancoe opened this issue Apr 26, 2017 · 19 comments

Comments

@stefancoe
Copy link

stefancoe commented Apr 26, 2017

Description of the bug

I am assigning network node_ids a to parcel data set and then running nearest_poi to find all of the parcels within a mile of a set of coordinates. I want to be able to re-run these queries without re-creating the network. However, If I use one set of coordinates, then a different set of coordinates whose buffer intersects with the first set, then run the first set again, I get a different number of parcels returned in the first and third query. Please see the code as it might be easier to understand.

I am pretty sure that this is not a copy of a reference problem (in my code) because I use df.copy() to make a copy of the original parcel dataframe in the get_nearest_parcels function before any modifications are made to it. Also, the reason I do not want to recreate the network after each query is to save time- I am prototyping a web service that uses pandana to generate geojson isochrones from a passed in set of coordinates. So ideally the network is loaded and the app is listening for coordinates to run the query. I noticed this behavior because the app started generating weird polygons when making multiple calls from the same area. I have added the network files below.

UPDATE: The parcel DataFrame has nothing to do with the error so I have updated the code for simplicity.

Thanks,

Stefan

Network data (optional)

https://file.ac/8UQg8JUbhvs/

Environment

  • Operating system:
    Windows

  • Python version:
    2,7

  • Pandana version:
    0.2.0

  • Pandana required packages versions (optional):

Paste the code that reproduces the issue here:

import pandana as pdna
import pandas as pd
import numpy as np

nodes_file_name = r'D:\Stefan\Isochrone\repository\data\all_streets_nodes_2014.csv'
links_file_name = r'D:\Stefan\Isochrone\repository\data\all_streets_links_2014.csv'

max_dist = 5280 

# nodes must be indexed by node_id column, which is the first column
nodes = pd.DataFrame.from_csv(nodes_file_name)
links = pd.DataFrame.from_csv(links_file_name, index_col = None )

# get rid of circular links
links = links.loc[(links.from_node_id <> links.to_node_id)]

# assign impedance
imp = pd.DataFrame(links.Shape_Length)
imp = imp.rename(columns = {'Shape_Length':'distance'})

# create pandana network
network = pdna.network.Network(nodes.x, nodes.y, links.from_node_id, links.to_node_id, imp)

network.init_pois(1, max_dist, 1)

def get_nearest_nodes(x,y):
    print 'Getting data'
    x = pd.Series(x)
    y = pd.Series(y)
    #Set as a point of interest on the pandana network
    network.set_pois('tstop', x, y)
    #Find distance to point from all nodes, everything over max_dist gets a value of 99999
    res = network.nearest_pois(max_dist, 'tstop', num_pois=1, max_distance=99999)
    res = (res[res <> 99999]/5280.).astype(res.dtypes) # convert to miles
    return res 


#### ****Re-Run above code for each test**** #####

##### TEST 1 #####
# Same coordinates, same result
test1 = get_nearest_nodes(1272010.0, 228327.0)
print len(test1[(test1[1] > 0)])
test2 = get_nearest_nodes(1272010.0, 228327.0)
print len(test2[(test2[1] > 0)])

##### TEST 2- this shows the potential bug #####
# First and Third coordinates are the same but give different result. 
# The second set of coordinates are within the buffer distance of the first/third. 
test1 = get_nearest_nodes(1272010.0, 228327.0)
print len(test1[(test1[1] > 0)])
test2 = get_nearest_nodes(1268830.0, 228417.0)
print len(test2[(test2[1] > 0)])
# Same coords as the first call, but yields different results
test3 = get_nearest_nodes(1272010.0, 228327.0)
print len(test3[(test3[1] > 0)])

##### TEST 3 #####
# First and third coordinates are the same and give the same result. 
# The second set of coordinates are outside the buffer distance of the first/third.
test1 = get_nearest_nodes(1272010.0, 228327.0)
print len(test1[(test1[1] > 0)])
# These coords are outside the buffered area as the first call. 
test2 = get_nearest_nodes(1264180.0, 193485.0)
print len(test2[(test2[1] > 0)])
# Same coords and same results as the first call.
test3 = get_nearest_nodes(1272010.0, 228327.0)
print len(test3[(test3[1] > 0)])

Paste the error message (if applicable):

# place error message here
@fscottfoti
Copy link
Contributor

@sablanchard you guys been able to reproduce this yet?

@federicofernandez
Copy link
Contributor

federicofernandez commented May 2, 2017 via email

@stefancoe
Copy link
Author

I realized I was not using the most current pandana version so I upgraded. However, all of the queries now return a dataframe where none of the nodes are within the distance limit. So the print statements in the tests are all printing 0. I am having the same exact issue with an UrbanAccess network:

UDST/urbanaccess#24

Thanks!

@stefancoe
Copy link
Author

Update- The set_pois method seems to only work now with x, y data that have more than one record. So the following returns a result:

coords_dict = [{'x' : -122.355 , 'y' : 47.689, 'var' : 1}, {'x' : -122.355 , 'y' : 47.689, 'var' : 1}]
df = pd.DataFrame(coords_dict)
net.init_pois(1, dist, 1)
net.set_pois('map_loc', df.x, df.y)  

I have also confirmed that the issue I first reported on this thread is still happening in the latest version. I will post the revised code to do the tests later today.

fscottfoti added a commit that referenced this issue May 10, 2017
@fscottfoti
Copy link
Contributor

@stefancoe I added some code to test this using the test data here

everything seems to be working ok as best as I can tell.

(fyi adding this test makes other tests fail, but not related to this issue)

@pksohn
Copy link
Contributor

pksohn commented May 10, 2017

Great call @fscottfoti. I can fix the sort issue today unless you're already working on it.

@fscottfoti
Copy link
Contributor

fscottfoti commented May 10, 2017

I was actually confused by that test failure - why isn't master failing because of it?

The other test failure is because I added the test so now the network is initialized while the other test was expecting it to not be initialized yet. I can probably fix that one.

@pksohn
Copy link
Contributor

pksohn commented May 10, 2017

I think only because the new pandas release made it to Conda in the last couple of days, and last build of master was before then: https://travis-ci.org/UDST/pandana/builds.

@stefancoe
Copy link
Author

Thanks @fscottfoti! For my own sanity, can you run this code below? No files needed. From the print statements, I get 972, 693, and 678. Thanks!

import pandana as pdna
from pandana.loaders import osm
import pandas as pd

network = osm.pdna_network_from_bbox(37.859, -122.282, 37.881, -122.252)
dist = 1000 
network.init_pois(1, dist, 1)
def get_nearest_nodes(x, y):
    coords_dict = [{'x' : x , 'y' : y, 'var' : 1}, {'x' : x , 'y' : y, 'var' : 1}]
    df = pd.DataFrame(coords_dict)
    network.set_pois('map_loc', df['x'], df['y'])
    res_name = "dist_map_loc"
    res = network.nearest_pois(dist, 'map_loc', num_pois=1, max_distance=99999)
    return res

##### TEST 2- this shows the potential bug #####
# First and Third coordinates are the same but give different result. 
# The second set of coordinates are within the buffer distance of the first/third. 
test1 = get_nearest_nodes(-122.262634 , 37.877165)
print len(test1[(test1[1] < 99999)])
test2 = get_nearest_nodes(-122.254116, 37.869361)
print len(test2[(test2[1] < 99999)])
# Same coords as the first call, but yields different results
test3 = get_nearest_nodes(-122.262634 , 37.877165)
print len(test3[(test3[1] < 99999)])

@fscottfoti
Copy link
Contributor

weird - I wonder if it's related to #43

when I ask for 10 pois rather than 1 the problem goes away

so you're trying to get an isochrone to this one point in your web service and it's not working in that use case?

@stefancoe
Copy link
Author

stefancoe commented May 11, 2017

Yeah- If a user were to make multiple calls in the same area, the results after the first call may not be correct. I have a work around (similar to what you found) that should work- set init_poi with many categories and then only call each category once. The code below produces stable results across calls, so it should be fine as long as only one nearest_poi is used per poi category.

import pandana as pdna
from pandana.loaders import osm
import pandas as pd

network = osm.pdna_network_from_bbox(37.859, -122.282, 37.881, -122.252)
dist = 1000 
num_poi = 10
network.init_pois(num_poi, dist, 1)
def get_nearest_nodes(x, y, poi_name):
    coords_dict = [{'x' : x , 'y' : y, 'var' : 1}, {'x' : x , 'y' : y, 'var' : 1}]
    df = pd.DataFrame(coords_dict)
    network.set_pois(poi_name, df['x'], df['y'])
    res_name = "dist_map_loc"
    res = network.nearest_pois(dist, poi_name, num_pois=1, max_distance=99999)
    return res

##### TEST 2- this shows the potential bug #####
# First and Third coordinates are the same but give different result. 
# The second set of coordinates are within the buffer distance of the first/third. 
for x in range (1, num_poi + 1) :
    test1 = get_nearest_nodes(-122.262634 , 37.877165, 'poi_' + str(x))
    print len(test1[(test1[1] < 99999)])
    test2 = get_nearest_nodes(-122.254116, 37.869361, 'poi_' + str(x))
    print len(test2[(test2[1] < 99999)])
    # Same coords as the first call, but yields different results
    test3 = get_nearest_nodes(-122.262634 , 37.877165, 'poi_' + str(x))
    print len(test3[(test3[1] < 99999)])

@fscottfoti
Copy link
Contributor

OK I've looked into it a bit. Turns out @federicofernandez is actually looking at the same thing in #80. I read the underlying code wrong and set_pois actually adds pois rather than re-initializing the memory and starting with new pois. So test 2 adds a new poi and then test 3 adds another new poi (in the same location as the first) and so you get different results on successive calls. @federicofernandez was just looking into the initialization of these data structures so I'll continue this on issue #80 to see if it's possible to re-initialize a category from scratch. In short, for now, the solution you came up with (have lots of categories) is likely the only solution that will work.

Also, the fact that you needed two pois in the coords_dict list is a bug introduced in the latest release, but a considerably easier one to fix than the other so I'll get that out in the next couple of days.

This was referenced May 12, 2017
@fscottfoti
Copy link
Contributor

@stefancoe I think I have this all solved for you in #81 . Do you have the means to build from source, or do you use Windows (which I think is harder to build from source)? I'm hoping the former so you can test it out before we cut a build.

@stefancoe
Copy link
Author

@fscottfoti I am using windows and have a virtual env set up that should work. Should I pull the code and run setup.py?

@fscottfoti
Copy link
Contributor

I haven't a clue how to compile on Windows - does anyone else know if this works? @sablanchard @pksohn

@federicofernandez
Copy link
Contributor

I did it a couple of times. Those are the correct instructions, the only detail is to have the corresponding version of Visual C++ for Python, according to the python version that @stefancoe has. Let me know if I can help with that.

@stefancoe
Copy link
Author

I was at a conference all week, I'll try to test this by early next week.

@fscottfoti fscottfoti mentioned this issue May 24, 2017
18 tasks
@stefancoe
Copy link
Author

I cloned this branch https://github.com/UDST/pandana/tree/issue-73 but setup.py failed because it could not find vcvarsall.bat, which is the c compiler installed by Visual Studio. I tried all the usual fixes (e.g. https://stackoverflow.com/questions/2667069/cannot-find-vcvarsall-bat-when-running-a-python-script) that have worked in the past but no luck this time. Sorry I could not be of more help testing this on windows.

@fscottfoti
Copy link
Contributor

Fixed in #87, which will be released soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants