Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to close network in memory #52

Closed
pksohn opened this issue Jun 1, 2016 · 7 comments
Closed

Unable to close network in memory #52

pksohn opened this issue Jun 1, 2016 · 7 comments

Comments

@pksohn
Copy link
Contributor

pksohn commented Jun 1, 2016

There doesn't seem to be a good way to "close" a pandana network instance in memory. I would like to get the following steps working:

  • Load a pandana network from its precomputed saved state as an HDF5 file (i.e. using from_hdf5)
  • Do some things with it (join data to it, run aggregation query, return data)
  • Close the network so I can repeat these steps

Since I can't seem to "close" the pandana network that is loaded in memory (the last step above), I get the following message: AssertionError: Adding more networks than have been reserved. I am aware that we can configure pandana so that it takes more than one network in memory, but I'd like to avoid that solution.

@jiffyclub
Copy link
Member

This would be really nice. Unfortunately this limitation is pretty deep in the contraction hierarchies code that underlies Pandana.

@waddell
Copy link
Member

waddell commented Jun 1, 2016

That's what I thought might be the case. We'll have to get Juan or Fede to look into this.

@fscottfoti
Copy link
Contributor

That's a good question. There is no quick solution unfortunately. It's not really designed to be run in that mode, and it's a limitation I've known about for a while. Technically it's a limitation in the nearest neighbor part of Pandana (the ANN library I did not write), but because it was there I set Pandana up similarly. Basically it expects you to load your networks and then run a bunch of queries against them rather than reprocessing networks over and over. Now that there is nearest neighbor in scipy (this happened about 18 months ago I think) this limitation could be removed (and use the NN in scipy instead). It would take a bit of coding down in the C internals to remove the limitation in Pandana, but it's possible.

The global memory in question is here: https://github.com/UDST/pandana/blob/master/src/pyaccesswrap.cpp#L8. There's just one pointer that needs to be moved inside the object that's gets passed to Python. (And then the nearest neighbor library needs to be removed and scipy used instead.)

You have a couple of options for workarounds. The most likely to work is to start a new process every time you need a new network. This would be pretty easy to write as a script for instance - you could run a script which does exactly what you say, and because it's a totally different process each time, the networks can be initialized every time you run the script. The reason not to do this is because it's expensive to run a script for every request, but reading the file from HDF5 is going to be slow anyway so I think this will be a non-issue given your use case. The only tricky part will be communicating from your code to a separate process/script - you could imagine that script reading in a file and writing out a file as you describe, or just communicating using stdin and stdout - basically I would look as os.popen and similar functions for that. But in general the short-term solution I think is to start a new process for each request.

The long term solution would be to remove the few cases of global memory, which is not a huge deal but would take some time to do it right.

@federicofernandez
Copy link
Contributor

I have two comments / questions about this issue.

From one side, I have been analyzing the need of libANN in the code. From what I see, it is not directly used in the C++ CH code, it's just used for generating the POI categories and can be just extracted and implemented in the Python side. Following that reasoning, it shouldn't be too difficult to replace the KDtree with the scipy version.

From the other side, I haven't found yet the limitation for closing an ANN kdtree. I mean, the annClose() method cannot be called because it frees up global memory, so the fact that this line [1] is commented out is a good idea. However, there are functions to delete the memory allocated by annAllocPts and others. Perhaps I'm missing something because I've only seen the code for a couple of hours.

[1] https://github.com/UDST/pandana/blob/master/src/nearestneighbor.cpp#L21

@fscottfoti
Copy link
Contributor

I think this is all correct.

I think there might be a way to free up parts of the kdtree and reallocate, but I don't think I ever got that far.

I think replacing libANN and using scipy (absolutely correct that libANN and accessibility are separable) is a very promising path. You might want to do a benchmark first though because I remember libANN being a bit slower (though maybe not enough to matter).

@federicofernandez
Copy link
Contributor

Thanks Fletcher! I'll do the benchmarking and then decide what to do. However, I suspect that going with scipy might be the best idea.

@fscottfoti fscottfoti mentioned this issue May 24, 2017
18 tasks
@fscottfoti
Copy link
Contributor

Fixed in #87

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants