-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive number of Overpass API requests for addr:postcode=1533. FIX REQUIRED 🔧⚡️ #1
Comments
Hi mmd. I did look a bit into this when I stumbled upon it in late September, but thought it might have been just intermittent as there were silence since, and I ran out of time. Sorry if the AWSbot interferes with Overpass that is surely not intended! Do you still see hits of that magnitude? The AWSbot scripts doesn't seem like they are very well maintained (or mature). I haven't been involved directly but I have voiced in on different issues regarding the import's procedure, documentation and validation (1, 2) which was discussed on the talk-dk mailing list last year. I don't know the details of the history but according to a post in one of the threads mentioned Stephen Møller is now in charge of running the import to OSM in the context of the OSM user AWSbot, which was originally done by Peter Brodersen. As he mentions himself the user also got blocked by the DWG on two occasions that summer because of edits that were completely off the map. As I understood the procedure then from some other posts with further details (and what I saw when the scripts became available) it is a fully manual process that he triggers from a website when time permits ("Q: Hvor ofte sker det? A: Nå jeg har tid." = "Q: How often? A: When I got time."). Studying the non-output scripts a bit, however, reveals that they use argv (1,2) intermixed with HTML output which suggest that they do run on the command line. But no details about the context in which they execute was disclosed so it's hard to say for sure if they are being launched periodically by a cron job. I didn't pursue improving the setup any further back then as it felt already a bit like fighting windmills and the most serious problems seemed to be fixed. I have found contact info on Stephen and I'll ping him now and probably give him a call one of the following days so we can find out if his setup do trigger this excessive load on Overpass. Hopefully we can coordinate a quick remedy then, assuming he is the source. Regards, |
I have had an email conversation with Stephen who immediately on becoming aware of the problem this morning brought down Apache completely on the machine where he is running AWSbot on behalf of the OSM-DK community. He did this to mitigate any risk of others triggering the problem without his knowledge (I can't see that could be possible using the scripts here non Github, though). I've also posted a pledge in the Danish Talk-dk mailinglist for people to stop using the scripts for the time being. Have you seen a decrease in the load after these attempted remedies? Stephen notes that he hasn't performed any imports (it is not a very automated process) since summer which coincides with the activity of the AWSbot user. I have asked him if he could assure that no php processes are running the scripts from the command line on the machine (which seems to be the intended use). I reckon it would be due diligence to actually determine whether the requests did originate from Stephen's machine or if others are involved. How can we do that? Are you in a position to disclose the IP/IP's that misbehaved? |
Apart from AWSbot I have seen 2 users work with danish adresses, namely https://www.openstreetmap.org/user/J%C3%B8rn-osm who has done considerable work "cleaning up" adresses, removing duplicates and moving misplaced adresses back etc. and a newbie who used the AWSbot scripts for a oneshot job importing adresses for a new residential area (and expressed surprise over finding a "goto" in the scripts) |
Thanks for looking into this. Checking yesterday'a log files on overpass-api.de there's 13025 requests for 1533 postcode, all originating from a US-CA ipv6 address ending on "::2" (sorry can't post more details for obvious privacy reasons). Requests from this static address seem to have started some time around March/April 2017. Requests neither provide a referrer nor a user agent. I'll recheck figures in the next couple of days...
Right, it seems like the requests originate from someone else running those scripts. If there's no way to figure out more details, there's still the last resort of just blocking that ip address. |
@mmd-osm I've also had conversations about that IP in private and public with the SDFE agency running DAWA and I got a hunch from the PTR record, but are waiting for confirmation. But thanks for confirming that Overpass are still seeing requests, we'll need to dig a bit deeper then. Nice, however, that it is only a single IP that is responsible. At DAWA they got Cloudfront in front of them, so they seemed not that worried. Doesn't Overpass employ any kind of caching? @Hjart |
Solved by different server setup in the meantime, hence closing. |
Another productive Overpass server was added end of 2017, and the issue is no longer urgent in the meantime. I haven't really checked if the bot was also fixed. |
As far as I know "the bot" aka AWSbot was never fixed and was replaced by a different and much more succesful bot "AutoAWS" earlier this year. |
You must be thinking about the osm-dk community support when saying "replaced". The AWSbot code of this repository could just as well still be running somewhere and if that makes sense somehow that would be fine. However, I am of course concerned if valuable community ressources at Overpass (and also because I'm a Danish taxpayer at DAWA/DAR) are being wasted for nothing because of fixable bugs. I would like to help analyze and mitigate this situation if I am able to. I assumed that Stephen had killed his bad behaving AWSbot client after this conversation, and the conversation I had with him. Thus I was surprised when mmd mentioned some production change as reason to close this issue. If Overpass still receive excessive requests with the AWSbot postcode query from a single source IP my suggestion would be to block that IP. This shouldn't affect the current community endorsed import of Danish addresses to OSM which is now done using autoAWS from a different server (as far as I'm aware). Regarding the AWSbot code, and its missing "fixing". I haven't seen any indications of anybody else than me working on it either. I did some experimental changes in my fork (maybe I never got to pushing them, I don't recall atm.); adding user agent header, throttling infinite retry, handling empty datasets from upstream with a little sanity. However, it was obviously in need of a massive overhaul to ever get into a sensible state. Luckily JKHougaard (autoAWS author) stepped in and did a better job at getting things done than I. |
For some reason PHPscript appears to trigger around 15000-25000 identical request on each day for a particular postcode:
[out:json];node["osak:identifier"]["addr:postcode"=1533];out;
I guess this is not really intended, as this query does not return any data and all of the other postcode queries don't exhibit the same behavior.
Also, I'd suggest to add some wait time in case of error rather than continuously sending the same query again. A HTTP User-Agent would also be helpful to better identify the source of those queries.
Thanks!
The text was updated successfully, but these errors were encountered: