Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Province name is ignored while doing IP-base geolocation #492

Closed
vzaliva opened this issue Jun 28, 2020 · 15 comments · Fixed by #542
Closed

BUG: Province name is ignored while doing IP-base geolocation #492

vzaliva opened this issue Jun 28, 2020 · 15 comments · Fixed by #542

Comments

@vzaliva
Copy link

vzaliva commented Jun 28, 2020

Please see attached screenshot of wttr and weather.com forecast.
There is no sign or rain here and none is likely at this time of year!

screenshot_20200628_131559
screenshot_20200628_131619

@chubin
Copy link
Owner

chubin commented Jun 28, 2020

wttr.in show the report for
Saratoga County, New York, United States of America,
not Saratoga,CA (by default).

please try saratoga,ca,us instead

@vzaliva
Copy link
Author

vzaliva commented Jun 29, 2020 via email

@mathieu-aubin
Copy link

shit happens i guess, now you know you should specify for your current situation (mobile carrier? ISP with way too broad/many ranges of ips and not using selective attribution?)..

Clearly, it's not your fault but it serves as a reminder that geolocation is only as precise as... well... as precise as insert something that is not very precise here.

Per example... my ISP attributes me ip addresses that are, for one great thing, part of a very limited range every time my router/modem reconnects. Great for ssh/firewalling stuff i think. As far as geolocation... well, i'm actually about 500km from where my ips 'discover' or locate me. Not so great for weather precision....

I am an owl with relish (the green type) stuck between the left middle sharpest claws, no spark plugs. Yup.

@chubin
Copy link
Owner

chubin commented Jun 29, 2020

I think that it is a bug indeed, because the IP-based geolocation correctly detected location of the IP address,
as Saratoga, United States, but it dropped the province, and so it became Saratoga of New York.
It is a bug

@chubin chubin added the bug label Jun 29, 2020
@chubin chubin changed the title wrong temperatures BUG: Province name is ignored while doing IP-base geolocation Jun 29, 2020
@elliotrushton
Copy link

I'll just add another data point to this in case it's helpful (I'm 99% sure this is the same bug):

When I do a plain curl to wttr.in it correctly geo locates me down to my location:

$ curl wttr.in
Weather report: Bowen Island, Canada

     \   /     Sunny
      .-.      30..31 °C
   ― (   ) ―   → 11 km/h
      `-’      16 km
     /   \     0.0 mm

However, I noticed that the temperate was way off (I wish it was 30°C). If I specify my location:
$ curl wttr.in/Bowen+Island I get a very similar weather report but the Location line at the end tells I think the story:

$ curl wttr.in/Bowen+Island
Weather report: Bowen+Island

    \  /       Partly cloudy
  _ /"".-.     29..33 °C
    \_(   ).   ↓ 0 km/h
    /(___(__)  14 km
               0.2 mm
[...snip...]
Location: Bowen Island, Charleston County, South Carolina, 29439, United States of America [32.6776793,-79.9598125]

So it looks like the geo-ip look up is correctly getting my location (I am on Bowen Island, in Canada) but when using it to look up the weather it's only using the city name.

@chubin
Copy link
Owner

chubin commented Oct 18, 2020

@elliotrushton Yes, it is the very same bug, and #536 (found now by @vzaliva ) is also another reincarnation of the bug. Some part of information (at least province/state name) is getting lost between IP -> LOCATION and LOCATION -> GPS conversion

@gregdan3
Copy link
Contributor

Was about to make a new issue when I spotted this. Same occurs for my IP in Helena,Alabama; I get results for Helena,Montana unless I explicitly pass the location.

@gregdan3
Copy link
Contributor

gregdan3 commented Nov 1, 2020

Breakdown

https://github.com/chubin/wttr.in/blob/master/lib/location.py#L177

I believe we can see the cause of the problem here, where all of your ip based locating methods exclusively return city and country, without any region information.

I had a look at all three of these, and I can see the fix for at least two of them.

geoip

geoip in location.py offers a region name under response.subdivisions.name, seen here

It also offers reponse.location.latitude and response.location.longitude.

ipinfo

ipinfo in location.py

ipinfo offers a region key seen here in their docs, which looks to be what we want.

It also offers a comma separated latlong in its loc key, if that would be useful.

ip2location

ip2location in location.py

Came back to this one cause I wasn't satisfied with it, and found that you're getting data with the package parameter.

https://api.ip2location.com/?ip=YOUR_IP&key=demo&package=WS3

This endpoints returns data like so:
countrycode;country;region;city
And currently you coalesce that to:
(city, country)

In fact, I had another look back at ip2location, and it offers a package WS5:
https://api.ip2location.com/?ip=YOUR_IP&key=demo&package=WS5
countrycode;country;region;city;latitude;longitude

In other words, all three of your ip to location services can retrieve latlong. I may start over with my pull request; there's a few minor issues with it that I've now noticed, but I'd love to contribute to this project, as I think it's fantastic.

Other notes

I think you're performing this invokation two times if the input is an IP address; there's a difference between them if the location is some actual location, but if it's an IP, I think you would perform it twice.

Another thing I noted is that you are using the IP address to fetch hemisphere information instead of the location; everyone sees the same phase at the same time, so you won't pass an incorrect name, but the /moon endpoint would show an incorrect side of the moon being lit for somebody in the northern hemisphere querying the moon phase in the southern hemisphere. Except, reading things over, I think fetching the moon phase on /moon makes it impossible to fetch a specific location, so... Fair enough!

@chubin
Copy link
Owner

chubin commented Nov 9, 2020

@gregdan3 Gregory thank you a lot for such a detailed analysis, and for the subsequent pull-request. Actually, lib/location.py is the messiest place in the code here, and it has been deserving refactoring for a long time. I think we will do it after we merge your pull-request.

Regarding cache invalidation: we already have more than 300k entries in the cache (I think that the name "cache" is a little bit misleading, it should be better call local database), and I would like to keep it, because it handle quite big part of the incoming queries. Some of the entries are, maybe, outdated, but not so much.

So, what I would suggest:

  • when writing, write all components of the ip address resolution into the cache (";".join(parts));
  • when reading, take only the three first of them;
  • the cache entries that have two parts, consider invalid (I will go through the cache and remove such entries).

The rest is looking fine.

Thank you once again fro the great contribution!

@chubin
Copy link
Owner

chubin commented Nov 9, 2020

Cache file format suggestion:

COUNTRY_CODE;COUNTRY;REGION;CITY[;REST]
  • Cache entry has 4 (or more) ; separated mandatory entries.
  • The first one, can be empty or contain the country code, and ignored in code;
    *The second, third, and fourth one are joined with , and used for conversion into GPS coordinates.

@gregdan3
Copy link
Contributor

@gregdan3 Gregory thank you a lot for such a detailed analysis, and for the subsequent pull-request. Actually, lib/location.py is the messiest place in the code here, and it has been deserving refactoring for a long time. I think we will do it after we merge your pull-request.

Regarding cache invalidation: we already have more than 300k entries in the cache (I think that the name "cache" is a little bit misleading, it should be better call local database), and I would like to keep it, because it handle quite big part of the incoming queries. Some of the entries are, maybe, outdated, but not so much.

So, what I would suggest:

* when writing, write all components of the ip address resolution into the cache (`";".join(parts)`);

* when reading, take only the three first of them;

* the cache entries that have two parts, consider invalid (I will go through the cache and remove such entries).

The rest is looking fine.

Thank you once again fro the great contribution!

Cache file format suggestion:

COUNTRY_CODE;COUNTRY;REGION;CITY[;REST]
* Cache entry has 4 (or more) `;` separated mandatory entries.

* The first one, can be empty or contain the country code, and ignored in code;
  *The second, third, and fourth one are joined with `,` and used for conversion into GPS coordinates.

Thanks for the advice; I'll make those changes this coming Sunday at the latest, and throw the changes back here.

Can I see some examples of the production cache files as well?

@chubin
Copy link
Owner

chubin commented Nov 11, 2020

Sure:

$ cat /wttr.in/cache/ip2l/100.24.119.233 ; echo
US;United States;Virginia;Ashburn;39.04372;-77.48749;20146;Amazon Data Services NoVa;amazon.com
$ cat /wttr.in/cache/ip2l/100.24.125.228; echo
US;United States;Virginia;Ashburn;39.04372;-77.48749;20146;Amazon Data Services NoVa;amazon.com
$ cat /wttr.in/cache/ip2l/100.12.67.210; echo
US;United States;New York;New York City;40.71427;-74.00597;10116;Verizon Communications Inc.;verizon.com

@gregdan3
Copy link
Contributor

Sure:

$ cat /wttr.in/cache/ip2l/100.24.119.233 ; echo
US;United States;Virginia;Ashburn;39.04372;-77.48749;20146;Amazon Data Services NoVa;amazon.com
$ cat /wttr.in/cache/ip2l/100.24.125.228; echo
US;United States;Virginia;Ashburn;39.04372;-77.48749;20146;Amazon Data Services NoVa;amazon.com
$ cat /wttr.in/cache/ip2l/100.12.67.210; echo
US;United States;New York;New York City;40.71427;-74.00597;10116;Verizon Communications Inc.;verizon.com

Would you prefer I emulate this structure? All of the existing IP locationing methods are able to return latlong (the others give it already, but we have to switch ip2location from WS3 to WS5), so I can also add that onto the cache.

@chubin
Copy link
Owner

chubin commented Nov 15, 2020

Yes, it is a good idea

@gregdan3
Copy link
Contributor

My pull request is now updated with the requested changes. It should no longer invalidate the cache, but it will not (and cannot) provide exactly the same data as you've shown; it now caches from all three sources, and since not all three sources provide the content after the latlong, I can't be sure how to source that data except for trying a different iplocation method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants