New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ConnectionError error no 10054 #284
Comments
Hi @rishabhjhaver10 , Which provider are you using ? Which method ? (geocode method I guess?) Cheers, |
I'm using geocoder through python 2.7 on my local machine. I have about
500,000 ip addresses and I want to impute city, state and zip code for all
of them.
…On Aug 27, 2017 03:02, "Manu" ***@***.***> wrote:
Hi @rishabhjhaver10 <https://github.com/rishabhjhaver10> ,
Which provider are you using ? (geocode method I guess?)
If google, which authentication ?
Cheers,
Manu
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#284 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXddawHsNqDvqrFxsR4iS0J-tN_QXra3ks5scT6WgaJpZM4PDu0a>
.
|
ok, got it. Which provider are you using ? Google ? Could you copy paste the command/script your are making ? Will be easier to help you :) |
I have a dataset that has the following schema of 6 columns:
userid,ipaddress,col3,col4,col5,col6
What I want is this:
userid, (city,state,zip),col3, col4,col5, col6
The ip address has a junk character so its like this '::ffff:197.123.56.12'
The code is below:
from pyspark import SparkConf, SparkContext
import geocoder
conf = SparkConf().setMaster("local").setAppName("locationfromip")
sc = SparkContext(conf = conf)
def location(xy):
str(xy)
x = geocoder.ip(xy.split(':')[3]).json
return x.get('city'), x.get('state'), x.get('postal')
rdd = sc.textFile('C:/Users/risha/Downloads/mysampledata.csv')
rdd1 = rdd.map(lambda x : x.split(','))
rdd2 = rdd1.map(lambda (a, b, c, d, e, f) : (a, location(b), c, d, e, f)
rdd2.collect()
…On Sun, Aug 27, 2017 at 1:30 PM, Manu ***@***.***> wrote:
ok, got it. Which provider are you using ? Google ?
Could you copy paste the command/script your are making ? Will be easier
to help you :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#284 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXdda5vWYINmRmSmj7RZ4cCrf6qibHGkks5scdHQgaJpZM4PDu0a>
.
|
The previous approach is using pyspark, this is an alternate approach using
python 2.7:
each item of my_list is a record(list):
[['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress',
'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress', 'c3', 'c4', 'c5',
'c6']........['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6']]
import geocoder
import csv
import time
my_file = open("C:/Users/risha/Downloads/mysampledata.csv", 'r')
reader = csv.reader(my_file)
my_list = list(reader)
a = 0
for i in my_list:
time.sleep(5)
print a
x = i[1]
y = x.split(':')
if len(y) == 4: #if there is ip address
z = geocoder.ip(y[3]).json
u = z.get('city')
v = z.get('state')
w = z.get('postal')
i.append(u)
i.append(v)
i.append(w)
elif len(y) == 1: #if ip address is null
s = ''
i.append(s)
i.append(s)
i.append(s)
i.pop(1)
a = a + 1
On Sun, Aug 27, 2017 at 1:59 PM, Rishabh Jhaveri <rishabhjhaveri10@gmail.com
… wrote:
I have a dataset that has the following schema of 6 columns:
userid,ipaddress,col3,col4,col5,col6
What I want is this:
userid, (city,state,zip),col3, col4,col5, col6
The ip address has a junk character so its like this '::ffff:197.123.56.12'
The code is below:
from pyspark import SparkConf, SparkContext
import geocoder
conf = SparkConf().setMaster("local").setAppName("locationfromip")
sc = SparkContext(conf = conf)
def location(xy):
str(xy)
x = geocoder.ip(xy.split(':')[3]).json
return x.get('city'), x.get('state'), x.get('postal')
rdd = sc.textFile('C:/Users/risha/Downloads/mysampledata.csv')
rdd1 = rdd.map(lambda x : x.split(','))
rdd2 = rdd1.map(lambda (a, b, c, d, e, f) : (a, location(b), c, d, e, f)
rdd2.collect()
On Sun, Aug 27, 2017 at 1:30 PM, Manu ***@***.***> wrote:
> ok, got it. Which provider are you using ? Google ?
>
> Could you copy paste the command/script your are making ? Will be easier
> to help you :)
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#284 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AXdda5vWYINmRmSmj7RZ4cCrf6qibHGkks5scdHQgaJpZM4PDu0a>
> .
>
|
So I am calling the api each time in the for loop (500,000 iterations) and
I even have sleep for 5 seconds but still not more than 400 iterations and
it gives me the error
On Sun, Aug 27, 2017 at 2:05 PM, Rishabh Jhaveri <rishabhjhaveri10@gmail.com
… wrote:
The previous approach is using pyspark, this is an alternate approach
using python 2.7:
each item of my_list is a record(list):
[['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress',
'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress', 'c3', 'c4', 'c5',
'c6']........['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6']]
import geocoder
import csv
import time
my_file = open("C:/Users/risha/Downloads/mysampledata.csv", 'r')
reader = csv.reader(my_file)
my_list = list(reader)
a = 0
for i in my_list:
time.sleep(5)
print a
x = i[1]
y = x.split(':')
if len(y) == 4: #if there is ip address
z = geocoder.ip(y[3]).json
u = z.get('city')
v = z.get('state')
w = z.get('postal')
i.append(u)
i.append(v)
i.append(w)
elif len(y) == 1: #if ip address is null
s = ''
i.append(s)
i.append(s)
i.append(s)
i.pop(1)
a = a + 1
On Sun, Aug 27, 2017 at 1:59 PM, Rishabh Jhaveri <
***@***.***> wrote:
> I have a dataset that has the following schema of 6 columns:
>
> userid,ipaddress,col3,col4,col5,col6
>
> What I want is this:
>
> userid, (city,state,zip),col3, col4,col5, col6
>
> The ip address has a junk character so its like this
> '::ffff:197.123.56.12'
> The code is below:
>
>
> from pyspark import SparkConf, SparkContext
> import geocoder
>
> conf = SparkConf().setMaster("local").setAppName("locationfromip")
> sc = SparkContext(conf = conf)
>
> def location(xy):
> str(xy)
> x = geocoder.ip(xy.split(':')[3]).json
> return x.get('city'), x.get('state'), x.get('postal')
>
> rdd = sc.textFile('C:/Users/risha/Downloads/mysampledata.csv')
>
> rdd1 = rdd.map(lambda x : x.split(','))
>
> rdd2 = rdd1.map(lambda (a, b, c, d, e, f) : (a, location(b), c, d, e, f)
>
> rdd2.collect()
>
> On Sun, Aug 27, 2017 at 1:30 PM, Manu ***@***.***> wrote:
>
>> ok, got it. Which provider are you using ? Google ?
>>
>> Could you copy paste the command/script your are making ? Will be easier
>> to help you :)
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#284 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AXdda5vWYINmRmSmj7RZ4cCrf6qibHGkks5scdHQgaJpZM4PDu0a>
>> .
>>
>
>
|
Hi @rishabhjhaver10 , Ok, your are using the provider ipinfo.io. They have a policy that limits the number of queries you can make per day:
The source is here, end of the page. Unfortunately, they do not support real bulk lookup: they recommend to do multiple queries (as you already do) Hence there is nothing we can do right now, I will close the issue. Keep in mind that If you decide to go for a paid plan, we will need to update geocoder in order to support the credentials. (You will be welcome to make a PR in this case ;) ) Hope that helps! |
Thanks for your help Manu.
…On Sun, Aug 27, 2017 at 11:27 PM, Manu ***@***.***> wrote:
Closed #284 <#284>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#284 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AXdda_kaX79r-Jl2c95Kc0BQwp58r6G1ks5scl3TgaJpZM4PDu0a>
.
|
👍 Thanks Manu for the explanation. @rishabhjhaver10 Most providers won't allow large volumes of requests and that's when you get into the paid plans, this geocoding library wouldn't allow 500K requests without being blocked or rate limited from the providers. |
Hi I am trying to get the city, state and country through the ip address.
My data frame has 500,000 rows and I need to apply it on each of them.
I am getting the the connection error after 200 records or so.
I even tried using time.sleep(5) and it still stops after 500 records or so.
Can you please provide alternates or solution to this.
Thank you.
The text was updated successfully, but these errors were encountered: