ConnectionError error no 10054 #284

rishabhjhaveri10 · 2017-08-27T07:57:20Z

Hi I am trying to get the city, state and country through the ip address.

My data frame has 500,000 rows and I need to apply it on each of them.

I am getting the the connection error after 200 records or so.

I even tried using time.sleep(5) and it still stops after 500 records or so.

Can you please provide alternates or solution to this.

Thank you.

ebreton · 2017-08-27T10:01:56Z

Hi @rishabhjhaver10 ,

Which provider are you using ? Which method ? (geocode method I guess?)
If google, which authentication ?

Cheers,
Manu

rishabhjhaveri10 · 2017-08-27T20:27:52Z

I'm using geocoder through python 2.7 on my local machine. I have about 500,000 ip addresses and I want to impute city, state and zip code for all of them.

…

On Aug 27, 2017 03:02, "Manu" ***@***.***> wrote: Hi @rishabhjhaver10 <https://github.com/rishabhjhaver10> , Which provider are you using ? (geocode method I guess?) If google, which authentication ? Cheers, Manu — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#284 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXddawHsNqDvqrFxsR4iS0J-tN_QXra3ks5scT6WgaJpZM4PDu0a> .

ebreton · 2017-08-27T20:30:07Z

ok, got it. Which provider are you using ? Google ?

Could you copy paste the command/script your are making ? Will be easier to help you :)

rishabhjhaveri10 · 2017-08-27T20:59:39Z

I have a dataset that has the following schema of 6 columns: userid,ipaddress,col3,col4,col5,col6 What I want is this: userid, (city,state,zip),col3, col4,col5, col6 The ip address has a junk character so its like this '::ffff:197.123.56.12' The code is below: from pyspark import SparkConf, SparkContext import geocoder conf = SparkConf().setMaster("local").setAppName("locationfromip") sc = SparkContext(conf = conf) def location(xy): str(xy) x = geocoder.ip(xy.split(':')[3]).json return x.get('city'), x.get('state'), x.get('postal') rdd = sc.textFile('C:/Users/risha/Downloads/mysampledata.csv') rdd1 = rdd.map(lambda x : x.split(',')) rdd2 = rdd1.map(lambda (a, b, c, d, e, f) : (a, location(b), c, d, e, f) rdd2.collect()

…

On Sun, Aug 27, 2017 at 1:30 PM, Manu ***@***.***> wrote: ok, got it. Which provider are you using ? Google ? Could you copy paste the command/script your are making ? Will be easier to help you :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#284 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXdda5vWYINmRmSmj7RZ4cCrf6qibHGkks5scdHQgaJpZM4PDu0a> .

rishabhjhaveri10 · 2017-08-27T21:05:45Z

The previous approach is using pyspark, this is an alternate approach using python 2.7: each item of my_list is a record(list): [['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6']........['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6']] import geocoder import csv import time my_file = open("C:/Users/risha/Downloads/mysampledata.csv", 'r') reader = csv.reader(my_file) my_list = list(reader) a = 0 for i in my_list: time.sleep(5) print a x = i[1] y = x.split(':') if len(y) == 4: #if there is ip address z = geocoder.ip(y[3]).json u = z.get('city') v = z.get('state') w = z.get('postal') i.append(u) i.append(v) i.append(w) elif len(y) == 1: #if ip address is null s = '' i.append(s) i.append(s) i.append(s) i.pop(1) a = a + 1 On Sun, Aug 27, 2017 at 1:59 PM, Rishabh Jhaveri <rishabhjhaveri10@gmail.com

…

wrote: I have a dataset that has the following schema of 6 columns: userid,ipaddress,col3,col4,col5,col6 What I want is this: userid, (city,state,zip),col3, col4,col5, col6 The ip address has a junk character so its like this '::ffff:197.123.56.12' The code is below: from pyspark import SparkConf, SparkContext import geocoder conf = SparkConf().setMaster("local").setAppName("locationfromip") sc = SparkContext(conf = conf) def location(xy): str(xy) x = geocoder.ip(xy.split(':')[3]).json return x.get('city'), x.get('state'), x.get('postal') rdd = sc.textFile('C:/Users/risha/Downloads/mysampledata.csv') rdd1 = rdd.map(lambda x : x.split(',')) rdd2 = rdd1.map(lambda (a, b, c, d, e, f) : (a, location(b), c, d, e, f) rdd2.collect() On Sun, Aug 27, 2017 at 1:30 PM, Manu ***@***.***> wrote: > ok, got it. Which provider are you using ? Google ? > > Could you copy paste the command/script your are making ? Will be easier > to help you :) > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#284 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AXdda5vWYINmRmSmj7RZ4cCrf6qibHGkks5scdHQgaJpZM4PDu0a> > . >

rishabhjhaveri10 · 2017-08-27T21:07:30Z

So I am calling the api each time in the for loop (500,000 iterations) and I even have sleep for 5 seconds but still not more than 400 iterations and it gives me the error On Sun, Aug 27, 2017 at 2:05 PM, Rishabh Jhaveri <rishabhjhaveri10@gmail.com

…

wrote: The previous approach is using pyspark, this is an alternate approach using python 2.7: each item of my_list is a record(list): [['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6'], ['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6']........['userid', 'ipaddress', 'c3', 'c4', 'c5', 'c6']] import geocoder import csv import time my_file = open("C:/Users/risha/Downloads/mysampledata.csv", 'r') reader = csv.reader(my_file) my_list = list(reader) a = 0 for i in my_list: time.sleep(5) print a x = i[1] y = x.split(':') if len(y) == 4: #if there is ip address z = geocoder.ip(y[3]).json u = z.get('city') v = z.get('state') w = z.get('postal') i.append(u) i.append(v) i.append(w) elif len(y) == 1: #if ip address is null s = '' i.append(s) i.append(s) i.append(s) i.pop(1) a = a + 1 On Sun, Aug 27, 2017 at 1:59 PM, Rishabh Jhaveri < ***@***.***> wrote: > I have a dataset that has the following schema of 6 columns: > > userid,ipaddress,col3,col4,col5,col6 > > What I want is this: > > userid, (city,state,zip),col3, col4,col5, col6 > > The ip address has a junk character so its like this > '::ffff:197.123.56.12' > The code is below: > > > from pyspark import SparkConf, SparkContext > import geocoder > > conf = SparkConf().setMaster("local").setAppName("locationfromip") > sc = SparkContext(conf = conf) > > def location(xy): > str(xy) > x = geocoder.ip(xy.split(':')[3]).json > return x.get('city'), x.get('state'), x.get('postal') > > rdd = sc.textFile('C:/Users/risha/Downloads/mysampledata.csv') > > rdd1 = rdd.map(lambda x : x.split(',')) > > rdd2 = rdd1.map(lambda (a, b, c, d, e, f) : (a, location(b), c, d, e, f) > > rdd2.collect() > > On Sun, Aug 27, 2017 at 1:30 PM, Manu ***@***.***> wrote: > >> ok, got it. Which provider are you using ? Google ? >> >> Could you copy paste the command/script your are making ? Will be easier >> to help you :) >> >> — >> You are receiving this because you were mentioned. >> Reply to this email directly, view it on GitHub >> <#284 (comment)>, >> or mute the thread >> <https://github.com/notifications/unsubscribe-auth/AXdda5vWYINmRmSmj7RZ4cCrf6qibHGkks5scdHQgaJpZM4PDu0a> >> . >> > >

ebreton · 2017-08-28T06:27:29Z

Hi @rishabhjhaver10 ,

Ok, your are using the provider ipinfo.io. They have a policy that limits the number of queries you can make per day:

Free usage of our API is limited to 1,000 API requests per day. If you exceed 1,000 requests in a 24 hour period we'll return a 429 HTTP status code to you. If you need to make more requests or custom data, see our paid plans, which all have soft limits.

The source is here, end of the page.
The link to the paid plans is there

Unfortunately, they do not support real bulk lookup: they recommend to do multiple queries (as you already do)

Hence there is nothing we can do right now, I will close the issue.

Keep in mind that If you decide to go for a paid plan, we will need to update geocoder in order to support the credentials. (You will be welcome to make a PR in this case ;) )

Hope that helps!
Manu

rishabhjhaveri10 · 2017-08-28T21:30:25Z

Thanks for your help Manu.

…

On Sun, Aug 27, 2017 at 11:27 PM, Manu ***@***.***> wrote: Closed #284 <#284>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#284 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXdda_kaX79r-Jl2c95Kc0BQwp58r6G1ks5scl3TgaJpZM4PDu0a> .

DenisCarriere · 2017-08-29T05:53:29Z

👍 Thanks Manu for the explanation.

@rishabhjhaver10 Most providers won't allow large volumes of requests and that's when you get into the paid plans, this geocoding library wouldn't allow 500K requests without being blocked or rate limited from the providers.

ebreton closed this as completed Aug 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConnectionError error no 10054 #284

ConnectionError error no 10054 #284

rishabhjhaveri10 commented Aug 27, 2017

ebreton commented Aug 27, 2017 •

edited

rishabhjhaveri10 commented Aug 27, 2017 via email

ebreton commented Aug 27, 2017

rishabhjhaveri10 commented Aug 27, 2017 via email

rishabhjhaveri10 commented Aug 27, 2017 via email

rishabhjhaveri10 commented Aug 27, 2017 via email

ebreton commented Aug 28, 2017

rishabhjhaveri10 commented Aug 28, 2017 via email

DenisCarriere commented Aug 29, 2017

ConnectionError error no 10054 #284

ConnectionError error no 10054 #284

Comments

rishabhjhaveri10 commented Aug 27, 2017

ebreton commented Aug 27, 2017 • edited

rishabhjhaveri10 commented Aug 27, 2017 via email

ebreton commented Aug 27, 2017

rishabhjhaveri10 commented Aug 27, 2017 via email

rishabhjhaveri10 commented Aug 27, 2017 via email

rishabhjhaveri10 commented Aug 27, 2017 via email

ebreton commented Aug 28, 2017

rishabhjhaveri10 commented Aug 28, 2017 via email

DenisCarriere commented Aug 29, 2017

ebreton commented Aug 27, 2017 •

edited