Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google App Engine / Pandas Request Failing: #19

Closed
mburke05 opened this issue May 16, 2016 · 7 comments
Closed

Google App Engine / Pandas Request Failing: #19

mburke05 opened this issue May 16, 2016 · 7 comments

Comments

@mburke05
Copy link

Hi guys,

I'm on Windows so forgive me in advance if that's the cause of any problems I'm also a beginner when it comes to async io and so I might just be misunderstanding something.

I'm making a request to the google app-engine (google analytics) using the pandas ga module, which uses OAuth to communicate with the analytics portion of the app engine.

Here's the code I had written:

import pandas.io.ga as ga
import pandas as pd
from deco import concurrent, synchronized
import time

@concurrent
def d_fetch(date, hour):
        t0 = time.time()
        data[str(date)+'h'+str(hour)] = [
            ga.read_ga(
            account_id  = "xxx",
            profile_id  = "xxx",
            property_id = "UA-xxx",
            metrics     = ['sessions','hits', 'bounces'],
            dimensions  = ['date', 'hour', 'minute', 'medium', 'keyword'],
            start_date  = date,
            end_date    = date,
            index_col = 0,
            filters = "hour==" + '{0:02d}'.format(hour))]
        t1 = time.time()
        data[str(date)+'h'+str(hour)].append(round(t1-t0,2))
        print str(date)+str(hour)+": completed in "+str(round(t1-t0,2))+" secs."

@synchronized
def run(data, dates):
    for date in dates:
        for hour in xrange(24):
            d_fetch(date, hour)

if __name__ == "__main__":
    somemute = {}
    date_range = pd.date_range(start='5/8/2016', end='5/8/2016', freq='D')

    t0 = time.time()
    run(somemute, date_range)
    t1 = time.time()
    print "TOOK", round(t1-t0,2)

And the error that was being raised:

image

Thanks!
Matt

@mburke05
Copy link
Author

I've posted a stack q here

@alex-sherman
Copy link
Owner

Yeah stack exchange might have more answers for you, the problem may just be that something doesn't work in multiple processes. Have you tried this code without the decorators? If it doesn't work even without the decorators you should debug that first and then try adding them back in.

If it continues to not work, we're hoping to add support for Python threads. Running this in separate threads rather than processes may fix your issue. Hopefully we'll have this done by the end of the week, so hold tight.

Thanks for bring this up, and let us know if stack exchange has any insight.

@mburke05
Copy link
Author

Thanks Alex.

Yeah it runs fine without decorators! I remember trying to do something similar with Pooling before and ran into a similar error with the locking of a file in file_cache.py so it's probably just native to app-engine but figured I'd ask and see if you guys had any insight.

Would be cool to have threading! But not sure if that would help as it needs to be async io for any sort of speed benefit on something like the above right?

@alex-sherman
Copy link
Owner

It doesn't need to be explicit Python 3 async io, I think a lot of IO operations will release the GIL on their own. HTTP requests will probably fall into this category. So with threading you may not be able to execute any computations in parallel, but at least your network requests can all happen at the same time and you'll see some speedup there.

Ideally when we support these different concurrent backends, you'll be able to have these network operations happen in threads, and maybe some computation on the results afterwards happen in processes.

@alex-sherman
Copy link
Owner

The latest commit contains a change that allows you to use ThreadPool from multiprocessing. If you have some time, would you mind trying that out? You can use the following little snippet

from multiprocessing.pool import ThreadPool

threaded = concurrent.custom(ThreadPool)

Then just replace your use of @concurrent with @threaded.

@mburke05
Copy link
Author

I'll check it out some time this week when I have time, thanks Alex!

@alex-sherman
Copy link
Owner

I think I'm going to close this for now, even if using threads doesn't fix it I'm not sure there's much that I could do in deco to get things working. I did make it a little simpler to use threads though, you can use @concurrent.threaded instead of @concurrent now, if you do have a chance to try it.

Do let me know if anything changes though, and thanks again for bringing this up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants