Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search stopping because search_metadata.next_results missing #6

Closed
jjoubert opened this issue Jun 25, 2013 · 8 comments
Closed

Search stopping because search_metadata.next_results missing #6

jjoubert opened this issue Jun 25, 2013 · 8 comments

Comments

@jjoubert
Copy link

Thanks for this library. Working very well.

This is more of a question on the twitter api I guess, but maybe you've encountered this before.
Every now and again, I find that the search (iterating using searchTweetsIterable) stops because in the twitter response, the search_metadata.next_results item is completely missing. Do you know of a good reason why this is happening? I don't see anything about this in the API documentation. It is also not due to rate limitation.

If I manually run another search with my own max_id populated, I get another set of results, again with the search_metadata.next_results missing.

@ckoepp
Copy link
Owner

ckoepp commented Jun 26, 2013

Thanks a lot for submitting this issue!

From what you're saying it sounds like a bug in the library as errors within the Twitter API should result in non 200 HTTP status codes. All those statuses will cause an exception by default - this is especially the case when you reach your limitations.

The problem is caused in line 65 in TwitterSearch.py:

    if self.response['content']['search_metadata'].get('next_results'):
        self.nextresults = self.response['content']['search_metadata']['next_results']

If you can, please tell me what's stored in your instance of TwitterSearch.response['meta']. This is the non modified response from the Twitter API (including the received next_results parameter). Just print it in your exception handling block:

try:
    ts = TwitterSearch(...)
    ....
except Exception:
    print ts.response['meta']

@jjoubert
Copy link
Author

I actually don't agree that it is necessarily a bug. It is more a question on the twitter API I think.
I had a look, and in some cases the 'search_metadata' element in the response from twitter does not contain a 'next_results' element at all.
Obviously now that I'm trying to reproduce it and show you the output, it is always present!
Let me keep at it and see if I can reproduce it again.

@jjoubert
Copy link
Author

I tried again, and I couldn't reproduce it. Apologies for the false alarm.
Might have been a temporary problem I experienced on the twitter API.
I'm going to close this issue as I don't think there's anything to fix in the code for this.

@ckoepp
Copy link
Owner

ckoepp commented Jun 26, 2013

The next_results parameter is only available if there are more Tweets you can access by the API. I also wondered how you saw that there is no next_results returned, as the library automatically raises an StopIteration exception which causes the loop to terminate without any other exceptions.

However, if you encounter any other strange behavior just drop me message :)
TwitterSearch is not extensively tested yet so any feedback is welcome!

@jjoubert
Copy link
Author

I also wondered how you saw that there is no next_results returned, as the library automatically raises an StopIteration exception which causes the loop to terminate without any other exceptions.

The behaviour I got was that I only got back a little bit less than 'count' tweets in total every time. The loop terminated without any exception, but I expected more results. Every time I started the loop again, I only got one 'page' back (no exceptions). I added some debug print messages in the code to find out why it is not performing another query to get more results - that was when I discovered the missing 'next_results' element.

@jjoubert
Copy link
Author

I think I managed to re-produce this. Still think it is a problem with the twitter API.
I submitted a query with the iterator, supplying my own max_id to start off with. Here is the output of the 'search_metadata' element after every query:

{'count': 100, 'completed_in': 0.093, 'max_id_str': '349847318960422913', 'since_id_str': '0', 'next_results': '?max_id=349846991402057729&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349847318960422913&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349847318960422913}
{'count': 100, 'completed_in': 0.072, 'max_id_str': '349846991402057729', 'since_id_str': '0', 'next_results': '?max_id=349846659523559423&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349846991402057729&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846991402057729}
{'count': 100, 'completed_in': 0.061, 'max_id_str': '349846659523559423', 'since_id_str': '0', 'next_results': '?max_id=349846282237509631&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349846659523559423&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846659523559423}
{'count': 100, 'completed_in': 0.078, 'max_id_str': '349846282237509631', 'since_id_str': '0', 'refresh_url': '?since_id=349846282237509631&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846282237509631}

You'll notice that the last response does not include a 'next_results' element.
But, if I manually inspect the tweets returned in the last response, take the smallest tweet id and submit my own query with that new 'max_id', I start getting more results, and again this time with more 'next_results' elements:

{'count': 100, 'completed_in': 0.078, 'max_id_str': '349846282237509631', 'since_id_str': '0', 'refresh_url': '?since_id=349846282237509631&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846282237509631}
setting max id to: 349845983783428095 for query 1
{'count': 100, 'completed_in': 0.065, 'max_id_str': '349845983783428095', 'since_id_str': '0', 'next_results': '?max_id=349845679331483647&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349845983783428095&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349845983783428095}
setting max id to: 349845679331483647 for query 2
{'count': 100, 'completed_in': 0.071, 'max_id_str': '349845679331483647', 'since_id_str': '0', 'next_results': '?max_id=349845415916605439&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349845679331483647&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349845679331483647}
....

This shows that there are in fact more results.. so I'm not sure why twitter stopped sending the 'next_results' element in the first set.

@ckoepp
Copy link
Owner

ckoepp commented Jun 26, 2013

Oh that's quite interesting...

I just had a look what dev.twitter.com is saying about this. It seems that pages are not the best idea to search for tweets, which is perfectly logical considered the constant change within the pages of Tweets: https://dev.twitter.com/docs/working-with-timelines

I try to improve TwitterSearch for not using any pages but IDs.

@ckoepp ckoepp reopened this Jun 26, 2013
@jjoubert
Copy link
Author

I agree. I basically implemented my own max_id navigation while still using your 'searchTweets' function.
Very naive and surely potential for lot of improvement (this code includes throttling to ensure I don't get rate limited):

def idNavWithThrottle(ts, tso):
    #minimum delay between queries in order not to exceed rate limit
    min_delay_s = 15 * 60 / 180

    done = False
    tweet_counter  = 0
    query_counter = 0
    next_max_id = 0
    while (done == False):
        #throttle to not exceed rate limit
        time.sleep(min_delay_s)

        response = ts.searchTweets(tso)
        query_counter += 1
        done = len(response['content']['statuses']) == 0
        for tweet in response['content']['statuses']:
            tweet_counter += 1
            tweet_id = tweet['id']
            if (next_max_id == 0) or (tweet_id < next_max_id):
                next_max_id = tweet_id
            #print tweet

        next_max_id -= 1
        sys.stderr.write('setting max id to: %i for query %i\n' % (next_max_id, query_counter))
        tso.setMaxID(next_max_id)

    sys.stderr.write('*** Found %i tweets in %i queries' % (tweet_counter, query_counter))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants