Search stopping because search_metadata.next_results missing #6

jjoubert · 2013-06-25T20:09:03Z

Thanks for this library. Working very well.

This is more of a question on the twitter api I guess, but maybe you've encountered this before.
Every now and again, I find that the search (iterating using searchTweetsIterable) stops because in the twitter response, the search_metadata.next_results item is completely missing. Do you know of a good reason why this is happening? I don't see anything about this in the API documentation. It is also not due to rate limitation.

If I manually run another search with my own max_id populated, I get another set of results, again with the search_metadata.next_results missing.

ckoepp · 2013-06-26T10:34:45Z

Thanks a lot for submitting this issue!

From what you're saying it sounds like a bug in the library as errors within the Twitter API should result in non 200 HTTP status codes. All those statuses will cause an exception by default - this is especially the case when you reach your limitations.

The problem is caused in line 65 in TwitterSearch.py:

    if self.response['content']['search_metadata'].get('next_results'):
        self.nextresults = self.response['content']['search_metadata']['next_results']

If you can, please tell me what's stored in your instance of TwitterSearch.response['meta']. This is the non modified response from the Twitter API (including the received next_results parameter). Just print it in your exception handling block:

try:
    ts = TwitterSearch(...)
    ....
except Exception:
    print ts.response['meta']

jjoubert · 2013-06-26T10:41:28Z

I actually don't agree that it is necessarily a bug. It is more a question on the twitter API I think.
I had a look, and in some cases the 'search_metadata' element in the response from twitter does not contain a 'next_results' element at all.
Obviously now that I'm trying to reproduce it and show you the output, it is always present!
Let me keep at it and see if I can reproduce it again.

jjoubert · 2013-06-26T10:51:43Z

I tried again, and I couldn't reproduce it. Apologies for the false alarm.
Might have been a temporary problem I experienced on the twitter API.
I'm going to close this issue as I don't think there's anything to fix in the code for this.

ckoepp · 2013-06-26T11:02:53Z

The next_results parameter is only available if there are more Tweets you can access by the API. I also wondered how you saw that there is no next_results returned, as the library automatically raises an StopIteration exception which causes the loop to terminate without any other exceptions.

However, if you encounter any other strange behavior just drop me message :)
TwitterSearch is not extensively tested yet so any feedback is welcome!

jjoubert · 2013-06-26T11:09:45Z

I also wondered how you saw that there is no next_results returned, as the library automatically raises an StopIteration exception which causes the loop to terminate without any other exceptions.

The behaviour I got was that I only got back a little bit less than 'count' tweets in total every time. The loop terminated without any exception, but I expected more results. Every time I started the loop again, I only got one 'page' back (no exceptions). I added some debug print messages in the code to find out why it is not performing another query to get more results - that was when I discovered the missing 'next_results' element.

jjoubert · 2013-06-26T11:26:06Z

I think I managed to re-produce this. Still think it is a problem with the twitter API.
I submitted a query with the iterator, supplying my own max_id to start off with. Here is the output of the 'search_metadata' element after every query:

{'count': 100, 'completed_in': 0.093, 'max_id_str': '349847318960422913', 'since_id_str': '0', 'next_results': '?max_id=349846991402057729&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349847318960422913&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349847318960422913}
{'count': 100, 'completed_in': 0.072, 'max_id_str': '349846991402057729', 'since_id_str': '0', 'next_results': '?max_id=349846659523559423&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349846991402057729&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846991402057729}
{'count': 100, 'completed_in': 0.061, 'max_id_str': '349846659523559423', 'since_id_str': '0', 'next_results': '?max_id=349846282237509631&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349846659523559423&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846659523559423}
{'count': 100, 'completed_in': 0.078, 'max_id_str': '349846282237509631', 'since_id_str': '0', 'refresh_url': '?since_id=349846282237509631&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846282237509631}

You'll notice that the last response does not include a 'next_results' element.
But, if I manually inspect the tweets returned in the last response, take the smallest tweet id and submit my own query with that new 'max_id', I start getting more results, and again this time with more 'next_results' elements:

{'count': 100, 'completed_in': 0.078, 'max_id_str': '349846282237509631', 'since_id_str': '0', 'refresh_url': '?since_id=349846282237509631&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349846282237509631}
setting max id to: 349845983783428095 for query 1
{'count': 100, 'completed_in': 0.065, 'max_id_str': '349845983783428095', 'since_id_str': '0', 'next_results': '?max_id=349845679331483647&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349845983783428095&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349845983783428095}
setting max id to: 349845679331483647 for query 2
{'count': 100, 'completed_in': 0.071, 'max_id_str': '349845679331483647', 'since_id_str': '0', 'next_results': '?max_id=349845415916605439&q=test&lang=en&count=100&include_entities=1', 'refresh_url': '?since_id=349845679331483647&q=test&lang=en&include_entities=1', 'since_id': 0, 'query': 'test', 'max_id': 349845679331483647}
....

This shows that there are in fact more results.. so I'm not sure why twitter stopped sending the 'next_results' element in the first set.

ckoepp · 2013-06-26T11:40:56Z

Oh that's quite interesting...

I just had a look what dev.twitter.com is saying about this. It seems that pages are not the best idea to search for tweets, which is perfectly logical considered the constant change within the pages of Tweets: https://dev.twitter.com/docs/working-with-timelines

I try to improve TwitterSearch for not using any pages but IDs.

jjoubert · 2013-06-26T11:48:51Z

I agree. I basically implemented my own max_id navigation while still using your 'searchTweets' function.
Very naive and surely potential for lot of improvement (this code includes throttling to ensure I don't get rate limited):

def idNavWithThrottle(ts, tso):
    #minimum delay between queries in order not to exceed rate limit
    min_delay_s = 15 * 60 / 180

    done = False
    tweet_counter  = 0
    query_counter = 0
    next_max_id = 0
    while (done == False):
        #throttle to not exceed rate limit
        time.sleep(min_delay_s)

        response = ts.searchTweets(tso)
        query_counter += 1
        done = len(response['content']['statuses']) == 0
        for tweet in response['content']['statuses']:
            tweet_counter += 1
            tweet_id = tweet['id']
            if (next_max_id == 0) or (tweet_id < next_max_id):
                next_max_id = tweet_id
            #print tweet

        next_max_id -= 1
        sys.stderr.write('setting max id to: %i for query %i\n' % (next_max_id, query_counter))
        tso.setMaxID(next_max_id)

    sys.stderr.write('*** Found %i tweets in %i queries' % (tweet_counter, query_counter))

jjoubert closed this as completed Jun 26, 2013

ckoepp reopened this Jun 26, 2013

ckoepp added a commit that referenced this issue Jun 26, 2013

Internal variables + max_id feature (see #6)

00a5d9f

ckoepp closed this as completed Jun 26, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search stopping because search_metadata.next_results missing #6

Search stopping because search_metadata.next_results missing #6

jjoubert commented Jun 25, 2013

ckoepp commented Jun 26, 2013

jjoubert commented Jun 26, 2013

jjoubert commented Jun 26, 2013

ckoepp commented Jun 26, 2013

jjoubert commented Jun 26, 2013

jjoubert commented Jun 26, 2013

ckoepp commented Jun 26, 2013

jjoubert commented Jun 26, 2013

Search stopping because search_metadata.next_results missing #6

Search stopping because search_metadata.next_results missing #6

Comments

jjoubert commented Jun 25, 2013

ckoepp commented Jun 26, 2013

jjoubert commented Jun 26, 2013

jjoubert commented Jun 26, 2013

ckoepp commented Jun 26, 2013

jjoubert commented Jun 26, 2013

jjoubert commented Jun 26, 2013

ckoepp commented Jun 26, 2013

jjoubert commented Jun 26, 2013