Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweepy urllib3 IncompleteReadError #1

Open
Kydlaw opened this issue Apr 7, 2020 · 5 comments
Open

Tweepy urllib3 IncompleteReadError #1

Kydlaw opened this issue Apr 7, 2020 · 5 comments

Comments

@Kydlaw
Copy link
Owner

Kydlaw commented Apr 7, 2020

Exception raised during streaming.

Futur exception example

@Kydlaw
Copy link
Owner Author

Kydlaw commented Apr 7, 2020

Main topic on Tweepy: tweepy/tweepy#237

"The error seems to be indicative of a full buffer disconnect."
"This for me was happening due to lot of processing the the on_status function leading to a delay from the API output.The best way to do this was to use threading to not clog the function for too long"

Possible solutions:

tweepy/tweepy#908

from http.client import IncompleteRead as http_incompleteRead
from urllib3.exceptions import IncompleteRead as urllib3_incompleteRead

def on_data(self, data):
    """
    This function overloads the on_data function in the tweepy package
    It is called when raw data is received from tweepy connection.
    :param data: a dict of the tweets information
    :return: True/False. Return False to stop stream and close connection.
    """
    if (time.time() - self.startTime) < self.timeLimit:
        try:
            tweet = json.loads(data)
            # Filter out retweets
            if tweet['text'][:2] != 'RT':
                self.printStreamingTweet(json.loads(data))
                if not debug_mode:
                    self.tweetFound.emit(json.loads(data))
                return [json.loads(data)]
        except BaseException as e:
            print("Error on_data: %s, Pausing..." % str(e))
            time.sleep(5)
            return True
        except http_incompleteRead as e:
            print("http.client Incomplete Read error: %s" % str(e))
            print("~~~ Restarting stream search in 5 seconds... ~~~")
            time.sleep(5)
            #restart stream - simple as return true just like previous exception?
            return True
        except urllib3_incompleteRead as e:
            print("urllib3 Incomplete Read error: %s" % str(e))
            print("~~~ Restarting stream search in 5 seconds... ~~~")
            time.sleep(5)
            return True
    else:
        if not debug_mode:
            self.tweetFound.emit(False)
        print('Timed out!')
        return False

Other proposition:

# the regular imports, as well as this:
from urllib3.exceptions import ProtocolError

###################
# loads of functions here
###################

auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())

while True:
    try:
        twitterStream.filter(track=[chip], async=True)

    except (ProtocolError, AttributeError):
        continue

@Kydlaw
Copy link
Owner Author

Kydlaw commented Apr 7, 2020

https://stackoverflow.com/questions/26638329/incompleteread-error-when-retrieving-twitter-data-using-python

Reasons mentioned:

  • "Your program isn't keeping up with the stream"
  • "The IncompleteRead could also be due to a temporary network issue"

Possible solutions:

  • Use Tweepy stall_warnings parameter:
    stream.filter(track=[t], stall_warnings=True)
    Then use theses messages to react to the failure.

  • Use a queue. Using https://github.com/nvie/rq/ for example.

@Kydlaw
Copy link
Owner Author

Kydlaw commented Apr 7, 2020

Source: https://stackoverflow.com/questions/28717249/error-while-fetching-tweets-with-tweepy

Reasons mentioned:

  • Remove the language parameter in .filter() (not my case).
  • Consumption fall behind.

Proposed solutions:

  • Reconnect after loosing the connection:
from http.client import IncompleteRead # Python 3
...
while True:
    try:
        # Connect/reconnect the stream
        stream = Stream(auth, listener)
        # DON'T run this approach async or you'll just create a ton of streams!
        stream.filter(terms)
    except IncompleteRead:
        # Oh well, reconnect and keep trucking
        continue
    except KeyboardInterrupt:
        # Or however you want to exit this loop
        stream.disconnect()
        break
...
  • Use a queue to consume tweets later: RabbitMQ, Kafka, Redis...

@Kydlaw
Copy link
Owner Author

Kydlaw commented Apr 7, 2020

https://stackoverflow.com/questions/53326879/twitter-streaming-api-urllib3-exceptions-protocolerror-connection-broken-i

Solutions tried:

  • Try/Except block with the http.client.IncompleteRead:
  • Setting Stall_Warning = to True:
  • Removing the english language filter.

Solution found:
Strip your "on_status/on_data/on_success" function to the bare essentials and handle any computations, i.e storing or entity identification, seperately after the streaming session has closed. Alternatively you could make your computations much more efficient and make the gap in time insubstantial, up to you.

@Kydlaw
Copy link
Owner Author

Kydlaw commented Apr 7, 2020

https://stackoverflow.com/questions/48034725/tweepy-connection-broken-incompleteread-best-way-to-handle-exception-or-can

Solution found:

  • Use a queue and a different thread for computation.
from Queue import Queue
from threading import Thread 

class My_Parser(tweepy.StreamListener):

    def __init__(self, q = Queue()):

        num_worker_threads = 4
        self.q = q
        for i in range(num_worker_threads):
             t = Thread(target=self.do_stuff)
             t.daemon = True
             t.start()

    def on_data(self, data):
        self.q.put(data)

    def do_stuff(self):
        while True:
            do_whatever(self.q.get())
            self.q.task_done()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant