-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scrape mode should not duplicate tweets #7
Comments
That's weird, it was designed to not behave that way, and I've seen it working properly in the past. Can you share your log file? |
@ruebot thanks for the replication of the bug ; I haven't had time to look into it yet, but will do shortly. |
Thanks! @edsu++ |
Yes, my query was `./twarc.py --scrape '@nichtich'. I just confirmed the bug at a new installation. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After running twarc for two days I analyzed the output and found that it downloads the same tweets over and over again. The script should hold a set of known tweet ids and only emit tweets that have not been written before.
The text was updated successfully, but these errors were encountered: