Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for --until #81

Closed
ppival opened this issue Sep 24, 2020 · 11 comments
Closed

Support for --until #81

ppival opened this issue Sep 24, 2020 · 11 comments
Labels
enhancement New feature or request interface wontfix This will not be worked on

Comments

@ppival
Copy link

ppival commented Sep 24, 2020

Another tool I have used, GetOldTweets3, supports not only the --since DATE flag, but also --until DATE, allowing a twitter scrape to focus between two dates. Syntax simply looks like --since 2015-05-24 --until 2015-08-23

Would it be possible to enable this for snscrape?

@JustAnotherArchivist
Copy link
Owner

It could be implemented, but it wouldn't really gain anything. --since is scraper-independent; the way it works is that it exits when reaching a result before the specified date, preventing paginating through potentially many older results and shortening the scrape. If I were to add --until, it would merely suppress the output of items newer than some date, but in the background, it would still have to fetch all results newer than that as well. So you don't gain any time or resources.

However, for Twitter specifically, you can use Twitter's proper since and until search filters. For example, snscrape twitter-search 'foobar since:2019-01-01 until:2020-01-01' would return tweets containing 'foobar' from the year 2019. since:YYYY-MM-DD produces tweets that were posted on that date or later; until:YYYY-MM-DD produces tweets that were posted before (but not on!) that date. (This used to be documented on the Twitter website, but they removed that since. You can find some more here by clicking on 'operators' below the search field. That list was also incomplete though.)

@Tachevaz
Copy link

Tachevaz commented Sep 25, 2020

Thank you for this great tool, it's a savior!

About the "since" argument - is there a way to implement it in a script as well, as an argument to snscrape.modules.twitter.TwitterUserScraper()? Or should it be added as a condition after snscrape.modules.twitter.TwitterUserScraper() has finished running?

@JustAnotherArchivist
Copy link
Owner

There are two ways to do that (without wasting time scraping things you're not actually interested in):

  1. You can check the date of the Tweet objects returned by the scraper and break the loop when it's older than the time you're interested in. This is exactly what --since in the CLI does.
  2. The user scraper is actually just a fancy wrapper around the much more general search scraper. So you could use that instead and rely on Twitter's date filters: snscrape.modules.twitter.TwitterSearchScraper('from:username since:2020-01-01')

@Tachevaz
Copy link

It worked, thank you so much!:))

@JustAnotherArchivist
Copy link
Owner

Closing this; see my first comment for the rationale.

@JustAnotherArchivist JustAnotherArchivist added the wontfix This will not be worked on label Sep 28, 2020
@ppival
Copy link
Author

ppival commented Sep 29, 2020

Any reason this works for twitter-search, but not for twitter-user? The following just appears to sit there forever:
snscrape twitter-user "barackobama since:2015-09-10 until:2015-09-12” > baracktweets.txt

TIA! You, @JustAnotherArchivist, are just about the most patient and responsive maintainer I've ever come across! :-)

It could be implemented, but it wouldn't really gain anything. --since is scraper-independent; the way it works is that it exits when reaching a result before the specified date, preventing paginating through potentially many older results and shortening the scrape. If I were to add --until, it would merely suppress the output of items newer than some date, but in the background, it would still have to fetch all results newer than that as well. So you don't gain any time or resources.

However, for Twitter specifically, you can use Twitter's proper since and until search filters. For example, snscrape twitter-search 'foobar since:2019-01-01 until:2020-01-01' would return tweets containing 'foobar' from the year 2019. since:YYYY-MM-DD produces tweets that were posted on that date or later; until:YYYY-MM-DD produces tweets that were posted before (but not on!) that date. (This used to be documented on the Twitter website, but they removed that since. You can find some more here by clicking on 'operators' below the search field. That list was also incomplete though.)

@JustAnotherArchivist
Copy link
Owner

twitter-user is actually just a wrapper around twitter-search using the search term from:username (plus code to extract user information from the profile page). So: snscrape twitter-search 'from:barackobama since:2015-09-10 until:2015-09-12'

Happy to help! I'm glad this work is finally getting some use outside of the immediate use case I wrote it for. :-)

@Uzair1947
Copy link

can we give time to until and since

@JustAnotherArchivist
Copy link
Owner

@Uzair1947 #259

@ihabpalamino

This comment was marked as off-topic.

@JustAnotherArchivist

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request interface wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants