How would I scrape "random" public posts between two dates? #2180
Replies: 3 comments 2 replies
-
I have post the same question here |
Beta Was this translation helpful? Give feedback.
-
@josh-ashkinaze I think I've found a way. Having the need to find all the posts of an account within a range of dates, I found a way by analyzing how the search parameters are translated into the API by the web app.
It appears to be an undocumented trick, the official documentation makes no mention of it: |
Beta Was this translation helpful? Give feedback.
-
Disclaimer: this might be a terrible way of doing this and I'm sure there are limitations I haven't found yet related to how postSearch works (for example, it seems like automated posts by bots are excluded from search), but it works for my low stakes purposes. I wanted to download all of the posts for a given day in order to play with topic modeling over time-- I do data science and machine learning stuff mostly, but I haven't spent much time working with bulk text data. So I threw together a tool that breaks days into a set of postSearch since/until parameter time intervals, asynchronously fetches as many 1000 page blocks of posts as are in each interval by iteratively setting the until parameter value to the earliest time in the previous block of posts until it reaches the since value or the query returns no new posts, and then it does some cleanup and saves the results as json or inserts them into sqlite or postgres. I run it at most at half the rate limit because nothing I'm doing is time sensitive. I can clean it up and post it as a gist or something if anyone is interested. |
Beta Was this translation helpful? Give feedback.
-
For a project, I am interested in scraping random public posts between two dates. If I can't do random. then I am fine with searching via a proxy slike "the" or "of" or "a" etc.
Now, I know the
app.bsky.feed.searchPosts
endpoint can search posts by query---but is there a way to search for posts via two date ranges? Looks like that only takes in a query string as a parameter but not a date.Any help is appreciated!
Beta Was this translation helpful? Give feedback.
All reactions