Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding query parameters to search terms imported via .txt? #688

Closed
lutemis opened this issue Feb 8, 2023 · 1 comment
Closed

Adding query parameters to search terms imported via .txt? #688

lutemis opened this issue Feb 8, 2023 · 1 comment

Comments

@lutemis
Copy link

lutemis commented Feb 8, 2023

Hi,

I have no coding experience and began using Twarc to collect Twitter data today. I'm looking to add the parameters -is:reweet lang:en to my query, but I'm not sure how to incorporate this along with my search terms, which I'm importing via .txt. As it stands. my full query reads: twarc2 search searchtermstest2.txt --limit 10000 --start-time 2020-03-16 --end-time 2021-01-08 --archive searchresults1.jsonl while searchterms2.txt contains:

example OR examples
#idea1 OR #idea2 OR #idea3 OR #idea4
-is:retweet lang:en

The aim is to collate tweets containing mentions of either "example" or "examples" in addition to one or more of the hashtags written. I'm unsure whether '-is:retweet lang:en' should be contained within the .txt file or written in the query.

Apologies for bringing such a question here; the Twitter Dev Forums are unavailable right now and I wasn't sure who else to consult.

@igorbrigadir
Copy link
Contributor

Sure thing - the twarc2 search command only takes a single query, not a text file as input. for that you need searches:

twarc2 searches --limit 10000 --start-time 2020-03-16 --end-time 2021-01-08 --archive input.txt output.jsonl

The input text file input.txt is 1 separate search query per line, but they will all be written to the same output file, output.jsonl in this case. You can combine them with OR with the flag --combine-queries, sometimes this speeds things up, but usually it's not necessary.

I will note that the progress bar becomes less responsive in this way, so it may look like twarc isn't working but it is. This is still an open issue #561 same thing there applies to searches command, and all bulk commands.

To get your search in the text file to work, put it all on one line, so:

(example OR examples OR #idea1 OR #idea2 OR #idea3 OR #idea4) -is:retweet lang:en

And add parentheses around OR clauses.

If those were meant to be separate queries, you have to add the operators to every line, so if you wanted:

(example OR examples) -is:retweet lang:en
(#idea1 OR #idea2 OR #idea3 OR #idea4) -is:retweet lang:en

Hope that helps! See https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query on building queries in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants