# Search Tweets

In order to find the topic of your interest, you may search tweets using two endpoints, Recent search and full-archive search. To filter tweets, no matter which endpoint you are using, you need to provide a search query.
According to Twitter website, "These search queries are created with a set of operators that match on Tweet and user attributes, such as message keywords, hashtags, and URLs. Operators can be combined into queries with boolean logic and parentheses to help refine the queries matching behavior."



## Requirements
The search_tweet.py script takes two arguments: 
1. `--credential-file <CREDENTIAL_FILENAME>`
2. `--config-file <CONFIG_FILENAME>` 
3. `--env-overwrite <BOOLEAN>` (Overwrites YAML-parsed credentials with any set environment variables. Default is TRUE)


### 1. Credential file
The credential file holds your Twitter credentials. The simplest credential file should look like this:

```
search_tweets_v2:
  endpoint:  https://api.twitter.com/2/tweets/search/...
  consumer_key: ek...
  consumer_secret: hy...
  bearer_token: AA...
```

By default, this library expects this file at `~/.twitter_keys.yaml`, but you can pass the relevant location as needed with the `--credential-file <CREDENTIAL_FILENAME>` flag for the command-line app.

#### Recent Search
To execute a recent search, the endpoint specification in the credential file needs to be set to 'recent'. The ‘recent’ search endpoint provides Tweets from the **past 7 days**.

```
search_tweets_v2:
  endpoint:  https://api.twitter.com/2/tweets/search/recent
  consumer_key: ek...
  consumer_secret: hy...
  bearer_token: AA...
```

#### Full-archive Search
To execute a full-archive search, the endpoint specification in the credential file needs to be set to 'all'. The ‘all’ search endpoint, launched in January 2021 as part of the ‘academic research’ tier of Twitter API v2 access, provides access to all publicly avaialble Tweets posted **since March 2006**.

```
search_tweets_v2:
  endpoint:  https://api.twitter.com/2/tweets/search/all
  consumer_key: ek...
  consumer_secret: hy...
  bearer_token: AA...
```

### 2. Configuration file
The configuration file (i.e., `api_config.config`) contains all parameters. Placing all paramters into one file is far easier to use than the command-line args version. If a valid configuration file is found, all arguments will be populated from there. N.B. Remaining command-line arguments will overrule arguments found in the config file (if `--env-overwrite` is not set to FALSE).

An example of such a config-file:

```
[search_rules]
start_time = 2020-05-01
end_time = 2020-06-01
query = (snow OR rain) has:media -is:retweet
tweet_fields = id,created_at,text

[search_params]
results_per_call = 10
max_tweets = 10

[output_params]
save_file = True
filename_prefix = output/weather_pics3
results_per_file = 10
```

### Search rules
In the config-file you can enter all search rules necessary for your research. 

Here are some examples:
```
[search_rules]
start_time = <Start of datetime window, format ‘YYYY-mm-DDTHH:MM’> 
end_time = <End of datetime window, format ‘YYYY-mm-DDTHH:MM’>
tweet_fields = <A comma-delimited list of Tweet JSON attributes to include in endpoint responses>
place_fields = <A comma-delimited list of Twitter Place JSON attributes to include in endpoint responses>
query = <Search query>
```

#### Search query
In the query specification, you enter how you wish to filter Tweets. Commonly used arugments are:
* `<key_word1> OR <key_word2>` (look for Tweets including either word1 or word2)
* `lang:<lang>` (only receive Tweets that are in specific langauge. Example, lang:en selects only English Tweets)
* `-is:<type>` ('-' is a negation operator; excludes certain types of Tweets. Example, -is:retweet exclused retweets, leaving only original Tweets) 
* `-<key_word>` ('-' is a negation operator; excludes Tweets with key_word in it)
* `has:<prop>` (matches Tweets that have specific property. Example, has:geo selectes Tweets with Tweet-specific geolocation data provided by the Twitter user)

Hence, if you want to look for original Tweets in English related to happy or happiness containing at least one hashtag, but are not related to birthday whishes, we write:
```
query = (happy or happiness) lang:en -birthday -is:retweet has:hashtags
```

To get an extensive overview of how you can structure a query, have a look [here](https://developer.twitter.com/en/docs/twitter-api/tweets/counts/integrate/build-a-query). 

### Search parameters
In the config-file you can enter all search parameters necessary for your research. 

Here are some examples:
```
[search_params]
results_per_call = <Number of results to return per call (default 10; max 100)>
max_tweets = <Maximum number of Tweets to return for this session of requests>
max_pages = <Maximum number of pages/API calls to use for this session>
```


### Output parameters
In the config-file you can enter all output parameters necessary for your research. 

Here are some examples:
```
[output_params]
save_file = True
filename_prefix = <prefix for the filename where tweet json data will be stored>
results_per_file = <Maximum tweets to save per file>
```


## Running script
After you've filled in your credentials in the `.twitter_keys.yaml` and entered all parameters needed for your Twitter search in the `api_config.config` file, you can run the script in the command line with the following code:

```
cd tweet_collector
python3 src/search_tweet.py --credential-file "config/.twitter_keys.yaml" --config-file "config/api_config.config 
```