# Guideline to analyzig historical Twitter data

To analyze historical Twitter data (i.e., all public Tweets from now to the first Tweet in March 2006), a number of steps have to be taken:

1. Get an approved 'Academic Research' developer account
2. Enter Bearer Token
3. Determine your filter options
4. Collect relevant tweets
5. Analyze data

## 1. Get an Academic Research developer account
To get an account:

1. Go to https://developer.twitter.com/en/solutions/academic-research
2. Click on `Apply for an account`
3. Click on `Start Academic Research application`
4. Fill in the requested information (about your research)

You should get a response within about 7 days.

## 2. Enter Bearer Token
After you've been granted an academic research developer account, Twitter will provide you with some information that is unique to your account:
* API Key
* API Secret Key
* Bearer Token
* Access Token Secret

For the following steps, you'll only need the **Bearer Token**.

To be able to connect with the Twitter server, you need to save this Bearer Token as an *environmental variable*. 
To do this, open your terminal and use the following code (depending on your device):

* **Mac**: `export "BEARER_TOKEN"="<insert_bearer_token_here>"`
* **Windows**: `SET BEARER_TOKEN=<insert_bearer_token_here>`

If this doesn't work, you can also run the following code inside your Python console:

In [None]:
import os
os.environ['BEARER_TOKEN']='<insert_bearer_token_here>'

## 3. Determine your filter options

To filter out the Tweets that are relevant to your research, the code takes some 'filter options'. These options are:
1) **query_params**: How do you want to filter the Tweets? <br>
2) **output_dir**: In what folder do you want to save the extracted Tweets? <br>
3) **output_file_name**: What name do you want to give the output file? <br>
4) **output_csv**: Do you want to save the output file as .csv or as .json (default)? <br>


### 3.1. Query parameters
**query_params** = `dictionary`

There is an extensive list of arguments you can use to fine tune how you want to filter the Tweets. Some basic options are already listed in the `options.json` file, such as:
* **query**: this is the most import argument as it specifies what the tweets will be filtered on (in this case the hashtag '#EurovisionAgain', being written in English, having a geo-location, and not being a retweet)
* **max_results**: how many tweets do you want to extract?
* **end_time**: most recent UTC timestamp to which the Tweets will be provided (format: `YYYY-MM-DDTHH:mm:ssZ`)
* **expansions**: requests additional data objects that relate to the originally returned Tweets (in this case the geo-location of the tweet)
* **tweet.fields**: which specific Tweet fields will be delivered in each returned Tweet object <br>

For extra options have a look at: https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent

In [None]:
# Example: to select English Tweets that mentioned #EurovisionAgain, have a geo location and was not a retweet.
# Select the latest 100 Tweets up to now (no end_time), also extract their geo location
# From selected tweets, show Tweet id, Author id, date it was created, the text of the Tweet, entities, and geo location
{
 "query": "#EurovisionAgain lang:en has:geo -is:retweet",
 "max_results": 100,
 "end_time": "",
 "expansions": "geo.place_id",
 "tweet.fields": "id,author_id,created_at,text,entities,geo"
 }

### 3.2. Ouput directory
**output_dir** = `string`

This is the path to where the output file containing all extracted tweets, will be saved. 

In [None]:
# Example: to save the filtered Tweets in the folder 'output', use:
"output"

### 3.3. Ouput file name
`output_file_name`

This is the name the output file containing all extracted tweets will get.

In [None]:
# Example: to save the filtered Tweets in a file named 'extracted_tweets', use:
"extracted_tweets"

### 3.4. Output file type
`output_csv`

When this option is "True", the output file will be saved as a .csv file in the output directory. If this option is "False" (or left empty), the output file will be saved as a. json file. 

In [None]:
# Example: if you want to save the extracted Tweets as .json file, use:
False

## 4. Collect relevant Tweets

Now that you've entered the Bearer Token as environmental variable and have determined how you want to filter the Tweets and where you want to save the results, you can run the code. To do so, you have to follow these steps:
1) Open your terminal <br>
2) Type the following code in one line:
* python version: `python3` <br>
* script you want to run: `current_search.py` or `full-archive-search.py`<br>
* query parameters: `--query_params '{"query": "#EurovisionAgain", "max_results": 10}'` <br>
* output directory: `--output_dir 'output'` <br>
* output file name: `--output_file_name 'extracted_tweets'`<br>
* output file type: `--output_csv False` <br>

3) Hit Enter

In total, this will look like:

`python3 full-archive-search.py --query_params '{"query": "#EurovisionAgain", "max_results": 10}' --output_dir 'output' --output_file_name 'extracted_tweets' --output_csv False`

## 5. Analyze data

### 5.1. Elasticsearch
After you've extracted and saved the Tweets relevant to your research, you can analyze the data. For this, you need to load the data into Elasticsearch:
1) Open Elasticsearch <br>
2) ... <br>

### 5.2. Kibana
After you've uploaded the data, you can visualize it in Kibana:
1) Open Kibana <br>
2) ... <br>