*Hello again!* 👋

This notebook is the <u>third</u> part of a **tutorial** on how to  **collect data from Twitter API v2 using Python** 🤓

This notebook contains a series of exercises that will help you get confortable with collecting Twitter data using the recent search endpoint.

## Exercises

01. Take a look at the following dictionaries.

rules_1 = [
    
    {"value": '("heat pump" OR "heat pumps") -is:retweet lang:en'},

    {"value": '("gas boiler" OR "gas boilers") -is:retweet lang:en'},

]

and 

rules_2 = [

    {"value": '("heat pump" OR "heat pumps" OR "gas boiler" OR "gas boilers") -is:retweet lang:en'},

]

Do we collect the exact same tweets from rules_1 and rules_2?

*Your answer:*

02. Taking the rules defined below, collect data including:
- tweet fields: tweet id, tweet text, author id, tweet creation date and time, context annotations, entities, geo, public metrics, source
- user fields: user id, name, username, date and time user created the account, description, location, if user is verified or not, public metrics
- place fields: place id, country, country_code, country name, country full name, geo and place_type


*Helpful links:*
- [Tweet fields](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet)
- [User fields](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/user)
- [Place fields](https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/place)

In [None]:
from collect_and_process_search_data import *
import os
import pandas as pd

In [None]:
rules = [
{"value": '("heat pump" OR "heat pumps") -is:retweet lang:en', "tag":"exercise_2"},
]

In [None]:
# replace the ... by the fields you want to collect
query_parameters = {
    "tweet.fields": "...",
    "user.fields": "...",
    "place.fields": "...",
    "expansions": "...",
    "max_results": 100,
}

In [None]:
bearer_token = os.environ.get("BEARER_TOKEN")

In [None]:
# if you just want to check if you defined the rules and query_parameters correctly, and that your code is working
headers = request_headers(bearer_token)
query_parameters["query"] = rules[0]["value"]
json_response = connect_to_endpoint(endpoint_url, headers, query_parameters)
json_response

# if you want to collect the data uncomment the line below and comment the ones above
#collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex2 = pd.read_pickle("tweets_exercise_2.pkl")
users_ex2 = pd.read_pickle("users_exercise_2.pkl")
places_ex2 = pd.read_pickle("places_exercise_2.pkl")

03. Using the same query parameters you defined above, change your rules to so that the data you collect does not contain heatpump nor heatpumps hashtags.

*Helpful links:*
- Take a look at the list of operators [here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#list)

In [None]:
#change this list to account for the hashtags
rules = [
{"value": '("heat pump" OR "heat pumps") -is:retweet lang:en', "tag":"exercise_3"},
]

In [None]:
# copy your answer from previous exercise
query_parameters = {
    "tweet.fields": "...",
    "user.fields": "...",
    "place.fields": "...",
    "expansions": "...",
    "max_results": 100,
}

In [None]:
# if you just want to check if you defined the rules and query_parameters correctly, and that your code is working
headers = request_headers(bearer_token)
query_parameters["query"] = rules[0]["value"]
json_response = connect_to_endpoint(endpoint_url, headers, query_parameters)
json_response

# if you want to collect the data uncomment the line below and comment the ones above
#collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex3 = pd.read_pickle("tweets_exercise_3.pkl")
users_ex3 = pd.read_pickle("users_exercise_3.pkl")
places_ex3 = pd.read_pickle("places_exercise_3.pkl")

04. Change **rules** and **query_parms** below to collect data from Twitter satisfying the following requirements:
- mentioning **ChatGPT** but *not mentioning* programming, refactoring or code;
- no retweets;
- written in english;
- from verified authors;
- posted between the 12th of January 2023 at 2pm (UK time) and 12th of January 2023 at 3pm (UK time);
- 100 results per call to the API;
- tweet fields, user fields and place fields as per exercise 2.


*Helpful links:*
- Take a look at the list of operators [here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#list)
- To know more about start and end times parameters [check this page](https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent)

In [None]:
# write your rules according to the above requirements
rules = [
    {"value": '...', "tag":"exercise_4"},
]

In [None]:
# copy your query parameters from exercise 2 and change them to account for the above requirements
query_parameters = {
    
}

In [None]:
collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex4 = pd.read_pickle("tweets_exercise_4.pkl")
users_ex4 = pd.read_pickle("users_exercise_4.pkl")
places_ex4 = pd.read_pickle("places_exercise_4.pkl")

05. Check the *id* of the first tweet you collected in the previous exercise (which corresponds to the latest tweet). Change query_parameters dictionary to only collect tweets posted after that one (*hint:* make use of the since_id query parameter)


*Helpful links:*
- Take a look at the list of operators [here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query#list)
- To know more about since_id parameter [check this page](https://developer.twitter.com/en/docs/twitter-api/tweets/search/api-reference/get-tweets-search-recent) or [this page](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/paginate)

In [None]:
tweets_ex4

In [None]:
# copy rules you defined in previous exercise (tag to exercise 5)
rules = [
    {"value": '', "tag":"exercise_5"},
]

In [None]:
# copy parameters you defined in previous exercise and alter them to account for the requirements of this exercise
query_parameters = {
    
}

In [None]:
collect_and_process_twitter_data(bearer_token, rules, query_parameters)

In [None]:
tweets_ex5 = pd.read_pickle("tweets_exercise_5.pkl")
users_ex5 = pd.read_pickle("users_exercise_5.pkl")
places_ex5 = pd.read_pickle("places_exercise_5.pkl")