# First step with API Twitter
The goal of this jupyternotebook is to perform basicmanipulation of the API Twitter.

One can learn/find in this notebook how to:
* make a request to get the tweets with the mention of `words` in the body text of tweets.
* make a request to get the tweets of a certain `user`.
* make a request to get the tweets in a certain date range (from `date_start` to `date_end`).

## Authentification forehand
The be able to perform the requests from the forged query, one needs to be identified regarding to his/her twitter developper application.

According to the [Twitter Developper Plateform documentation](https://developer.twitter.com/en/docs/authentication/overview) the authentification ensure to secure the data available on the plateform through authenfitication.

There are several authentification methods:
* [**OAuth 1.0a User Context**](https://developer.twitter.com/en/docs/authentication/oauth-1-0a)
* [**OAuth 2.0 Authentification**](https://developer.twitter.com/en/docs/authentication/oauth-2-0)
    * [**Bearer Token**](https://developer.twitter.com/en/docs/authentication/oauth-2-0/application-only)
    * [**OAuth 2.0 auth Code Flow with PKCE**](https://developer.twitter.com/en/docs/authentication/oauth-2-0/authorization-code)
* [**Basic Authentification**](https://developer.twitter.com/en/docs/authentication/basic-auth)

You can also (you must, it will be clearer after you read it) read the documentation about the [*Credential Handling*](https://github.com/twitterdev/search-tweets-python#credential-handling) to have few more informations about the credentials handling in a code point of view.

The dictionnaries keys for AWS and Twitter API used here are expected to be:
* `SEARCHTWEETS_ENDPOINT`
* `SEARCHTWEETS_BEARER_TOKEN`
* `SEARCHTWEETS_CONSUMER_KEY`
* `SEARCHTWEETS_CONSUMER_SECRET`

In [1]:
import os
import searchtweets
from searchtweets import ResultStream, gen_request_parameters, load_credentials, collect_results

print('Searchtweets version:', searchtweets.__version__)

Searchtweets version: 1.1.1


In [2]:
# Taking a look to the env variables:
# os.environ

##
# /!\ if you source your .envrc with the credentials in it, they will appeared here. BE SURE TO CLEAR OUTPUTS
# If the credentials are not in os.environ, you can retrieved them using the functions "retrieve_AWS_related" and "retrieved_TwitterAPI_related"
##

### Loading the credentials

In [3]:
def retrieve_TwitterAPI_related(s_envrc: str) -> dict:
    tmp = s_envrc.split('\n')
    lst_to_retrieve = ['SEARCHTWEETS_ENDPOINT',
        'SEARCHTWEETS_BEARER_TOKEN',
        'SEARCHTWEETS_CONSUMER_KEY',
        'SEARCHTWEETS_CONSUMER_SECRET']
    s_twitter_related = {}
    for needle in lst_to_retrieve:
        for s in tmp:
            if s.find(needle) == 0:
                key, val = s.split('=')
                s_twitter_related[key] = val[1:-1]
    return s_twitter_related

In [9]:
# Awful but it will work for now, if a (several) key(s) related to twitter API is (are) missing, it will try to find it in the `.envrc` file
necessary_keys = ['SEARCHTWEETS_ENDPOINT', 'SEARCHTWEETS_BEARER_TOKEN', 'SEARCHTWEETS_CONSUMER_KEY', 'SEARCHTWEETS_CONSUMER_SECRET']

if any([k in os.environ.keys() for k in necessary_keys]):
    with open('../.envrc', 'r') as f:
        content = f.read()
    dct_twitterapi = retrieve_TwitterAPI_related(content)

In [10]:
for key, val in dct_twitterapi:
    os.environ[key] = val

NameError: name 'dct_twitterapi' is not defined

Now the necessary credentials should be in the environment and accessible to allow our authentification to Twitter API.

In [6]:
# If thecredentials are in a specific YAML file, those can be load directly from this file

# Loading the credentials:
# search_args = load_credentials(filename="./.searchtweets_credentials.yaml",
#                  yaml_key="search_tweets_v2_example",
#                  env_overwrite=False)

In [7]:
search_args = load_credentials(filename=None)
#search_args

# will catch the necessary credentials from env,
#this method can be used to avoid to use the custom parsing function retrieve_TwitterAPI_related

cannot read file ~/.twitter_keys.yaml
Error parsing YAML file; searching for valid environment variables
Your credentials are not configured correctly and  you are missing a required field. Please see the  readme for proper configuration


KeyError: 

## Tweets with the mention of `words` in the body text of tweets.
### Forging a query:
There is 2 ways to interact with the Twitter API:
* fast way: using `gen_request_parameters`
* ResultStream: using `ResultStream` class object

The first parameter of `gen_request_parameters` (or `gen_rule_payload`) must be a string representing what we call *Power track rule*

In [None]:
help(gen_request_parameters)

According to the [Developer Platform documentation](https://developer.twitter.com/en/docs/twitter-api/enterprise/historical-powertrack-api/guides/historical_powertrack_rules_and_filtering), PowerTrack allows to filter Twitter's full firehose.

We can filter on a wide range of attributes:
* geo-location,
* language,
* keyword,
* hastag,
* mention
* ...

See documentation [here](https://developer.twitter.com/en/docs/twitter-api/enterprise/rules-and-filtering/enterprise-operators) and especially [here](https://developer.twitter.com/en/docs/twitter-api/enterprise/powertrack-api/overview)

There are many possible filtering actions, 

For the `start_date` and `end_date`, it has to be in UTC format accepted by the function `convert_utc_time`:

![convert_utc_time](images/convert_utc_time.png)

In [None]:
s_time = e_time= None
query = gen_request_parameters("EmmanuelMacron lang:fr",
                               results_per_call=10,
                               granularity=None,
                               start_time=s_time,
                               end_time=e_time)
print(query)

In [None]:
tweets = collect_results(query, max_tweets=10, result_stream_args=search_args) 

### Remarks:
How the return of collect_results are constructed ?

* `tweets` is a list where each elements is a dictionary containing `max_tweets` number of tweets: the results are `batchs` of tweets.
* Each batch is a dictionnary with 2 keys: "data" and "metadata".
* `batch['data']` is a list of dictionaries corresponding to each tweets (`batch['data'][i]`).
*  `batch['data'][i]['id']` is the tweet ID and `batch['data'][i]['text']` is the text of the tweet.

In [None]:
tweets[0]['data'][0]['text']

In [None]:
func = lambda tweets, tweet_nb: tweets[0]['data'][tweet_nb]['text']

In [None]:
print(func(tweets, 2)[::])

In [None]:
request = tweets
for element in request:
    for id, tweet in enumerate(element['data']):
        print(f"Tweet n*{id}")
        print(tweet["text"])
        print(f"")


### Forging a query with `ResultStream`
The `ResultStream` constructor needs at least the `endpoint` (*str*) and `rule_payload` (*dict* or *json*).
There are also other parameters such as `username`, `password`, `bearer_token` ...

In [None]:
help(ResultStream)

In [None]:
rs = ResultStream(**search_args,
                  request_parameters=query, # as mentioned by the doc, it can be a json.
                 max_tweets=3,
                 max_requests=1,
                 output_format='a')

When the ResultStream object is instanced, we can used its method `stream` which returns a `generator`.

In [None]:
for ii in rs.stream():
    print(ii)

As mentioned in the documentation of the class, one can used a json file for `request_parameters` parameter.

In [None]:
import json

In [None]:
# Creation of a json file:
dct_query = json.loads(query)

with open('first_tweet.json', 'w') as outputfile:
    json.dump(dct_query, outputfile)

In [None]:
rs = ResultStream(**search_args,
                  request_parameters='first_tweet.json', # as mentioned by the doc, it can be a json.
                 max_tweets=3,
                 max_requests=1,
                 output_format='a')

It is not working because the documentation is not precise enough.

It is dictonnary construct from a json file which works:

In [None]:
rs = ResultStream(**search_args,
                  request_parameters=dct_query, # as mentioned by the doc, it can be a json.
                 max_tweets=3,
                 max_requests=1,
                 output_format='a')

### Tweet mentionning a user:

Simply use `@user` or `{'from':user}` when forging the request:

In [None]:
query = gen_request_parameters(" from:JLMelenchon",
                               results_per_call=10,
                               granularity=None)
print(query)

In [None]:
tweets = collect_results(query, max_tweets=3, result_stream_args=search_args)

In [None]:
print(tweets)

In [None]:
rs = ResultStream(**search_args,
                  request_parameters={'query':'from:JLMelenchon'}, # as mentioned by the doc, it can be a json.
                 max_tweets=3,
                 max_requests=1,
                 output_format='a')

for ii in rs.stream():
    print(ii)

If the query is incorrectly formated, one could get an HTTPError / Bad Request

From the raised error:
```
HTTP Error code: 400: {"errors":[{"parameters":{"from":["JLMelenchon"]},"message":"The query parameter [from] is not one of [query,start_time,end_time,since_id,until_id,max_results,next_token,pagination_token,sort_order,expansions,tweet.fields,media.fields,poll.fields,place.fields,user.fields]"}],"title":"Invalid Request","detail":"One or more parameters to your request was invalid.","type":"https://api.twitter.com/2/problems/invalid-request"} | Bad Request
 Request payload: {'query': '', 'from': 'JLMelenchon'}
Quitting... 
```
One can see that we can provided the following keys in the dictionary:
* **query**,
* **start_time**,
* **end_time**,
* **since_id**,
* **until_id**,
* **max_results**,
* **next_token**,
* **pagination_token**,
* **sort_order**,
* **expansions**,
* **tweet.fields**,
* **media.fields**,
* **poll.fields**,
* **place.fields**,
* **user.fields**

### Request to get the tweets in a certain date range (from date_start to date_end).
For the starting and ending time, one has just to give the date in one of the following format:
- YYYYmmDDHHMM
- YYYY-mm-DD
- YYYY-mm-DD HH:MM
- YYYY-mm-DDTHH:MM


**Remark:**

We cannot request with date older than 7 days ago, otherwise one will get this type of warning:

```
 HTTP Error code: 400: {"errors":[{"parameters":{"start_time":["2022-02-02T00:00Z"]},"message":"Invalid 'start_time':'2022-02-02T00:00Z'. 'start_time' must be on or after 2022-03-01T18:59Z"},{"parameters":{"end_time":["2022-02-10T00:00Z"]},"message":"Invalid 'end_time':'2022-02-10T00:00Z'. 'end_time' must be on or after 2022-03-01T18:59Z"}],"title":"Invalid Request","detail":"One or more parameters to your request was invalid.","type":"https://api.twitter.com/2/problems/invalid-request"} | Bad Request
 Request payload: {'query': 'EmmanuelMacron lang:fr', 'start_time': '2022-02-02T00:00:00Z', 'end_time': '2022-02-10T00:00:00Z', 'max_results': 10}
Quitting.
```

In [None]:
# YYY-mm-DD HH:MM
start_time = '2022-03-01'
end_time= '2022-03-07'
query = gen_request_parameters("EmmanuelMacron lang:fr",
                               results_per_call=10,
                               granularity=None,
                               start_time=start_time,
                               end_time=end_time)

In [None]:
tweets = collect_results(query, max_tweets=3, result_stream_args=search_args)
print(tweets)