<h1 align="center" style="color:tomato; font-size:4em;">Twitter Data (Tweet) Collection</h1>

## Twitter API Access Levels


[Twitter API access Levels](https://developer.twitter.com/en/docs/twitter-api/getting-started/about-twitter-api#v2-access-level)

## After Academic Research Application is Approved

* Go to the [Developer Portal](https://developer.twitter.com/en/portal)
* Create and App (endpoint API) by clicking the **Add App** button
* Enusre to sall all key information in a separate (python) file (config.py)
    - `access tokens, client/consumer keys` are only used, for example, when you are posting or retweeting or liking a post.
    - `bearer token` is enough for searching tweets and users  
* Set App Setting and configure User Authentication Service
    - Choose Read and Write

## Install `tweepy`

#### Install tweepy as: `pip install tweepy`

[Tweepy Official Documentation](https://docs.tweepy.org/en/stable/)

* Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python
* Once installed, it can be used in the program as `import tweepy`

## The tweepy `Client` class

[Official Documentation for ALL `Client` Methods ](https://docs.tweepy.org/en/stable/client.html)

* Creates an endpoint (client) API that let you to access different functionalities of the tweepy
* Client can be created as:
    - `client = tweepy.Client(bearer_token = "YourBearerToken")`
    - Using `client` you can access the different [tweepy `Client` Methods ](https://docs.tweepy.org/en/stable/client.html)


## Building a Search query

* The search query is a string that defines what keywords are to be searched for Tweets Collection, e.g. `"covid OR covid19 OR covid-19 deaths"`

* For more details see: 

    - [Official Query Documentation](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query) 

    - [Some more Examples](https://github.com/twitterdev/getting-started-with-the-twitter-api-v2-for-academic-research )


## Searching ALL tweets

### `ONLY available in Academic `

[search_all_tweets()](https://docs.tweepy.org/en/stable/client.html#search-tweets)

* by Default only gives you Tweet `id` and `text`. But you can add more fields with predefined `Tweet and User Fields`

## Twitter Fields

[Offical Documentation on Fileds](https://developer.twitter.com/en/docs/twitter-api/fields)

### Tweet Fields: 

* If more information is required, use the tweet_fields[] argument in the Client `method` as, e.g., 

`tweet_fields = ['author_id', 'created_at', 'public_metrics']`

### User Fileds

* Default is id, user, username. If more is intended, extend them using the `user_fields`

`user_fields = ['name', 'username', 'location', 'public_metrics']`




## Getting ONLY Tweets data

(With extended tweet fields defined in the `tweet_fields= ['author_id', 'created_at', 'public_metrics']`)

> `for tweet in response.data:
     st.info(f"{tweet.id} {tweet.text} {tweet.author_id} {tweet.created_at} {tweet.public_metrics}")`

#### `response.data` 
* `response` is that object used for querying the Client
* `data` is the object (a dictionary) that contain tweet (fields) information


## Getting ONLY the user information
(with default fields example)

> `for user in response.includes['users']:
    st.info(f"{user.id} {user.name} {user.username}")`
    
#### `includes['users']` is the default dictionary objection containing User related information


<div class="alert alert-block alert-danger"> The <b> expansions </b> attribute of the search method MUST be used to extract user related information. Otherwise, <b>includes['users']</b> will not work</div>


## The Challenge - How to combine both the datasets, `tweet data and user info`

* Solution
    - First, create a user dictionary containing all the required user information
    - Then combine this dictionary (based on the author_id field of the tweet data) with the tweet data to merge user-tweet information as follows:

In [None]:


user_dict = {} # Create an empty dictionary to save user information - Creating USER dataset

for user in response.includes['users']:
        
        # saving user information as key:value pair
        
        user_dict[user.id] = {

                'name': user.name,
                'handler': user.username,
                'location': user.location,
                'followers': user.public_metrics['followers_count'],
                'following': user.public_metrics['following_count']

        } 


In [None]:
# Creating COMBINED dataset

results = [] # Create empty list to hold both tweet and user data

for tweet in response.data:
        
        auth_info = user_dict[tweet.author_id]

        results.append({

            'tweet_id': tweet.id,
            'author_id': tweet.author_id,
            'text': tweet.text,
            'date': tweet.created_at,
            'retweets': tweet.public_metrics['retweet_count'], 
            'replies': tweet.public_metrics['reply_count'],
            'likes': tweet.public_metrics['like_count'],
            "name": auth_info['name'],
            "handler": auth_info['handler'],
            "location": auth_info['location'],
            "followers": auth_info['followers'],
            "following": auth_info['following']

        })



## Create a pandas DataFrame (series or table) for the combined result

*pandas* should be imported before doing this:

>### `dataset = pd.DataFrame(results)`


## Save the data frame, dataset, as a csv/xlsx file

>### `dataset.to_csv()`
>### `dataset.to_excel()`

### OR

Use the download button feature of streamlit to save the file