# Collecting the data using Twitter

This tutorial introduces what an API is and documents the process of using the Twitter API v2, from gaining access to the API, to connecting to a search endpoint and collecting data relating to some keywords of interest.

### What is an API

An Application Programming Interface (API) is a software intermediary that allows two applications to communicate with each other to access data. Our focus will be on the Twitter API, which is a well-documented API that enables programmers to access Twitter in advanced ways. It can be used to analyze, learn from, and even interact with Tweets. 

### Account Setup

Before using the Twitter API, one must already have a Twitter account. It is then required to apply for access to the Twitter API in order to obtain credentials.

To go about this, one must set up an app by opening the developer portal and choosing ‘create a new project’, filling out the required details, and lastly giving the new app a name.

Once this is done, you will be navigated to a keys and tokens page. After you name your app, you will receive your API Keys and the Bearer Token (hidden in the screenshot below). These are necessary to connect to the endpoints in the Twitter API v2.# For setting up the developer portal account, please refer to the links mentioned in the 'References' Section.

In [None]:
# Import the libraries
import pandas as pd
import requests
import os
import json

Set the bearer token that you have obtained while setting up the developer portal(I am using my bearer token over here)

In [None]:
os.environ["BEARER_TOKEN"] = "AAAAAAAAAAAAAAAAAAAAAM8%2BggEAAAAAEXsWYznpdauEwYDZp%2FyJqrh0I2k%3DIkWf7xTxYhzsspFvXnARNAxF2n6je6YhkUWveTLfqLnbdhmBNq"

### Create Connection

We will create a connect_to_twitter() function which retrieves the bearer token from the environment variable and return headers that will be used to access the API.

In [None]:
def connect_to_twitter():
    bearer_token = os.environ.get("BEARER_TOKEN")
    
    return {"Authorization": "Bearer {}".format(bearer_token)}
    
headers = connect_to_twitter()
print(headers)

{'Authorization': 'Bearer AAAAAAAAAAAAAAAAAAAAAM8%2BggEAAAAAEXsWYznpdauEwYDZp%2FyJqrh0I2k%3DIkWf7xTxYhzsspFvXnARNAxF2n6je6YhkUWveTLfqLnbdhmBNq'}


Twitter recently launched a new "ExtremeWeather Mini-Site". This gives a great idea of the data insights available on Twitter. Therefore, in this example, we will pull some recent Tweets relating to the #ExtremeWeather conversation on Twitter. 

### Retrieving Data

Now, we will invoke Twitter API by setting query parameters as 'Extreme Weather

In [None]:
def make_request(headers):
    url = "https://api.twitter.com/2/tweets/search/recent"
    query_params = {'query': 'ExtremeWeather'}
    return requests.request("GET", url, params=query_params, headers=headers).json()
response = make_request(headers)
print(response)

{'data': [{'id': '1565856106479493133', 'text': 'RT @StephTweetChat: #5G network could spell massive difficulties for #weather satellites used for crucial Earth observations: https://t.co/…'}, {'id': '1565852239612284929', 'text': '#5G network could spell massive difficulties for #weather satellites used for crucial Earth observations: https://t.co/mdNFYBIwMj\n\n#Wx #ExtremeWeather #EmergencyPreparedness #NaturalDisaster #Disaster #Infrastructure #PublicSafety #Tech #EmergingTech https://t.co/ekcgc4EF9F'}, {'id': '1565847388442398720', 'text': 'Ay caray que calor que hace en Los Angeles, #extremeweather https://t.co/OfQz0bUyCn'}, {'id': '1565837017325371392', 'text': 'Are you prepared?\n\nCheck out our blog post for tips on what you can do before, during and after a blackout to keep your family safe:\n\nhttps://t.co/4CfXeXj8pp\n\n#ExtremeWeather #BePrepared #BlackOut #PowerOutage https://t.co/V41OwjlICG'}, {'id': '1565830907772141570', 'text': 'Special Issue on "Hydrology and Climate C

We will transform the data presemnt in JSON format into DataFrame

In [None]:
def make_dataset(response):
    return pd.DataFrame(response['data'])
extreme_weather_tweets = make_dataset(response)
extreme_weather_tweets

Unnamed: 0,id,text
0,1565856106479493133,RT @StephTweetChat: #5G network could spell ma...
1,1565852239612284929,#5G network could spell massive difficulties f...
2,1565847388442398720,"Ay caray que calor que hace en Los Angeles, #e..."
3,1565837017325371392,Are you prepared?\n\nCheck out our blog post f...
4,1565830907772141570,"Special Issue on ""Hydrology and Climate Change..."
5,1565828734682337283,"RT @PIK_Klima: ""Der #Sommer 2022 ist erneut ei..."
6,1565828421439176706,"Violent #thunderstorm in Mattighofen, #Austria..."
7,1565828150638043137,RT @xWxClub: #OnThisDay 3 years ago: Amazing p...
8,1565825670785208320,Have we reached a #climate #tippingpoint ? Why...
9,1565813251765669888,"His name is Gaston, he is playful, sociable an..."


In [None]:
extreme_weather_tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      10 non-null     object
 1   text    10 non-null     object
dtypes: object(2)
memory usage: 288.0+ bytes


### More Data

In [None]:
def make_request(headers):
    url = "https://api.twitter.com/2/tweets/search/recent"
    query_params = {'query': 'ExtremeWeather -is:retweet',
    'max_results': 100,
    'tweet.fields': 'id,text,geo,conversation_id,created_at',
    'user.fields': 'id,name,username,location',
    'place.fields': 'full_name,country',
    'next_token': {}}
    return requests.request("GET", url, params=query_params, headers=headers).json()

response = make_request(headers)
print(response)



The recent search endpoint can deliver up to max_results=100 Tweets per request in reverse-chronological order. Pagination tokens are used if there are more than the ‘max_results’ matching Tweets. The next page of results can be retrieved by amending the request by copying and pasting the ‘next_token’ field given in the previous result into the ‘next_token’ field instead of leaving it blank as above. A loop could be created to make requests to pull Tweets until all matching Tweets have been collected.

In [None]:
def make_request_all(headers, nt):
    url = "https://api.twitter.com/2/tweets/search/recent"
    query_params = {'query': 'ExtremeWeather -is:retweet',
    'max_results': 100,
    'tweet.fields': 'id,text,geo,conversation_id,created_at',
    'user.fields': 'id,name,username,location',
    'place.fields': 'full_name,country',
    'next_token': {nt}}
    return requests.request("GET", url, params=query_params, headers=headers).json()

extreme_weather_tweets = pd.DataFrame(response['data'])
for i in range(5):
    nt = response['meta']['next_token']
    response = make_request_all(headers,nt)
    extreme_weather_tweets = pd.DataFrame.append(result,response['data'])


In [None]:
extreme_weather_tweets

Unnamed: 0,created_at,text,id,conversation_id,geo
0,2022-09-03T00:00:40.000Z,#5G network could spell massive difficulties f...,1565852239612284929,1565852239612284929,
1,2022-09-02T23:41:24.000Z,"Ay caray que calor que hace en Los Angeles, #e...",1565847388442398720,1565847388442398720,
2,2022-09-02T23:00:11.000Z,Are you prepared?\n\nCheck out our blog post f...,1565837017325371392,1565837017325371392,
3,2022-09-02T22:35:54.000Z,"Special Issue on ""Hydrology and Climate Change...",1565830907772141570,1565830907772141570,
4,2022-09-02T22:26:02.000Z,"Violent #thunderstorm in Mattighofen, #Austria...",1565828421439176706,1565828421439176706,
...,...,...,...,...,...
99,2022-08-27T05:11:13.000Z,#Pakistan Blames #Climate Change for Deadly Fl...,1563393676830654464,1563393676830654464,
0,2022-08-27T02:57:08.000Z,"""From the US to Italy to China, waters have re...",1563359931801411584,1563359931801411584,
1,2022-08-27T02:41:05.000Z,#ExtremeWeather #Methane #GlobalWarming #Clima...,1563355895165202437,1563355608832610304,
0,2022-08-27T02:57:08.000Z,"""From the US to Italy to China, waters have re...",1563359931801411584,1563359931801411584,


In [None]:
extreme_weather_tweets.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 503 entries, 0 to 1
Data columns (total 5 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   created_at       503 non-null    object
 1   text             503 non-null    object
 2   id               503 non-null    object
 3   conversation_id  503 non-null    object
 4   geo              12 non-null     object
dtypes: object(5)
memory usage: 23.6+ KB


### Save Data

In [None]:
extreme_weather_tweets.to_csv('data/extreme_weather_tweets.csv')

### References and Credit

References:

1) https://towardsdatascience.com/getting-started-with-data-collection-using-twitter-api-v2-in-less-than-an-hour-600fbd5b5558

2) https://towardsdatascience.com/collect-data-from-twitter-a-step-by-step-implementation-using-tweepy-7526fff2cb31

Collecting Data from Twitter by Bhawneet Singh is licensed under [CC BY NC SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).