# Fetching tweets from Twitter API using Twarc and Tweepy

Twitter API V2 Official Documentation: Get to know more about parameters <br>
[Link](https://github.com/twitterdev/getting-started-with-the-twitter-api-v2-for-academic-research/blob/main/modules/5-how-to-write-search-queries.md)

In [1]:
# import libraries 
import matplotlib.pyplot as plt 
from twarc import Twarc2,expansions 
import tweepy 
import configparser
import time
import pandas as pd

## Authentication 

### 1. Read Configs 

By using `configparser` library, the authorization files can be stored seperatly without hurting the availibility of sharing working files. 

In [2]:
config = configparser.ConfigParser(interpolation=None)
config. read('config.ini')

api_key = config['twitter']['api_key']
api_key_secret = config['twitter']['api_key_secret']
access_token = config['twitter']['access_token']
access_token_secret = config['twitter']['access_token_secret']
bearer_token = config['twitter']['bearer_token']

### 2. Authenticate 
Authenticate the account/app to the Twitter API. 

In [3]:
from multiprocessing.connection import wait


auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)

# create the API instance
twarc2_client = tweepy.API(auth)

# or use Tweepy 
client = tweepy.Client(bearer_token = bearer_token, wait_on_rate_limit= True)

## Fetching Data

### 1. user search

In [5]:
user = 'Nike'

# fetch the last 100 tweets 
limit = 300 

# prevent to reach the cap 
tweets = tweepy.Cursor(twarc2_client.user_timeline, 
              screen_name = user, 
              count = 200,
              tweet_mode = 'extended').items(limit)

# tweets = api.user_timeline(
#                             screen_name = user, 
#                             cont = limit,
#                             tweet_mode = 'extended', # prevent the API to truncate only 140 characters
#                             ) 

# create DataFrame
columns = ['user_id','user_location','user_name','text']
data = []

for tweet in tweets: 
    # the tab provides multiple parameter selection 
    data.append([tweet.user.id, 
                tweet.user.location, 
                tweet.user.screen_name, 
                tweet.full_text])

df1 = pd.DataFrame(data, columns = columns)
df1.head()

Unnamed: 0,user_id,user_location,user_name,text
0,415859364,"Beaverton, Oregon",Nike,"@__agrdn Salut, nous sommes navrés de voir que..."
1,415859364,"Beaverton, Oregon",Nike,"@kam_htl Hey, @kam_htl nous sommes navrés de v..."
2,415859364,"Beaverton, Oregon",Nike,"@sarahmaispasla Eh bien, tu as été chanceuse ! 😀"
3,415859364,"Beaverton, Oregon",Nike,"@MeganTo09520759 Hey Megan, sorry to hear. For..."
4,415859364,"Beaverton, Oregon",Nike,"@sarahmaispasla Hello,\nL'offre anniversaire e..."


### 2. keywords or hashtags search 

In [6]:
keywords = 'sneakers'
limit = 500 # why must add limit? 

tweets_keyword_search = tweepy.Cursor(
              twarc2_client.search_tweets, 
              q= keywords, 
              count = 100,
              tweet_mode = 'extended').items(limit)

# create DataFrame
columns = ['author_id','author_name','created_time','location','text']
data = []

for tweet in tweets_keyword_search: 
    data.append([tweet.author.id, 
                 tweet.author.name,
                 # time can be more granualler depending on the need 
                 tweet.created_at,
                 tweet.user.location, 
                 tweet.full_text])

df2 = pd.DataFrame(data, columns = columns)
df2.head()

Unnamed: 0,author_id,author_name,created_time,location,text
0,1057967416699957248,Chad Jones aka Sneaker Galactus,2022-10-06 13:49:14+00:00,United States,"I would love to do my own sneaker ""show"" to sh..."
1,251341001,Graham Cracker.,2022-10-06 13:49:11+00:00,,I put sneakers on a toddler once and I’ve neve...
2,539929374,jinlia,2022-10-06 13:48:59+00:00,,RT @midcys: itzy: put my sneakers on\n\nthe cr...
3,1094938177369001984,Benefits Everyone,2022-10-06 13:48:56+00:00,"Newcastle Upon Tyne, England",Fabulous footwear since 1975 - Exclusive Moda ...
4,1250741036390903808,ؘ,2022-10-06 13:48:55+00:00,s%her,RT @ITZYofficial: 💾 ITZY &lt;CHECKMATE&gt; 하드털...


### 3. Full-Archive Search
Get more than 500 Tweets at a time using `paginator`, which will automately go for the next page.

[Reference](https://www.youtube.com/watch?v=rQEsIs9LERM)

In [8]:
# set query, parameters changable 
input_query = 'sneakers -is:retweet lang:en place_country:US'

def full_archive_search(input_query): 
    """
    Function for API query with input parameters
    ----------------------
    Input: query 
    Output: API request result

    """
    result = []

    for response in tweepy.Paginator(client.search_all_tweets, 
                                    # doesn't include the retweet
                                    query = input_query,
                                    user_fields = ['username', 'public_metrics', 'description', 'location'],
                                    tweet_fields = ['created_at', 'geo', 'public_metrics', 'text'],
                                    expansions = 'author_id',
                                    start_time = '2022-08-01T00:00:00Z',
                                    end_time = '2022-09-30T23:59:59Z',
                                    max_results=500, limit = 20):
    # the `search_all_tweets` has per second limit, therefore should wait for a second before the code progresses                              
        time.sleep(1)
        result.append(response)

    return result

tweepy_query = full_archive_search(input_query)

In [9]:
def full_archive_search_df(tweepy_query):

    """
    Function for reformat the function the query into dataframe.
    --------------------
    Input: the tweepy query result 
    Output: pandas dataframe format of query result

    """
    result = []
    user_dict = {}

    # loop through each response object
    for response in tweepy_query:
        
        # take all of the users, and put them into a dictionary of dictionaries with the info we want to keep
        for user in response.includes['users']:
            user_dict[user.id] = {'username': user.username, 
                                'followers': user.public_metrics['followers_count'],
                                'tweets': user.public_metrics['tweet_count'],
                                'description': user.description,
                                'location': user.location
                                }

        # for each tweet, find the author information                        
        for tweet in response.data:
            author_info = user_dict[tweet.author_id]

            # put all of the information we want to keep in a single dictionary for each tweet 
            result.append({'author_id': tweet.author_id, 
                        'username': author_info['username'],
                        'author_followers': author_info['followers'],
                        'author_tweets': author_info['tweets'],
                        'author_description': author_info['description'],
                        'author_location': author_info['location'],
                        'text': tweet.text,
                        'created_at': tweet.created_at,
                        'retweets': tweet.public_metrics['retweet_count'],
                        'replies': tweet.public_metrics['reply_count'],
                        'likes': tweet.public_metrics['like_count'],
                        'quote_count': tweet.public_metrics['quote_count']
                        })

    tweepy_query_df = pd.DataFrame(result)
    return tweepy_query_df

sneakers_df = full_archive_search_df(tweepy_query)

In [10]:
sneakers_df.shape

(2776, 12)

In [11]:
sneakers_df.head()

Unnamed: 0,author_id,username,author_followers,author_tweets,author_description,author_location,text,created_at,retweets,replies,likes,quote_count
0,1493612427304488965,1KHABYS,128,2168,Be ¥a $elf or Bring ¥a $hooters! 👀🎮 BYS LLC™️ ...,Around da corner⭐️,Privileged never paid over box price for sneakers,2022-09-30 23:28:41+00:00,0,0,0,0
1,29560488,CarrieMae_,344,12286,"the devil works hard, kris jenner works harder","Brooklyn, NY",Trying to have a peaceful evening and at the r...,2022-09-30 23:21:03+00:00,0,0,0,0
2,22680919,MattHalfhill,13053,535,Founder + CEO of @nicekicks. DMs are open. sz 11,"Austin, TX",Drop sneakers at a job fair if you don’t want ...,2022-09-30 23:13:22+00:00,112,46,1196,32
3,997270922976481282,JEFF_SON_334,609,3810,"Husband, Father to a son ,COOL MF in General U...","Montgomery, AL",I hate the fact that Puma ain’t got no sneaker...,2022-09-30 22:36:38+00:00,0,0,0,0
4,101915799,kwamemorgan,847,25444,Follow my IG : @Kwamemorgan,"ÜT: 38.899236,-76.797741",@1_Bundles You know my lil buddies gone geek t...,2022-09-30 22:33:35+00:00,0,1,0,0


### 4. Full-Archive Search for brands 

#### 4.1 Nike

In [12]:
# Nike 
input_query = 'nike -is:retweet lang:en place_country:US'
nike_query = full_archive_search(input_query)
nike_df = full_archive_search_df(nike_query)

In [13]:
print(nike_df.shape)
nike_df.head()

(7883, 12)


Unnamed: 0,author_id,username,author_followers,author_tweets,author_description,author_location,text,created_at,retweets,replies,likes,quote_count
0,87144412,GarrettKGray,373,12055,Land Economist & Economic Development Speciali...,"Coos Bay, OR",@ShaneDaleAZ Totally. The Nike uniforms since ...,2022-09-30 23:46:59+00:00,0,0,0,0
1,492330913,LockDown_Lopes,467,97661,"@nicekicks, sports, & memes | University of Ar...","Scottsdale, AZ",Hats off to Tom Sachs and the marketing team a...,2022-09-30 23:43:45+00:00,0,0,0,0
2,37706001,RyanGensler,6686,13613,315 Born and Raised: Assistant Basketball Coac...,"Champaign, IL",The look on @makiracook face! 😂 \n\nThanks @Ni...,2022-09-30 23:38:15+00:00,1,1,27,0
3,17417435,ShellzBoss,564,22077,"#TeamLibra #TeamLesbian Hibernating, should be...","Maryland, Michigan",Check out my new pickup from Nike⁠ SNKRS: http...,2022-09-30 23:35:28+00:00,0,1,0,0
4,853714067692806144,DJKingJam,349,3474,Jordan Shoe collector || DJ Jamez || Music Pro...,"Seattle, WA",@jameslfreelance @Jumpman23 @Nike @nikestore O...,2022-09-30 23:15:57+00:00,0,0,2,0


In [19]:
nike_df['text'][0]

'@ShaneDaleAZ Totally. The Nike uniforms since have replaced a distinguished/identifiable look to enhance their own brand at the expense of Arizona.'

### 4.2 New Balance

In [20]:
# newbalance 
input_query = 'newbalance -is:retweet lang:en place_country:US'
nb_query = full_archive_search(input_query)
nb_df = full_archive_search_df(nb_query)

In [23]:
print(nb_df.shape)
nb_df['text'][200]

(240, 12)


'@snkr_twitr 🥵🥵😍😍😍 sexyyyy.\n\nI’m still stunned that when I worked at @striderite in 2010 @newbalance were not “IN” in Amerikkka at that time, and they used to be much cheaper lol.'

### 4.3 Adidas

In [24]:
# adidas  
input_query = 'adidas -is:retweet lang:en place_country:US'
adidas_query = full_archive_search(input_query)
adidas_df = full_archive_search_df(adidas_query)

In [27]:
print(adidas_df.shape)
adidas_df['text'][178]

(2346, 12)


'Did you see the new Adidas x Gucci pop up in melrose? The new partner will open tomorrow to provide customers with a unique shopping experience. https://t.co/smEQcOfP33'

### 4.4 Converse 

In [28]:
# converse  
input_query = 'converse -is:retweet lang:en place_country:US'
converse_query = full_archive_search(input_query)
converse_df = full_archive_search_df(converse_query)

In [30]:
print(converse_df.shape)
converse_df['text'][169]

(942, 12)


'First wedding event of the trip! Indian kurta? Check. @scottevest pants? Check. Converse? Check. I live by my own rules. Also, my wife allowed it https://t.co/cS5llc3au2'

### 4.5 Reebok

In [31]:
# reebok  
input_query = 'reebok -is:retweet lang:en place_country:US'
reebok_query = full_archive_search(input_query)
reebok_df = full_archive_search_df(reebok_query)

In [32]:
print(reebok_df.shape)
reebok_df['text'][152]

(244, 12)


'@chinababee Well if you’re only talking about one song vs the other then sure, but lets talk about his Reebok collection though, his new crackhead appearance, his new bitch, him continuously claiming he’s #1 worldwide when we all know who is lmaooo I can keep going.'

In [34]:
len(reebok_df)

244

### 4.6 Combine the info 

In [40]:
# text count of each brand 
d = {'brand':['Nike','New Balance','Adidas','Converse','Reebok'],
     'count':[len(nike_df), len(nb_df), len(adidas_df),len(converse_df),len(reebok_df)]}

brand_text_count = pd.DataFrame(columns=['brand','count'], data = d)

brand_text_count.sort_values(by = ['count'],ascending = False)

Unnamed: 0,brand,count
0,Nike,7883
2,Adidas,2346
3,Converse,942
4,Reebok,244
1,New Balance,240


In [45]:
# full dataset 
df = pd.concat([nike_df, adidas_df, nb_df, converse_df, reebok_df], axis = 0 )
df.shape

(11655, 12)