![twitter](images/twitter.jpg)

## Objective

* Retrive data from the twitter API

## Loading Python Packages

In [33]:
# numpy for high level mathematical functions and working with Arrays
import numpy as np
# pandas data manipulation and analysis for tablular data
import pandas as pd

# tweepy to access the twitter API
import tweepy
# request to send http request 
import requests
# operating system functionality ie read,paths,readline
import os
# env for credentials
from dotenv import load_dotenv
# path to .env
path_to_env = ".env"
load_dotenv(path_to_env)

True

In [34]:
# Credentials
consumer_key = os.getenv("API_KEY")
consumer_secret = os.getenv("API_KEY_SECRET")
access_token = os.getenv("ACCESS_TOKEN")
access_token_secret = os.getenv("ACCESS_TOKEN_SECRET")
bearer_token = os.getenv("BEARER_TOKEN")

In [35]:
# intialize the client (Tweepy's interface for Twitter API v2)

client = tweepy.Client(
    bearer_token=bearer_token,
    consumer_key=consumer_key,
    consumer_secret=consumer_secret,
    access_token=access_token,
    access_token_secret=access_token_secret,
    return_type=requests.Response,
    wait_on_rate_limit=True
)

## Mentions

Retrieving the tweets mentioning a single user specified by their tweeter username. Returns the most recent ten tweets by default per request but you can use pagination to retrieve up to 800 tweets.

Find out more [here](https://developer.twitter.com/en/docs/twitter-api/tweets/timelines/api-reference/get-users-id-mentions) from twitter's developer platform.

Kenyans' are quick to twitter to raise complaints directed at government parastatals such as [Kenya Power](https://twitter.com/KenyaPower_Care?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor), banks such as [KCB](https://twitter.com/kcbcare?lang=en), telecommunication companies such as [safaricom](https://twitter.com/safaricom_care). Such Data can be collected and their sentiments analyzed to and get a glimpse of what their customers complains are.

Lets have a look at tweets sent to KenyaPower_Care

In [36]:
# Key parameter for getting mentions is the id
def get_user_id(user_name):
    """Simple function to get twiiter account user ID"""
    response = client.get_user(username=user_name).json()
    # response is a json file with data containing username, name and ID
    userID = response["data"]["id"]

    return userID 



get_user_id("KenyaPower_Care") 

'147561402'

In [37]:
# getting five mention the KenyaPower_Care twitter account
tweet = client.get_users_mentions(
    id=147561402,
    tweet_fields=['created_at', 'public_metrics'],
    max_results=5
)

tweet

<Response [200]>

In [38]:
# turn into a dictionary
tweet.json()

{'data': [{'id': '1592112018353684482',
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': '@KenyaPower_Care no power supply, kindly assist https://t.co/jSHU7MTROb',
   'created_at': '2022-11-14T11:07:39.000Z',
   'edit_history_tweet_ids': ['1592112018353684482']},
  {'id': '1592111416261197824',
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': '@Allan09036042 @KenyaPower_Care But inaletanga homa sana',
   'created_at': '2022-11-14T11:05:16.000Z',
   'edit_history_tweet_ids': ['1592111416261197824']},
  {'id': '1592110700100747264',
   'public_metrics': {'retweet_count': 0,
    'reply_count': 0,
    'like_count': 0,
    'quote_count': 0},
   'text': '@gladowjsang @KenyaPower_Care Pesa iko',
   'created_at': '2022-11-14T11:02:25.000Z',
   'edit_history_tweet_ids': ['1592110700100747264']},
  {'id': '1592110354620112897',
   'public_metrics': {'retweet

In [117]:
name = {"first": "Antonny", "second":"Muiko"}

name["first"]


'Antonny'

In [39]:
# getting more mentions the Kenya Power Care twitter account
tweets = client.get_users_mentions(
    id=get_user_id("KenyaPower_Care"),
    tweet_fields=['created_at', 'public_metrics'],
    max_results=100
    )

In [40]:
# Save data as dictionary
tweets_dict = tweets.json()

In [41]:
# Extract "data" value from dictionary
tweets_data = tweets_dict['data']

In [42]:
# transform to pandas DataFrame
data = pd.json_normalize(tweets_data)

In [43]:
# preview the data
data.head()

Unnamed: 0,id,created_at,text,edit_history_tweet_ids,public_metrics.retweet_count,public_metrics.reply_count,public_metrics.like_count,public_metrics.quote_count
0,1592112018353684482,2022-11-14T11:07:39.000Z,"@KenyaPower_Care no power supply, kindly assis...",[1592112018353684482],0,0,0,0
1,1592111416261197824,2022-11-14T11:05:16.000Z,@Allan09036042 @KenyaPower_Care But inaletanga...,[1592111416261197824],0,0,0,0
2,1592110700100747264,2022-11-14T11:02:25.000Z,@gladowjsang @KenyaPower_Care Pesa iko,[1592110700100747264],0,0,0,0
3,1592110354620112897,2022-11-14T11:01:03.000Z,@KenyaPower_Care Well noted and appreciated MW...,[1592110354620112897],0,1,0,0
4,1592110217814495233,2022-11-14T11:00:30.000Z,@RobertAlai help us follow @KenyaPower_Care re...,[1592110217814495233],0,2,0,0


In [44]:
# the shape of the dataframe
data.shape

(99, 8)

In [73]:
# using paginator to get more than 100
paginator = tweepy.Paginator(
    client.get_users_mentions,                     # The method you want to use
    id=get_user_id("KenyaPower_Care"),
    tweet_fields=['created_at', 'public_metrics'], # Some argument for this method
    max_results=100,                               # How many tweets per page
    limit=5                                        # How many pages to retrieve
)

In [74]:
# empty list to store the data
data_list = []
# using a loop to access the data 
for page in paginator:
    # store as a dictionary
    page_dict = page.json()
    # add the dicts to the empty list 
    data_list.append(page_dict)   

In [76]:
# len of the data_list
len(data_list)

5

In [102]:
# empty list to store dicts 
data_dict_list = []
# Extract "data" value from dictionary
for item in data_list:
    item_data = item['data']
    data_dict_list.append(item_data)

In [103]:
# empty list to store dataframe 
dataframes_list = []
for page_dict in data_dict_list:
    page_df = pd.json_normalize(page_dict)
    dataframes_list.append(page_df)

We now have a list of dataframes we have obtained from the KenyaPower_Care account which we can concatenate into one dataframe for our data.

In [118]:
df1 = pd.DataFrame({"A":[1,2,3,4], "B":["2a", "2b", "2c", "2d"]})
df2 = pd.DataFrame({"A":[5,6,7,8], "B":["3a", "3b", "3c", "3d"]})
df1

Unnamed: 0,A,B
0,1,2a
1,2,2b
2,3,2c
3,4,2d


In [119]:
df2

Unnamed: 0,A,B
0,5,3a
1,6,3b
2,7,3c
3,8,3d


In [122]:
df3 = pd.concat([df1, df2], axis=0, ignore_index=True)
df3

Unnamed: 0,A,B
0,1,2a
1,2,2b
2,3,2c
3,4,2d
4,5,3a
5,6,3b
6,7,3c
7,8,3d


In [105]:
# concat the dataframes
df_kplc = pd.concat(dataframes_list, axis=0, ignore_index=True)

In [106]:
# confirm the lengths of the dataframes
length = []
for df in dataframes_list:
    length.append(len(df))

print(f"Total length should be {sum(length)}.")
length

Total length should be 496.


[98, 100, 100, 100, 98]

In [107]:
len(df_kplc)

496

In [116]:
# preview the data 
df_kplc.head()

Unnamed: 0,text,edit_history_tweet_ids,id,created_at,public_metrics.retweet_count,public_metrics.reply_count,public_metrics.like_count,public_metrics.quote_count
0,@KenyaPower_Care Am trying to load my token an...,[1592132805341110272],1592132805341110272,2022-11-14T12:30:15.000Z,0,0,0,0
1,@KenyaPower_Care respond to ref no. 8737190,[1592132057760944130],1592132057760944130,2022-11-14T12:27:17.000Z,0,0,0,0
2,@KenyaPower_Care Can it be reversed?,[1592131682765000706],1592131682765000706,2022-11-14T12:25:48.000Z,0,0,0,0
3,@KenyaPower_Care kindly check your dm,[1592131450610257922],1592131450610257922,2022-11-14T12:24:52.000Z,0,0,0,0
4,"@KenyaPower_Care hello, please confirm is powe...",[1592131422621667328],1592131422621667328,2022-11-14T12:24:46.000Z,0,0,0,0


## Tweets from Account

* Let’s search Tweets from Safaricom's Twitter account (@SafaricomPLC) from the last 7 days (search_recent_tweets).
* We are going to exclude Retweets and limit the result to a maximum of 100 Tweets.
* We also include some additional information with tweet_fields (when the Tweet was created and the reactions ie likes and retweets).


In [47]:
# Define query
query = 'from:SafaricomPLC -is:retweet'

# retrieve recent tweets
saf_tweets = client.search_recent_tweets(query=query, 
                                    tweet_fields=['created_at', 'public_metrics'],
                                     max_results=100)

In [48]:
# Save data as dictionary
saf_tweets_dict = saf_tweets.json()

In [49]:
# Extract "data" value from dictionary
saf_tweets_data = saf_tweets_dict['data']

In [50]:
# transform to pandas DataFrame
saf_data = pd.json_normalize(saf_tweets_data)

In [52]:
# preview the data
saf_data.head()

Unnamed: 0,text,id,created_at,edit_history_tweet_ids,public_metrics.retweet_count,public_metrics.reply_count,public_metrics.like_count,public_metrics.quote_count
0,"@Otu_montana Hello Otu, dial ##21#OK.^FO",1592105684283400193,2022-11-14T10:42:29.000Z,[1592105684283400193],0,1,0,0
1,@BrianKiriethe Hi Bob. Checking in a short whi...,1592102151408738304,2022-11-14T10:28:27.000Z,[1592102151408738304],0,0,0,0
2,@Mr_Sarapai We are on it. ^RM,1592093896267513857,2022-11-14T09:55:39.000Z,[1592093896267513857],0,0,0,0
3,@changer_Kenya Hello. Sorry for this. Kindly D...,1592088943012585473,2022-11-14T09:35:58.000Z,[1592088943012585473],0,0,0,0
4,@Rakimafp1 Hello. Sorry for that. Kindly hsar ...,1592077243341508609,2022-11-14T08:49:28.000Z,[1592077243341508609],0,0,0,0


In [53]:
# the shape of the data 
saf_data.shape

(100, 8)

## Topics 

We can also get recent tweets about a certain topic.

In [130]:
# Define query
query = '(#moringaschool OR #Moringaschool)'

# retrieve recent tweets
ms_tweets = client.search_recent_tweets(query=query, 
                                    tweet_fields=['created_at', 'public_metrics'],
                                     max_results=100)

In [132]:
# save as a dict
ms_dict = ms_tweets.json()

### Building queries for Search Tweets

We can create various search queries that relate to the topic of discusion we are interested in. This would vary from ```hashtags```, using logic operators ```AND``` and ```OR```, getting media ```has:media```, getting ```has:links``` negation ```-is:retweet``` which will not match on Retweets and so much more.

The queries we build is dependant on the data and type of data that we are interested in. You read more [here](https://developer.twitter.com/en/docs/twitter-api/tweets/search/integrate/build-a-query) from the twitters developers platform.