## Research Questions Focus Areas

- Note: Please endeavour to explicitly comment your codes and properly document whichever functions created so as to help other collaborators learn from your codes quicker. Remember that the project is also a learning process.

---

### Dataset Import

In [3]:
import pandas as pd
import re

In [42]:
pd.set_option("display.max_colwidth", None)
tweets = pd.read_csv('citizensvoice_dataset.csv', index_col=0)

In [24]:
tweets.head(10)

Unnamed: 0,time_created,tweet,loca_tion
0,2022-10-25T23:44:56+00:00,"Tinubu Is An Emperor; Buhari, Osinbajo, Governors Begged Him To Forgive Ambode But He Refused —Dele Momodu | Sahara Reporters https://t.co/mO1zWgXQxn",Nigeria
1,2022-10-25T23:37:40+00:00,Dear @PeterObi please stop putting our future at risk. \r\nYou are the only reason I still believe in Nigeria. \r\nMy vote is for you https://t.co/nKhLhzrV8H,"Lagos, Nigeria"
2,2022-10-25T23:31:19+00:00,"Wike pointed to how the PDP presidential candidate, Alhaji Atiku Abubakar picked people from Rivers State as members of the presidential campaign council without any input from him.\r\n\r\nhttps://t.co/H2cicBJlu1",Nigeria
3,2022-10-25T23:03:57+00:00,@fkeyamo @apc_lagos https://t.co/KrKdTG8prX,"Ogun, Nigeria"
4,2022-10-27T23:59:39+00:00,"PDP is in total chaos in Ogun, dead in Lagos, Oyo PDP refusing to work for Atiku, the leaders in Ekiti and Ondo are refusing to mount a challenge, only in Osun does the party have a deem hope. https://t.co/w3r3dmSa0i","Ogun, Nigeria"
5,2022-10-20T23:59:41+00:00,One of their lekki lies . They lie like their presidential candidate @PeterObi https://t.co/mMxchiJIPc,"Port Harcourt, Nigeria"
6,2022-10-25T22:40:58+00:00,@Taikaz_adufe @valeron31 @Cold_n_dark @SugarCharles1 @MuchTalksBlog1 @PeterObi Can u say amen to what u wish Nigerians?,"Lagos, Nigeria"
7,2022-10-25T22:40:03+00:00,@Taikaz_adufe @valeron31 @Cold_n_dark @SugarCharles1 @MuchTalksBlog1 @PeterObi Amen bro. Saying that with my full chest,"Lagos, Nigeria"
8,2022-10-25T22:31:22+00:00,I just saw video of tinubu threatening to reduce our purchasing power😂😂😂😂,"Lagos, Nigeria"
9,2022-10-20T23:59:34+00:00,IPOB: South East senators beg Buhari to release Kanu - https://t.co/Oehx7JOFmS https://t.co/EJlS2ITvGs,Nigeria


### Dataset Wrangling

+ Using one of our Research Questions to guide the data wrangling. If we consider a simple question of "What is being said about Peter Obi?";

#### 1. Filtering

+ Filtering: Filter for tweets directed at Peter Obi, based on the following rules:
        - Peter Obi's handle appears first in tweet.
        - Peter Obi's name (not handle) appears any where in tweet.
        - Peter Obi's handle appears in tweet but not after another handle.
+ These rule help us focus the results on tweet directed to or about Peter Obi, instead of including tweets that could simply be replies to other twitter users under Peter Obi's tweet or replies to other twitter users who posted a tweet with Peter Obi's handle in it.
+ These rules were derived from domain knowledge of the platform.

In [44]:
tweets.tweet = tweets.tweet.str.lower()

In [45]:
def filter_tweet(tweet, handle, mentions):
    """
    Function that filters tweet Filter for tweets directed at handle, based on the following rules:
    - The handle appears first in tweet.
    - The handle appears in tweet but not after another handle.
    - The person is mentioned any where in tweet based on the list of metions.

    Parameters:
        tweet (string): The tweet
        handle (string): The username of the subject to be filtered for should start with '@'
        mentions (list): A list of other ways the subject could be mentioned in the text
    """

    # Split text into tokens
    tokens = tweet.split()

    # Check for tokens that have the handle
    indices = [i for i, token in enumerate(tokens) if token == handle]

    for index in indices:

        # Checks if the handle appear first
        if index==0:
            return True

        # Checks if the another handle appears before it
        if not tokens[index-1].startswith("@"):
            return True

    # Checks if the person is mentioned anywhere in the tweet
    for mention in mentions:
        if mention in tweet:
            return True
    
    return False

In [46]:
po_tweets = tweets[tweets.tweet.apply(filter_tweet, handle="@peterobi", mentions=["peter obi", " peterobi", " po "])].copy()

In [47]:
len(po_tweets)

9134

#### 2. Cleaning

+ Cleaning: after the results have been filtered to those we are certain is about Peter Obi, we clean any element of the dataset that might affect our NLP algorithm.
        - Remove "@" sign from Peter Obi's handle.
        - If the handle is not Peter Obi's remove the entire handle.
        - Remove "/n", links and emojis.
        - Replace &amp; with and.
+ In future versions of this project, we might try to analyse some of these element, like the emojis as they could be essential for our sentimental analysis, but for now we keep it simple and focus on the execution.

In [48]:
# Unicode for emojis
emojis = re.compile("["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002500-\U00002BEF"  # chinese char
        u"\U00002702-\U000027B0"
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001f926-\U0001f937"
        u"\U00010000-\U0010ffff"
        u"\u2640-\u2642" 
        u"\u2600-\u2B55"
        u"\u200d"
        u"\u23cf"
        u"\u23e9"
        u"\u231a"
        u"\ufe0f"  # dingbats
        u"\u3030"
                      "]+", re.UNICODE)

- need to tell michael about other handles without another before it in text
- if peterobis handle appears after another
- first persons handle not peterobi
- point of removing handles?

In [49]:
def clean_tweet(tweet, handle):
    """
    Function to clean tweet by:
    - Removing the '@' symbol of the handle if it appears first 
    - Removing the @ symbol from handles that do not appear after another handle, 
    - Removing all other handles completely
    - Changing '&' sign to and
    - Removing newlines, links, emojis, hashtags and puntuations

    Parameters:
        tweet (string): The tweet
        handle (string): The username of the subject should start with '@'
        mentions (list): A list of other ways the subject could be mentioned in the text
    """

    # Splits tweet into tokens
    tokens = tweet.split()
    # Checks for all handles
    indices = [i for i, token in enumerate(tokens) if token.startswith("@")] 

    # Checks if it is subjects handle and if it appears first or
    # if the handle does not have another handle directly before it
    for index in indices:
        if (index==0 and tokens[index]==handle) or \
           (index!=0 and not tokens[index-1].startswith("@")):
           tweet = tweet.replace(tokens[index], tokens[index].replace("@", ""))
           
    tweet = re.sub("@[^\s]+","",tweet) # remove handles
    tweet = re.sub("\n", "", tweet) # remove newlines
    tweet = re.sub("\r", "", tweet) # remove carriage returns
    tweet = re.sub(r"(?:\@|http?\://|https?\://|www)\S+", "", tweet) # remove links
    tweet = re.sub(emojis, "", tweet) # remove emojis
    tweet = re.sub(r"#(\w+)", "", tweet) # remove hashtags
    tweet = re.sub("&", "and", tweet) # changes & sign to and
    tweet = re.sub(r"[^\w\s]","",tweet) # removes punctuation
    tweet = tweet.strip()

    return tweet

In [50]:
po_tweets["clean_tweets"] = po_tweets.tweet.apply(clean_tweet, handle="@peterobi")

In [52]:
# index = 23987
index = 15871

po_tweets.tweet[index]

"at this rate @peterobi will wear out @atiku and @officialabat. both can't keep up with his forefront moves. such a vanguard. https://t.co/bkkpfdejuu"

In [53]:
po_tweets.clean_tweets[index]

'at this rate peterobi will wear out atiku and officialabat both cant keep up with his forefront moves such a vanguard'

In [54]:
index = 31452
po_tweets.tweet[index]

'@renoomokri @peterobi @mobilepunch @pmnewsnigeria @nairaland @vanguardngrnews @ap @afp @daily_trust @channelstv @arisetv it is evidence that you are jobless, campaigning for atiku by talking down peter obi will give you a job.'

In [55]:
po_tweets.clean_tweets[index]

'it is evidence that you are jobless campaigning for atiku by talking down peter obi will give you a job'

In [58]:
tweets.tweet[17968]

'@peterobi 🌹🌹🌹🌹'

In [56]:
po_tweets[["clean_tweets"]].sample(20)

Unnamed: 0,clean_tweets
17968,peterobi
4206,peterobi ogaa abeg dey use life jacket ooo
36721,yes o since peterobi will block every avenue let them loot as much as they can by whatever means they can unless they are doing so because we already have other place where our naira is printed apart from cbn or they want to force those who stole and hide money to used it
17973,peterobi with you peter we shall climb the tallest mountain nigeria is gonna be great again
12421,expensive advise 4 northenerspeterobi is a tribal n religious bigot his kismen subjected se to forceful bondage n sit at home ipob attacked hausa fulani businesses in se killing a lot obi presidency will embolden terrorists secessionists to carry out biafran agenda
3767,dont digress your candidate went to visit flood victims with a pdp cap ondid u say anything about iti thought peter obi was wrong to have made the appealyou should hide your face in shame
25196,peterobi we will never forget
17842,peterobi theres no stopping us now we on a cruise
25704,peter obi has cleared his name so many times concerning that event what did tinubu said about end sars your father there
29768,i have been labelled more than that maam its crystal clear that nigeria need a man of the people not for some elites peter obi stands tall to it


#### 3. Translate

+ Convert all tweets to lower case.
+ Translate: Here we convert all non-english tweet to English for a smooth and uniform analysis
        - If text is not in English convert to English (using google translate or any other suitable library or api).
+ This is much time saving as systems have already been develped for such translation, instead of us having to develop our NLP kit or algorithm for each language used in the Nigerian twitter space. We can simply translate with the already existing systems and then analyse with the already trained systems.
+ In future versions of this project, we could look into developing our own custom NLP algorithm and kit tailored to our own native languages.

### General Trends 

- This sections covers the generic trends existing amongst citizens' discussion groups.
        - What is most talked about (regardless of area or topic)

### Citizens' Sentiment

- This section covers citizens' reactions and general sentiment towards certain topic (e.g areas of developments, policies, politically significant events, public office holders' performance and so on).
        - What is the general sentiment of the citizens?
		- What is most discussed(election and governance related)?
		- What is the sentiment towards what is being discussed? 

### Complaint Areas

- This section covers the extraction of various areas of complaints and dissatisfaction amongst citizens (in different aspects of government).
        - What are the various areas of complaints as regards to governance?
		- What are the levels of sentiment towards the various area of complaints?

### Politician's Reputation

- This section covers what citizens's think about certain public office holders, their sentiment towards these individuals and their general popularity or notoriety.
        - Who is most talked about?
		- Popularity or notoreity of the most talked about.
		- Most popular, and most notorious candidates/politician.
		- How much is a certain candidate being talked about?
		- What is being said about each candidate?
		- What is the general sentiment of what is being said?