# Creating features to measure a tweet's influence

References:
- https://medium.com/@tijesunimiolashore/mining-data-on-twitter-3c7969207e75
- https://medium.com/@brianodhiambo530/twitter-mining-and-analysis-on-influential-twitter-users-and-africa-government-officials-a61fced65e38
- https://towardsdatascience.com/twitter-data-mining-measuring-users-influence-ef76c9badfc0

Results:
- Popularity score = Retweets + Likes
- Reach score = Followers - Following (N/A)
- Relevance_score = Comments + Mentions (N/A)

In [1]:
from os.path import dirname, join, realpath

import pandas as pd

In [2]:
def is_interactive():
    """Check if the script is being run interactively."""
    import __main__ as main

    return not hasattr(main, "__file__")


if is_interactive():
    SCRIPT_DIR = dirname(realpath("__file__"))
else:
    SCRIPT_DIR = dirname(realpath(__file__))

DATA_DIR = join(dirname(SCRIPT_DIR), "data", "processed", "twitter")

In [3]:
tweet_data = pd.read_csv(join(DATA_DIR, "tweets_2022_03_05-2022_03_11.csv"))

# Drop 2 unnamed columns.
tweet_data.drop(columns=["Unnamed: 0", "Unnamed: 0.1"], inplace=True)
tweet_data.head()

Unnamed: 0,text,retweet_count,favorite_count,followers_count,verified,listed_count,created_at,hashtags,name
0,"""It is an open ledger, trying to sneak lots of...",151,520,2437101,True,10788,2022-03-05 09:33:07+00:00,"[{'text': 'crypto', 'indices': [61, 68]}]",Bitcoin News
1,“The #crypto market today has a market capital...,87,245,2437101,True,10788,2022-03-05 12:03:14+00:00,"[{'text': 'crypto', 'indices': [5, 12]}]",Bitcoin News
2,G7 countries and the EU are looking at ways to...,95,245,2437101,True,10788,2022-03-05 13:33:29+00:00,"[{'text': 'cryptocurrency', 'indices': [88, 10...",Bitcoin News
3,JUST IN: 🇸🇬 Singapore has introduced sanctions...,149,1043,708057,True,1449,2022-03-05 11:01:13+00:00,"[{'text': 'cryptocurrency', 'indices': [77, 92]}]",Watcher.Guru
4,Make sure you check in on your bros in the str...,45,149,7822,True,80,2022-03-05 14:45:25+00:00,[],Ian Heinisch


In [4]:
tweet_data["popularity"] = (
    tweet_data["retweet_count"] + tweet_data["favorite_count"]
)

# Remove combined features.
tweet_data = tweet_data.drop(["retweet_count", "favorite_count"], axis=1)
tweet_data.to_csv(
    join(DATA_DIR, "tweets_2022_03_05-2022_03_11_popularity.csv"), index=False
)
tweet_data.head()

Unnamed: 0,text,followers_count,verified,listed_count,created_at,hashtags,name,popularity
0,"""It is an open ledger, trying to sneak lots of...",2437101,True,10788,2022-03-05 09:33:07+00:00,"[{'text': 'crypto', 'indices': [61, 68]}]",Bitcoin News,671
1,“The #crypto market today has a market capital...,2437101,True,10788,2022-03-05 12:03:14+00:00,"[{'text': 'crypto', 'indices': [5, 12]}]",Bitcoin News,332
2,G7 countries and the EU are looking at ways to...,2437101,True,10788,2022-03-05 13:33:29+00:00,"[{'text': 'cryptocurrency', 'indices': [88, 10...",Bitcoin News,340
3,JUST IN: 🇸🇬 Singapore has introduced sanctions...,708057,True,1449,2022-03-05 11:01:13+00:00,"[{'text': 'cryptocurrency', 'indices': [77, 92]}]",Watcher.Guru,1192
4,Make sure you check in on your bros in the str...,7822,True,80,2022-03-05 14:45:25+00:00,[],Ian Heinisch,194
