# ANIKULAPO - An Analysis of Twitter's Perception of the Movie Anikulapo



### 1.Introduction
I scraped over 34,000 tweets from twitter using a social network scraper library in python called snscrape for this analysis project. I also performed a sentiment analysis using TextBlob library in python.

### Contents
1. Introduction
2. Data Gathering
3. Data Assessment and Wrangling
4. Data Preprocessing
5. Sentiment Analysis
6. Data Visualization
7. Conclusion

### 2. Data Gathering

In [None]:
#import libraries
import pandas as pd
import numpy as np
import snscrape.modules.twitter as sntwitter
import matplotlib.pyplot as plt
import seaborn as sns

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.stem.porter import PorterStemmer

import string
import re

import wordcloud
from wordcloud import WordCloud

import textblob
from textblob import textblob

from emot.emo_unicode import UNICODE_EMOJI

Lemmatizer = WordNetLemmatizer() 

import warnings
%matplotlib inline

In [None]:
#scraping the tweet
query = '(anikulapo OR anikulapothemovie OR #anikulapo OR #anikulapothemovie) until:2022-10-30 since:2022-09-30'
tweets = []

for i, tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
    if i > 35000:
        break
    else:
        tweets.append([tweet.date, tweet.user.username, tweet.sourceLabel, tweet.content, tweet.user.location, tweet.likeCount, tweet.retweetCount])
df = pd.DataFrame(tweets, columns = ['Date', 'User', 'Source', 'Tweet', 'Location', 'Like_Count', 'Retweet_Count'])
df.to_csv('project_anikulapo.csv')

In [None]:
#importing the scraped dataset into a dataframe
df = pd.read_csv('project_anikulapo.csv', encoding = 'unicode_escape')

### 3. Data Assessment and Wrangling

In [None]:
#check first five rows
anikulapo.head()

In [None]:
#check the shape 
anikulapo.shape

In [None]:
#checking for null values
anikulapo.isna()

In [None]:
#checking the number of missing value in the whole dataset
anikulapo.isna().sum()

In [None]:
#on ascertaining that only the location column had missing value, i replaced null values with unknown
anikulapo['Location'] = anikulapo['Location]'.fillna(values = 'Unknown')

### 4. Data Preprocessing

In [None]:
#defining a function to extract hashtags and creating a new column for those hashtags
def hashtag(Tweet):
    tweet = re.findall(r'#\w+', Tweet)
    return ' '.join(tweet)
anikulapo['hashtags'] = anikulapo['Tweet'].apply(hashtag)
anikulapo.head(25)

In [None]:
#listing all hashtags
hashtags_list = anikulapo['hashtags'].tolist()

# Iterate over all hashtags and split where there is more than one hashtag
hashtags = []
for item in hashtags_list:
    item = item.split()
    for i in item:
        hashtags.append(i)
        
# Determine Unique count of all hashtags used
counts = Counter(hashtags)
hashtags_anikulapo = pd.DataFrame.from_dict(counts, orient='index').reset_index()
hashtags_anikulapo.columns = ['Hashtags', 'Count']
hashtags_anikulapo.sort_values(by='Count', ascending=False, inplace=True)

In [None]:
#check the top 5 most used hashtags
hashtags_anikulapo.head()

In [None]:
# Defining set containing all stopwords in english
stop_words_eng = list(stopwords.words('english'))
user_stop_words =["i", "me", "my", "myself", "we", "our", "ours", "ourselves", "you", "your", "yours", "yourself", 
                   "yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it", "its", "itself",
                   "they", "them", "their", "theirs", "themselves", "what", "which", "who", "whom", "this", "that", 
                   "these", "those", "am", "is", "are", "was", "were", "be", "been", "being", "have", "has", "had", 
                   "having", "do", "does", "did", "doing", "a", "an", "the", "and", "but", "if", "or", "because", "as", 
                   "until", "while", "of", "at", "by", "for", "with", "about", "against", "between", "into", "through", 
                   "during", "before", "after", "above", "below", "to", "from", "up", "down", "in", "out", "on", "off", 
                   "over", "under", "again", "further", "then", "once", "here", "there", "when", "where", "why", "how",
                   "all", "any", "both", "each", "few", "more", "most", "other", "some", "such", "no", "nor", "not",
                   "only", "own", "same", "so", "than", "too", "very", "s", "t", "can", "will", "just", "don", "should", 
                   "now","anyone","today","yesterday","day", "already"]
stop_words = stop_words_eng + user_stop_words

In [None]:
emoji = list(UNICODE_EMOJI.keys())

In [None]:
#preprocessing tweet for sentiment analysis
def preprocessedTweets(Tweet):    
#converting tweets to lowercase characters
    tweet = Tweet.lower()
#cleaning and removing links and URLs
    tweet = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t]) |(\w+:\/\/\S+)", " ", tweet).split())   
#still cleaning, removing mentions and repeating characters
    tweet = re.sub(r'\@\w+|\#\w+|\d+', '', tweet)
#cleaning special characters
    tweet = re.sub(r'[^\x00-\x7F]+', '', tweet)
#removing punctuations and numbers
    punct = str.maketrans('', '', string.punctuation+string.digits)
    tweet = tweet.translate(punct)
#cleaning, tokenizing, stopword removal
    tokens = word_tokenize(tweet)
    filtered_words = [w for w in tokens if w not in stop_words]
    filtered_words = [w for w in filtered_words if w not in emoji]
#lemmatizing words
    lemmatizer = WordNetLemmatizer()
    lemma_words = [lemmatizer.lemmatize(w) for w in filtered_words]
    tweet = ' '.join(lemma_words)
    return tweet
df['Processed_Tweets'] = df['Tweet'].apply(preprocessedTweets)
df

### 5. Sentiment Analysis

In [None]:
# def polarity(tweet):
    return TextBlob(tweet).sentiment.polarity

#define function to get polarity
def sentimenttextblob(polarity):
    if polarity < 0:
        return "Negative"
    elif polarity == 0:
        return "Neutral"
    else:
        return "Positive"

In [None]:
df['Polarity'] = df['Processed_Tweet9'].apply(polarity)
df['Sentiment'] = df['Polarity'].apply(sentimenttextblob)
df['Sentiment'].value_counts()
df.tail(10)

### 6. Data Visualization

In [None]:
tweets_long_string = df['Processed_Tweets'].tolist()
tweets_long_string = " ".join(tweets_long_string)

tweet_wc = WordCloud(collocations = False, max_words = 200, background_color = 'White').generate(tweets_long_string)

plt.imshow(tweet_wc, interpolation = 'bilinear')
plt.axis("off")
plt.show()

In [None]:
#save final file
df.to_csv("anikulapo_final_file.csv", index = False)

In [None]:
tweet_wc.to_file("wordcloud.png")

### 7. Conclusion
Exported this file to powerBI to create a better visualization for my analysis.