# Individual Exam CtH | Analysing "Brand Twitter"

### Author: *Name* Leonards Leimanis

Be sure to check the assignment description on Canvas once more.

Handing in the notebook should be done in a `zip` file, together with any files you create. We request you to name the exam submission as "exam_name_studentname.zip" with your corresponding name. The data is presented in the `data` folder. Also, include the files containing the data when handing in your exam, as this helps us to check your code when re-running it. To check if we can run your notebook from top to bottom without receiving errors, try to clear all the output of the cells and rerun everything. This can be done automatically by clicking on `Runtime --> Restart and run all`.

Good luck!

# Introduction

For this assignment we will investigate the phenomenon of "*Brand Twitter*". While corporate communication is traditionally rather bland, in recent years a trend has emerged of brands communicating on social media in a far less formal way. Social media teams of certain brands have started use memes, edgy humour and an informal and personal style of writing, sometimes also engaging with other brand accounts in a way that is similar to how many Twitter users tweet and interact with each other. This phenomenon of corporate personhood has been called *Brand Twitter*.

In 2019, Vulture published a history of the phenomenon, just in case you find it interesting: https://www.vulture.com/2019/06/brand-twitter-jokes-history.html

For this assignment we will consider a group of brand accounts that are often considered part of *Brand Twitter*:

    * @Wendys - Wendy's
    * @PrimeVideo - Amazon Prime Video
    * @MerriamWebster - Merriam Webster
    * @BurgerKing - Burger King
    * @Netflix - Netflix US
    * @McDonalds - McDonald's
    * @DennysDiner - Denny's Diner

We thought it would be interesting to analyse the tweets of these brand accounts to learn more about this novel style of corporate communication and the ways corporations might be perceived as relatable people on social media.

# Data

The data can be found within the file `brand_tweets.csv`. This file contains twitter data on tweets (every row is one tweet) by these brands.

FYI: This data was acquired from Twitter using their API if you're interested, following the method from the optional Notebook 7.

*Please note: this dataset might contain content which could be considered as offensive. It is real unfiltered data directly from Twitter.*

# Tasks

We would like to look through some recent tweets of *Brand Twitter*, and be able to understand certain characteristics of their tweets. As these Twitter accounts represent major brands, one particularly interesting aspect of this dataset is the difference between regular tweets and tweets that are replies to other tweets, which could be replies to customers or other brands.

Make your code and results as pretty as possible, and feel free to use tabs and enumeration when printing text and formatting for the visualisations. 

This assignment is about getting familiar with Pandas' methods — we suggest going through the lecture and seminar notebooks on Pandas again. You can of course use any course material in this assignment!

You are not limited to the structure of the cells below with ` # Your code here` only. Organise your code the way you think is most readible and appropriate.

If you do not fully manage to solve a question in the requested way, feel free to solve it in a different way to be able to proceed with later questions - you'll probably still get some points.

### Question 1: Pre-processing
* Add a column with a normalised version of the 'text' column. Use an appropriate tokenizer for this type of data in your normalization function. As the 'text' column contains strings, things will be easier if your normalized text column will also contain strings.

Continue to work with this normalized column in the next tasks. You're of course free to add more columns if you think you need them. 

In [37]:
import csv
import pandas as pd
from nltk.tokenize import TweetTokenizer

csv_file = "C:\\Users\\leona\\Documents\\Coding the humanities\\notebooks\\CtH-final\\data\\brand_tweets.csv"

df_tweets = pd.read_csv(csv_file, encoding="utf-8")

tokenizer = TweetTokenizer(preserve_case=True, reduce_len=False, strip_handles=False)

def tweet_normalizer(row_to_normalize):
    tokenizer.tokenize(row_to_normalize)
    


df_tweets.insert(loc=10, column='normalized_tweets', value= tokenizer.tokenize['text'])
    
print(df_tweets)

TypeError: 'method' object is not subscriptable

### Question 2: Description / statistics

Provide information on: 

1. Number of tweets (per brand and in total)
2. How many of those tweets are replies (per brand and in total)
3. Most liked tweet (per brand and in total)
4. Most frequent hashtags (per brand and in total)

You can present the answers in this notebook. If you prefer, you may also write your results to a separate text file (optional).

In [None]:
#Your code here

### Question 3: Analysis - Corporate personhood

To observe to what extent the brands encourage corporate personhood, it would be interesting to see what pronouns the brands use to refer to themselves: "we" or "I". Let us define a "First Person Pronoun Ratio" - the total number of times that the word "I" is used by a brand / the total number of times the words "I" or "we" are used by a brand. This should give us a value between 0 and 1, and a higher value indicates that the brand used the word "I" relatively more often compared to the word "we". If we multiply this value by 100, it becomes a percentage.

1. For all of the 7 brands, compute their First Person Pronoun Ratio (or percentage) as defined here.
2. Choose the brand with the highest First Person Pronoun Ratio. For this brand, compute the First Person Pronoun Ratio separately for tweets that are replies and tweets that are not replies.

Briefly interpret the result (as a text block).

In [None]:
#Your code here

### Question 4: Analysis - Brand interaction

Among the tweets that are replies, do the *Brand Twitter* brands reply to each other? (in a tweet this is done by writing @username at the beginning of the tweet)

1. Print three tweets from the dataframe in which a brand mentions one of the other brands.
2. For all of the 7 brands, find out how often they mention each of the other brands in replies. Present the result as a DataFrame.
 
Briefly interpret the result (as a text block).

In [None]:
#Your code here

### Question 5: Visualization

1. Plot the number of tweets in the whole dataset per week. 
    * Interpret the graph. Can you explain the overall pattern and/or some of the fluctuations that are visible? Feel free to also make reference to the numbers you computed for Question 2 in your explanation.
    * (If needed, restrict the dataframe to an active twitter timeframe)

2. Choose one of the seven brands and plot its (Twitter) popularity over time (choose the time unit and range of your choice) by:
    * Number of retweets
    * Number of likes
  
  You can either try to plot these two metrics (retweets/likes) in the same figure, or create multiple figures.
  


In [None]:
#Your code here