# Queen Charlotte: a bridgerton story

## Opinion Minning

Want to know how the audience accross the continent views the Queen Charlotte play? if that is your interest, this project is for you

------------------------------

I have extracted the tweets from twitter using the tweepy library. Now we just need to perform data cleaning and preprocessing to get our data fit for analysis.

In [1]:
import pandas as pd
import re
from textblob import TextBlob
from wordcloud import WordCloud

In [2]:
# loading the data into a dataframe
bridgerton = pd.read_csv("queen_charlotte.csv")

In [3]:
bridgerton.head()

Unnamed: 0,Text,User,Description,Timestamp,Location,Likes,Hashtags
0,"I feel like i am like George, Kate and Penelop...",clarissejh,🌍🌏,2023-05-12 11:25:51+00:00,,0,QueenCharlotte
1,Ah!!!’ Thank you mama!!! Love na love na love ...,itsrayvriel,$100 for profile inspection,2023-05-12 11:25:39+00:00,,0,QueenCharlotte
2,Bets acting/chemistry pair on the show so far....,guiss89,She/Her (fan account),2023-05-12 11:24:48+00:00,,0,QueenCharlotte
3,🎙🎙🎙Join me tonight on my LIVE at #instagram -...,duewafrazier1,"Writer, professor, @TEDx speaker & digital cre...",2023-05-12 11:24:37+00:00,United States,0,"instagram, Tupac, docuseries, podcasts, DuEwa,..."
4,don't know if today I should binge #QueenCharl...,mary__lou_,i get addicted easily\n•\n\n\n\n\n\n\n\n\n\n\n...,2023-05-12 11:23:47+00:00,,0,"QueenCharlotte, TheGreat"


If you notice, there are hashtags inside the texts which can be inefficient for our analysis, so we need to get rid of that

In [4]:
bridgerton['Text'] = bridgerton['Text'].str.replace(r'http\S+|www.\S+|@\S+|[^a-zA-Z0-9\s]', '', regex=True)

In [5]:
bridgerton.head()

Unnamed: 0,Text,User,Description,Timestamp,Location,Likes,Hashtags
0,I feel like i am like George Kate and Penelope...,clarissejh,🌍🌏,2023-05-12 11:25:51+00:00,,0,QueenCharlotte
1,Ah Thank you mama Love na love na love na love...,itsrayvriel,$100 for profile inspection,2023-05-12 11:25:39+00:00,,0,QueenCharlotte
2,Bets actingchemistry pair on the show so far O...,guiss89,She/Her (fan account),2023-05-12 11:24:48+00:00,,0,QueenCharlotte
3,Join me tonight on my LIVE at instagram two ...,duewafrazier1,"Writer, professor, @TEDx speaker & digital cre...",2023-05-12 11:24:37+00:00,United States,0,"instagram, Tupac, docuseries, podcasts, DuEwa,..."
4,dont know if today I should binge QueenCharlot...,mary__lou_,i get addicted easily\n•\n\n\n\n\n\n\n\n\n\n\n...,2023-05-12 11:23:47+00:00,,0,"QueenCharlotte, TheGreat"


In [6]:
print("Data types of columns present: ")
print(bridgerton.dtypes)

Data types of columns present: 
Text           object
User           object
Description    object
Timestamp      object
Location       object
Likes           int64
Hashtags       object
dtype: object


In [7]:
bridgerton.shape

(47201, 7)

From this dataframe, we would be removing rows that has a null value in the Location column

In [8]:
bridgerton = bridgerton.dropna(subset=['Location'])

In [9]:
bridgerton.shape

(33129, 7)

In [10]:
bridgerton.head()

Unnamed: 0,Text,User,Description,Timestamp,Location,Likes,Hashtags
3,Join me tonight on my LIVE at instagram two ...,duewafrazier1,"Writer, professor, @TEDx speaker & digital cre...",2023-05-12 11:24:37+00:00,United States,0,"instagram, Tupac, docuseries, podcasts, DuEwa,..."
6,That scene where Charlotte says Venus is beaut...,_wildflowersx,phool.,2023-05-12 11:21:49+00:00,"Karnataka, India",1,QueenCharlotte
7,Corey Mylcheerest at the Vogue and Netflix pre...,archivequeenc,Fan account for Queen Charlotte: A Bridgerton ...,2023-05-12 11:21:45+00:00,"London, England",11,queencharlotte
10,QueenCharlotte Netflix \nking George what a sw...,SumaiyaNadeem2,MAngaaa😇😍,2023-05-12 11:18:23+00:00,Manchester,0,"QueenCharlotte, Netflix"
11,Some scenes in QueenCharlotte made me cry,Disparue_,Je vais étendre mon alie et je vais apprendre ...,2023-05-12 11:16:50+00:00,In my head,0,QueenCharlotte


Let's convert both the dataframe **Timestamp** Column to a datetime column without a UTC timezone

In [11]:
# first let's convert the column to a datetime column

bridgerton['Timestamp'] = pd.to_datetime(bridgerton['Timestamp'])

In [12]:
bridgerton['Timestamp'] = bridgerton['Timestamp'].dt.tz_convert(None).dt.tz_localize(None)

In [13]:
bridgerton.dtypes

Text                   object
User                   object
Description            object
Timestamp      datetime64[ns]
Location               object
Likes                   int64
Hashtags               object
dtype: object

Let's check if any other column contains a null value

In [14]:
bridgerton.isnull().sum()

Text              0
User              0
Description    1292
Timestamp         0
Location          0
Likes             0
Hashtags         64
dtype: int64

Let's drop the **Description** and **Hashtags** columns, as they wouldnt be needed for this analysis

In [15]:
bridgerton.drop(columns = ['Description', 'Hashtags'], axis=1, inplace=True)

Let's also drop the **User** coulmn

In [16]:
bridgerton.drop(columns = ['User'], axis=1, inplace=True)

In [17]:
print(bridgerton.shape)

(33129, 4)


In [18]:
bridgerton.isnull().sum()

Text         0
Timestamp    0
Location     0
Likes        0
dtype: int64

Lets rename the column **Timestamp** to **Datetime**

In [19]:
bridgerton = bridgerton.rename(columns={'Timestamp': 'Datetime'})

Now, from the Datetime column, we need to extract the **Day of the Month** and **Days of the week**

In [20]:
bridgerton['Day_of_Month'] = bridgerton['Datetime'].dt.day

In [21]:
bridgerton['Day_of_Week'] = bridgerton['Datetime'].dt.day_name()

In [22]:
bridgerton.head()

Unnamed: 0,Text,Datetime,Location,Likes,Day_of_Month,Day_of_Week
3,Join me tonight on my LIVE at instagram two ...,2023-05-12 11:24:37,United States,0,12,Friday
6,That scene where Charlotte says Venus is beaut...,2023-05-12 11:21:49,"Karnataka, India",1,12,Friday
7,Corey Mylcheerest at the Vogue and Netflix pre...,2023-05-12 11:21:45,"London, England",11,12,Friday
10,QueenCharlotte Netflix \nking George what a sw...,2023-05-12 11:18:23,Manchester,0,12,Friday
11,Some scenes in QueenCharlotte made me cry,2023-05-12 11:16:50,In my head,0,12,Friday


Now our data is cleaned

-------------------------

Since its sentiment analysis we are performing, we need to get the negative and postive statement for each text, and to do so we need to get the polarity and subjectivity of the texts. 

For further understanding, 

**Subjectivity** quantifies the amount of personal opinion and factual information contained in the text. 

**Subjectivity** is the output that lies within [0,1] and refers to personal opinions and judgments.

The higher **Subjectivity** means that the text contains personal opinion rather than factual information. And also,

**Polarity** is the output that lies between [-1,1], where -1 refers to negative sentiment, +1 refers to positive sentiment, and 0 refers to a Nuetral sentiment


In [23]:
def subjectivity(txt):
  return TextBlob(txt).sentiment.subjectivity

In [24]:
def polarity(txt):
  return TextBlob(txt).sentiment.polarity

In [25]:
bridgerton['Subjectivity'] = bridgerton['Text'].apply(subjectivity)
bridgerton['Polarity'] = bridgerton['Text'].apply(polarity)

In [26]:
bridgerton.head()

Unnamed: 0,Text,Datetime,Location,Likes,Day_of_Month,Day_of_Week,Subjectivity,Polarity
3,Join me tonight on my LIVE at instagram two ...,2023-05-12 11:24:37,United States,0,12,Friday,0.5,0.318182
6,That scene where Charlotte says Venus is beaut...,2023-05-12 11:21:49,"Karnataka, India",1,12,Friday,1.0,0.85
7,Corey Mylcheerest at the Vogue and Netflix pre...,2023-05-12 11:21:45,"London, England",11,12,Friday,0.0,0.0
10,QueenCharlotte Netflix \nking George what a sw...,2023-05-12 11:18:23,Manchester,0,12,Friday,0.0,0.0
11,Some scenes in QueenCharlotte made me cry,2023-05-12 11:16:50,In my head,0,12,Friday,0.0,0.0


Now that we have gotten the subjectivity scores and the polarity scores, we will define a function that will add another column called **Sentiment** which will carry the (negative, posistive or neutral) sentiment types for each polarity scores. And also, 

An added column called **Opinion** that carries (factual information or personal opinion) for each subjectivity scores 

In [27]:
import numpy as np

def sentiment(pol):
    conditions = [
        pol == 0.0,
        pol > 0,
        pol < 0
    ]
    choices = ["neutral", "positive", "negative"]
    return np.select(conditions, choices, default="unknown")

bridgerton['Sentiment'] = sentiment(bridgerton.Polarity)

In [28]:
def opinion(sub):
    conditions = [
        sub == 0.5,
        sub > 0.5,
        sub < 0.5
    ]
    choices = ["neutral opinion", "personal opinion", "factual information"]
    return np.select(conditions, choices, default="unknown")

bridgerton['Opinions'] = opinion(bridgerton.Subjectivity)

In [29]:
bridgerton.head()

Unnamed: 0,Text,Datetime,Location,Likes,Day_of_Month,Day_of_Week,Subjectivity,Polarity,Sentiment,Opinions
3,Join me tonight on my LIVE at instagram two ...,2023-05-12 11:24:37,United States,0,12,Friday,0.5,0.318182,positive,neutral opinion
6,That scene where Charlotte says Venus is beaut...,2023-05-12 11:21:49,"Karnataka, India",1,12,Friday,1.0,0.85,positive,personal opinion
7,Corey Mylcheerest at the Vogue and Netflix pre...,2023-05-12 11:21:45,"London, England",11,12,Friday,0.0,0.0,neutral,factual information
10,QueenCharlotte Netflix \nking George what a sw...,2023-05-12 11:18:23,Manchester,0,12,Friday,0.0,0.0,neutral,factual information
11,Some scenes in QueenCharlotte made me cry,2023-05-12 11:16:50,In my head,0,12,Friday,0.0,0.0,neutral,factual information


Finally, we are done preprocessing and cleaning the data for analysis.

Lets extarct the dataframe as both an excel an csv file fo further analysis

In [30]:
bridgerton.to_csv('bridgerton.csv', index=False)

In [31]:
bridgerton.to_excel('bridgerton.xlsx', index=False)