### Project 4 - Elon Buys Twitter ###

DATA 620 Team "Lucky 7": Bonnie Cooper, George Cruz Deschamps, Rob Hodde

*Part 3: Topical Analysis By Impact Tier*

*A comparison of the top eight tweet topics related to the Elon Musk buy of Twitter, stratified by Author Impact.*

<br>

In [105]:
#import necessary packages

import pandas as pd

import cleantext  
from emoji import demojize
import re
import nltk
from nltk.tokenize import word_tokenize
#nltk.download('wordnet')
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer

from bertopic import BERTopic

import os
os.chdir('C:\\Data\\')

import pyodbc
sServer = 'localhost'
sDB = 'CUNY'
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
                      "Server=" + sServer + ";"
                      "Database=" + sDB + ";"
                      "Trusted_Connection=yes;") 


In [106]:
# The next three functions are for cleaning up tweets:

# Changes text to lower case
# Removes:
#    numbers and punctuation 
#    extra spaces
#    stop words
# Translates emoji's into phrases 
def clean_text(x):
    x = demojize(x, language='alias') 
    x = re.sub(r"[:]+\ *", " ", x) #removes emoji colons and separates them with a space
    return cleantext.clean(x, extra_spaces=True, lowercase=True, numbers=True, punct=True, stopwords=True,
                     reg=r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", reg_replace=' ')

In [107]:
#Function to Lemmatize text (convert various forms to root words) 
def lemmatize_word(text):
    lemmatizer = WordNetLemmatizer()
    lemma = [lemmatizer.lemmatize(word) for word in text]
    return lemma

In [109]:
#Rationalize the text: clean, tokenize and lemmatize 
def rationalize_text(txt):
    return txt.apply(lambda x: clean_text(x)).apply(word_tokenize).apply(lambda x: lemmatize_word(x)).apply(lambda x: ''.join(i+' ' for i in x))


In [113]:
# make BERTopic model and supporting text files (tweets and topic info)
def make_model(filename, where):
    sSQL = """SELECT UserContent 
          FROM elonmusktwitter_tweets a 
          INNER JOIN tbl_Musk b ON a.UserName = b.UserName 
          WHERE a.UserLanguage = 'en' AND b.LikeCount """ + where
    df = pd.read_sql_query(sSQL, cnxn)
    df.columns = ['UserContent']
    df.to_csv(filename, encoding='utf-8')
    df['UserContent'] = rationalize_text(df['UserContent'])
    docs = df['UserContent'].tolist()
    model = BERTopic(verbose=True)
    topics, probabilities = model.fit_transform(docs)
    df = model.get_topic_info()
    df.to_csv('topic_info_'+filename, encoding='utf-8')
    return model 
    

<br>

**Top Tier**

The top tier consists of the Top 100 authors by number of "Likes" within the dataset. The top ten in descending order of tweet "Like Count" in our dataset are:

1. catturd2
2. nypost
3. RubinReport
4. benshapiro
5. thebradfordfile
6. bennyjohnson
7. FoxNews
8. disclosetv
9. laurenboebert
10. RBReich

All of the above except RBReich are right-leaning influencers or news outlets. 

In [121]:
filename = "SourcesTop.csv"
modelTop = make_model(filename, " > 70000")

Batches:   0%|          | 0/84 [00:00<?, ?it/s]

2022-05-24 19:18:11,422 - BERTopic - Transformed documents to Embeddings
2022-05-24 19:18:18,466 - BERTopic - Reduced dimensionality
2022-05-24 19:18:18,588 - BERTopic - Clustered reduced embeddings


In [122]:
modelTop.visualize_barchart()

The Top Tier tweets focused on:
    
1. Motives for Musk takeover of Twitter
2. Cost for Musk takeover of Twitter
3. Board actions against hostile Musk takeover
4. Free speech issue
5. Twitter employees reaction to Musk takeover
6. Tesla fans discussing Musk takeover
7. Rumor that Musk would join Twitter board
8. Musk securing financing to buy Twitter

<br>

**Mid-Tier**

The second group consists of authors who garnered between 1,000 and 70,000 Likes (Ranked 101 - 2779). 

The top ten in descending order of tweet "Like Count" in our dataset are:

1. ElectionWiz
2. scrowder
3. f_philippot
4. nytimes
5. pusholder
6. CleoEverest
7. burackbobby_
8. Mediavenir
9. JoshuaPotash
10. derekahunter

In this group we are starting to see voices from the left-leaning political spectrum appear.

In [123]:
filename = "SourcesMid.csv"
modelMid = make_model(filename, " BETWEEN 1000 AND 70000 ")

Batches:   0%|          | 0/511 [00:00<?, ?it/s]

2022-05-24 19:22:20,416 - BERTopic - Transformed documents to Embeddings
2022-05-24 19:22:26,574 - BERTopic - Reduced dimensionality
2022-05-24 19:22:27,365 - BERTopic - Clustered reduced embeddings


In [124]:
modelMid.visualize_barchart()

The middle Tier tweets focused on:
1. Vanguard being the largest shareholder, and a passive one
2. The free speech issue
3. Musk improving Twitter
4. Leaving Twitter if Musk buys it
5. Possibility of Musk buying Twitter
6. Musk's reasons for buying Twitter
7. Musk making an offer to buy Twitter
8. Musk securing financing to buy Twitter

<br>

**Bottom Tier**

The third group consists of authors who garnered between 100 and 999 Likes (Ranked 2780 - 12884). These are primarily civilians. The top ten in this group are:

1. rhonda_harbison
2. TomRoyce
3. farokh
4. BlanikZ
5. Nguyen_anime3
6. trcfwtt
7. Z_LyhGomes
8. Cocochaneladair
9. roberthill91
10. JornalOGlobo


In [125]:
filename = "SourcesLow.csv"
modelLow = make_model(filename, " BETWEEN 100 AND 999 ")

Batches:   0%|          | 0/1379 [00:00<?, ?it/s]

2022-05-24 19:38:52,709 - BERTopic - Transformed documents to Embeddings
2022-05-24 19:39:09,144 - BERTopic - Reduced dimensionality
2022-05-24 19:39:12,423 - BERTopic - Clustered reduced embeddings


In [130]:
modelLow.visualize_barchart()

The Bottom Tier (least influential authors) tweets focused on:
    
1. Optimism for future of Twitter from Musk takeover
2. Free Speech Issue
3. Change coming to Twitter from Musk takeover
4. Possibility that Musk will shut down Twitter San Francisco headquarters.
5. Sales Price for Musk takeover
6. Right wing influencers discussing Musk takeover
7. Crypto enthusiasts discussing Musk takeover
8. Possibility that Musk will buy Twitter

Conclusion:
    
The top tier influencers seem most concerned with major power plays - Musk's political motivation, the Board's initial offer of a board seat to prevent a wholesale takeover, Musk's ability to secure financing. This seems reasonable given that these authors consider themselves power players and the ones who are entitled to discuss such matters.

The middle tier seems more passive - focused on uncertainty, or doubt. Will Musk actually do this? Will employees leave Twitter if content moderation is scaled back? Can Vanguard block Musk from the takeover?

The bottom tier exhibits similar passive uncertainty as the middle tier but appears more optimistic about the effects of the takeover.
