# Natural Language Processing - Public Opinion on Climate Change 

## 1. Sentiment Analysis BERTweet model

BERTweet is the first public large-scale language model pre-trained for English Tweets. BERTweet is trained based on the RoBERTa pre-training procedure, using the same model configuration as BERT-base. BERT is a pre-trained deep learning model developed by Google that revolutionized Natural Language Processing (NLP).

### Import relevant libraries and data

In [16]:
import pandas as pd 
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import re
import torch

In [17]:
# Import data
df_original = pd.read_csv('../data/df_posts.csv')
df = df_original.copy()
df = df[~(df.year.isna())]
df.year = df['year'].astype(int)
df.head()

  df_original = pd.read_csv('../data/df_posts.csv')


Unnamed: 0,comment_id,score,self_text,subreddit,created_time,post_id,controversiality,ups,downs,post_score,post_self_text,post_title,post_upvote_ratio,post_thumbs_ups,post_total_awards_received,post_created_time,clean_text,clean_post_self_text,clean_title,year
0,lds45dg,1,At no point in the Milankovich cycle should th...,politics,2024-07-18 14:59:21,1e6bs6r,0.0,1.0,0,88,,123 House and Senate Republicans deny climate ...,0.93,88,0,2024-07-18 13:43:35,point milankovich cycle earth warm multiple de...,,123 house senate republican deny climate scien...,2024
1,lds42w7,1,"&gt; So, I have to ask: how can Americans crit...",changemyview,2024-07-18 14:58:59,1e6day6,0.0,1.0,0,0,"As an African, I've spent quite some time expl...",CMV: The USA has lost its moral high ground in...,0.22,0,0,2024-07-18 14:49:24,ask american criticize african leader politica...,african ive spent quite time exploring various...,cmv usa lost moral high ground criticizing afr...,2024
2,lds3yu1,1,Because they are paid handsomely by fossil fue...,energy,2024-07-18 14:58:22,1e5luu1,0.0,1.0,0,1531,Has anyone else watching the convention gotten...,Why does the RNC seem to think we don’t produc...,0.91,1531,0,2024-07-17 16:01:11,paid handsomely fossil fuel interest say thing,anyone else watching convention gotten impress...,rnc seem think dont produce oil gas u anymore,2024
3,lds3y2w,1,"Depends on your house, they want £4k for mine ...",unitedkingdom,2024-07-18 14:58:16,1e6c2xf,0.0,1.0,0,4,,Climate body CCC says cut electricity bills to...,0.83,4,0,2024-07-18 13:57:00,depends house want 4k mine grant modern house ...,,climate body ccc say cut electricity bill boos...,2024
4,lds3u4b,1,This is what happens when people face conseque...,climate,2024-07-18 14:57:39,1e5yxqk,0.0,1.0,0,527,,Texas residents endure days-long heat wave and...,0.99,527,0,2024-07-18 01:11:19,happens people face consequence blame wrong pe...,,texas resident endure dayslong heat wave power,2024


In [18]:
# Import key words to filter post by climate related topics
climate_key_words = pd.read_csv('../data/climate_key_words.csv')

key_words_list = climate_key_words.loc[0:2, "key_words"].apply(lambda x: str(x).strip().split(","))
key_words_list = pd.DataFrame(key_words_list)

# Explode the 'key_words' column and convert to a list
key_words_list = key_words_list['key_words'].explode()

# Clean up key words
key_words_list = key_words_list.str.strip().str.lower().tolist()

#we will remove the word 'cop' as it can be also slang word for police
key_words_list.remove('cop')
key_words_list

['affordable energy',
 'energy',
 'reliable energy',
 'modern energy',
 'access to energy',
 'electrification',
 'clean energy',
 'renewable energy',
 'energy efficiency',
 'renewables',
 'energy infrastructure',
 'fossil-fuel technology',
 'clean energy',
 'international cooperation on energy',
 'alternative energy',
 'energy resources',
 'solar energy',
 'photovoltaic',
 'photovoltaics',
 'electrification',
 'bioenergy',
 'biofuel',
 'biofuels',
 'biodiesel',
 'biogasoline',
 'carbon',
 'charcoal',
 'green energy',
 'biomass',
 'woodfuels',
 'sustainable energy',
 'sustainable energy investments',
 'energy developing countries',
 'energy land-locked countries',
 'energy least developed countries.',
 'safe housing',
 'affordable housing',
 'upgrade slums',
 'sustainable transport',
 'sustainable transportation',
 'public transport',
 'city air quality',
 'waste management',
 'sustainable cities and communities',
 'sustainable housing',
 'urbanization',
 'urban environmental impact',
 

In [19]:
# getting rid of conspiracy theories
df = df[~(df.subreddit == 'conspiracy')]
df.head()

Unnamed: 0,comment_id,score,self_text,subreddit,created_time,post_id,controversiality,ups,downs,post_score,post_self_text,post_title,post_upvote_ratio,post_thumbs_ups,post_total_awards_received,post_created_time,clean_text,clean_post_self_text,clean_title,year
0,lds45dg,1,At no point in the Milankovich cycle should th...,politics,2024-07-18 14:59:21,1e6bs6r,0.0,1.0,0,88,,123 House and Senate Republicans deny climate ...,0.93,88,0,2024-07-18 13:43:35,point milankovich cycle earth warm multiple de...,,123 house senate republican deny climate scien...,2024
1,lds42w7,1,"&gt; So, I have to ask: how can Americans crit...",changemyview,2024-07-18 14:58:59,1e6day6,0.0,1.0,0,0,"As an African, I've spent quite some time expl...",CMV: The USA has lost its moral high ground in...,0.22,0,0,2024-07-18 14:49:24,ask american criticize african leader politica...,african ive spent quite time exploring various...,cmv usa lost moral high ground criticizing afr...,2024
2,lds3yu1,1,Because they are paid handsomely by fossil fue...,energy,2024-07-18 14:58:22,1e5luu1,0.0,1.0,0,1531,Has anyone else watching the convention gotten...,Why does the RNC seem to think we don’t produc...,0.91,1531,0,2024-07-17 16:01:11,paid handsomely fossil fuel interest say thing,anyone else watching convention gotten impress...,rnc seem think dont produce oil gas u anymore,2024
3,lds3y2w,1,"Depends on your house, they want £4k for mine ...",unitedkingdom,2024-07-18 14:58:16,1e6c2xf,0.0,1.0,0,4,,Climate body CCC says cut electricity bills to...,0.83,4,0,2024-07-18 13:57:00,depends house want 4k mine grant modern house ...,,climate body ccc say cut electricity bill boos...,2024
4,lds3u4b,1,This is what happens when people face conseque...,climate,2024-07-18 14:57:39,1e5yxqk,0.0,1.0,0,527,,Texas residents endure days-long heat wave and...,0.99,527,0,2024-07-18 01:11:19,happens people face consequence blame wrong pe...,,texas resident endure dayslong heat wave power,2024


### Sentiment Analysis for Post Titles

In [13]:
df_uniq_titles=df[(~df['post_title'].isna())]
# Remove duplicate values from the 'post_title' column , as the one post may be duplicated as it has many comments
df_uniq_titles = df_uniq_titles.drop_duplicates(subset=['post_title'])
df_uniq_titles.count()

comment_id                    16261
score                         16261
self_text                     16261
subreddit                     16261
created_time                  16261
post_id                       16261
controversiality              16261
ups                           16261
downs                         16261
post_score                    16261
post_self_text                 3987
post_title                    16261
post_upvote_ratio             16261
post_thumbs_ups               16261
post_total_awards_received    16261
post_created_time             16261
clean_text                    16080
clean_post_self_text           3840
clean_title                   16253
year                          16261
dtype: int64

In [20]:
# keeping only comments to climate related_titles
pattern = '|'.join(re.escape(keyword) for keyword in key_words_list)

# Filter posts that match any of the keywords
matching_titles = df_uniq_titles[df_uniq_titles['post_title'].str.contains(pattern, case=False, na=False)]

# Count the filtered DataFrame
print(matching_titles.count())

# Create a new data frame with matching posts
df_climate_titles = pd.DataFrame(matching_titles)
df_climate_titles.head()

comment_id                    6126
score                         6126
self_text                     6126
subreddit                     6126
created_time                  6126
post_id                       6126
controversiality              6126
ups                           6126
downs                         6126
post_score                    6126
post_self_text                1197
post_title                    6126
post_upvote_ratio             6126
post_thumbs_ups               6126
post_total_awards_received    6126
post_created_time             6126
clean_text                    6063
clean_post_self_text          1151
clean_title                   6125
year                          6126
dtype: int64


Unnamed: 0,comment_id,score,self_text,subreddit,created_time,post_id,controversiality,ups,downs,post_score,post_self_text,post_title,post_upvote_ratio,post_thumbs_ups,post_total_awards_received,post_created_time,clean_text,clean_post_self_text,clean_title,year
7,lds3oog,1,How much greenhouse gasses get released in war...,climate,2024-07-18 14:56:49,1e6d6k6,0.0,1.0,0,2,,Why the Era of China’s Soaring Carbon Emission...,1.0,2,0,2024-07-18 14:44:17,much greenhouse gas get released warfare,,era china soaring carbon emission might ending...,2024
14,lds32x7,1,"Is it. Is it really impossible, like Is there ...",Futurology,2024-07-18 14:53:29,1e68oum,0.0,1.0,0,183,,The world's largest renewable energy and trans...,0.95,183,0,2024-07-18 11:04:40,really impossible like physical way could like...,,world largest renewable energy transmission pr...,2024
42,lds09zc,1,It does not work like this.\n\nSolar deploymen...,energy,2024-07-18 14:37:54,1e5sntp,0.0,1.0,0,204,,How the Inflation Reduction Act is playing out...,0.98,204,0,2024-07-17 20:36:36,work like solar deployment want buy land lease...,,inflation reduction act playing one biased sta...,2024
55,ldrz4s5,1,&gt;And they’re on pace to improve on those mu...,energy,2024-07-18 14:31:28,1e5oxl5,0.0,1.0,0,139,,China is on track to reach its clean energy ta...,0.92,139,0,2024-07-17 18:04:43,theyre pace improve much faster country possib...,,china track reach clean energy target month si...,2024
89,ldrwiy1,1,China knows that if it becomes the leader in g...,energy,2024-07-18 14:16:33,1e666b5,0.0,1.0,0,22,,China to Boost Funding to Reduce Emissions at ...,0.92,22,0,2024-07-18 08:14:51,china know becomes leader green energy green p...,,china boost funding reduce emission coal power...,2024


In [21]:
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")

# Load the pretrained model
model = AutoModelForSequenceClassification.from_pretrained("vinai/bertweet-base")

# Preprocess and tokenize the data
texts = df_climate_titles['clean_title'].explode()
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors='pt')

# Fine-tune the model on your dataset (not shown here)

# Predict sentiment
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=1)

# Results
sentiments = ["Positive" if pred > 0 else ('Negative' if pred < 0 else 'Neutral') for pred in predictions]
probabilities = [pred.tolist() for pred in predictions]

Downloading config.json:   0%|          | 0.00/558 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/843k [00:00<?, ?B/s]

Downloading bpe.codes:   0%|          | 0.00/1.08M [00:00<?, ?B/s]

emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


ImportError: 
AutoModelForSequenceClassification requires the PyTorch library but it was not found in your environment.
However, we were able to find a TensorFlow installation. TensorFlow classes begin
with "TF", but are otherwise identically named to our PyTorch classes. This
means that the TF equivalent of the class you tried to import would be "TFAutoModelForSequenceClassification".
If you want to use TensorFlow, please use TF classes instead!

If you really do want to use PyTorch please go to
https://pytorch.org/get-started/locally/ and follow the instructions that
match your environment.


In [None]:
df['sentiment'] = sentiments
df['sentilent_prob'] = probabilities