# **ESG Sentiment Analysis**

####  This notebook applied sentiment analysis from 4 different pretrained models on the dataset that shared by Farhad in Dagshub. The [dataset](https://dagshub.com/Omdena/Voy-Finance/src/main/src/tasks/NLP-tasks-exploration/ESG_daily_news.csv) shared by Farhad in Dagshub consists of daily news text. The pretrained models are:

1. [DistilBERT](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
2. [FinBERT-ESG](https://huggingface.co/yiyanghkust/finbert-esg)
3. [ClimateBERT](https://huggingface.co/climatebert/distilroberta-base-climate-sentiment)
4. [BERT-ESG](https://huggingface.co/TrajanovRisto/bert-esg)

In [86]:
pip install transformers torch



## Initialise Pretrained Models

#### This section initialises all the sentiment analysis pretrained models mentioned above

In [87]:
import pandas as pd
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer, BertTokenizer, BertForSequenceClassification, AutoModelForSequenceClassification, AutoTokenizer

# DistilBERT model
distilbert = "distilbert-base-uncased-finetuned-sst-2-english"
distilbert_model_name = 'DistilBERT'
distilbert_model = DistilBertForSequenceClassification.from_pretrained(distilbert)
distilbert_tokenizer = DistilBertTokenizer.from_pretrained(distilbert)

# FinBERT-ESG model
finbert = 'yiyanghkust/finbert-tone'
finbert_model_name = 'FinBERT-ESG'
finbert_model = BertForSequenceClassification.from_pretrained(finbert,num_labels=3)
finbert_tokenizer = BertTokenizer.from_pretrained(finbert)

#ClimateBERT model
climatebert = "climatebert/distilroberta-base-climate-sentiment"
climatebert_model_name = 'ClimateBERT'
climatebert_model = AutoModelForSequenceClassification.from_pretrained(climatebert)
climatebert_tokenizer = AutoTokenizer.from_pretrained(climatebert, max_len=512)

# BERT-ESG model
bert_esg = "TrajanovRisto/bert-esg"
bert_esg_model_name = 'Bert-ESG'
bert_tokenizer = AutoTokenizer.from_pretrained(bert_esg)
bert_model = AutoModelForSequenceClassification.from_pretrained(bert_esg)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


## Upload dataset

#### Running this code will prompt user to upload a file and convert it into a dataframe. News that do not contain any text is removed from the datasets in below code

In [88]:
from google.colab import files

uploaded = files.upload()
esg_df = pd.read_csv(list(uploaded.keys())[0])

Saving ESG_daily_news.csv to ESG_daily_news (2).csv


In [89]:
esg_df.head()

Unnamed: 0,Date,headline,text
0,28/11/2022,Top-Ranked Hedge Fund Makes Contrarian Bet on ...,As most techology stocks reel from higher inte...
1,27/11/2022,Deutsche Bankâ€™s DWS CEO Mulls New Legal Setup,DWS Group CEO Stefan Hoops is considering chan...
2,24/11/2022,"JPMorgan, Deutsche Bank Sued by Epstein Accusers",JPMorgan Chase & Co. and Deutsche Bank AG were...
3,23/11/2022,Tech Job Cuts Increase â€˜Anxietyâ€™ Across In...,"After years of exuberant growth and hiring, la..."
4,22/11/2022,"Amundi, DWS Reclassify Funds in Major Industry...",Amundi and Deutsche Bankâ€™s DWS Group are dow...


In [90]:
esg_df.dropna(subset=['text'], inplace=True)

## Predicting the Sentiment

#### This section defines a function that call the model to predict the sentiment of the input text from the dataset. The codes create a new column for outcomes of each model. A total of 4 new columns will be created as we are evaluating 4 different models.

In [91]:
import torch

def predict_sentiments(model, tokenizer, model_name):
  sentiments = []

  for text in esg_df['text']:
    inputs = tokenizer(text, return_tensors='pt', truncation=True, padding=True)
    outputs = model(**inputs)
    logits = outputs.logits
    probabilities = logits.softmax(dim=1)
    predicted_class = torch.argmax(probabilities, dim=1).item()
    predicted_sentiment = model.config.id2label[predicted_class]

    sentiments.append(predicted_sentiment)

  column_name = f'{model_name}_Predicted_Sentiments'
  esg_df[column_name] = sentiments

In [92]:
predict_sentiments(distilbert_model, distilbert_tokenizer, distilbert_model_name)
predict_sentiments(finbert_model, finbert_tokenizer, finbert_model_name)
predict_sentiments(climatebert_model, climatebert_tokenizer, climatebert_model_name)
predict_sentiments(bert_model, bert_tokenizer, bert_esg_model_name)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


In [93]:
esg_df

Unnamed: 0,Date,headline,text,DistilBERT_Predicted_Sentiments,FinBERT-ESG_Predicted_Sentiments,ClimateBERT_Predicted_Sentiments,Bert-ESG_Predicted_Sentiments
0,28/11/2022,Top-Ranked Hedge Fund Makes Contrarian Bet on ...,As most techology stocks reel from higher inte...,NEGATIVE,Neutral,opportunity,Governance Negative
1,27/11/2022,Deutsche Bankâ€™s DWS CEO Mulls New Legal Setup,DWS Group CEO Stefan Hoops is considering chan...,NEGATIVE,Neutral,neutral,Governance Neutral
2,24/11/2022,"JPMorgan, Deutsche Bank Sued by Epstein Accusers",JPMorgan Chase & Co. and Deutsche Bank AG were...,NEGATIVE,Neutral,risk,Governance Negative
3,23/11/2022,Tech Job Cuts Increase â€˜Anxietyâ€™ Across In...,"After years of exuberant growth and hiring, la...",NEGATIVE,Neutral,neutral,Governance Negative
4,22/11/2022,"Amundi, DWS Reclassify Funds in Major Industry...",Amundi and Deutsche Bankâ€™s DWS Group are dow...,NEGATIVE,Neutral,risk,Governance Positive
...,...,...,...,...,...,...,...
340,28/10/2021,Citi Pitches $1 Billion Social Bond Amid Race ...,Citigroup Inc. is returning to the social bond...,NEGATIVE,Neutral,opportunity,Governance Positive
341,26/10/2021,Jet Fuel Surges in Price as Travel Restriction...,Jet fuel is back in a big way. The oil product...,NEGATIVE,Positive,risk,Environmental Negative
342,25/10/2021,Rich Nations Fail to Meet Climate Target Befor...,Rich countries have failed to meet their pledg...,NEGATIVE,Negative,neutral,Environmental Negative
343,24/10/2021,Negotiators Edge Closer to Global Carbon Marke...,Nations are edging toward a deal that might cr...,NEGATIVE,Neutral,neutral,Governance Negative


In [94]:
# Prediction of DistilBERT

print('Sentiments of DistilBERT:', list(esg_df['DistilBERT_Predicted_Sentiments'].unique()))
print('Number of rows with positive sentiment:', esg_df[esg_df['DistilBERT_Predicted_Sentiments']=='POSITIVE'].shape[0])
print('Number of rows with negative sentiment:', esg_df[esg_df['DistilBERT_Predicted_Sentiments']=='NEGATIVE'].shape[0])
print('')

# Prediction of FinBERT-ESG

print('Sentiments of FinBERT-ESG:', list(esg_df['FinBERT-ESG_Predicted_Sentiments'].unique()))
print('Number of rows with positive sentiment:', esg_df[esg_df['FinBERT-ESG_Predicted_Sentiments']=='Positive'].shape[0])
print('Number of rows with neutral sentiment:', esg_df[esg_df['FinBERT-ESG_Predicted_Sentiments']=='Neutral'].shape[0])
print('Number of rows with negative sentiment:', esg_df[esg_df['FinBERT-ESG_Predicted_Sentiments']=='Negative'].shape[0])
print('')

# Prediction of ClimateBERT

print('Sentiments of ClimateBERT:', list(esg_df['ClimateBERT_Predicted_Sentiments'].unique()))
print('Number of rows with opportunity sentiment:', esg_df[esg_df['ClimateBERT_Predicted_Sentiments']=='opportunity'].shape[0])
print('Number of rows with neutral sentiment:', esg_df[esg_df['ClimateBERT_Predicted_Sentiments']=='neutral'].shape[0])
print('Number of rows with risk sentiment:', esg_df[esg_df['ClimateBERT_Predicted_Sentiments']=='risk'].shape[0])
print('')

# Prediction of Bert-ESG

print('Sentiments of Bert-ESG:', list(esg_df['Bert-ESG_Predicted_Sentiments'].unique()))
print('Number of rows with Environment Positive sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Environmental Positive'].shape[0])
print('Number of rows with Environment Neutral sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Environmental Neutral'].shape[0])
print('Number of rows with Environment Negative sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Environmental Negative'].shape[0])
print('Number of rows with Social Positive sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Social Positive'].shape[0])
print('Number of rows with Social Neutral sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Social Neutral'].shape[0])
print('Number of rows with Social Negative sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Social Negative'].shape[0])
print('Number of rows with Governance Positive sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Governance Positive'].shape[0])
print('Number of rows with Governance Neutral sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Governance Neutral'].shape[0])
print('Number of rows with Governance Negative sentiment:', esg_df[esg_df['Bert-ESG_Predicted_Sentiments']=='Governance Negative'].shape[0])

Sentiments of DistilBERT: ['NEGATIVE', 'POSITIVE']
Number of rows with positive sentiment: 43
Number of rows with negative sentiment: 301

Sentiments of FinBERT-ESG: ['Neutral', 'Negative', 'Positive']
Number of rows with positive sentiment: 56
Number of rows with neutral sentiment: 170
Number of rows with negative sentiment: 118

Sentiments of ClimateBERT: ['opportunity', 'neutral', 'risk']
Number of rows with opportunity sentiment: 122
Number of rows with neutral sentiment: 88
Number of rows with risk sentiment: 134

Sentiments of Bert-ESG: ['Governance Negative', 'Governance Neutral', 'Governance Positive', 'Environmental Negative', 'Environmental Positive', 'Social Negative']
Number of rows with Environment Positive sentiment: 35
Number of rows with Environment Neutral sentiment: 0
Number of rows with Environment Negative sentiment: 27
Number of rows with Social Positive sentiment: 0
Number of rows with Social Neutral sentiment: 0
Number of rows with Social Negative sentiment: 2
Nu

#### From the dataframe and calculation of unique sentiments prediction of different models, we can see that different models having different output labels, such as BERT-ESG classify the text into E, S, G and positive, negative sentiments. Besides, the models are showing different performance. We might need to conduct an evaluation of the accuracy of the model based on our project purpose. The process might be time consuming as we need to read through 344 rows of text and validate the accuracy of each of the model.

## Change the Output Labels (Only Positive, Neutral, and Negative)

#### Since we are seeing different kind of labels from different models. The code below will change all of the labels to 'Positive', 'Neutral', or 'Negative'. This task is performed for the efficiency of comparing results between models. We can find out how many rows are outputting the same label.

In [95]:
def map_to_binary_sentiment (sentiment):
  sentiment_lower = sentiment.lower()
  if 'positive' in sentiment_lower or 'opportunity' in sentiment_lower:
    return 'Positive'
  elif 'neutral' in sentiment_lower:
    return 'Neutral'
  elif 'negative' in sentiment_lower or 'risk' in sentiment_lower:
    return 'Negative'

In [96]:
esg_df_binary_sentiment = esg_df.copy()

esg_df_binary_sentiment['DistilBERT_Predicted_Sentiments'] = esg_df_binary_sentiment['DistilBERT_Predicted_Sentiments'].apply(map_to_binary_sentiment)
esg_df_binary_sentiment['FinBERT-ESG_Predicted_Sentiments'] = esg_df_binary_sentiment['FinBERT-ESG_Predicted_Sentiments'].apply(map_to_binary_sentiment)
esg_df_binary_sentiment['ClimateBERT_Predicted_Sentiments'] = esg_df_binary_sentiment['ClimateBERT_Predicted_Sentiments'].apply(map_to_binary_sentiment)
esg_df_binary_sentiment['Bert-ESG_Predicted_Sentiments'] = esg_df_binary_sentiment['Bert-ESG_Predicted_Sentiments'].apply(map_to_binary_sentiment)

In [97]:
esg_df_binary_sentiment

Unnamed: 0,Date,headline,text,DistilBERT_Predicted_Sentiments,FinBERT-ESG_Predicted_Sentiments,ClimateBERT_Predicted_Sentiments,Bert-ESG_Predicted_Sentiments
0,28/11/2022,Top-Ranked Hedge Fund Makes Contrarian Bet on ...,As most techology stocks reel from higher inte...,Negative,Neutral,Positive,Negative
1,27/11/2022,Deutsche Bankâ€™s DWS CEO Mulls New Legal Setup,DWS Group CEO Stefan Hoops is considering chan...,Negative,Neutral,Neutral,Neutral
2,24/11/2022,"JPMorgan, Deutsche Bank Sued by Epstein Accusers",JPMorgan Chase & Co. and Deutsche Bank AG were...,Negative,Neutral,Negative,Negative
3,23/11/2022,Tech Job Cuts Increase â€˜Anxietyâ€™ Across In...,"After years of exuberant growth and hiring, la...",Negative,Neutral,Neutral,Negative
4,22/11/2022,"Amundi, DWS Reclassify Funds in Major Industry...",Amundi and Deutsche Bankâ€™s DWS Group are dow...,Negative,Neutral,Negative,Positive
...,...,...,...,...,...,...,...
340,28/10/2021,Citi Pitches $1 Billion Social Bond Amid Race ...,Citigroup Inc. is returning to the social bond...,Negative,Neutral,Positive,Positive
341,26/10/2021,Jet Fuel Surges in Price as Travel Restriction...,Jet fuel is back in a big way. The oil product...,Negative,Positive,Negative,Negative
342,25/10/2021,Rich Nations Fail to Meet Climate Target Befor...,Rich countries have failed to meet their pledg...,Negative,Negative,Neutral,Negative
343,24/10/2021,Negotiators Edge Closer to Global Carbon Marke...,Nations are edging toward a deal that might cr...,Negative,Neutral,Neutral,Negative


#### From the results below, we can see that there are only 101 out of 344 rows showing the same results across all the models.

In [98]:
models_predicted_sentiments = ['DistilBERT_Predicted_Sentiments', 'FinBERT-ESG_Predicted_Sentiments', 'ClimateBERT_Predicted_Sentiments', 'Bert-ESG_Predicted_Sentiments']
models_predicted_sentiments_without_neutral = ['FinBERT-ESG_Predicted_Sentiments', 'ClimateBERT_Predicted_Sentiments', 'Bert-ESG_Predicted_Sentiments']

all_positive_rows_count = esg_df_binary_sentiment[models_predicted_sentiments].apply(lambda row: all(sentiment == 'Positive' for sentiment in row), axis=1).sum()
all_neutral_rows_count = esg_df_binary_sentiment[models_predicted_sentiments_without_neutral].apply(lambda row: all(sentiment == 'Neutral' for sentiment in row), axis=1).sum()
all_negative_rows_count = esg_df_binary_sentiment[models_predicted_sentiments].apply(lambda row: all(sentiment == 'Negative' for sentiment in row), axis=1).sum()
print('Number of rows where all models show Posiive sentiment:', all_positive_rows_count)
print('Number of rows where all models show Neutral sentiment:', all_neutral_rows_count)
print('Number of rows where all models show Negative sentiment:', all_negative_rows_count)

Number of rows where all models show Posiive sentiment: 13
Number of rows where all models show Neutral sentiment: 8
Number of rows where all models show Negative sentiment: 80


#### Based on the results above, the pretrained models showing different performance. We might need to spend some time to manual label the news text to see which models performing the best in sentiment analysis based on our project needs. We can create our own datasets that has all the news text labelled to fine tune the pretrained model.