# Advanced NLP Techniques with Transformers for ESG Tasks

### **Goal**:
Explore and review the application of advanced Natural Language Processing (NLP) techniques in Environmental, Social, and Governance (ESG) domains using pre-trained models from the Hugging Face `transformers` library. These techniques can be employed to automate and enhance ESG-related tasks such as text classification, sentiment analysis, and summarization.

### **Introduction**:
The increasing demand for transparency and accountability in ESG practices has driven the need for advanced tools to analyze large volumes of ESG-related data. NLP models, especially those built on transformers, provide the ability to process and understand the complex narratives found in sustainability reports, regulatory filings, corporate disclosures, and news articles.

In this review, we will explore how transformer-based models can be adapted for ESG tasks, leveraging pre-trained models available in the Hugging Face library. These models help automate the analysis of textual data related to environmental policies, social impact, and corporate governance, offering insights that are critical for investors, regulatory bodies, and companies aiming to improve their sustainability practices.

### **Content**:

1. **Overview of ESG-Specific Models on Hugging Face**:
   Hugging Face offers a variety of transformer models that can be fine-tuned for ESG-specific tasks. Some models have been pre-trained on ESG data to enable faster deployment for real-world use cases. These models can handle tasks such as classifying ESG-related documents, analyzing sentiments around sustainability topics, and summarizing lengthy reports on environmental and social initiatives.

   - **ESG BERT**: A BERT-based model fine-tuned on ESG-related datasets to classify documents based on their environmental, social, and governance content.
   - **RoBERTa for ESG Sentiment Analysis**: A RoBERTa model trained to analyze sentiment in ESG reports and sustainability disclosures.
   - **DistilBERT for ESG Summarization**: A lightweight model for summarizing complex ESG-related documents, allowing users to quickly extract key information.

2. **NLP Tasks for ESG Applications**:
   Below are the key tasks that can be performed using pre-trained transformer models in the ESG context:

   - **Text Classification**:
     ESG reports and sustainability disclosures often contain a mix of environmental, social, and governance information. Transformer models can automatically classify text into relevant ESG categories, helping to streamline the analysis of these documents.

     Example task: Categorizing a company's disclosure into 'Environmental,' 'Social,' or 'Governance' based on the content.

   - **Summarization**:
     ESG reports can be lengthy and detailed, making it difficult to extract key information. Summarization models built with transformers can condense these reports into concise summaries, highlighting the most important points related to sustainability initiatives.

     Example task: Summarizing a company's annual sustainability report to focus on key environmental achievements.

   - **Sentiment Analysis**:
     Understanding public perception of a company's ESG practices is crucial for assessing reputational risk and brand value. Sentiment analysis models help gauge the overall sentiment in news articles, social media, or investor reports related to a company's ESG performance.

     Example task: Analyzing public sentiment towards a company’s recent environmental policy changes by classifying the sentiment of related news articles.

3. **Practical Applications of ESG NLP Models**:
   Transformer models are already being applied in a variety of ESG use cases, such as:

   - **ESG Risk Assessment**:
     Using NLP to assess potential risks related to a company’s ESG practices. Models can analyze text from news outlets, regulatory bodies, and public statements to highlight risks related to environmental impact, social responsibility, or corporate governance issues.
     
   - **Regulatory Compliance**:
     Automating the process of ensuring that companies adhere to regulatory requirements in ESG reporting. Transformer models can be used to detect inconsistencies, omissions, or deviations in compliance reports.

   - **Investor Decision Support**:
     Helping investors make data-driven decisions by analyzing ESG disclosures, identifying trends, and providing insights into a company’s sustainability initiatives. NLP models can enhance the efficiency of ESG data analysis, providing real-time insights on how companies perform in key ESG areas.

4. **Advantages of Using Transformer-Based Models for ESG**:
   - **Accuracy**: Transformer models like BERT, RoBERTa, and GPT-3 have set new benchmarks for NLP tasks due to their ability to understand the context of language better than traditional methods.
   - **Scalability**: Pre-trained models allow for rapid deployment across large volumes of ESG data, enabling efficient scaling for enterprise-level use cases.
   - **Fine-tuning Capabilities**: These models can be fine-tuned with ESG-specific datasets to improve performance on industry-relevant tasks, ensuring that they adapt well to the nuances of sustainability and corporate governance data.

### **Conclusion**:
Transformer-based NLP models provide a powerful toolset for tackling a wide range of ESG-related tasks. By leveraging pre-trained models from Hugging Face, professionals in sustainability, finance, and governance can automate and enhance the analysis of ESG data, making it easier to monitor compliance, assess risks, and drive informed decision-making in the realm of sustainability.

In this review, we will explore various models and applications, highlighting the flexibility and utility of transformer models for ESG tasks. Whether for text classification, summarization, or sentiment analysis, these models play a crucial role in the evolution of ESG analysis.


## Install and Import necessary modules

In [76]:
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
from transformers import AutoTokenizer, AutoModelForMaskedLM, AutoModelForSequenceClassification

import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Define sample text(s) 

Define sample texts to use across different models and tasks.

In [68]:
text1 = """
For 2002, our total net emissions were approximately 60 million metric tons of CO2 equivalents for all businesses and operations we have ﬁnancial interests in, based on its equity share in those businesses and operations.
"""

text2 = """
We use technology systems on our trucks to track driver behaviors, which has increased accountability among our managers and resulted in a reduction in speeding and safer fleet operations.
"""

text3 = """
Our Board is composed entirely of independent directors other than our chairman and CEO, and is diverse, with diversity reflecting gender, age, race, ethnicity, background, professional experience, and perspectives.
"""

## MODEL: FinBERT-esg-9-categories

ESG analysis can help investors determine a business' long-term sustainability and identify associated risks. **FinBERT-esg-9-categories** is a FinBERT model fine-tuned on about 14,000 manually annotated sentences from firms' ESG reports and annual reports.

**finbert-esg-9-categories** classifies a text into nine fine-grained ESG topics: Climate Change, Natural Capital, Pollution & Waste, Human Capital, Product Liability, Community Relations, Corporate Governance, Business Ethics & Values, and Non-ESG. This model complements **finbert-esg** which classifies a text into four coarse-grained ESG themes (E, S, G or None).

Detailed description of the nine fine-grained ESG topic definition, some examples for each topic, training sample, and the model’s performance can be found [here](https://www.allenhuang.org/uploads/2/6/5/5/26555246/esg_9-class_descriptions.pdf).

- **Input**: A text.
- **Output**: Climate Change, Natural Capital, Pollution & Waste, Human Capital, Product Liability, Community Relations, Corporate Governance, Business Ethics & Values, or Non-ESG.
- HuggingFace description: [page link](https://huggingface.co/yiyanghkust/finbert-esg-9-categories).

In [33]:
model = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-esg-9-categories',num_labels=9)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-esg-9-categories')
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)

In [35]:
results = nlp(text1)
print(results)

[{'label': 'Climate Change', 'score': 0.9955655932426453}]


In [37]:
results = nlp(text2)
print(results)

[{'label': 'Human Capital', 'score': 0.6967359185218811}]


In [39]:
results = nlp(text3)
print(results)

[{'label': 'Corporate Governance', 'score': 0.9947186708450317}]


## MODEL: FinBERT-esg

ESG analysis can help investors determine a business' long-term sustainability and identify associated risks. FinBERT-ESG is a FinBERT model fine-tuned on 2,000 manually annotated sentences from firms' ESG reports and annual reports. 

- **Input**: A financial text.
- **Output**: Environmental, Social, Governance or None.
- HuggingFace description: [page link](https://huggingface.co/yiyanghkust/finbert-esg).

In [43]:
model = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-esg',num_labels=4)
tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-esg')
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)

config.json:   0%|          | 0.00/781 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/439M [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [47]:
results = nlp(text1)
print(results)

[{'label': 'Environmental', 'score': 0.9887692928314209}]


In [49]:
results = nlp(text2)
print(results)

[{'label': 'Social', 'score': 0.9806464910507202}]


In [51]:
results = nlp(text3)
print(results)

[{'label': 'Governance', 'score': 0.7229688167572021}]


## MODEL: EnvRoBERTa-environmental 

Based on [this paper](https://doi.org/10.1016/j.frl.2024.104979), this is the **EnvRoBERTa-environmental** language model. A language model that is trained to better classify environmental texts in the ESG domain. 

Using the **EnvRoBERTa-base** model as a starting point, the **EnvRoBERTa-environmental** Language Model is additionally fine-trained on a 2k environmental dataset to detect environmental text samples.

- **Input**: A text.
- **Output**: Environmental, or None.
- HuggingFace description: [page link](https://huggingface.co/ESGBERT/EnvRoBERTa-environmental).

Similar models for the Social and Governance are also available [here](https://huggingface.co/ESGBERT). Simply change the model and tokenizer names to one of the following:

- ESGBERT/EnvRoBERTa-environmental
- ESGBERT/SocRoBERTa-social
- ESGBERT/GovRoBERTa-governance

In [78]:
tokenizer_name = "ESGBERT/EnvRoBERTa-environmental"
model_name = "ESGBERT/EnvRoBERTa-environmental"

model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, max_len=512)
nlp = pipeline("text-classification", model=model, tokenizer=tokenizer) # set device=0 to use GPU

config.json:   0%|          | 0.00/946 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/499M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/351 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/280 [00:00<?, ?B/s]

In [80]:
results = nlp(text1)
print(results)

[{'label': 'environmental', 'score': 0.9925256967544556}]


In [82]:
results = nlp(text2)
print(results)

[{'label': 'none', 'score': 0.8960307836532593}]


In [84]:
results = nlp(text3)
print(results)

[{'label': 'none', 'score': 0.9951488375663757}]


# For further exploration:

- **ESGify** on https://huggingface.co/ai-lab/ESGify
- **Financial-RoBERTa** on https://huggingface.co/soleimanian/financial-roberta-large-sentiment
- **AdaptationBERT** on https://huggingface.co/ClimateLouie/AdaptationBERT
- **distilBERT_ESG** on https://huggingface.co/descartes100/distilBERT_ESG